Understanding Medium's Story Analytics: A Comprehensive Guide
Written on
Chapter 1: My Journey with Medium
Approximately ten months ago, I embarked on a writing journey on Medium, primarily to enhance my writing skills and maintain a record of my learnings. Initially, I paid little attention to the statistics of my articles until I noticed a spike in views for two specific stories about meta cache consistency and Twitter events. This prompted me to monitor story statistics more closely and analyze various metrics.
As I contemplated my next writing topic, the idea of creating a blog post about how Medium’s analytics operate emerged. This led me to conduct thorough research into event publishing in systems and data analysis methodologies.
Section 1.1: Exploring Medium's Event Publishing
In this blog, I will delve into several key areas:
- The process of how events are transmitted while reading stories on Medium, which we will observe using developer tools.
- The potential backend architecture responsible for processing these events.
- The queries dispatched to retrieve data for the statistics page.
The focus will be on user-level monthly statistics and metrics for individual stories. Below, you'll find a screenshot illustrating the features I intend to cover.
Section 1.2: Understanding Story-Level Statistics
This blog will emphasize a crucial aspect that is vital for any platform: event processing and data analysis. We will examine some intriguing design choices that Medium may have implemented or could consider in the future. Please remember, this blog reflects my interpretation of how this system may be structured.
As you read any story, Medium actively tracks various interactions. Top-tier companies rely heavily on analytics to gauge everything from metrics to vital events. Medium, recognized as one of the premier blogging platforms, follows a similar approach. To observe this functionality in practice, follow these steps:
- Open any Medium post.
- Access the developer tools in your browser and navigate to the network tab.
- Select Fetch/XHR and review the batch operations. You'll notice that events are triggered as you interact with the page.
Chapter 2: Analyzing Event Payloads
When we scrutinize the payload sent, it appears as follows:
[
{
"key": "post.streamScrolled",
"data": {
"postIds": ["621b3456c9dc"],
"collectionIds": [""],
"sequenceIds": [""],
"sources": ["post_page"],
"tops": [101],
"bottoms": [11066],
"areFullPosts": [true],
"loggedAt": 1712237305802,
"timeDiff": 1001,
"scrollTop": 665,
"scrollBottom": 1662,
"scrollableHeight": 14941,
"viewStartedAt": 1712237301647,
"service": "lite",
"browserWidth": 1114,
"referrerSource": "your_stories_page"
},
"type": "e",
"timestamp": 1712237305802,
"eventId": "lul9vsnejbnsgug6yt"
}
]
This payload indicates that as we scroll through a story, various events are sent to the backend, providing valuable information such as the event type ("key") and the timestamp of the scroll action.
The first video titled "Medium Stats for Beginners (complete breakdown)" provides a comprehensive overview of how Medium's analytics are structured and the different metrics available to users.
Section 2.1: Backend Processing of Events
Upon receiving events in the backend, several actions must occur:
- Store the event in a data store.
- Execute processing jobs to analyze the event data.
- Ensure that the user's monthly statistics can be quickly accessed.
For reliable storage, systems like HDFS or popular services such as Amazon Redshift and Amazon S3 are utilized. Once the data is collected, processing jobs are executed to analyze the events and store the results in a database. The choice of database technology becomes crucial here.
To understand the data retrieval process, let's examine what happens when a stats page loads on Medium.
The second video titled "How to Find Your Audience on Medium in 2024✍️ with Stats & Demographics" dives into audience engagement metrics and demographic insights available on Medium.
Chapter 3: Querying User Statistics
When the stats page is accessed, Medium sends a request to the backend to retrieve the user's monthly data. The query structure is as follows:
query UserMonthlyStoryStatsTimeseriesQuery($username: ID!, $input: UserPostsAggregateStatsInput!) {
user(username: $username) {
id
postsAggregateTimeseriesStats(input: $input) {
__typename
... on AggregatePostTimeseriesStats {
...MonthlyStoryStats_aggregatePostTimeseriesStats
__typename
}
}
__typename
}
}
This query retrieves detailed timeseries statistics regarding a user’s posts, focusing on metrics like views and readers. The input parameters specify the user and the time frame for the data requested.
In conclusion, this analysis of Medium's analytics architecture illustrates the critical role of event processing and data storage in providing valuable insights into user engagement. The system mirrors concepts from the batch layer of lambda architecture, although the speed layer is absent. Can you envision features that might be implemented using a speed layer?
For more in-depth insights, you can check out my other detailed blog posts: