Explore Your Geospatial Impact Using Google Maps Data
Written on
Uncovering your digital footprint as you traverse the physical world is an intriguing notion. Your smartphone, often perceived as a harmless gadget, is actually a powerful geospatial tracker that diligently records your movements.
If you use an Android device, your travel history may be richer than you realize, offering a fascinating glimpse into your daily activities.
How does this happen?
The Google Location History feature on your Android phone continually logs your GPS coordinates and retains them. This geospatial information encompasses not just your location, but also details about the routes taken, modes of transportation, and time spent at various places.
So, how can we tap into this wealth of data for analysis? Let’s explore together!
First, we need to obtain your Google Maps data.
Right?
Google’s Takeout service allows you to download data from its products, including your Google Maps location history. You can access it directly via:
https://takeout.google.com/settings/takeout
You should see an interface similar to the one below:
Here, you can request a comprehensive copy of your data or select specific items. For our purposes, simply choose the Location History option.
Scroll down the list of Google products until you locate the Location History option.
Once that’s done, proceed to the next step.
Here, you can select your preferred option. I typically choose “Export Once.”
Now, be patient while your selected data is compiled.
When your data file is ready, you will receive an email inviting you to download it, with a specified expiration date for security reasons.
After following the download instructions, you will have a .zip file that begins with the word “takeout.”
Upon unzipping the file, you will find a folder labeled “Semantic Location History.”
The reason this folder is termed Semantic Location History is that it contains higher-level, processed data compared to raw location history data.
This semantic data reflects the information available on the Timeline pages of Google Maps, enriched with additional layers of analysis.
If you're looking for raw data, I haven’t found a way to retrieve it as of yet... please share any alternatives you know in the comments!
Now, let’s dive into the fun part!
The folder comprises multiple subfolders for each year. Personally, I began using a smartphone back in 2013 when I was 15 years old.
Each yearly folder contains a JSON file for every month.
What might be my main challenge?
I transitioned from Android almost two years ago, in 2021... I’m curious about how this affects my Google Maps location history!
Let’s upload this data into our Python environment and start exploring. The data is in JSON format, a commonly used data interchange format that’s manageable in Python.
Libraries such as pandas and json are excellent for loading and manipulating this kind of information.
But first, let’s take a look at the structure of these JSON files.
I typically use an Online JSON Viewer to examine JSON structures.
You can paste your JSON structure into the tool and explore its format in the Viewer window.
Each JSON file, corresponding to a single month, contains all relevant information under the timelineObjects field. Upon further inspection, we can identify two main types of records:
- Activity segment: This includes paths between different locations and various modes of travel (car, walking, biking, etc.)
- Place Visit: This records all locations visited.
This indicates that we will generate two primary datasets: separate tables for placeVisit records and activitySegment records.
Let’s break down the process into distinct steps:
#1. Loading Data into Python
We can easily load the data using the following approach:
- Iterate through each folder — representing a specific year — within the “Semantic Location History.”
- Iterate through each file — representing a specific month — in each folder.
- Append the data to an empty list.
With this method, we create a single, extensive list of JSON files.
I know what you’re thinking... how will I use this massive file?
Next, we need to convert this long list of JSON files into a more usable format.
#2. Creating DataFrames
As mentioned earlier, we’ll construct two main DataFrames:
The Google Maps data comprises various details, but for our analysis, we’ll focus on creating two primary tables: VisitedPlaces and ActivitySegments.
- VisitedPlaces will encompass details about the locations you've visited, including name, address, latitude, longitude, and duration of the visit.
- ActivitySegments will contain data about your journeys between places, including start and end times, distance, duration, and mode of transport.
We will iterate through the list of JSON files, appending activitySegment records to one list and placeVisit records to another.
To keep track of which file each record comes from, we’ll create two additional lists: activitySegment_month_list and placeVisit_month_list, where we will store the original JSON file names containing each record.
After executing this code, we will have two distinct lists of JSON data.
What’s next?
We can utilize json_normalize to convert our data into a DataFrame.
The json_normalize() function from pandas flattens JSON objects into DataFrames, accommodating simple dictionaries, lists of dictionaries, and nested JSON structures.
In our case, as we’re working with a JSON list, we can easily normalize it using the command:
df = pd.json_normalize(json_list)
Now we will normalize both DataFrames and add an additional column indicating the specific JSON file for each record, which we’ll label “source_file.”
This results in a pandas DataFrame that looks like this:
As a surprising note, I only have 6,508 location records and 7,284 activity records.
That’s quite low, isn’t it? I expected more!
Is that all?
Not at all!
Let’s take a closer look at the placeVisit DataFrame using the .info() command. I’ll display only the first 32 columns for clarity.
As we can see, many columns contain numerous null values, indicating that we should be cautious about which columns we include — as they might fill our table with null entries!
This leads us to our next step!
After loading the data, cleaning it is the next logical phase.
This is a critical stage in any data science project. Let’s proceed!
#3. Data Cleaning
First and foremost, we don’t need all the data contained in the JSON.
That’s why we will retain only those columns that contribute to our analysis and rename them for clarity.
Let’s focus on the visitedPlaces DataFrame.
First, I’ll select the necessary data and then rename the columns of interest.
Following the initial steps, we implement several key enhancements:
- Firstly, we adjust latitude and longitude values since the provided figures are scaled by a factor of 10.
- Next, we convert datetime variables into the numpy datetime format for efficient manipulation.
- Finally, we create a new variable ‘duration’, which measures how long we remain stationary at a specific location.
And voilà, we have our cleaned and ready-to-use DataFrame!
All right, everyone, we now have our data primed for analysis.
However, before wrapping up this article, let’s take a closer look at the shiny new dataset we’ve worked diligently to clean and organize. More articles will follow to delve deeper into this data.
To start, let’s assess how many records we have.
It’s fascinating to note that there’s virtually no activity recorded from 2013 to 2017, and 2015? No records whatsoever!
Fast forward to 2019 and 2020 — a bounty of data emerged during these years.
Lastly, we have 2021 as the closing year of records — a bittersweet note, marking my switch to the iPhone. Here’s to new beginnings!
When we analyze this week by week, we observe the same previous insights along with the significant impact of the Covid lockdown on my mobility (it nearly dropped to zero!).
And finally, here’s a map showcasing all my locations worldwide!
(My two primary areas of concentration are around Barcelona — where I currently reside — and Taiwan, where I have lived previously!)
# Main Conclusions
To sum it up, your Google Maps location history can be a valuable dataset for personal data analysis.
By retrieving and analyzing this information, you can uncover significant insights into your behaviors and routines.
While this can be an enjoyable and enlightening personal project, it also serves as a reminder of the extensive digital footprints we leave behind.
It’s essential to remember that while our smartphones are potent data collection tools, it’s up to us to interpret and utilize this data responsibly.
For instance, you could create a heatmap of your most frequently visited locations. This could provide an insightful glimpse into your routines and preferences.
Moreover, by merging external datasets like weather or events, you can assess how outside factors influence your movements.
Happy analyzing!
If you have any questions, feel free to comment!
You can find the previous code on my GitHub account.
Stay updated by subscribing to my Medium Newsletter for unique content!
If you haven’t yet joined Medium, consider checking it out to support me and other writers. It truly helps! :D
Connect with me on Twitter and LinkedIn as well!