SciPy 2023

Using Linear Tracking Data to Estimate Backcountry Recreation Popularity
07-14, 11:25–11:55 (America/Chicago), Grand Salon C

Geolocated data from smartphone apps are well-established resources for research. While most of that data come as points (e.g., geotagged photos), there are a growing number of apps that collect linear data from users activities (e.g., running, hiking, off-road driving). Using established ecological methods, shallow-machine learning packages, and multiprocessing we demonstrate a novel approach using mobile app data to estimate back-country recreation popularity at multiple scales. The topics covered include normalizing and thinning coordinate data, merging linear data from multiple sources, and accounting for spatial bias while preserving the integrity of the original data.


Official sources (typically governments) provide the cleanest and most trustworthy data. Decades of established standards and years of archived records provide a framework for reliable data collection making it a strong foundations for research. The drawback to official centralized sources is that they often focus on the macro level, and because of this, the data tends to be lower resolution, leaving broad areas of obscurity at a micro level.

With a need to geospatially estimate and represent backcountry recreation habits on a statewide level down to square mile grid, our team needed high-resolution datasets.

Social media data is high-resolution, dense and valuable, which often leads companies to limit access to their data. App downloads and active user counts fluctuate with the market and long-term utility of an app's data is not guaranteed to last. Despite these limitations, social media offers significantly higher resolution data than official sources. We will discuss the methods we developed to overcome the unique challenges of processing and standardizing social media data so it complements and informs official datasets.

We will cover acquiring data from multiple apps using modular methods that can be applied to new apps as older ones become obsolete. It is rare for apps to offer identical metrics, so we developed a flexible approach that can translate different metrics into a standardized form. It is also important that a model addresses the inherent unknowns that lie beyond the app's userbase.

Our specific use case uses linear geolocation data gathered from mobile tracking apps. Data comes in the form of GeoJSON coordinates, Google Earth polylines, and shapefiles. We will discuss the specific packages used to read each and store them in a common format.

Linear data brings with it unique challenges relative to point and polygon data. We will describe how we used Python to redistribute points along line segments while maintaining a minimum distance between segment vertices as a first step towards standardizing the linear data. We then needed to "thin" the data to minimize spatial bias caused by overlapping line segments and circuitous routes; this will include our reasoning for not averaging or interpolating new data points between each dataset, and how instead we used a method that preserved the integrity of original geolocation data.

We will explain the ways multiprocessing and nonlinear data structures were used to process large numbers of vertices when we aggregated all datasets together and ran the thinning algorithm on the combined points; the goal of which was to create an overall presence dataset that represents recreation across the state with minimal spatial bias.

We will also review the resulting data structure: coordinate pairs, aggregated metrics and IDs that point back to rows from the original datasets gathered from each app.

The presentation will finish with a summary and conclusion on how the resulting presence data was processed using the MaxEnt ecological model to inform and supplement official data sources and provide the state of Arizona with a clearer picture of recreation.

Vince is a Data science grad student with a background in biology. He lives in a small mountain town where he works on a research team involved in estimating and quantifying the risks abandoned hard-rock mines pose to the population of Arizona.