SciPy 2023

Small Town Police Accountability: A Data Science Toolkit
07-14, 10:45–11:15 (America/Chicago), Grand Salon C

In this talk we will share a Python library to obtain and analyze policing data, that was developed in conjunction with community activists, data scientists, social scientists and the Small Town Police Accountability (SToPA) Research Lab. We will showcase components of the SToPA library which use Python tools such as web drivers, optical character recognition, geospatial mapping, machine learning and statistical sampling to better understand the policing landscape. The goal of this work is to present an easily replicable framework for analyzing police and community interactions with accessible on-ramps for activists, developers and researchers.


Recent years have highlighted the urgent need for transparency and accountability within police departments across the United States. Typically, large cities have access to policing data and the resources to analyze and interrogate such data to hold authority accountable. Small towns face the same injustices at the hands of police, but these issues receive comparatively little attention, in part due to a lack of resources and tools to investigate the data. Additional challenges arise in the clarity and consistency of the data that may be available. Consequently, the public are generally unable to take data-informed action toward social justice in these regions. The overarching goal of the The Small Town Police Accountability (SToPA) Research Lab is to create an adaptable tool that enables small-town residents to analyze police actions to increase transparency and accountability.

This talk will introduce the interdisciplinary work of the research group to (1) obtain data through digital portals and records requests, (2) create a flexible, scaffolded software toolkit for organizing and analyzing police data for users with various levels of technical expertise and (3) use data-driven modeling tools to uncover potential patterns and anomalies in select small town data, serving as a template for investigations elsewhere. The SToPA toolkit consists of a range of components including instructions for data gathering; adaptable tools for reading, cleaning, and organizing data; and machine learning applications to analyze and understand patterns in policing.

Using case studies of a handful of small towns, the SToPA toolkit provides a broadly applicable methodology for reading and parsing police data. Where data is available online in a somewhat structured format, the SToPA library offers tools for web crawling and scraping. In other cases, data is only available as a printed physical copy, necessitating digitization, text identification using tools such as PyTesseract, word-level data cleaning, and testing for accuracy. This pipeline includes the use of user-defined, non-standard language dictionaries (such as a list of town-specific locations), geometric methods for word location detection, regular expressions, and fuzzy string matching.

After data is collected, cleaned, and structured, a second thrust of the SToPA lab is to analyze police interactions with machine learning and statistical tools. A diverse set of policing data, including dates, locations, names, and free text narratives, yields rich opportunity for exploratory analysis and modeling. Explorable maps were created with various mapping and plotting libraries, revealing location-based patterns. Town-specific data from the US Census allows for demographic comparisons between how citizens are distributed vs. how they are policed. This analysis is further refined using statistical sampling and inference tools such as scikit-learn and PyEI. Narrative text data, unstructured language across thousands of reports, was also analyzed with natural language processing techniques such as topic modeling.

This talk aims to be accessible to a diverse audience and to empower and inspire others to contribute to the growing SToPA repository: https://qsideinstitute.github.io/SToPA/

Ariana Mendible is an assistant professor at Seattle University, where she teaches and uses data science to approach social justice research problems.