Darren Vengroff
Dr. Vengroff is a Computer Scientist with 20+ years of experience in Data Science, Machine Learning, Algorithms, and Software Development. He is the creator and principle maintainer of several open-source projects including censusdis and divintseg.
Dr. Vengroff has worked with organizations large and small, ranging from tech startups to the Bill and Melinda Gates Foundation, Microsoft, and Amazon. His recent work centers on metrics of diversity and integration (e.g. an interactive map of diversity and integration in the U.S.), and modeling techniques to identify systematic bias in areas including home valuation, eviction, and food accessibility. He holds a B.S.E. from Princeton University and an Sc.M. and Ph.D. from Brown University.
Dr. Vengroff's blog can be found at https://datapinions.com.
Sessions
The United States Census Bureau publishes over 1,600 data sets via its APIs. These are useful across a myriad of fields in the social sciences. In this interactive tutorial, attendees will learn how to use open-source Python tools to discover, download, analyze, and generate maps of U.S. Census
data. The tutorial is full of practical examples and best practices to help participants avoid the tedium of data wrangling and concentrate on their research questions.
This hands-on tutorial will consider the full breadth and richness of data available from the U.S. Census. We will cover not only American Community Survey (ACS) and similarly well-known data sets, but also a number of data sets that are less well-known but nonetheless useful in a variety of research contexts.
The tutorial has no slides. Instead, it will be presented from a series of live Jupyter notebooks. After each lesson notebook is presented by the instructor, participants will be given a hands-on exercise to put what they just learned into practice. Essentially they will start with a research question and a blank notebook. Using what they just learned, they will then write the code to answer the question.
Lesson will start with the most basic queries and mapping and move through more advanced topics related to geographies, variables, groups and trees of related variables, and data set exploration.
After covering the concepts, the group as a whole will go through a complete end-to-end research example. Finally, individuals and small groups will have a chance to complete a series of short interactive exercises extending what they have learned and share the results with their peers.
All Python tooling used in the workshop is available as open-source software. Final versions of the notebooks used in the tutorial will also be made available via open-source.
Impact charts, as implemented in the impactchart package,
make it easy to take a data set and visualize the impact of one variable
on another in ways that techniques like scatter plots and linear regression can't,
especially when there are other variables involved.
In this talk, we will introduce impact charts, demonstrate how they find easter-egg impacts
we embed in synthetic data, show how they can find hidden impacts in a real-world use case,
show how you can create your first impact chart with just a few lines of code,
and finally talk a bit about the interpretable machine learning techniques they are built upon.
Impact charts are primarily visual, so this talk will be too.