SciPy 2024

An Introduction to Impact Charts
07-10, 16:05–16:35 (US/Pacific), Room 316

Impact charts, as implemented in the impactchart package,
make it easy to take a data set and visualize the impact of one variable
on another in ways that techniques like scatter plots and linear regression can't,
especially when there are other variables involved.

In this talk, we will introduce impact charts, demonstrate how they find easter-egg impacts
we embed in synthetic data, show how they can find hidden impacts in a real-world use case,
show how you can create your first impact chart with just a few lines of code,
and finally talk a bit about the interpretable machine learning techniques they are built upon.

Impact charts are primarily visual, so this talk will be too.


If you are a data scientist to regularly does exploratory data analysis on new data sets, impact
charts are for you. If you are a social scientist or quantitative policy maker who regularly muddles
through regression models to try to find out how one variable impacts another, impact charts are
for you too.

Impact charts, as implemented in the impactchart package,
make it easy to take a data set and visualize the impact of one variable
on another. Impact charts are easy to generate, easy to visually parse and understand,
and don't require any a priori parametric hypotheses to uncover impact.

In this talk, we will

  • introduce impact charts with some visual examples;
  • demonstrate exactly what impact charts do by showing how they can find easter-egg impacts we embed in synthetic data;
  • show how impact charts were used to find hidden impacts in a real-world use case involving race, ethnicity, income, and eviction;
  • show how you can create your first impact chart on top of your own data set with just a few lines of code, using the
    impactchart package;
  • talk a bit (but without any gory mathematical details) about the interpretable machine learning techniques impact charts
    are built upon and how and why they work

Only a basic understanding of data analysis tools and techniques (i.e. Python 3.x and pandas) is needed
to follow this talk. It will be mostly visual, because visual understanding of the impact of one variable
on another is what impact charts are all about.

Dr. Vengroff is a Computer Scientist with 20+ years of experience in Data Science, Machine Learning, Algorithms, and Software Development. He is the creator and principle maintainer of several open-source projects including censusdis and divintseg.

Dr. Vengroff has worked with organizations large and small, ranging from tech startups to the Bill and Melinda Gates Foundation, Microsoft, and Amazon. His recent work centers on metrics of diversity and integration (e.g. an interactive map of diversity and integration in the U.S.), and modeling techniques to identify systematic bias in areas including home valuation, eviction, and food accessibility. He holds a B.S.E. from Princeton University and an Sc.M. and Ph.D. from Brown University.

Dr. Vengroff's blog can be found at https://datapinions.com.

This speaker also appears in: