SciPy 2024

Data Visualization with Vega-Altair
07-08, 08:00–12:00 (US/Pacific), Room 317

This tutorial is an introduction to data visualization using the popular Vega-Altair Python library. Vega-Altair provides a simple, friendly, and consistent API that supports rapid data exploration. Vega-Altair’s flexible support for interactivity enables the creation and sharing of beautiful interactive visualizations.

Participants will learn the foundational concepts that Vega-Altair is built on and will gain hands-on experience exploring a variety of datasets. Of particular interest to the scientific community, this tutorial will cover recent advancements in the Vega-Altair ecosystem that make it possible to scale visualizations to large datasets, and to easily export visualizations to static image formats for publication.


This tutorial is divided into four parts, each part consisting of ~30 minutes of instruction followed by ~30 minutes of hands-on exploration.

Part 1 - Data Types, Graphical Marks, and Visual Encoding Channels

(30 minutes instruction + 30 minutes exercise session)

The initial portion of the tutorial has two goals. First, it introduces the grammar of graphics approach to data visualization, in which different variables (different columns in a DataFrame) are encoded into different visual properties of a chart (such as color or x-coordinate). This approach will be immediately recognized by participants already familiar with Seaborn or Plotly Express in Python, or with ggplot2 in R, but no previous experience with this approach to data visualization is expected. Second, this first portion of the tutorial will introduce the syntax of Vega-Altair. This includes both the basic syntax, which is used to generate even the simplest of charts, as well as the syntax that is used to customize an Altair chart beyond its default appearance. Version 5 of Vega-Altair, released in May 2023, introduced a new method-based syntax which can be used for many chart customizations. This method-based syntax is more concise and more readable than the previous syntax (which is still supported), and the tutorial will primarily use the method-based syntax.

In the first exercise session, participants will have the chance to apply the concepts learned so far on a different dataset. Participants will be encouraged to experiment with some of the many chart types and customizations that were not already introduced.

Part 2 - Data Transformation

(30 minutes instruction + 30 minutes exercise session)

Often the data we want to display in a chart is not directly present in the original data source. Of course, one option would be to engineer this data using tools from a DataFrame library such as pandas or Polars, but often the same visualization can be produced directly within Vega-Altair from the original dataset. For example, a histogram could reasonably be produced in two fundamentally different ways: by using groupby in a DataFrame library, or by using binning and aggregation in Vega-Altair.

We will show how including data transformations within the chart specification enables faster iteration on chart designs, and thereby faster data exploration. A related concept is the notion of a faceted chart, in which data is again grouped together, but this time for the purpose of being displayed in separate subplots.

A frequent point of difficulty for newcomers to Vega-Altair is the default restriction that datasets should contain no more than 5,000 rows. We will introduce options for working efficiently in Vega-Altair with far larger datasets, these options being provided by the VegaFusion library.

During the exercise session of Part 2, participants will be guided through some of the possibilities for transforming data within Vega-Altair, including making histograms and facet charts, as well as applying mathematical transformations to the data (again, directly within Vega-Altair, as opposed to externally with a DataFrame library).

Part 3 - Interactivity

(30 minutes instruction + 30 minutes exercise session)

One of the most exciting features of Vega-Altair is its ability to produce interactive charts. In fact, these interactive charts, once produced, are typically fully functional directly within a web browser, with no need for Python running in the background. We will explore some of the types of interactivity available, ranging from simple tooltips to complex multi-view charts.

We’ll also discuss Vega-Altair’s recently added Jupyter Widget integration that makes it possible to access interactive chart states from Python.

The exercise session will again provide participants with an opportunity to explore extensions of these concepts in a guided setting.

Part 4 - Next steps. Sharing and publishing visualizations.

(30 minutes instruction + 30 minutes exercise session)

After creating an effective data visualization, it’s important to be able to share it with other people. Vega-Altair supports a variety of export formats for sharing charts including interactive HTML files, self-contained URLs for opening charts in the online Vega editor, and static PNG, SVG, and PDF formats for use in traditional publications.

While dashboard toolkits will not be covered in this tutorial, we will provide recommendations for the toolkits that provide the best support for hosting Vega-Altair charts as components of a larger dashboard.

In the final exercise session of this tutorial, participants are encouraged to apply what they have learned to their own choice of dataset. (Alternatively, sample datasets will be provided.) As with all of the exercise sessions, the instructors will be on hand to assist with these explorations and to help troubleshoot.


Prerequisites

This tutorial assumes beginner-level proficiency with Python and pandas, but all concepts related to data visualization and Vega-Altair will be explained.

Installation Instructions

Vega-Altair is most frequently used in a notebook environment, such as Jupyter Notebook. Participants will have two options for following this tutorial. For participants that prefer to work from a local Python environment, a list of required PyPI packages (and their versions) will be provided before the tutorial along with suggested installation instructions. Alternatively, the tutorial materials will be available online within Google Colab.

Jon is a visualization software engineer with past experience leading and contributing to a variety of open source Python visualization projects including plotly.py, HoloViews, Datashader. Jon is an active maintainer of Vega-Altair and the creator of VegaFusion (which scales Vega-Altair to large datasets) and VlConvert (which provides static image export of Vega-Altair visualizations). Jon has shared his experience through a variety of talks at past SciPy and PyData conferences. A full list of talks is available at https://jonmmease.dev/talks/.

Chris is a Professor of Teaching in the Math Department at the University of California, Irvine. Although his research background is in theoretical math, Chris has been teaching introductory programming courses in the math department since 2015. Chris first began contributing to Vega-Altair in 2021, and is currently one of the active maintainers of the library. Sample videos of Chris’s teaching are available at https://youtu.be/n61BNVCuTgM?si=ZYJLh73UgDCAXhZv (from 2013, on cryptography labs using the Sage mathematical software) and at https://youtu.be/Ph--xNiz3kM?si=qtLagdb0oFzHme1t (from a few years ago, on estimating probabilities using Matlab).