SciPy 2023

Better (Open Source) Homes and Gardens with Project Pythia
07-14, 13:55–14:25 (America/Chicago), Amphitheater 204

As scientists continue to embrace the Jupyter ecosystem for constructing computational narratives of their science through code, data, and rich text, they may encounter technical and community barriers to maintaining and sharing their science with new and existing audiences. We demonstrate the value of open-source science community building and getting there through reliance on the open-source Jupyter ecosystem, pre-packaged GitHub and BinderHub-based infrastructure, and documentation for creating, sharing, testing, and maintaining Pythia Cookbooks for their computational narratives.


A “community garden” metaphor is particularly apt for a free- and open-source software project and community. Enthusiasm, creativity, and openness work both for the SciPy conference and Albany NY’s Tulip Festival. But a “garden”, be it botanical or cyber, requires nurturing. With regard to free- and open-source software, there are bounteous examples. Pull requests (PRs) are sown and merged; Issues are resolved, and bugs are removed. Yet we also see signs of formerly fruitful repositories that have been left to languish. Issues proliferate like weeds; bugs roam freely, and eventually the repos’ stars fade away. It is incumbent on the SciPy community to ensure that the projects we are invested in take the more fruitful path.
One such open source “greenspace” is Project Pythia (hereafter Pythia). Now in its 3rd year, Pythia extends Pangeo by providing an educational and training hub for the geoscientific Python community. It has three key components:
1. Foundations: The core geoscientific Python stack (JupyterBook)
2. Cookbooks: Advanced and domain-specific workflows (JupyterBooks)
3. Resource Gallery of externally-hosted geoscientific Python resources

Here we discuss Pythia’s infrastructure, which sustains the above components in a year-round “community garden”.

Pythia’s content is built upon an open stack of infrastructure for reproducibility and collaboration that provides for the care and nurturing of the community it serves. We have built a cloud-based publishing system upon Jupyter Book that automates notebook execution in a reproducible, curated environment. Users can interact with notebooks via Binder links, launching directly into an identical environment. The platform provides automated code- and link-checking, ensuring a rapid healing cycle. Collaboration is achieved through PRs that trigger the same execution infrastructure and a rich preview.

Our infrastructure relies on GitHub, which encourages open development via PRs. Pythia uses this process extensively for building and maintaining its “garden”, for the core team and community contributions. GitHub’s focus on collaboration provides users a sense of ownership of whatever “garden” they choose to visit, and provides a path for others to visit and contribute.

GitHub’s Actions power Pythia’s automation of key steps in the notebook execution/publishing process. We periodically re-run the publication workflow as health checks for on-going maintenance of the materials, as well as for new “plantings” via PRs. Pythia’s web portal displays the updated content, which users can download to try out and build on in their own “backyard gardens”–computing environments.

A garden may need more powerful tools. While GitHub Actions may often suffice, real-world scientific workflows have compute and data requirements that exceed GitHub’s free resources. Pythia’s notebooks can also be executed on our dedicated cloud using BinderHub, which provides a way to execute notebooks within custom environments. Pythia’s workflows are able to validate and deploy results directly from execution on its BinderHub. The same BinderHub instance powers interactive user sessions, guaranteeing that users execute code in the same environment in which the rendered web pages were built.

Mr. Tyle is the Manager of Departmental Computing for the Department of Atmospheric and Environmental Sciences at the University at Albany, at which he received his M.S. in Atmospheric Science in 1995. He also has a B.A. in Psychology, with emphases on Neuroscience and Cognitive Science, from the University of Rochester. His main interest is promoting the use of free- and open-source software packages, mostly using Python, for the analysis, visualization and sharing of geoscientific datasets.

Scientific Python dev and educator @ UCAR/Unidata. MetPy, Siphon, Project Pythia.