SciPy 2023

Climate Model Evaluation Workflow Built on Jupyter Notebooks
07-13, 16:30–17:00 (America/Chicago), Grand Salon C

This project introduces an extensible workflow used to evaluate climate model output using collections of Jupyter notebooks. The workflow supports parametrizing and batch-executing notebooks using Papermill, in conjunction with developing notebooks interactively. Additional features include integration with Dask and caching intermediate data products generated by notebooks. The final product of the workflow can automatically be built into a Jupyter book for easy presentation and shareability. While it was initially developed for climate modeling, the flexible and extensible nature of this framework makes it adaptable to any kind of data analysis work, and the presentation will highlight this capability.


Motivation

Within the field of climate modeling, there is a need to run collections of scripts generating plots of common diagnostic metrics of climate model output, for example as models are run with different configurations during development. These scripts often involve manual configuration, and the output is not necessarily well-organized for interpreting and sharing. Jupyter notebooks help address this problem, creating more readable workflows that can be annotated and edited interactively, then easily presented to others as a Jupyter book. However, Jupyter notebooks are not by default parameterizable or runnable in batches. This project addresses this gap by utilizing Papermill to create a package that can run collections of Jupyter notebooks with configurable parameters, cache generated data products, and publish results as a Jupyter book, while continuing to support the interactive development work that Jupyter notebooks enable. This framework is not limited to use within climate modeling; the infrastructure is useful to any data science project that would benefit from a batch-executable, parameterizable, and shareable Jupyter notebook-based workflow.

Methods

This project uses a number of existing open-source Python tools, building on the Jupyter ecosystem using Papermill as well as Jinja templating, supporting Dask functionality, and publishing a Jupyter book. It brings these tools together to create a powerful workflow that combines their functionality. The project infrastructure will be published as a Python package and on Github, and examples showcasing its functionality will be made available.

Results

Currently (as of 3/1/23), the project is in the development stage, with several working demos. By the time of the conference, a more complete version will be public on Github with documentation and installable as a Python package, along with examples that can be downloaded and built on.

Conclusion

We have developed a framework for data analysis using collections of parameterizable Jupyter notebooks, along with infrastructure to support Dask, caching of data products, building a Jupyter book and other features. This is a powerful application of the Jupyter ecosystem and can be applied to a wide range of fields outside of the climate model evaluation use case it was initially developed for.

I'm an Associate Scientist 1 in the Oceanography Section of NCAR's Climate and Global Dynamics Lab.