SciPy 2025

Joe Hamman

Joe Hamman is a climate scientist, engineer, and the co-founder and CTO of Earthmover, where he leads the development of Arraylake, a cloud platform for scientific data teams. Previously, he was a founder and Technology Director at CarbonPlan and a scientist at the Climate and Global Dynamics Laboratory at the National Center for Atmospheric Research. He holds a Ph.D. in Civil and Environmental Engineering from the University of Washington, and is a licensed Professional Engineer in Washington State. He co-founded the Pangeo Project and is a core developer of both the Xarray and Zarr-Python projects.

The speaker's profile picture

Sessions

07-08
13:30
240min
Hierarchical Data Analysis with Xarray DataTree & Zarr
Deepak Cherian, Negin Sobhani, Ian Hunt-Isaak, Eniola Awowale, Tom Nicholas, Joe Hamman, Justus Magin

Xarray provides data structures for multi-dimensional labeled arrays and a toolkit for scalable data analysis on large, complex datasets. Many real-world datasets often have hierarchical or heterogeneous structure, and are best organized through groups of related data arrays. Through xarray.DataTree, the xarray data model now supports opening datasets with a hierarchical structure of groups, such as HDF5 files and Zarr stores. This expanded data model is now general enough to manage data across different scientific disciplines, including geosciences and biosciences. This hands-on tutorial focuses on intermediate and advanced workflows using xarray to analyze real-world hierarchical data.

Tutorials
Room 315
0min
Icechunk: Open-source, cloud-native transactional storage engine for multi-dimensional arrays
Joe Hamman

Icechunk is a new, open-souce, cloud-native transactional storage engine for multi-dimensional arrays. Key features include data versioning, schema evolution, and virtualization of existing formats like NetCDF and HDF5. Leveraging object storage, Icechunk provides atomic writes and consistent reads. We'll demonstrate its Python API, Zarr storage interface, and integration with Xarray, Dask, and other libraries within the Scientific Python ecosystem. Real-world examples using gridded weather and climate data will showcase its potential to unlock petabytes for high-performance cloud analytics and machine learning.

Earth, Ocean, Geo, Climate, and Atmospheric Science