Tom Nicholas SciPy 2024

Tom Nicholas
.ical

Tom currently works at [C]Worthy, a non-profit building the computation tools needed to ensure safe, effective ocean-based carbon dioxide removal.

Before that he was a Research Software Engineer working in Ryan Abernathey's Climate Data Science Lab at Lamont Doherty Earth Observatory, Columbia University.

He first started using the open-source scientific python stack during his PhD, when he was studying plasma turbulence in nuclear fusion reactors.

He is a member of the xarray core development team, and also works on Cubed, xGCM, pint-xarray, and xarray-datatree. He is heavily involved with the Pangeo community for Big Data Geoscience.

Sessions

07-08

13:30

240min

Xarray: Friendly, Interactive, and Scalable Scientific Data Analysis

Scott Henderson, Don Setiawan, Tom Nicholas, Wietze Suijker, Jessica Scheick, Max Jones, Luis Lopez, Negin Sobhani

Xarray provides data structures for multi-dimensional labeled arrays and a toolkit for scalable data analysis on large, complex datasets with many related variables. Xarray combines the convenience of labeled data structures inspired by Pandas with NumPy-like multi-dimensional arrays to provide an intuitive and scalable interface for scientific analysis. This hands-on tutorial focuses on intermediate and advanced workflows using complex real-world data. We encourage participants in this workshop to bring your own dataset as we will dedicate ample time to apply tutorial concepts to datasets of interest!

Simplifying analysis of hierarchical HDF5 and NetCDF4 files with xarray-datatree

Eniola Awowale, Lucas Sterzinger, Tom Nicholas, Nick Lenssen

Xarray-datatree [1], is a Python package that supports HDFs (Hierarchical Data Format) with hierarchical group structures by creating a tree-like hierarchical data structure in xarray. When an HDF file is opened with Datatree, a DataTree object is created that contains all of the groups in the file. The tree-like structure allows each group to be accessed once a DataTree object is instantiated. This eliminates the need for a user to go through each group and subgroup to access observational data.

We will present our use case for Datatree in NASA’s Harmony Level 2 Subsetter (HL2SS). HL2SS provides variable and dimension subsetting for Earth observation data from different NASA data centers. To subset hierarchical datasets without Datatree, HL2SS flattens the entire data structure into a new file by copying all of the grouped and subgrouped variables into the root group. With this new file, a variable or dimension subset is conducted. However, the flattened and subsetted file has to be in the same hierarchical structure of the original file, so it is unflattened, its attributes are copied, and the variables are grouped back to preserve the original group hierarchy. With the open_datatree() function, HL2SS can open datasets containing multiple groups at once and have all of their group hierarchies preserved. This functionality has significant benefits towards optimizing the workflow in HL2SS, since it would eliminate the need to flatten and unflatten grouped datasets.

[1] https://github.com/xarray-contrib/datatree

Earth, Ocean, Geo, and Atmospheric Science

Room 315

Tom Nicholas .ical

Sessions

Tom Nicholas
.ical