Thomas Nicholas
Tom is a Research Software Engineer working in Ryan Abernathey's Ocean Transport Group at Lamont Doherty Earth Observatory, Columbia University.
He first started using the open-source scientific python stack during his PhD, when he was studying plasma turbulence in nuclear fusion reactors.
He is a member of the xarray core development team, and also works on xGCM, pint-xarray, and xarray-datatree.
Sessions
Xarray provides data structures for multi-dimensional labeled arrays and a toolkit for scalable data analysis on large, complex datasets with many related variables. Xarray combines the convenience of labeled data structures inspired by Pandas with NumPy-like multi-dimensional arrays to provide an intuitive and scalable interface for scientific analysis. This tutorial will introduce data scientists already familiar with Xarray to more intermediate and advanced topics, such as applying functions in SciPy/NumPy with no Xarray equivalent, advanced indexing concepts, and wrapping other array types in the scientific Python ecosystem.
Real scientific workflows often require working with many heterogeneous but related datasets, e.g. many different models, experimental + simulation data, or data at multiple resolutions.
Xarray-datatree provides a tree-like hierarchical data structure that is general enough to be useful in a wide variety of such cases. Extending xarray, datatree builds upon an interface that many scientists are already familiar with.
We will explain the model of datatree, its relation to netCDF & Zarr, and how to use the data structure to simplify your own work. We will demonstrate using datatree with real geoscience datasets, such as CMIP6 model data.