07-10, 16:30–17:00 (US/Pacific), Room 315
We illustrate the power and flexibility of a new extension point in Xarray's data model: "custom indexes" that allow Xarray users to neatly handle complex grids, and enables at least one new data model (vector data cubes). We present a whirlwind tour of specific examples to illustrate the power of this feature, and aim to stimulate experimentation during the sprints.
Xarray is an open-source Python project that enables its users to use a dataset's metadata for easy, expressive, and readable analytics.
For example, one can use dimension names and coordinate labels to select and subset data: .sel(time="2024-01-01")
.
In this way, Xarray users can express themselves quite naturally in the labeled coordinate system of the data, rather than the unlabeled coordinate system of bare arrays, e.g. in Numpy: array[0, ...]
.
The underlying functionality is enabled by an "index" (in this case, a Pandas Index).
Till much recently, Xarray's indexing features were limited to indexing along one-dimensional coordinates, leaving users to construct ad-hoc solutions for more complicated grids.
Funded by CZI and NASA grants, steady work over the past few years has let us relax this restriction.
Xarray now allows a user to associate "custom indexes" across multiple coordinate variables and dimensions of an Xarray dataset, unlocking a incredible range of use cases for Xarray's geoscience users — from handling complex grids to entire new data models!
We will take a whirlwind tour of specific examples that demonstrate the power and flexibility now available:
1. handling time and space intervals, periodic boundaries, and units;
2. using tree structures for geographical lat-lon indexing from simpler curvilinear grids to more complex discrete global grid systems (xdggs
);
3. handling georeferenced coordinate spaces for raster data with affine transforms;
4. handling very large coordinates with lazy out-of-core indexes;
5. enabling sophisticated time-dimension indexing of weather Forecast Model Run Collections, such as selecting a "best estimate" time series or selecting all forecasts for a given future time instant, and more;
6. enabling the "vector data cube" model that marries the raster and vector data worlds by allowing one to index a dimension using Shapely geometries (xvec
).
We expect this talk to provoke stimulating discussion on possible new use cases, and stimulate collaborations and contributions during the sprints.
Deepak Cherian is an Xarray maintainer and Forward Engineer at Earthmover. Previously he was an oceanographer at the National Center for Atmospheric Research. He helps build and maintain many parts of the scientific Python ecosystem, includinh Xarray, dask, zarr and related projects.