SciPy 2024

xCDAT (Xarray Climate Data Analysis Tools): A Python package for simple climate data analysis on structured grids
07-11, 15:50–16:20 (US/Pacific), Room 315

xCDAT(Xarray Climate Data Analysis Tools) is an open-source Python package that extends Xarray for climate data analysis on structured grids. This talk will cover a brief history of xCDAT, the value this package presents to the climate science community, and a general overview of key features with technical examples. xCDAT’s scope focuses on routine climate research analysis operations such as loading, averaging, and regridding data on structured grids (e.g., rectilinear, curvilinear). Some key features include temporal averaging, geospatial averaging, horizontal regridding, vertical regridding, and robust interpretation and handling of metadata and bounds for coordinates.


The process of analyzing climate data requires core operations such as reading and writing netCDF files, horizontal and vertical regridding, and spatial and temporal averaging. These operations must also be highly performant in order to handle the volume of climate data that continues to grow due to a larger pool of data products and increasing spatiotemporal resolution of model and observational data. In recent years there has been a driving need for modern analysis software that can efficiently handle large datasets with a simple to use interface. xCDAT addresses this need by combining the power of Xarray with sensibly developed geospatial analysis features that are inspired by the Community Data Analysis Tools (CDAT) library. xCDAT leverages powerful packages in the Xarray ecosystem including xESMF, xgcm, and CF xarray.

xCDAT aims to promote software sustainability and scientific reproducibility with climate analysis code. xCDAT streamlines climate data analysis by providing simple and robust APIs that abstract Xarray boilerplate code for core analysis operations. This results in code that is more reusable, readable, and less-error prone compared to implementations with pure Xarray. xCDAT operates on datasets that are compliant with the Climate and Forecast (CF) metadata conventions to ensure general compatibility with data regardless of source.

Performance is also one of the fundamental drivers in xCDAT’s software design. xCDAT tackles this area of focus by supporting Dask for parallelism in many of its computational operations. xCDAT conveniently inherits Dask support through Xarray, which enables users to more efficiently analyze large climate data using multithreading or multiprocessing. For more resource-intensive needs, users can configure and/or use a local or remote Dask cluster.

xCDAT’s mission is to serve the needs of the climate science community in the long-term. xCDAT is a community-driven project that encourages contributions from anyone who is interested through the GitHub repository. xCDAT is actively being integrated as a core component of the Program for Climate Model Diagnosis and Intercomparison (PCMDI) Metrics Package and the Energy Exascale Earth System Model Diagnostics (E3SM) Package. xCDAT is also included in the E3SM Unified Environment that is deployed in various U.S. Department of Energy supercomputers to run E3SM software tools.

Resources:
* GitHub Repository
* Documentation
* Earth System Grid Federation (ESGF) Seminar Series Talk 04/23 (Virtual, YouTube)
* AMS 2023 Abstract (for a talk)
* LLNL Climate and Weather Series Talk (01/25/23) - A Gentle Introduction to xCDAT (Jupyter Notebook slides)

Hi, my name is Tom Vo. I am a software engineer in the LLNL Climate Program and a member of the Energy Exascale Earth System Model (E3SM) and Simplifying ESM Analysis Through Standards (SEATS) projects. I contribute to numerous open-source scientific software for climate science, including leading the development of Xarray Climate Data Analysis Tools (xCDAT). I am formerly a member of the Earth System Grid Federation (ESGF) Project where I was the lead full-stack web developer for MetaGrid, ESGF’s next-generation search portal for climate data.

My interests and expertise are in scientific Python package development, full-stack web-development, and DevOps engineering.