SciPy 2023

Xarray: Friendly, Interactive, and Scalable Scientific Data Analysis
07-11, 13:30–17:30 (America/Chicago), Classroom 203

Xarray provides data structures for multi-dimensional labeled arrays and a toolkit for scalable data analysis on large, complex datasets with many related variables. Xarray combines the convenience of labeled data structures inspired by Pandas with NumPy-like multi-dimensional arrays to provide an intuitive and scalable interface for scientific analysis. This tutorial will introduce data scientists already familiar with Xarray to more intermediate and advanced topics, such as applying functions in SciPy/NumPy with no Xarray equivalent, advanced indexing concepts, and wrapping other array types in the scientific Python ecosystem.


Xarray is an open-source Python project that makes working with complex, multi-dimensional arrays elegant, intuitive, and efficient. Real-world datasets are often a collection of many related variables on a common grid rather than raw numbers. Such datasets are common in the disciplines of earth science, astronomy, biology, and finance. These datasets are more than just arrays of values: they have labels which describe how array values map to locations in dimensions such as space and time and metadata that describes how the data was collected and processed.

Xarray embraces this complexity and enables users to use dataset metadata such as dimension names and coordinate labels to easily analyze, manipulate, and visualize their datasets. For example, the Pandas-inspired Xarray label-based syntax temperature.sel(place=”Boston”) is more intuitive and less error-prone compared to NumPy syntax: temperature[0].

This hands-on tutorial will introduce data scientists already familiar with Xarray to more advanced concepts. All material will be presented via Jupyter Notebooks, with participants actively coding and performing exercises to solidify understanding of key concepts. The tutorial intersperses teaching intermediate to advanced Xarray concepts with increasingly complex real-world data analysis tasks.

The participant learning goals for the tutorial are to:

  1. Effectively use Xarray’s powerful multidimensional indexing operations
  2. Become familiar with important parts of Xarray’s computational API
  3. Understand how to extend Xarray’s built-in capabilities with custom computation functions
  4. Understand how Xarray fits in with other array types in the scientific Python ecosystem

The structure of our tutorial is based on our extensive experience teaching Xarray over the past few years, including numerous similar tutorials at international conferences like SciPy, as well as in formal classes taught at the National Center for Atmospheric Research and the University of Washington.

The tutorial will be presented using Nebari, which will facilitate interactive computation and a consistent computational environment without requiring participants to install any software. Tutorial material will be available online (link) and we will ensure that proper environment files are available for participants that prefer running the tutorial locally. Participants are expected to have some familiarity with Jupyter notebooks, NumPy, Pandas, and Xarray. No specific domain knowledge (e.g. geoscience) is required to effectively participate in this tutorial.

If you are new to Xarray then please go through last year’s tutorial (link) prior to attending, as our tutorial will assume attendees have a working understanding of these basic concepts.


Installation Instructions

https://tutorial.xarray.dev/workshops/scipy2023/README.html

Prerequisites

Familiarity with Xarray, NumPy, Pandas, Jupyter Notebooks;

Xarray in 45 minutes(primer for tutorial session; not required)

Fundamentals of Xarray

This speaker also appears in:

Tom is a Research Software Engineer working in Ryan Abernathey's Ocean Transport Group at Lamont Doherty Earth Observatory, Columbia University.

He first started using the open-source scientific python stack during his PhD, when he was studying plasma turbulence in nuclear fusion reactors.

He is a member of the xarray core development team, and also works on xGCM, pint-xarray, and xarray-datatree.

Negin Sobhani is a High Performance Computing consultant and computational atmospheric scientist working at the National Center for Atmospheric Research (NCAR). She has several years of experience developing and supporting open-source tools and infrastructure to improve the performance and accessibility of Earth System models and bridge the gap between data science, atmospheric science, and software engineering. She is interested in applying in adopting cutting-edge data science and computational technologies to improve our understanding of the environment.

Don Setiawan is a Senior Research Software Engineer at the University of Washington, eScience Institute, Scientific Software Engineering Center (SSEC). He has expertise in Python programming, web development, geospatial data analytics, and cloud-based data engineering. He is interested in building scalable, open software to facilitate scientific discovery across fields and enforce software best practices. He has been a power user of the Xarray ecosystem for several years across various projects with Ocean Observatory Initiative (OOI), U.S. Integrated Ocean Observing System (IOOS), National Oceanic and Atmospheric Administration (NOAA), and National Aeronautics and Space Administration (NASA). He is very excited to share his knowledge and help facilitate the Xarray tutorial as this is his first time at Scipy!

Scott is research scientist in the University of Washington (UW) Department of Earth and Space Sciences and data science fellow at the eScience Institute. He works on numerous NASA-funded efforts to develop open Cloud computing solutions for data intensive research.

This speaker also appears in: