SciPy 2025

Xarray across biology. Where are we and where are we going?
07-10, 10:45–11:15 (US/Pacific), Room 317

Xarray has enormous potential as a data model and toolkit for labeled N-D arrays in biology. Originally developed within the geosciences community, it is seeing increased usage in biology, with applications ranging from genomics to image analysis and beyond. However, it has not yet been widely adopted. This presentation will investigate what the blockers have been to wider adoption, showcase the power of Xarray in biology through existing use cases, and present a roadmap for the future of Xarray in biological workflows through recent and upcoming improvements in Xarray.


Background

Biological datasets come in a wide variety of shapes, sizes, and types. However, there are common challenges faced across biology when dealing with complex structured data, and keeping track of real-world coordinates. Xarray provides a powerful solution to these issues. Additionally, Xarray provides first class support for HDF and Zarr files, formats already in wide use in biology. Despite these advantages, Xarray has yet to see widespread adoption in the biological research community. Developers of many biology-focused Python packages want to see greater adoption of Xarray as a data model and tool kit for common challenges. This presentation will explore what the challenges for greater adoption have been, what has changed, and present a roadmap for future work.

Motivation

To motivate using Xarray for biologists, I will demonstrate its usage in biological workflows through common workflows:
- Examples from the sgkit project
- Managing segmentation of timelapse microscopy experiment
- Xarray as an interface to dask and zarr
- Keeping track of metadata
- Examples of processing neurophysiology data (similar to this)

What Has Limited Adoption

Based on interviews with biological software thought leaders conducted in Spring 2025 I will discuss the major factors that have prevented wider adoption of Xarray in biological research.

  • Social
  • Lack of documentation and examples with biological data
  • Fragmentation of biology software between Python, Java, Matlab, etc.
  • Lack of clear reference implementations
  • Technical
  • Integration (or lack of) with existing biology software tools
  • Data model incompatibility

Recent Improvements

Recent improvements in Xarray and developments in downstream packages will enable wider adoption Xarray. For each point, I will demonstrate with a brief real-world example use case.

  • Technical:
  • xarray.DataTree aligns the data model with next generation microscopy data format (OME-zarr)
    • Multiscale imaging
      -Flexible Transformations - Within an implementation merged in February 2025, Xarray now supports functional definitions of coordinates. This enables using Xarray for use cases such as volumetric imaging
  • Increased usage in user-facing packages
    • AllenSDK
    • napari
  • Social:
  • This presentation is part of a multi-year effort to increase the visibility of Xarray to biologists
  • Addition of biology examples to Xarray documentation
  • Blog post series demonstrating the advantages Xarray for common biological workflows. This will also be a call for interested writers

Roadmap

To conclude, I will discuss how the biology community and Xarray community work to accelerate the adoption of Xarray in biology. This roadmap for the future will include: efforts to better integrate Xarray with existing projects, education efforts, and feature development in Xarray.

I recently completed my PhD, during which I built software to manage the acquisition of combined epifluorescence and single-cell Raman spectroscopy time-lapse data. I extensively used Xarray and Zarr in both the data acquisition and analysis of this project. I have also presented multiple workshops on using Python for scientific data analysis and on using the SciPy stack (including Xarray) for microscopy data.

During graduate school, I discovered a passion for contributing to open-source scientific projects, which led me to my current role as a Xarray Community Developer at Earthmover. In this role, I am focused on improving Xarray for use cases in biological research.