SciPy 2023

Python Array API Standard: Toward Array Interoperability in the Scientific Python Ecosystem
07-14, 11:25–11:55 (America/Chicago), Amphitheater 204

The array API standard (https://data-apis.org/array-api/) is a common specification for Python array libraries, such as NumPy, PyTorch, CuPy, Dask, and JAX.

This standard will make it straightforward for array-consuming libraries, like scikit-learn and SciPy, to write code that uniformly supports all of these libraries. This will allow, for instance, running the same code on the CPU and GPU.

This talk will cover the scope of the array API standard, supporting tooling which includes a library-independent test suite and compatibility layer, what work has been completed so far, and the plans going forward.


This talk will have the following outline:

  • A motivating example, adding array API standard usage to a real-world scientific data analysis script so it runs with CuPy and PyTorch in addition to NumPy.
  • History of the Data APIs Consortium and array API specification.
  • The scope and general design principles of the specification.
  • Current status of implementations:
    • Two versions of the standard have been released, 2021.12 and 2022.12.
    • The standard includes all important core array functionality and extensions for linear algebra and Fast Fourier Transforms.
    • NumPy and CuPy have complete reference implementations in submodules (numpy.array_api).
    • NumPy, CuPy, and PyTorch have near full compliance and have plans to approach full compliance
    • array-api-compat is a wrapper library designed to be vendored by consuming libraries like scikit-learn that makes NumPy, CuPy, and PyTorch use a uniform API.
    • The array-api-tests package is a rigorous and complete test suite for testing against the array API and can be used to determine where an array API library follows the specification and where it doesn’t.
  • Future work
    • Add full compliance to NumPy, as part of NumPy 2.0.
    • Focus on improving adoption by consuming libraries, such as SciPy and scikit-learn.
    • Reporting website that lists array API compliance by library.
    • Work is being done to create a similar standard for dataframe libraries. This work has already produced a common dataframe interchange API.

Aaron Meurer is a software engineer at Quansight, where he works on important projects affecting the scientific Python ecosystem including the array API standard, NumPy, and PyTorch. He is also a core maintainer of the SymPy symbolic mathematics library.

This speaker also appears in:

Thomas J. Fan is a Staff Software Engineer at Quansight Labs and is a maintainer for scikit-learn, an open-source machine learning library for Python. Previously, Thomas worked at Columbia University to improve interoperability between scikit-learn and AutoML systems. He is a maintainer for skorch, a neural network library that wraps PyTorch. Thomas has a Master's in Mathematics from NYU and a Master's in Physics from Stony Brook University.

This speaker also appears in:

I've been working in open source since 2019 as part of multiple projects involving scientific computing and IDE development. The last two years a lot of my work has been focused on providing a better UI/UX of multiple applications. I've given multiple talks about different topics, the two most recent are available in the following links:

This speaker also appears in:

Got my B.S. & M.S. in Physics. After graduating went to work at Howard Hughes Medical Institute for 5 years working on image processing problems particularly in neuroscience. Got more involved in open source during that work with particular interest in packaging, storage, and distributed array processing. Then joined the NVIDIA RAPIDS team where there has been good overlap with these past interests as well as new ones.

This speaker also appears in:

Staff Scientist at LANL

Software engineer @ Quansight working on Data APIs. Hypothesis maintainer.

Ralf has been deeply involved in the SciPy and PyData communities for over a decade. He is a maintainer of NumPy, SciPy and data-apis.org, and has contributed widely throughout the SciPy ecosystem. Ralf is currently the SciPy Steering Council Chair, and he served on the NumFOCUS Board of Directors from 2012-2018.

Ralf co-directs Quansight Labs, which consists of developers, community managers, designers, and documentation writers who build open-source technology and grow open-source communities around data science and scientific computing projects. Previously Ralf has worked in industrial R&D, on topics as diverse as MRI, lithography and forestry.

Andreas Müller is a Principal Research SDE at Microsoft, where he works on the interface of the Data Science ecosystem and cloud infrastructure.
He previously held positions as Associate Research Scientist at the Columbia Data Science Institute and as a Research Engineer at the NYU Center for Data Science.
He is one of the core developers of the scikit-learn machine learning library, a member of the scikit-learn technical committee, and the author of the book "Introduction to machine learning with Python".
His work focuses on practical aspects of machine learning and the development of user-centric machine learning software.

Currently working at openai, previously at google.

Travis is a long-time participant in the SciPy ecosystem.

Working on e-graphs in Python currently. Interested in cross library collaboration in the Python data science ecosystem.