SciPy 2023

John Kirkham

Got my B.S. & M.S. in Physics. After graduating went to work at Howard Hughes Medical Institute for 5 years working on image processing problems particularly in neuroscience. Got more involved in open source during that work with particular interest in packaging, storage, and distributed array processing. Then joined the NVIDIA RAPIDS team where there has been good overlap with these past interests as well as new ones.

The speaker's profile picture

Sessions

07-13
15:50
30min
Zarr: Community specification of large, cloud-optimised, N-dimensional, typed array storage
Sanket Verma, John Kirkham, Josh Moore

A key feature of the Python data ecosystem is the reliance on simple but efficient primitives that follow well-defined interfaces to make tools work seamlessly together (Cf. http://data-apis.org/). NumPy provides an in-memory representation for tensors. Dask provides parallelisation of tensor access. Xarray provides metadata linking tensor dimensions. Zarr provides a missing feature, namely the scalable, persistent storage for annotated hierarchies of tensors. Defined through a community process, the Zarr specification enables the storage of large out-of-memory datasets locally and in the cloud. Implementations exist in C++, C, Java, Javascript, Julia, and Python, enabling.

Tending Your Open Source Garden: Maintenance and Community
Zlotnik Ballroom
07-14
10:45
30min
New CUDA Toolkit packages for Conda
John Kirkham, Thomson Comer, Rick Ratzel

In this talk, we will examine the new CUDA package layout for Conda (as included in conda-forge). Show how CUDA components have been broken out. Share how this affects development and package building. Walk through changes in the conda-forge infrastructure made to incorporate these new packages. Examine recipes using the new packages and what was needed to update them. Additionally will provide guidance on how to use these new packages in recipes or in library development.

General Track
Amphitheater 204
07-14
11:25
30min
Python Array API Standard: Toward Array Interoperability in the Scientific Python Ecosystem
Aaron Meurer, Thomas J. Fan, Stephannie Jimenez Gacha, John Kirkham, Stephan Hoyer, Tyler Reddy, Leo Fang, Matthew Barber, Ralf Gommers, Andreas Mueller, Athan Reines, Mario, Alexandre Passos, Travis E Oliphant, Saul shanabrook

The array API standard (https://data-apis.org/array-api/) is a common specification for Python array libraries, such as NumPy, PyTorch, CuPy, Dask, and JAX.

This standard will make it straightforward for array-consuming libraries, like scikit-learn and SciPy, to write code that uniformly supports all of these libraries. This will allow, for instance, running the same code on the CPU and GPU.

This talk will cover the scope of the array API standard, supporting tooling which includes a library-independent test suite and compatibility layer, what work has been completed so far, and the plans going forward.

General Track
Amphitheater 204