SciPy 2025

John Kirkham

The speaker's profile picture

Sessions

07-07
13:30
240min
Reproducible Machine Learning Workflows for Scientists with pixi
John Kirkham, Matthew Feickert, Ruben Arts

Scientific researchers need reproducible software environments for complex applications that can run across heterogeneous computing platforms. Modern open source tools, like pixi, provide automatic reproducibility solutions for all dependencies while providing a high level interface well suited for researchers.

This tutorial will provide a practical introduction to using pixi to easily create scientific and AI/ML environments that benefit from hardware acceleration, across multiple machines and platforms. The focus will be on applications using the PyTorch and JAX Python machine learning libraries with CUDA enabled, as well as deploying these environments to production settings in Linux container images.

Tutorials
Room 315
0min
Unthrottling I/O bottlenecks to accelerate data analysis and machine learning using GPUs and Zarr v3
John Kirkham, Akshay Subramaniam, Mads R. B. Kristensen

With the advent of petabyte scale datasets in many fields like weather forecasting, genomics, biology and astronomy, storing and working with this data is complex. This can be further complicated when sharing and collaborating on data. The need to use cloud storage and optimized I/O pipelines to read from them are much more critical. Keeping size manageable while minimizing computational impact requires a well thoughtout data compression and data loading strategy. For highly parallel workloads (like deep learning), serving up data fast enough is the bottleneck. How do we solve this pressing user need?

In this presentation, we discuss the latest developments in Zarr, an open source, community developed storage format and python library. We showcase how Zarr V3's approach to data sharding combined with native GPU capabilities and the integrations of nvCOMP GPU decompression and kvikIO's IO suport can help feed data hungry alogrithms alleviating I/O bottlenecks for accelerated computing use cases like data analysis and machine learning.

General