John Kirkham

Sessions
Scientific researchers need reproducible software environments for complex applications that can run across heterogeneous computing platforms. Modern open source tools, like pixi
, provide automatic reproducibility solutions for all dependencies while providing a high level interface well suited for researchers.
This tutorial will provide a practical introduction to using pixi
to easily create scientific and AI/ML environments that benefit from hardware acceleration, across multiple machines and platforms. The focus will be on applications using the PyTorch and JAX Python machine learning libraries with CUDA enabled, as well as deploying these environments to production settings in Linux container images.
With the advent of petabyte scale datasets in many fields like weather forecasting, genomics, biology and astronomy, storing and working with this data is complex. This can be further complicated when sharing and collaborating on data. The need to use cloud storage and optimized I/O pipelines to read from them are much more critical. Keeping size manageable while minimizing computational impact requires a well thoughtout data compression and data loading strategy. For highly parallel workloads (like deep learning), serving up data fast enough is the bottleneck. How do we solve this pressing user need?
In this presentation, we discuss the latest developments in Zarr, an open source, community developed storage format and python library. We showcase how Zarr V3's approach to data sharding combined with native GPU capabilities and the integrations of nvCOMP GPU decompression and kvikIO's IO suport can help feed data hungry alogrithms alleviating I/O bottlenecks for accelerated computing use cases like data analysis and machine learning.