Tom White
Tom White is an independent software engineer. His long-term professional interest centres around large-scale distributed storage and processing. Over the last few years he has focused on big data infrastructure for scientists, including GATK, Scanpy, sgkit, and most recently Cubed. In a previous life Tom wrote “Hadoop: the Definitive Guide” published by O’Reilly. He lives in the Brecon Beacons in Wales with his family.

Sessions
Cubed is a framework for distributed processing of large arrays without a cluster. Designed to respect memory constraints at all times, Cubed can express any NumPy-like array operation as a series of embarrassingly-parallel, bounded-memory steps. By using Zarr as persistent storage between steps, Cubed can run in a serverless fashion on both a local machine and on a range of Cloud platforms. After explaining Cubed’s model, we will show how Cubed has been integrated with Xarray and demonstrate its performance on various large array geoscience workloads.