SciPy 2024

Dask in Production
07-10, 11:25–11:55 (US/Pacific), Ballroom

Distributed systems are neat to demo, but hard to use in reality.

This talk goes through lessons learned running 100,000s of Dask clusters and 1,000,000,000s of Python functions for users in critical production settings across many companies and research groups.

We'll cover lessons learned like ...

  • GIL Vigilance is Good
  • Kubernetes is too heavyweight if all you want is lots of jobs
  • ARM is underused
  • Docker doesn't work well for data science folks
  • Availability-Zones are key for spot/GPU availability
  • Adaptive is underused (but hard)
  • Most workloads are small
  • Most workloads are fast
  • Most users don't scale up properly
  • Most people overestimate costs

These lessons will be motivated by tons of metadata collected and aggregated from real-world workloads.


Distributed systems are neat to demo, but hard to use in reality.

This talk goes through lessons learned running 100,000s of Dask clusters and 1,000,000,000s of Python functions for users in critical production settings across many companies and research groups.

We'll cover lessons learned like ...

  • GIL Vigilance is Good
  • Kubernetes is too heavyweight if all you want is lots of jobs
  • ARM is underused
  • Docker doesn't work well for data science folks
  • Availability-Zones are key for spot/GPU availability
  • Adaptive is underused (but hard)
  • Most workloads are small
  • Most workloads are fast
  • Most users don't scale up properly
  • Most people overestimate costs

These lessons will be motivated by tons of metadata collected and aggregated from real-world workloads.

Matthew is an open source software developer in the numeric Python ecosystem. He maintains several PyData libraries, but today focuses mostly on Dask a library for scalable computing. Matthew worked for Anaconda Inc for several years, then built out the Dask team at NVIDIA for RAPIDS, and most recently founded Coiled to improve Python's scalability with Dask for large organizations.

Matthew holds a bachelors degree from UC Berkeley in physics and mathematics, and a PhD in computer science from the University of Chicago.