James Bourbeau
James Bourbeau is a core maintainer of Dask, engineer at Coiled, and frequent conference speaker. He works to enable the use of data in our lives.
Sessions
07-10
14:35
30min
Pandas + Dask DataFrame 2.0 - Comparison to Spark, DuckDB and Polars
James Bourbeau
Dask is a library for distributed computing with Python that integrates tightly with pandas. Historically, Dask was the easiest choice to use (it’s just pandas) but struggled to achieve robust performance (there were many ways to accidentally perform poorly). The re-implementation of the DataFrame API addresses all of the pain points that users ran into. We will look into how Dask is a lot faster now, how it performs on benchmarks that is struggled with in the past and how it compares to other tools like Spark, DuckDB and Polars.
Maintainers and Community
Room 315