SciPy 2024

Scaling your data science workflows with Modin
07-10, 10:45–11:15 (US/Pacific), Ballroom

pandas is one of the most commonly used data science libraries in Python, with a convenient set of APIs for data cleaning, preparation, analysis, and exploration. However, despite its widespread adoption, pandas suffers from severe memory and performance issues on even moderately sized datasets. Modin is an open-source project that serves as a fast, scalable drop-in replacement for pandas (https://github.com/modin-project/modin). By changing just a single line of code, Modin seamlessly speeds up pandas workflow on a laptop or in a cluster. Originally developed at UC Berkeley, Modin has been downloaded more than 17 million times and is used by leading data science teams across industries.


Modin is a highly scalable, drop-in replacement for pandas. Modin eliminates the complexity of working directly with distributed systems and lets users continue to use the pandas API with large datasets. In this talk, we will give an overview of Modin and a demo of how you can scale up your pandas workflows by changing just a single line of code.

Doris Lee is currently leading Python and data science product efforts at Snowflake. Previously, Doris is the CEO and co-founder of Ponder, the company behind the open source project Modin. Ponder was acquired by Snowflake in 2023. Doris received her Ph.D. from the UC Berkeley RISE Lab and School of Information in 2021, where she developed tools that help data scientists explore and understand their data. She is the recipient of Forbes 30 under 30 for Enterprise Technology in 2023.