SciPy 2025

Breaking Out of the Loop: Refactoring Legacy Software with Polars
07-09, 11:25–11:55 (US/Pacific), Room 318

Data manipulation libraries like Polars allow us to analyze and process data much faster than with native Python, but that’s only true if you know how to use them properly. When the team working on NCEI's Global Summary of the Month first integrated Polars, they found it was actually slower than the original Java version. In this talk, we'll discuss how our team learned how to think about computing problems like spreadsheet programmers, increasing our products’ processing speed by over 80%. We’ll share tips for rewriting legacy code to take advantage of parallel processing. We’ll also cover how we created custom, pre-compiled functions with Numba when the business requirements were too complex for native Polars expressions.


In this talk, we will explore best practices for modernizing legacy software using Polars, a popular data manipulation library. Our discussion will feature real-world examples from software engineers working for NOAA’s National Centers for Environmental Information (NCEI) who have successfully refactored climate science applications with Polars.

This session provides a unique opportunity to go under the hood of recently updated software projects, including:

• Global Summary of the Month (GSOM), which provides monthly weather summaries for over 100,000 weather stations worldwide

• International Best Track Archive for Climate Stewardship (IBTrACS), the most comprehensive global tropical cyclone dataset available

• Datzilla, a system used to track data issues and corrections within NOAA’s environmental datasets

By leveraging Polars, our teams significantly improved on the performance of the original Java programs. GSOM, for example, saw an 80% boost in processing speed! The refactoring wasn’t always straightforward, however. We’ll share the lessons we learned about writing Polars code that takes advantage of multi-core, parallel processing.

This talk is for anyone interested in atmospheric science, but will be particularly relevant to software engineers and data professionals interested in learning more about refactoring code in Polars. While prior knowledge of Polars is not required, familiarity with Pandas, SQL, or spreadsheet macros would be helpful. During this session, Brodie will guide attendees through examples using Jupyter Notebook and VSCode. He’ll start with simpler usage cases and gradually build toward advanced techniques, including user-defined functions (UDFs) compiled into machine code using Numba.

By the end of this talk, attendees will have a practical understanding of how to migrate legacy workflows to Polars and leverage its full potential to enhance performance. While the examples will primarily be related to climate science, the techniques covered in this session will help attendees write faster, more scalable code for any scientific application that requires large-scale data crunching.

Brodie Vidrine is a software engineer working for NOAA's National Center for Environmental Information. Before working to modernize government software, Brodie spent 17 years writing code for Ascend Math, an online Math tutorial program.