SciPy 2024

Gil Forsyth

Gil Forsyth is a software engineer at Voltron Data. He followed the common career path of Japanese language specialist -> administrative assistant -> mechanical engineer -> computational fluid dynamicist -> data scientist -> software engineer -> machine learning engineer -> software engineer. Gil contributes to several projects in the PyData ecosystem and is a core maintainer of xonsh and Ibis. He served as the program chair for the Scientific Computing with Python (SciPy) conference from 2017 to 2020.

The speaker's profile picture

Sessions

07-08
08:00
240min
Intro to Ibis: blazing fast analytics with DuckDB, Polars, Snowflake, and more, from the comfort of your Python repl.
Gil Forsyth, Phillip Cloud, Naty Clementi, Jim Crist-Harif

Tabular data is ubiquitous, and pandas has been the de facto tool in Python for analyzing it. However, as data size scales, analysis using pandas may become untenable. Luckily, modern analytical databases (like DuckDB) are able to analyze this same tabular data, but perform orders-of-magnitude faster than pandas, all while using less memory. Many of these systems only provide a SQL interface though; something far different from pandas’ dataframe interface, requiring a rewrite of your analysis code.

This is where Ibis comes in. Ibis is a pure-Python open-source library that provides a dataframe interface to many popular databases and analytics tools (DuckDB, Polars, Snowflake, Spark, etc...). This lets users analyze data using the same consistent API, regardless of which backend they’re using, and without ever having to learn SQL. No more pains rewriting pandas code to something else when you run into performance issues; write your code once using Ibis and run it on any supported backend.

https://ibis-project.org/
https://github.com/ibis-project/ibis

Tutorials
Ballroom A
07-12
13:55
30min
Ibis: because SQL is everywhere and so is Python
Gil Forsyth

Tabular data is ubiquitous, and pandas has been the de facto tool in Python for
analyzing it. However, as data size scales, analysis using pandas may become
untenable. Luckily, modern analytical databases (like DuckDB) are able to
analyze this same tabular data, but perform orders-of-magnitude faster than
pandas, all while using less memory. Many of these systems only provide a SQL
interface though; something far different from pandas’ dataframe interface,
requiring a rewrite of your analysis code.

This talk will lay out the current database / data landscape as it relates to
the SciPy stack, and explore how Ibis (an open-source, pure Python, dataframe
interface library) can help decouple interfaces from engines, to improve both performance
and portability. We'll examine other solutions for interacting with SQL from Python and
discuss some of their strengths and weaknesses.

General
Room 316