Intro to Ibis: blazing fast analytics with DuckDB, Polars, Snowflake, and more, from the comfort of your Python repl. SciPy 2024

Intro to Ibis: blazing fast analytics with DuckDB, Polars, Snowflake, and more, from the comfort of your Python repl.
.ical

2024-07-08 08:00–12:00, Ballroom A

Tabular data is ubiquitous, and pandas has been the de facto tool in Python for analyzing it. However, as data size scales, analysis using pandas may become untenable. Luckily, modern analytical databases (like DuckDB) are able to analyze this same tabular data, but perform orders-of-magnitude faster than pandas, all while using less memory. Many of these systems only provide a SQL interface though; something far different from pandas’ dataframe interface, requiring a rewrite of your analysis code.

This is where Ibis comes in. Ibis is a pure-Python open-source library that provides a dataframe interface to many popular databases and analytics tools (DuckDB, Polars, Snowflake, Spark, etc...). This lets users analyze data using the same consistent API, regardless of which backend they’re using, and without ever having to learn SQL. No more pains rewriting pandas code to something else when you run into performance issues; write your code once using Ibis and run it on any supported backend.

https://ibis-project.org/
https://github.com/ibis-project/ibis

This tutorial is open to all. If you have ever

been thwarted by SQL or data stored somewhere, or
been stuck trying to translate a pandas POC to PySpark for "production", or
are interested in how to write blazing fast analytics code that uses all of the cores on your laptop without running into memory limits (and without writing any SQL)

then this tutorial is for you!

We’ll cover:

The basic operations of Ibis (select, filter, group_by, order_by, join, and aggregate), and how these operations may be composed to form more complicated queries.
How Ibis may be used on a number of different local and remote backend engines to execute the same queries on different systems.
How to quickly compare performance between different backends without changing your code.
How Ibis integrates into the larger Python data ecosystem, including tools like Scikit-Learn, Matplotlib, PyArrow, pandas, Shapely, Altair, hvPlot, and VegaFusion.

Prerequisites:

This is a hands-on tutorial presented with Jupyter notebooks, with numerous examples to get your hands dirty. Participants should ideally have some experience using Python and pandas, but no SQL experience is necessary.

Installation Instructions:

We intend to use GitHub Codespaces for quick environment setup -- detailed instructions for Codespace setup and for local installation (if attendees want to) are available on the tutorial repo at https://github.com/ibis-project/ibis-tutorial.

Naty Clementi

Naty is a senior software engineer at Voltron Data. She is a former academic with a Masters in Physics and PhD in Mechanical and Aerospace Engineering to her name. She is currently contributing to Ibis, but in the past has also contributed and maintained Dask. She is also an active member of Pyladies and a one of the directors of Women Who Code DC.

This speaker also appears in:

Ibis + DuckDB geospatial: a match made on Earth

Jim Crist-Harif

Gil Forsyth

Gil Forsyth is a software engineer at Voltron Data. He followed the common career path of Japanese language specialist -> administrative assistant -> mechanical engineer -> computational fluid dynamicist -> data scientist -> software engineer -> machine learning engineer -> software engineer. Gil contributes to several projects in the PyData ecosystem and is a core maintainer of xonsh and Ibis. He served as the program chair for the Scientific Computing with Python (SciPy) conference from 2017 to 2020.

This speaker also appears in:

Ibis: because SQL is everywhere and so is Python

Phillip Cloud

I'm Phillip Cloud, a software engineer. I work on Ibis full-time at Voltron Data. I like a lot of things, including Dune, jazz and puns. Let's chat!

Intro to Ibis: blazing fast analytics with DuckDB, Polars, Snowflake, and more, from the comfort of your Python repl. .ical 2024-07-08 08:00–12:00, Ballroom A

Intro to Ibis: blazing fast analytics with DuckDB, Polars, Snowflake, and more, from the comfort of your Python repl.
.ical

2024-07-08 08:00–12:00, Ballroom A