SciPy 2025

Python is all you need: an overview of the composable, Python-native data stack
07-09, 13:55–14:25 (US/Pacific), Ballroom

For the past decade, SQL has reigned king of the data transformation world, and tools like dbt have formed a cornerstone of the modern data stack. Until recently, Python-first alternatives couldn't compete with the scale and performance of modern SQL. Now Ibis can provide the same benefits of SQL execution with a flexible Python dataframe API.

In this talk, you will learn how Ibis supercharges existing open-source libraries like Kedro and Pandera and how you can combine these technologies (and a few more) to build and orchestrate scalable data engineering pipelines without sacrificing the comfort (and other advantages) of Python.


Over the past year, integrations across the Python data ecosystem unlocked a cohesive vision for the composable Python analytics stack. In this talk, we will begin with a brief overview of the modern data stack, what it offers, and why the idea became so prevalent in data engineering. This will provide a baseline for the capabilities that a useful analytics stack should have.

Next, we will introduce Ibis, the data processing workhorse of the Python analytics stack. Ibis is a portable Python dataframe library that supports 20+ execution backends, from local computing engines like Polars and DuckDB to distributed cloud data warehouses like Snowflake and BigQuery. Crucially, its deferred execution model makes it perfect for large-scale in-database data transformation, much like SQL. While we won't weigh in on the never-ending Python versus SQL debate (SQL is prevalent and effective and benefits from a mature data engineering tooling ecosystem), we will cover a few relevant advantages of using Python and specifically Ibis, including portability and suitability for other (e.g. data science, machine learning, and AI) workloads.

Then, we will present two key components of the emerging stack: Kedro as the core transformation framework and Pandera for data validation. In both cases, we will highlight how Ibis extends the capabilities of the existing, established tool.

Kedro gained popularity as a framework for authoring production-ready data science pipelines. While it has also been used by data engineers since its inception, most data engineering use cases were Spark-based or small data. However, integrating Ibis enabled building data engineering pipelines that scale. Furthermore, it complemented Kedro's concept of dev/prod parity; the exact same code could now be tested locally and deployed in production with just a difference in configuration. We'll demonstrate a Kedro port of the Jaffle Shop project as an example pipeline leveraging the Ibis integration. (The Jaffle Shop is the canonical dbt starter project.)

Pandera is a lightweight Python data validation library. It already supported a variety of dataframe backends, including pandas, PySpark, and, most recently, Polars. By adding support for Ibis, we extended Pandera's flexible and expressive data testing API to execute natively on the database. We'll show what this looks like by adding validation to our Jaffle Shop pipeline.

Finally, we will step back and look at the remaining pieces of the composable analytics stack. We will fill out the picture with Python-native recommendations for ingestion (dlt) and orchestration (Dagster). We will also be transparent about some of the current gaps compared to the more established SQL-first approach and ongoing work to address them.

Attendees don't need previous experience with the modern data stack or any of the aforementioned technologies; those who already understand some of the usual limitations of using Python for data engineering workflows will benefit from learning how they can overcome these challenges, while others less familiar will be introduced to popular frameworks like Ibis, Kedro, and Pandera that they can explore and start to use in their day-to-day work or side projects.

Deepyaman is a software engineer at Dagster Labs. He joined from Voltron Data, where he was a Senior Staff Software Engineer on the Ibis team. Before their acquisition by Voltron Data, he was a Founding Machine Learning Engineer at Claypot AI, working on their real-time feature engineering platform. Prior to that, he led data engineering teams and asset development across a range of industries at QuantumBlack, AI by McKinsey.

Deepyaman is passionate about building and contributing to the broader open-source data ecosystem. Outside of his day job, he helps maintain Kedro, an open-source Python framework for building production-ready data science pipelines.

This speaker also appears in: