SciPy 2025

All the SQL a Pythonista needs to know: an introduction to SQL and DataFrames with DuckDB
07-07, 08:00–12:00 (US/Pacific), Room 318

Structured Query Language (or SQL for short) is a programming language to manage data in a database system and an essential part of any data engineer’s tool kit. In this tutorial, you will learn how to use SQL to create databases, tables, insert data into them and extract, filter, join data or make calculations using queries. We will use DuckDB, a new open source embedded in-process database system that combines cutting edge database research with dataframe-inspired ease of use. DuckDB is only a pip install away (with zero dependencies), and runs right on your laptop. You will learn how to use DuckDB with your existing Python tools like Pandas, Polars, and Ibis to simplify and speed up your pipelines. Lastly, you will learn how to use SQL to create fast, interactive data visualizations, and how to teach your data how to fly and share it via the Cloud.


In this tutorial, you will…

  • Learn how to quack SQL with DuckDB. We will dedicate the first half of the tutorial to a beginner-friendly introduction to SQL. You'll learn to load data, get a table, filter a table by column names, add a calculated column to your table and handy tools like "group by all". We'll also learn how to work with multiple tables by using joins and subqueries. For the exercises, you can use the example data we provide, or BYO data.
  • Learn how to use SQL seamlessly with your favorite Python tools. Here we’ll talk a bit about why we think DuckDB is so helpful for a Pythonista to have in their toolkit! If SQL is the right tool for the job, DuckDB is fluent in the world’s friendliest SQL dialect. DuckDB can also fit seamlessly into any existing dataframe workflow by reading and writing Pandas, Apache Arrow, and Polars dataframes. You'll learn how to use the DuckDB engine with no SQL in sight using either DuckDB’s relational API, PySpark API or Ibis.
  • Run SQL queries in live data viz for on-the-fly analytics. You'll learn how to use DuckDB in the browser and run SQL queries as part of interactive data visualization, and how to plot data directly from DuckDB with Matplotlib.
  • Share your data via the Cloud. Now that you've mastered the basics of SQL, you will teach your data to fly with Cloud providers such as AWS and MotherDuck.

Installation Instructions

1) Create a Google account to access Google Colab or BYO Jupyter Notebook, i.e. set up a Python environment that has Jupyter Notebook, pandas and duckdb installed, 2) (Optional for "Do more with your data" section) Sign up for MotherDuck via https://app.motherduck.com/?auth_flow=signup

Prerequisites

Beginner/Intermediate Python skills
Basic knowledge of DataFrames in Python (e.g. Pandas, Ibis)

Guen is a software engineer at MotherDuck on the Ecosystems team. Guen has broad experience with software engineering, data engineering and data science with Python in the context of scientific data acquisition, analysis and computation for experimental physics. She has given introductory talks and workshops on SQL and DuckDB at various conferences, hackathons and events.

Hello, I'm Alex! I am a customer software engineer at MotherDuck and I blog for DuckDB Labs. My background is Industrial and Systems Engineering from Virginia Tech, but I've decided I prefer working with data!

I joined MotherDuck 2 years ago after 9 years at Intel. I started at Intel as an industrial engineer, later became a technical analyst, and then jumped into a data scientist role. Back in 2020 I discovered DuckDB while building an internal self service analytics platform and became one of DuckDB's biggest Twitter fans! I have been diving deeper into Duck-themed databases ever since.