SciPy 2024

Alex Monahan

Alex is a forward deployed software engineer at MotherDuck and writes blogs and docs part time for DuckDB Labs. He has a bachelor's in Industrial and Systems Engineering from Virginia Tech. Alex recently joined MotherDuck after 9 years at Intel. After starting at Intel as an industrial engineer, Alex later became a technical analyst, and then moved into a data scientist role. Back in 2020 Alex discovered DuckDB while building an internal self-service analytics platform. It was such a perfect fit that he quickly integrated it and began using it in multiple projects. Alex also became one of DuckDB's biggest Twitter fans! He has been diving deeper into duck-themed databases ever since.

The speaker's profile picture

Sessions

07-09
08:00
240min
All the SQL a Pythonista needs to know: an introduction to SQL and DataFrames with DuckDB
Guen Prawiroatmodjo, Alex Monahan, Mehdi Ouazza, Elena Felder

Structured Query Language (or SQL for short) is a programming language to manage data in a database system and an essential part of any data engineer’s tool kit. In this tutorial, you will learn how to use SQL to create databases, tables, insert data into them and extract, filter, join data or make calculations using queries. We will use DuckDB, a new open source embedded in-process database system that combines cutting edge database research with dataframe-inspired ease of use. DuckDB is only a pip install away (with zero dependencies), and runs right on your laptop. You will learn how to use DuckDB with your existing Python tools like Pandas, Polars, and Ibis to simplify and speed up your pipelines. Lastly, you will learn how to use SQL to create fast, interactive data visualizations, and how to teach your data how to fly and share it via the Cloud.

Tutorials
Ballroom A
07-12
13:15
30min
How to bootstrap a Data Warehouse with DuckDB
Guen Prawiroatmodjo, Nicholas Ursa, Alex Monahan

A Data Warehouse (DW) is a powerful tool to manage your scientific data, training data, logs, or any other type of relational data. Most Data Warehouses are cloud-based and built to scale to petabyte workflows, but might not be optimal for smaller workloads that need a fast iteration cycle. Likewise, a collection of CSV files and python scripts can become painful to share and maintain. This is where DuckDB comes in! DuckDB is a fast, in-process database that you can run on your laptop, supports a rich SQL dialect, and you can push to the cloud with just a single line of code. In this talk, we’ll show you how to bootstrap a Data Warehouse on your laptop using open source, including ETL (extract-transform-load) data pipelines, dashboard visualization, and sharing via the cloud.

General
Room 316