Guen Prawiroatmodjo
Guen is a software engineer at MotherDuck on the Ecosystems team. Previously, she was a Sr. Quantum Measurement Engineer at Microsoft. Guen has broad experience with software engineering, data engineering and data science with Python in the context of scientific data acquisition, analysis and computation for experimental physics and biotech. She has given introductory talks and workshops on quantum computing with Python at various conferences, hackathons and events.
Sessions
Structured Query Language (or SQL for short) is a programming language to manage data in a database system and an essential part of any data engineer’s tool kit. In this tutorial, you will learn how to use SQL to create databases, tables, insert data into them and extract, filter, join data or make calculations using queries. We will use DuckDB, a new open source embedded in-process database system that combines cutting edge database research with dataframe-inspired ease of use. DuckDB is only a pip install away (with zero dependencies), and runs right on your laptop. You will learn how to use DuckDB with your existing Python tools like Pandas, Polars, and Ibis to simplify and speed up your pipelines. Lastly, you will learn how to use SQL to create fast, interactive data visualizations, and how to teach your data how to fly and share it via the Cloud.
A Data Warehouse (DW) is a powerful tool to manage your scientific data, training data, logs, or any other type of relational data. Most Data Warehouses are cloud-based and built to scale to petabyte workflows, but might not be optimal for smaller workloads that need a fast iteration cycle. Likewise, a collection of CSV files and python scripts can become painful to share and maintain. This is where DuckDB comes in! DuckDB is a fast, in-process database that you can run on your laptop, supports a rich SQL dialect, and you can push to the cloud with just a single line of code. In this talk, we’ll show you how to bootstrap a Data Warehouse on your laptop using open source, including ETL (extract-transform-load) data pipelines, dashboard visualization, and sharing via the cloud.