SciPy 2023

07:00

60min

Registration

Classroom 106

08:00

240min

Building better data structures, APIs and configuration systems for scientific software using Pydantic

Axel Donath, Nick Langellier

This tutorial is an introduction to Pydantic, a library for data validation and settings management using Python type annotations. Using a semi-realistic ML and / or scientific software pipeline scenario we demonstrate how Pydantic can be used to support type validations for scientific data structures, APIs and configuration systems. We show how the use of Pydantic in scientific and ML software leads to a more pleasant user experience as well as more robust and easier to maintain code. A minimum knowledge of Python type annotations, class definitions and data structures will be helpful
for beginners but not required.

Controlling Self-Landing Rockets Using CVXPY

Philipp Schiele, Steven Diamond, Eric Sager Luxenberg

In this tutorial, attendees will learn hands-on how to optimize the trajectory of a self-landing rocket in a real-time simulated setting using CVXPY, a Python-embedded modeling language for convex optimization. We integrate the optimization with the Kerbal Space Program, to showcase a complete landing mission without human intervention, ideally in one piece. CVXPY allows solving complex problems declaratively, letting convex optimization find an optimal way of meeting target conditions with respect to an objective function. After solving the initial problem, attendees will use a selection of advanced CVXPY features while making the example gradually more realistic.

Full-stack Machine Learning for Data Scientists

Hugo Bowne-Anderson, Savin Goyal

One of the key questions in modern data science and machine learning, for businesses and practitioners alike, is how do you move machine learning projects from prototype and experiment to production as a repeatable process. In this workshop, we present an introduction to the landscape of production-grade tools, techniques, and workflows that bridge the gap between laptop data science and production ML workflows.

Introduction to Python and Programming

Matt Davis

Enjoy a gentle introduction to Python for folks who are completely new to it and may not have much experience programming. Learn how to write Python while practicing loops, if’s, functions, and usage of Python’s built-in features in a series of fun, interactive exercises inside Jupyter Notebooks. By the end you’ll be ready to write your own basic Python -- but most importantly, I want you to learn the form and vocabulary of Python so that you can understand Python documentation, interpret code written by others, and get the most out of other SciPy tutorials.

Mosaic Magic with Matplotlib

Kyle Sunden

Communicating scientific data often relies on making comparisons between multiple datasets.
Join the Matplotlib team to learn about creating multi-axis figures to display such data side-by-side.
This intermediate level tutorial will cover a variety of tools for making multi-axis figures.
Of particular focus will be the subplot_mosaic and the layout engines: tight, constrained, and compressed.
This tutorial will emphasize the use of Matplotlib's Object Oriented (OO) API and why that is generally recommended over the pyplot (plt) API.

PPML: Machine Learning on data you cannot see

Valerio Maggio

Privacy guarantee is the most crucial requirement when it comes to analyse sensitive data. However, data anonymisation techniques alone do not always provide complete privacy protection; moreover Machine Learning models could also be exploited to leak sensitive data when attacked, and no counter-measure is applied. Privacy-preserving machine learning (PPML) methods hold the promise to overcome all these issues, allowing to train machine learning models with full privacy guarantees. In this tutorial we will explore several methods for privacy-preserving data analysis, and how these techniques can be used to safely train ML models without actually seeing the data.

image analysis and visualization in Python with scikit-image, napari, and friends

Juan Nunez-Iglesias, Lars Grüter, Kira Evans

Between telescopes and satellite cameras and MRI machines and microscopes, scientists are producing more images than they can realistically look at. They need specialized viewers for multi-dimensional images, and automated tools to help process those images into knowledge. In this tutorial, we will cover the fundamentals of algorithmic image analysis, starting with how to think of images as NumPy arrays, moving on to basic image filtering, and finishing with a complete workflow: segmenting a 3D image into regions and making measurements on those regions. At every step, we will visualize and understand our work using matplotlib and napari.

Tutorials

Classroom 202

12:00

90min

Lunch

Classroom 106

13:30

240min

3D Visualization with PyVista

Bane Sullivan, Tetsuo Koyama, Alexander Kaszynski

PyVista is a general purpose 3D visualization library used for over 1400+ open source projects for the visualization of everything from computer aided engineering and geophysics to volcanoes and digital artwork.

PyVista exposes a Pythonic API to the Visualization Toolkit (VTK) to provide tooling that is immediately usable without any prior knowledge of VTK and is being built as the 3D equivalent of Matplotlib, with plugins to Jupyter to enable visualization of 3D data using both server- and client-side rendering.

How the Little Jupyter Notebook Became a Web App: Managing Increasing Complexity with nbdev

Nicole Brewer, Ludovico Bianchi

Already familiar with ipywidgets, but ready to take your skills to the next level? In this tutorial we walk through what it takes to transform an exploratory Jupyter Notebook into a mature web application. Web apps can be a valuable product of collaboration between researchers and software developers, and the packages used in this tutorial were selected to support this relationship, starting with using JupyterLab as an integrated development environment. Attendees will learn how to design and document a scientific web application that accommodates increasing complexity, but is also inheritable by the researchers who maintain them in the long run.

Introduction to Causal Inference

Roni Kobrosly

This tutorial session is intended to give attendees a gentle introduction to applying causal thinking and causal inference to data using python. Causal data analysis is very common in many academic domains (e.g. in social psychology, epidemiology, macroeconomics, etc) as well as in industry (all of the largest Silicon Valley tech companies employ teams of scientists who answer business questions purely with causal inference methods). The tutorial will involve a combination of presentations with open Q&A and hands-on exercises contained in Google Colab notebooks.

Introduction to Numerical Computing With NumPy

Sandhya Govindaraju

NumPy provides Python with a powerful array processing library and an elegant syntax that is well suited to expressing computational algorithms clearly and efficiently. We'll introduce basic array syntax and array indexing, review some of the available mathematical functions in NumPy, and discuss how to write your own routines.

Meet your coding best friend: VS Code💖 - A hands-on tutorial on how to get the most out of the world’s most popular Python editor

Guen Prawiroatmodjo, Sarah Kaiser, Leopold Talirz

Visual Studio Code (VS Code) is a free code editor that runs on Windows, Linux, macOS and in your browser. This tutorial aims at Python programmers of all levels who are already using VS Code or are interested in doing so, and will take them from zero (installing VS Code) to a production setup for Python development. We will cover starter topics, such as customizing the UI and extensions, using code autocomplete, code navigation, debugging, and Jupyter Notebooks. We will also go into advanced use cases, such as remote development, pair programming via Live Share, Dev containers, GitHub Codespaces & more.

Modern Deep Learning with PyTorch

Sebastian Raschka

We will kick off this tutorial with an introduction to deep learning and highlight its primary strengths and use cases compared to traditional machine learning. In recent years, PyTorch has emerged as the most widely used deep learning library for research. However, a lot has changed regarding how we train neural networks these days. After getting a firm grasp of the PyTorch API, you will learn how to train deep neural networks using various multi-GPU training paradigms. We will also fine-tune large language models (transformers) and deploy them to the cloud.

Scalable machine learning workloads with Ray AI Runtime

Emmy Li, Adam Breindel

Machine learning (ML) pipelines involve a variety of computationally intensive stages. As state-of-the-art models and systems demand more compute, there is an urgent need for adaptable tools to scale ML workloads. This idea drove the creation of Ray—an open source, distributed ML compute framework that not only powers systems like ChatGPT but also pushes theoretical computing benchmarks. Ray AIR is especially useful for parallelizing ML workloads such as pre-processing images, model training and finetuning, and batch inference. In this tutorial, participants will learn about AIR’s composable APIs through hands-on coding exercises.

Tutorials

Classroom 103

07:00

60min

Registration

Classroom 106

08:00

240min

An Introduction to Cloud-Based Geospatial Analysis with Earth Engine and Geemap

Qiusheng Wu, Steve Greenberg

This tutorial is an introduction to cloud-based geospatial analysis with Earth Engine and the geemap Python package. We will cover the basics of Earth Engine data types and how to visualize, analyze, and export Earth Engine data in a Jupyter environment using geemap. We will also demonstrate how to develop and deploy interactive Earth Engine web apps. Throughout the session, practical examples and hands-on exercises will be provided to enhance learning. The attendees should have a basic understanding of Python and Jupyter Notebooks. Familiarity with Earth science and geospatial datasets is not required, but will be useful.

Data of an Unusual Size: A practical guide to analysis and interactive visualization of massive datasets

Pavithra Eswaramoorthy, Dharhas Pothina, Christopher Ostrouchov

While most scientists aren't at the scale of black hole imaging research teams that analyze Petabytes of data every day, you can easily fall into a situation where your laptop doesn't have quite enough power to do the analytics you need.

In this hands-on tutorial, you will learn the fundamentals of analyzing massive datasets with real-world examples on actual powerful machines on a public cloud provided by the presenters – starting from how the data is stored and read, to how it is processed and visualized.

Explore generative models in AI with Keras

Divyashree Shivakumar Sreepathihalli, Chen Qian

This tutorial introduces Keras, a powerful deep learning library and demonstrates how to enable generative models using Keras. The first part delves into the Keras training pipeline and extended modules. The second part explores image generative models using stable diffusion, with live coding examples to generate novel images and teach the model new concepts. Finally, you'll explore language generative models, including GPT and BART, with a live coding example that demonstrates how to enable these models. By the end of this tutorial, you'll have a solid understanding of how to harness Keras to create powerful AI applications.

Pandas can be tricky, and there is a lot of bad advice floating around. This tutorial will cut through some of the biggest issues I've seen with Pandas code after working with the library for a while and writing three books on it.

We will discuss:

Proper types
Chaining
Aggregation
Debugging

Power up your work with compiling and profiling

Cheuk Ting Ho

In this workshop, we will introduce Numba - a JIT compiler that is designed to speed up numerical calculations. Most people found all of it is like a mystery - It sounds like magic, but how does it work? Under what conditions does it work? And because of it, new users found it hard to start using it and it requires a steep learning curve to get the hang of it. This workshop will provide all the knowledge that you need to make Numba works for you.

Despite its reputation for being slow, Python is the leading language of scientific computing, which generally needs large-scale (fast) computations. This is because most scientific problems can be split into "metadata bookkeeping" and "number crunching," where the latter is performed by array-oriented (vectorized) calls into precompiled routines.

This tutorial is an introduction to array-oriented programming. We'll focus on techniques that are equally useful in NumPy, Pandas, xarray, CuPy, Awkward Array, and other libraries, and we'll work in groups on three class projects: Conway's Game of Life, evaluating decision trees, and computations on ragged arrays.

hvPlot and Panel: Visualize all your data easily, from notebooks to dashboards

James A. Bednar, Sophia Yang

This tutorial will show you how to use the Pandas or Xarray APIs you already know to interactively explore and visualize your data even if it is big, streaming, or multidimensional. Then just replace your expression arguments with widgets to get a web app that you can share as HTML+WASM or backed by a live Python server. These tools let you focus on your data rather than the API, and let you build linked, interactive drill-down exploratory apps without having to run a web-technology software development project, which you can then share without becoming an operations specialist.

Tutorials

Classroom 105

12:00

90min

Lunch

Classroom 106

13:30

240min

A Hands-on Introduction to Production-grade Data Science Orchestration with Flyte

Niels Bantilan

One of the biggest challenges for data scientists and machine learning engineers alike is the friction caused by the iteration cycle between prototyping and production. It’s not enough to deploy a working model to a serving app. The iterative process itself needs to be a tight feedback loop between experimentation, data and model refinement, deploying to production, and dealing with data drift. In this tutorial, attendees will learn how to unify the common tools in the Python Data/ML scientific stack into a single orchestration plane using Flyte so that you can reduce the friction between prototyping and production.

Advanced Dask Tutorial

James Bourbeau, Naty Clementi, Julia Signell, Charles Blackmon-Luca

Dask is a Python library for scaling and parallelizing Python code. It provides familiar, high-level interfaces to extend the SciPy ecosystem to larger-than-memory or distributed environments, as well as lower-level interfaces for parallelizing custom algorithms. In this tutorial, we’ll cover advanced features of Dask like applying custom operations to Dask DataFrames and arrays, debugging computations, diagnosing performance issues, and more. Attendees should walk away with a deeper understanding of Dask’s internals, an introduction to more advanced features, and ideas of how they can apply these features effectively to their own workloads.

Interactive data visualization with Bokeh

Pavithra Eswaramoorthy, Ian Thomas, Bryan Van de Ven, Timo Metzger, Victoria Adesoba

Bokeh is a library for interactive data visualization. You can use it with Jupyter Notebooks or create standalone web applications, all using Python. This tutorial is a complete guide to Bokeh, where we start with a basic line plot and step-by-step make our way to creating a dashboard with several interacting components. This tutorial will be helpful for scientists who are looking to level-up their analysis and presentations, and tool developers interested in adding custom plotting functionally or dashboards.

Python for answering geospatial questions: exploring social inequity in our communities

bonny p mcclain

We love Python but maybe not enough to commit to an entire coding language. What if we could understand the fundamentals and begin working with real-time data in a single session? Actionable python scripts and understanding the frameworks might be enough to be a springboard for larger exploration projects.

Resampling and Monte Carlo Methods in SciPy.stats

Matt Haberland, Albert Steppi

Resampling and Monte Carlo statistical techniques are surprisingly intuitive, and they are often more flexible and accurate than their better-known analytical counterparts. In this tutorial, participants will develop their intuitive understanding of frequentist statistics and apply it using three functions in scipy.stats - monte_carlo_test, permutation_test, and bootstrap - to dramatically expand the statistical analyses they can perform with the SciPy Library.

SymPy Introductory Tutorial

Aaron Meurer, Anutosh Bhat, Sangyub Lee

SymPy is a Python library for symbolic mathematics. This tutorial will introduce SymPy to a beginner audience. It will cover an introduction to symbolic computing, basic operations, simplification, calculus, matrices, advanced expression manipulation, code generation, and selected advanced topics. The tutorial does not have any prerequisites beyond knowledge of Python and basic freshman level mathematics. It will be presented with Jupyter notebooks with regular exercises for the attendees. After attending this tutorial, attendees will be able to start using SymPy to solve their own problems.

Xarray: Friendly, Interactive, and Scalable Scientific Data Analysis

Deepak Cherian, Negin Sobhani, Scott Henderson, Anderson Banihirwe, Don Setiawan, Thomas Nicholas, Jessica Scheick

Xarray provides data structures for multi-dimensional labeled arrays and a toolkit for scalable data analysis on large, complex datasets with many related variables. Xarray combines the convenience of labeled data structures inspired by Pandas with NumPy-like multi-dimensional arrays to provide an intuitive and scalable interface for scientific analysis. This tutorial will introduce data scientists already familiar with Xarray to more intermediate and advanced topics, such as applying functions in SciPy/NumPy with no Xarray equivalent, advanced indexing concepts, and wrapping other array types in the scientific Python ecosystem.

Tutorials

Classroom 203

18:30

120min

SciPy Welcome Reception

200 W Cesar Chavez

SciPy Welcome Reception hosted by Enthought. Tuesday, July 11, 6:30-8:30 at Enthought HQ, 200 W Cesar Chavez, Austin. Meet fellow attendees! Food and drinks served!

Walk, get a ride, or take the bus with CapMetro!

Enthought - 200 W Cesar Chavez St

08:00

60min

Registration and Breakfast

Zlotnik Ballroom

09:00

15min

Opening Notes

Zlotnik Ballroom

09:15

45min

Keynote - Open Source Contributors in Space and Time

Michael Droettboom

Michael Droettboom is a Principal Software Engineering Manager at Microsoft where he leads the CPython Performance Engineering Team. That team contributes directly to the upstream CPython project, and recently helped make Python 3.11 up to 60% faster than 3.10.

Michael has been contributing to open source for over 25 years: he is the former lead maintainer of matplotlib, a major contributor to astropy, and he is the original author of Pyodide and airspeed velocity. His work has supported such diverse applications as the Hubble and James Webb Space Telescopes, the Firefox web browser, infrared retinal imaging, and optical sheet music recognition.

Keynote

Zlotnik Ballroom

10:00

25min

SciPy Tools Plenary

Zlotnik Ballroom

10:25

20min

Break

Zlotnik Ballroom

10:45