SciPy 2024

Ian Spektor

Lead Machine Learning Engineer @ Tryolabs | Founding Engineer @ Puppeteer AI | CTO @ Buen Provecho

Currently building
- Temporian, an open-source Python library for preprocessing and feature engineering of temporal data
- Puppeteer, an actually useful AI platform for the healthcare industry
- Buen Provecho, a startup fighting food waste in Latin America

The speaker's profile picture

Sessions

07-08
08:00
240min
A hands-on forecasting guide: from theory to practice
Ian Spektor, Diego Kiedanski, Mathieu Guillame-Bert

Forecasting is central to decision-making in virtually all technical domains. For instance, predicting product sales in retail, forecasting energy demand, and anticipating customer churn all have tremendous value across different industries. However, the landscape of forecasting techniques is as diverse as it is useful, and different techniques and expertise are adapted to different types and sizes of data.
In this hands-on workshop, we give an overview of forecasting concepts, popular methods, and practical considerations. We’ll walk you through data exploration, data preparation, feature engineering, statistical forecasting (e.g., STL, ARIMA, ETS), forecasting with tabular machine learning models (e.g., decision forests), forecasting with deep learning methods (e.g., TimesFM, DeepAR), meta-modeling (e.g., hierarchical reconciliation and relational modeling, ensembles, resource models), and how to safely evaluate such temporal models.

Tutorials
Ballroom D
07-12
10:45
30min
Safe, fast, and easy time series preprocessing with Temporian
Mathieu Guillame-Bert, Ian Spektor

Temporal data is ubiquitous in data science and plays a vital role in machine learning pipelines and business decisions. Preprocessing temporal data using generic data tools can be tedious, lead to inefficient computation, and be prone to errors.
Temporian is an open-source library for safe, simple, and efficient preprocessing and feature engineering of temporal data. It supports common temporal data types, including non-uniform sampled, multi-variate, multi-index, and multi-source data. Temporian favors interactive development in notebooks and integration with other machine learning tools, and can run at scale using distributed computing.
This talk, aimed at data scientists and machine learning practitioners, will showcase Temporian’s key features along with its powerful API, and demonstrate its advantages over generic data preprocessing libraries for handling temporal data.

Data Science and AI/Machine Learning
Ballroom