SciPy 2023

Niels Bantilan

Niels is the Chief Machine Learning Engineer at Union.ai, and core maintainer of Flyte, an open source workflow orchestration tool, author of UnionML, an MLOps framework for machine learning microservices, and creator of Pandera, a statistical typing and data testing tool for scientific data containers. His mission is to help data science and machine learning practitioners be more productive.

He has a Masters in Public Health with a specialization in sociomedical science and public health informatics, and prior to that a background in developmental biology and immunology. His research interests include reinforcement learning, AutoML, creative machine learning, and fairness, accountability, and transparency in automated systems.

The speaker's profile picture

Sessions

07-11
13:30
240min
A Hands-on Introduction to Production-grade Data Science Orchestration with Flyte
Niels Bantilan

One of the biggest challenges for data scientists and machine learning engineers alike is the friction caused by the iteration cycle between prototyping and production. It’s not enough to deploy a working model to a serving app. The iterative process itself needs to be a tight feedback loop between experimentation, data and model refinement, deploying to production, and dealing with data drift. In this tutorial, attendees will learn how to unify the common tools in the Python Data/ML scientific stack into a single orchestration plane using Flyte so that you can reduce the friction between prototyping and production.

Tutorials
Classroom 104
07-12
13:15
30min
Pandera: Beyond Pandas Data Validation
Niels Bantilan

Data quality remains a core concern for practitioners of machine learning, data science, and data engineering, and in recent years specialized packages have emerged to validate and monitor data and models. However, as the open source community iterates on data frameworks – notably, highly performant entrants such as Polars – data quality libraries need to catch up to support them. In this talk, you will learn about Pandera and its journey from being a pandas-only validator to a generic tool for testing arbitrary data containers so that it can provide a standardized way of creating data validation tools.

Machine Learning, Data Science, and Ethics in AI
Zlotnik Ballroom