SciPy 2023

A Hands-on Introduction to Production-grade Data Science Orchestration with Flyte
07-11, 13:30–17:30 (America/Chicago), Classroom 104

One of the biggest challenges for data scientists and machine learning engineers alike is the friction caused by the iteration cycle between prototyping and production. It’s not enough to deploy a working model to a serving app. The iterative process itself needs to be a tight feedback loop between experimentation, data and model refinement, deploying to production, and dealing with data drift. In this tutorial, attendees will learn how to unify the common tools in the Python Data/ML scientific stack into a single orchestration plane using Flyte so that you can reduce the friction between prototyping and production.


Background

This tutorial interleaves lecture-style content and coding exercises to give data scientists, machine learning engineers, and data engineers hands-on experience with Flyte. Flyte is an open source workflow orchestrator that has a Python SDK for writing and scheduling execution graphs in a type-safe, reproducible manner. The topics and concepts covered in this tutorial are transferable to other similar orchestration tools, or would be useful for anyone who wants to build their own orchestrator. We will anchor the tutorial to five challenges of model development and deployment: scalability, data quality, reproducibility, recoverability, and auditability. Using Flyte, we’ll see how to address these challenges and abstract them out to give you a broader understanding of how to overcome them.

Main Content

First I’ll define and describe what these five challenges mean in the context of model development. Then I’ll dive into the ways in which Flyte provides solutions to them, taking you through the reasoning behind Flyte’s data-centric and ML-aware design. We'll cover:

  • Flyte tasks and workflows: the building blocks for expressing execution graphs.
  • Dynamic workflows: for defining execution graphs at runtime.
  • Map tasks: Scale embarrassingly parallel workflows.
  • Plugins: Extend Flyte's core functionality.
  • Type System: See the benefits of static type safety.
  • DataFrame Types: Validate dataframe-like objects at runtime.
  • Reproducibility: Containerize and harden your execution graph.
  • Caching: Don't waste precious compute resources re-running nodes.
  • Recovering Executions: Build fault-tolerant pipelines.
  • Checkpointing: Checkpoint progress within a node.
  • Flyte Decks: Create rich static reports associated with your tasks.

Attendees will learn how Flyte distributes and scales computation, enforces static and runtime type safety, leverages Docker to provide strong reproducibility guarantees, implements caching and checkpointing to recover from failed model training runs, and ships with built-in data lineage tracking for full data pipeline auditability.

Wrap-up

The end of the tutorial will provide a summary of all the main learnings, point to resources to learn more, and a discussion for attendees to address their questions.

Resources


Prerequisites

Must Have:
- Working knowledge of data frameworks like pandas, polars, dask .
- Working knowledge of ML frameworks like scikit-learn, pytorch, and tensorflow.

Nice to Have:
- Familiarity with Docker
- Familiarity with Kubernetes concepts

Installation Instructions

https://github.com/flyteorg/flyte-conference-talks/tree/main/scipy-2023

Niels is the Chief Machine Learning Engineer at Union.ai, and core maintainer of Flyte, an open source workflow orchestration tool, author of UnionML, an MLOps framework for machine learning microservices, and creator of Pandera, a statistical typing and data testing tool for scientific data containers. His mission is to help data science and machine learning practitioners be more productive.

He has a Masters in Public Health with a specialization in sociomedical science and public health informatics, and prior to that a background in developmental biology and immunology. His research interests include reinforcement learning, AutoML, creative machine learning, and fairness, accountability, and transparency in automated systems.

This speaker also appears in: