07-11, 13:15–13:45 (US/Pacific), Room 315
Real-time machine learning depends on features and data that by definition can’t be pre-computed. Detecting fraud or acute diseases like sepsis requires processing events that emerged seconds ago. How do we build an infrastructure platform that executes complex data pipelines (< 10ms) end-to-end and on-demand? All while meeting data teams where they are–in Python–the language of ML!
Learn how we built a symbolic interpreter that accelerates ML pipelines by transpiling Python into DAGs of static expressions. These expressions are optimized in C++ and eventually run in production workloads at scale with Velox–an OSS (~4k stars) unified query engine (C++) from Meta.
Relevance to SciPy audience and high-level overview
- Offers insights into parsing Python's abstract syntax tree using Python’s ast module
- Demonstrate how we parse type annotations to build a DAG that’s later leveraged to dynamically create query plans at inference time
- Filters and projections are pushed down E2E
- Parse Python functions to generate static expressions that are executed either
- in C++ post fetching data
- or as SQL UDF’s at query time to achieve orders of magnitude speed up
- We also built our own SQL Driver in C++ that we interface with to replace SQLAlchemy
- Python functions that can’t be converted into static expressions are run against isolated processes (Ray cluster/or custom sub-process) for parallelism
- Execute the query plan using Velox
- Velox is an extensible C++ execution engine library for building data management systems
- Open-source with 4k stars on GitHub
- Contribute upstream to Velox, as it wasn’t built with ML use cases in mind
Background & Motivation
- Real-time machine learning demands rapid response times
- Python’s speed is often a bottleneck
- Overview of the strategies we’ve employed to circumvent this
- Need for real-time is only growing
- Demand for contextually aware systems that can process and react to information instantly is increasing
- Demand also grows as the individual touchpoints between humans and AI systems multiply
What are the real-world applications for real-time ML?
Inference, recommendations, reranking, etc.
- Predictive monitoring + health
- Predictive monitoring with sensor data (Pathogens, pollution, IoT, etc.)
- Acute disease detection
- Operations e.g. ambulance routing
- Fraud: Is this transaction fraudulent? Is this website phishing (AI-generated content)?
- Lots of features (which are only meaningful when meticulously combined)
- Ecommerce: Recommendations e.g. other shoppers also bought X
- Collaborative filtering
- Marketplaces: Re-ranking matches between buyers and sellers
- Event-driven similarity and vector search at a huge scale
- single user + real-time preferences vs many sellers / products + real-time preferences
Methods & Approach
- DAG construction using Python’s AST
- Automatic parallel execution of concurrent pipeline stages
- Vectorization of pipeline stages that are written using scalar syntax
- Low-latency key/value stores like Redis, Bigtable, and DynamoDB to minimize cached feature fetch time
- Statistics-informed join planning
- JIT transformation of Python code into native code
Results & Effects
- Throughput of hundreds of millions of features per second.
- Sub-second latencies (E2E)
- Zero-ETL - Our ML model can now be fully decoupled from ETL unlocking similar affordances to infrastructure-as-code
- Single source of truth
- Features and logic are shared and consistent across teams i.e. another team can easily use the (exact same) logic for computing creditworthiness
- Version control > easily rollback to the previous iteration without needing to backfill data
- Branch deploys - Data scientists can experiment in their own branches
- Easily simulate model runs with historical production data
- Experiment, A/B test, and evaluate models
- Prevent drift between training/serving, staging/prod, etc.
- It’s the same underlying data
- Integrate more data sources
- Minimize dependency and schema requests to your DE teams since data is transformed post-fetch
- Broadened contexts by pulling from Postgres, Kafka, AWS Glue, Snowflake, a microservice/3rd party API
Elliot Marx is one of the co-founders of Chalk. He started his career at Affirm, where he built the early risk and credit data infrastructure system (the inspiration for Chalk). He then co-founded Haven Money, which Credit Karma acquired to power its banking products. He holds a B.S. and M.S. in Computer Science from Stanford University.