07-10, 14:20–14:50 (US/Pacific), Ballroom
The rapidly evolving Python ecosystem presents increasing challenges for adapting code using traditional methods. Developers frequently need to rewrite applications to leverage new libraries, hardware architectures, and optimization techniques. To address this challenge, the Numba team is developing a superoptimizing compiler built on equality saturation-based term rewriting. This innovative approach enables domain experts to express and share optimizations without requiring extensive compiler expertise. This talk explores how Numba v2 enables sophisticated optimizations—from floating-point approximation and automatic GPU acceleration to energy-efficient multiplication for deep learning models—all through the familiar NumPy API. Join us to discover how Numba v2 is bringing superoptimization capabilities to the Python ecosystem.
In today's landscape of AI/ML-dominated computing and ever-increasing programming complexity, flexible compiler tooling has become more critical than ever. Building on a decade of experience developing Numba, our team is creating a next-generation compiler (Numba v2) that supports composable term rewriting rules, making compiler development modular, extensible, and accessible to domain experts across AI, machine learning, and traditional numerical applications.
Key Challenges in the Current Landscape
Our experience has highlighted three critical challenges:
- The Python ecosystem's strength lies in its diverse libraries, offering various implementations of core numerical routines and specialized hardware access through different APIs. As these libraries, APIs, and target hardware evolve, developers must continuously adapt their codebases to effectively utilize both existing and emerging capabilities.
- While numerical codebases rely heavily on compiler technology for performance optimization, current Python compilation faces two significant limitations:
* Python's language structure doesn't naturally align with the structured forms expected by common compilation technologies like MLIR/LLVM, complicating optimization efforts.
* Traditional compiler technology depends on heuristics—predefined compiler passes optimized for general cases—forcing developers to over-specialize their programs through various flags and implementation "tricks". - Domain experts possess valuable optimization knowledge but lack straightforward methods to implement and share these optimizations without extensive source code modifications.
Numba v2: A New Approach
To address these challenges, we're developing a next-generation compiler that broadens access to compiler technology. Beyond modernizing the core compiler, Numba v2 introduces "rewrite rules" that allow users to express adaptations and optimizations for both existing and new code. By leveraging equality saturation—which explores all program variants derived from rewriting rules—Numba v2 achieves superoptimization through cost-based extraction. These rules serve as shareable, distributable, domain-specific optimizations that enhance both new and established workflows through simple recompilation.
Practical Applications and Benefits
We will demonstrate how this approach enhances machine learning and numerical computing by unlocking new optimization opportunities, including:
- Numerical Approximation: High tolerance for floating-point imprecision enables optimizations beyond traditional
-ffast-math
, incorporating ISA-specific techniques that push efficiency further. - Automatic Hardware Acceleration: Numba v2 can seamlessly offload NumPy array expressions to GPUs, optimizing performance without requiring explicit user intervention.
- Energy-efficient Computation: New floating-point optimizations enable replacing fundamental operations—such as multiplication—with more efficient variants (such as L-Mul), potentially reducing power consumption in deep learning models and numerical applications.
This talk is essential for numerical codebase maintainers, domain experts interested in sharing optimization knowledge, and anyone working with hardware acceleration or superoptimization techniques.