Challenges and Implementations for ML Inference in High-energy Physics SciPy 2025

Challenges and Implementations for ML Inference in High-energy Physics
.ical

2025-07-09 15:25–15:55, Room 317

At CERN (European Organization for Nuclear Research), machine learning models are developed and deployed for various applications, including data analysis, event reconstruction, and classification. These models must not only be highly sophisticated but also optimized for efficient inference. A critical application is in Triggers- systems designed to identify and select interesting events from an immense stream of experimental data. Experiments like ATLAS and CMS generate data at rates of approximately 100 TB/s, requiring Triggers to rapidly filter out irrelevant events. This talk will explore the challenges of deploying machine learning in such high-throughput environments and discuss solutions to enhance their performance and reliability.

Machine learning has paved the way for new discoveries in high-energy physics at the Large Hadron Collider at CERN. While we already have state-of-the-art models for tasks such as data analysis, simulation, and track reconstruction, alongside effective training methodologies- their Inference still remains a challenge. Even if we develop a sophisticated model that captures the intricate patterns of fundamental particles, its impact is limited without efficient inference engines that enable its deployment and practical application. ML inference has become especially critical in high-energy physics, where data influx rates are extremely high.

Although popular frameworks like TensorFlow and PyTorch provide robust inference capabilities, their integration into C++ environments presents several challenges, including flexibility constraints when interfacing with external frameworks. ONNX Runtime, which enables fast inference of ONNX models, also has limitations due to its lack of fine-grained control.

To address these challenges, SOFIE, or the System for Optimized Fast Inference code Emit was developed. It's an inference engine designed to generate highly optimized C++ code from trained ML models. SOFIE converts models in ONNX format into its own intermediate representation and also offers limited support for models trained in Keras, PyTorch, and message-passing GNNs from DeepMind’s Graph Nets library.

The key advantage of SOFIE is its ability to generate standalone C++ code that can be directly invoked within C++ applications with low latency and minimal dependencies, requiring only BLAS for numerical computations. This enables seamless integration into high-energy physics workflows and other computationally demanding applications. Additionally, the generated code can be compiled at runtime using Cling Just-In-Time compilation, allowing for flexible execution, including within Python environments. By eliminating the need for heavyweight machine learning frameworks during inference, SOFIE provides a highly efficient and easily deployable solution for ML inference.

Recently conducted benchmarking demonstrates that SOFIE provides faster inference for event-level evaluations and consumes less memory for smaller models than standards such as ONNXRuntime and LibTorch, but still has scope of improvement in its optimization. For GNNs, SOFIE scales better by avoiding overheads from splitting models with a large number of operators.

Further ongoing developments in SOFIE now include GPU support via multiple stacks such as SYCL, ALPAKA, and CUDA, along with integration with hls4ml for FPGAs and support for models developed in Flax.

In this talk, we will explore machine learning opportunities at CERN and the challenges involved in implementing them. We will then delve into SOFIE’s architecture, use cases, and the latest developments in its optimization methods and extensions.

Outline

Computing challenges at CERN - a brief introduction
- How Machine Learning solves them?
- Limitations and opportunities
Introducing TMVA SOFIE
- Motivation
- Why does CERN need super-fast inference of ML models with low latency and fewer dependencies?
- Why frameworks like TensorFlow or PyTorch aren't much help at CERN for ML Inference?
SOFIE Architecture
- Parser
- Model Storage
- Inference Code Generator
SOFIE Parser
- ONNX Parser
- Keras Parser
- PyTorch Parser
SOFIE Inference Code Generator
SOFIE Advanced Models' Inference Support
- Graph Neural Networks
- Dynamic Computation Graph
SOFIE Optimization Methods
Inference on Accelerators
Benchmarking results
Future Goals

Pre-requisites

Intermediate knowledge of machine learning and the underlying mathematics will be helpful. The project is an ML inference engine developed using C++ with Python interfaces through the C-Python API. Thus, a basic understanding of the required libraries will be beneficial. Familiarity with mathematical functions such as GEMM, ReLU, matrix multiplication, and hardware accelerators will be useful for following the latest developments of the project.

Challenges and Implementations for ML Inference in High-energy Physics .ical 2025-07-09 15:25–15:55, Room 317

Outline

Pre-requisites

Challenges and Implementations for ML Inference in High-energy Physics
.ical

2025-07-09 15:25–15:55, Room 317