07-09, 15:25–15:55 (US/Pacific), Room 317
At CERN (European Organization for Nuclear Research), machine learning models are developed and deployed for various applications, including data analysis, event reconstruction, and classification. These models must not only be highly sophisticated but also optimized for efficient inference. A critical application is in Triggers- systems designed to identify and select interesting events from an immense stream of experimental data. Experiments like ATLAS and CMS generate data at rates of approximately 100 TB/s, requiring Triggers to rapidly filter out irrelevant events. This talk will explore the challenges of deploying machine learning in such high-throughput environments and discuss solutions to enhance their performance and reliability.
Machine learning has paved the way for new discoveries in high-energy physics at the Large Hadron Collider at CERN. While we already have state-of-the-art models for tasks such as data analysis, simulation, and track reconstruction, alongside effective training methodologies- their Inference still remains a challenge. Even if we develop a sophisticated model that captures the intricate patterns of fundamental particles, its impact is limited without efficient inference engines that enable its deployment and practical application. ML inference has become especially critical in high-energy physics, where data influx rates are extremely high.
Although popular frameworks like TensorFlow and PyTorch provide robust inference capabilities, their integration into C++ environments presents several challenges, including flexibility constraints when interfacing with external frameworks. ONNX Runtime, which enables fast inference of ONNX models, also has limitations due to its lack of fine-grained control.
To address these challenges, SOFIE, or the System for Optimized Fast Inference code Emit was developed. It's an inference engine designed to generate highly optimized C++ code from trained ML models. SOFIE converts models in ONNX format into its own intermediate representation and also offers limited support for models trained in Keras, PyTorch, and message-passing GNNs from DeepMind’s Graph Nets library.
The key advantage of SOFIE is its ability to generate standalone C++ code that can be directly invoked within C++ applications with low latency and minimal dependencies, requiring only BLAS for numerical computations. This enables seamless integration into high-energy physics workflows and other computationally demanding applications. Additionally, the generated code can be compiled at runtime using Cling Just-In-Time compilation, allowing for flexible execution, including within Python environments. By eliminating the need for heavyweight machine learning frameworks during inference, SOFIE provides a highly efficient and easily deployable solution for ML inference.
Recently conducted benchmarking demonstrates that SOFIE provides faster inference for event-level evaluations and consumes less memory for smaller models than standards such as ONNXRuntime and LibTorch, but still has scope of improvement in its optimization. For GNNs, SOFIE scales better by avoiding overheads from splitting models with a large number of operators.
Further ongoing developments in SOFIE now include GPU support via multiple stacks such as SYCL, ALPAKA, and CUDA, along with integration with hls4ml for FPGAs and support for models developed in Flax.
In this talk, we will explore machine learning opportunities at CERN and the challenges involved in implementing them. We will then delve into SOFIE’s architecture, use cases, and the latest developments in its optimization methods and extensions.
Outline
- Computing challenges at CERN - a brief introduction
- How Machine Learning solves them?
- Limitations and opportunities
- Introducing TMVA SOFIE
- Motivation
- Why does CERN need super-fast inference of ML models with low latency and fewer dependencies?
- Why frameworks like TensorFlow or PyTorch aren't much help at CERN for ML Inference?
- SOFIE Architecture
- Parser
- Model Storage
- Inference Code Generator
- SOFIE Parser
- ONNX Parser
- Keras Parser
- PyTorch Parser
- SOFIE Inference Code Generator
- SOFIE Advanced Models' Inference Support
- Graph Neural Networks
- Dynamic Computation Graph
- SOFIE Optimization Methods
- Inference on Accelerators
- Benchmarking results
- Future Goals
Pre-requisites
Intermediate knowledge of machine learning and the underlying mathematics will be helpful. The project is an ML inference engine developed using C++ with Python interfaces through the C-Python API. Thus, a basic understanding of the required libraries will be beneficial. Familiarity with mathematical functions such as GEMM, ReLU, matrix multiplication, and hardware accelerators will be useful for following the latest developments of the project.
- Sengupta S. ”On developing the fast machine learning inference engine”. Oral Presentation at the CERN Summer Student Series; August 2022; Geneva, Switzerland
- Sengupta S. 2022.“TMVA SOFIE: Enhancing the Machine Learning Inference Engine”. Technical Report published for the CERN Summer Student Programme. Geneva, Switzerland
- Panagou I.M., Bellas N., Moneta L., Sengupta S. 2024. Accelerating Machine Learning Inference on GPUs with SYCL. In Proceedings of the 12th International Workshop on OpenCL and SYCL. Association for Computing Machinery, New York, NY, USA, Article 17, 1–2.
- GitHub Repository
- An S., Moneta L., Sengupta S., Hamdan A. Shah N., Shende H., Mittal S., Zapata O. 2022. ROOT Machine Learning Ecosystem for Data Analysis. In Proceedings of the 21st International Workshop on Advanced Computing and Analysis Techniques in Physics Research. Bari, Italy.
- Moneta L., Panagou I.M., Sengupta S. 2024. Benchmark Studies of ML Inference with TMVA SOFIE . In Proceedings of the 27th International Conference on Computing in High-Energy & Nuclear Physics. Krakow, Poland.
- Moneta L., Sengupta S., Hamdan A.“New developments of TMVA/SOFIE: Code Generation and Fast Inference for Graph Neural Networks”. Oral Presentation at 26th International Conference on Computing in High Energy & Nuclear Physics; May 2023; Virginia, USA.
- An S., Moneta L., Sengupta S., Hamdan A., Sossai F., Saxena A. C++ Code Generation for Fast Inference of Deep Learning Models in ROOT/TMVA. 2023 Journal of Physics: Conference Series 2438 012013
- Documentation for SOFIE
- Blog Post introducing SOFIE for GSoC 2021 by Sanjiban Sengupta
Sanjiban is a Doctoral Student at CERN, affiliated to the University of Manchester. He is researching on optimization strategies for efficient Machine Learning Inference for the High-Luminosity phase of the Large Hadron Collider at CERN within the Next-Gen Triggers Project. Previously, he was a Summer Student at CERN in 2022, and also contributed at CERN-HSF via the Google Summer of Code Program in 2021. In the development of SOFIE, he was particularly involved in the development of the Keras and PyTorch Parser, storage functionalities, machine learning operators based on ONNX standard, Graph Neural Networks support, etc. Moreover, he volunteered as a Mentor for the contributors of Google Summer of Code 2022, and again in 2023, 2024 and 2025, and the CERN Summer Students of 2023 working on CERN’s ROOT Data Analysis Project.
Previously, Sanjiban spoke at PyCon India 2023 about Python interfaces for Meta’s Velox Engine. He also presented a talk on the Velox architecture at PyCon Thailand 2023. He has been contributing to open-source projects on data science and engineering that includes ROOT, Apache Arrow, Substrait, etc.