07-09, 16:05–16:35 (US/Pacific), Room 317
Computational needs in high energy physics applications are increasingly met by utilizing GPUs as hardware accelerators, but achieving the highest throughput requires directly reading data into GPU memory. This has yet to be achieved for HEP’s standard domain specific “ROOT” file formats. Using KvikIO’s python bindings to CuFile and NvComp, KvikUproot is a prototype package to support the reading of ROOT file formats by the GPU. On GPUDirect storage (GDS) enabled systems, data bypasses the CPU and is loaded directly from storage to the GPU. We will discuss the methodology we developed to read ROOT files into GPUs via RDMA.
High energy physics (HEP) analyses are in need of larger and larger datasets to push the limits of experimental sensitivity to theoretical calculations. To meet these needs, upgrades to detectors and detector infrastructure are increasing data rates and GPUs are a natural choice for handling the increased data volume.
For analysts at the Large Hadron Collider (LHC), it has become necessary to find more efficient ways of processing data. Awkward Array already has rich functionality for its CUDA backend, which allows users to leverage the high throughput of GPUs without any knowledge of CUDA programming. However, current Python tools for reading and deserializing the particle physics domain specific ROOT file formats to GPU memory first read data from storage to the CPU and then finally copy it to the GPU. This unnecessarily introduces the CPU as a potential bottleneck in an analysis workflow.
KvikUproot is a prototype module for the Uproot library which uses Python bindings to cuFile and nvCOMP provided by the KvikIO library for reading and decompressing the ROOT "TTree" and newer "RNTuple" file formats. CuFile on GPU direct storage (GDS) enabled systems transfers data from storage directly to the GPU. NvCOMP provides a backend for decompressing raw data on the GPU. CuPy, which is nearly a drop-in replacement for Numpy, provides an interface in Python to buffers stored on the GPU. ROOT data is then deserialized into Awkward Arrays with its CUDA backend.
Implementation of cuFile and nvCOMP required a restructuring of Uproot’s current workflow to maximize performance. Currently, chunks of data are sequentially read, decompressed, and deserialized before being concatenated in Uproot. KvikUproot asynchronously reads many chunks of data at a time with cuFile, once read decompresses these chunks in parallel through nvCOMP, and then streams buffer deserialization operations with CuPy. Already, KvikUproot can decrease read times by 20% for TTree and 30% for RNTuple file formats without GDS support.
There are present challenges to adoption of tools such as KvikUproot. Enabling GDS involves multiple hardware and software components with limited support for 3rd party solutions. This requires research and development of our computing infrastructure to fulfill the requirements for activating these high performance features. Additionally, cuFile, nvCOMP, and CuPy are specific to NVIDIA GPUs. Tools similar to KvikUproot for other GPU types must be developed separately to accommodate the diversity of computing resources available.
Despite these challenges, KvikUproot reduces read times of ROOT data without GDS support when compared to Uproot. The continued development of KvikUproot furthers the mission of creating a suite of python tools for HEP physicists to complete analyses completely on the GPU without the CPU bottleneck. There is still development in supporting additional data types, creating a more user-friendly API, and optimizing analysis workflow integration.
I am a physics PhD student at University of Illinois Chicago (UIC) and part of the CMS group there. My areas of research are developing intuitive python tools for accelerating high energy physics analyses and performing measuring entanglement of top quark systems at the LHC.