SciPy 2023

libyt: a Tool for Parallel In Situ Analysis with yt
07-12, 13:55–14:25 (America/Chicago), Grand Salon C

In the era of exascale computing, storage and analysis of large scale data have become more important and difficult. We present libyt, an open source C++ library, that allows researchers to analyze and visualize data using yt or other Python packages in parallel during simulation runtime. We describe the methods for reading adaptive mesh refinement data structure, handling data transition between Python and simulation with minimal memory overhead, and conducting analysis with no additional time penalty using Python C API and NumPy C API. We demonstrate how it solves the problem in astrophysical simulations and increases disk usage efficiency.


Motivation and Aims

In the era of exascale computing, storage and analysis of large scale data have become more important and difficult.
We present libyt, an open source C++ library, that allows researchers to analyze and visualize data using yt or any other Python packages in parallel during simulation runtime.

Methods

Connecting Python and Simulation

We use Python C API and NumPy C API to connect variables and arrays in simulation to Python. This includes creating a NumPy array through wrapping an existing C array without additional memory, allocating new arrays and assigning values, and building Python objects and module that contain simulation information. We also create Python C-extension methods for Python to request data from simulations.

Executing Python Codes and Handling Errors

libyt runs in situ analysis using Python interpreter. This is like running Python prompt inside the ongoing simulation with data loaded.
libyt checks input Python syntax through compiling it to code object. If error occurs, it parses the error to see if this is caused by input not done yet or a real error.

In Situ Analysis Under Parallel Computing

Each MPI process contains one simulation code and one Python. All Python instances will work together to conduct in situ analysis in parallel using mpi4py (Python bindings for MPI).
yt (a Python package for analyzing and visualizing volumetric data) supports MPI parallelism feature. libyt borrows this feature and handles data transition between different MPI processes and between simulation and Python. Since every data is separated in different processes, and we cannot predict how Python decomposes the jobs and asks for data, we use one-sided MPI to deal with data exchanging process between each nodes.

Applications

Analyzing Fuzzy Dark Matter Vortices Simulation using GAMER + libyt

We use GAMER, a simulation for astrophysics, to simulate the evolution of vortices form from density voids in a Fuzzy Dark Matter halo.
Each snapshot takes 116 GB, and a total of 321 snapshots are required to capture them (37 TB disk space). We solve this by using yt in libyt to extract our region of interest, which now consumes only 8 GB in each step. The data size is 15 times smaller.

Analyzing Core-Collapse Supernova Simulation using GAMER + libyt

We use GAMER to simulate core-collapse supernova explosions. We use libyt to call yt and draw slice plot of the entropy distribution.
Since entropy is not part of the variable in simulation's iterative process, these entropy data will only be generated through simulation when they are needed by yt. libyt tries to minimize memory usage.

Discussion and Conclusion

  • libyt provides a promising solution that binds simulation to Python with minimal memory overhead and no additional time penalty. It makes analyzing large scale simulation feasible.
  • libyt focuses on using yt as its core analytic method, even though it can call arbitrary Python modules. We will extend to more data structure in the future.

Shin-Rong Tsai is a research scientist at the University of Illinois Urbana-Champaign School of Information Sciences. She has worked on developing astrophysics simulations, processing and visualizing extensive data, and improving application performance when scaling up in high-performance computing clusters. Her work now focuses on creating an in situ analysis tool that enables ongoing simulations to use Python to analyze data. She also develops tools for analyzing and visualizing volumetric data.