Leo Fang
Senior Math Libraries Engineer @NVIDIA
Sessions
If you have interest in NumPy, SciPy, Signal Processing, Simulation, DataFrames, or Graph Analysis, we'd love to hear what performance you're seeing and how you're measuring. We've been working to accelerate your favorite packages on GPUs
Traditionally, scientific Python libraries have optional support for accelerated numerical libraries, such as OpenBLAS, FFTW, MKL, CUDA Math Libraries, etc. However, integration of these libraries by Python packages is ad hoc, leading to fragmented feature coverage and non-uniform interfaces. We introduce a novel design of a Python math library that abstracts out C/C++, Fortran, and CUDA from end users, allowing a unified pythonic interface that leverages accelerated CPU/GPU library backends, while offering to users a friendly installation experience and full exposure of the majority, if not all, of library-specific tuning knobs. Pythonic device functions callable within a JIT compiling context are made possible as well. We illustrate that fundamental math operations such as FFT, MATMUL, SOLVE, ... can be offered in a “performant and productive” fashion with this design, while being transparently interoperable with all mainstream Python array libraries such as NumPy, CuPy, and PyTorch.