CuPy: My Journey toward GPU-Accelerated Computing in Python SciPy 2025

CuPy: My Journey toward GPU-Accelerated Computing in Python
.ical

2025-07-09 13:15–13:45, Ballroom

This talk walks all Pythonistas through recent CuPy feature development. Join me and hear my story on how an open-source novice started contributing to and helping CuPy over the years grow into a full-fledged, reliable, GPU-accelerated array library that covers most of NumPy, SciPy, and Numba functionalities.

Today, CuPy is the go-to library for general-purpose GPU programming in Python. To achieve the goal while keeping the library performant, maintainable, and easy to package and deploy, CuPy resorted to Just-in-Time (JIT) compiling CUDA C++ kernels as the underlying workhorse. Equipped with this capability and other highly optimized vendor libraries, CuPy quickly expanded its coverage over the majority of NumPy ufuncs (elementwise functions), reduction operations (ex: cupy.sum), FFT, and linear algebra. Users were able to accelerate their NumPy code on GPUs.

Over time, it became clear that the same strategy opens up two doors simultaneously: One is to allow CuPy to cover more scientific computing routines from SciPy, should there exist parallel algorithms friendly to massively parallel architectures like GPUs. Another is to allow users to implement custom GPU workloads themselves when all libraries fail to meet the need. The two pathways that were paved out of the doors eventually cross again at the @cupyx.jit decorator, enabling users to write GPU code in Python syntax similar to Numba’s @cuda.jit, but instead powered by a flexible Python AST to CUDA C++ transpiler. I argue that it is the programmability part of CuPy that solved the “last-mile” challenge and gained its current status.

I will lead the audience through CuPy’s history from my perspective as a long-time contributor, give Pythonistas a glimpse into the magic of GPU programming and Python’s productive interoperability, and share my vision for how the future of CuPy might look like.

Leo Fang

I am a senior software engineer at NVIDIA, currently leading the CUDA Python program. We aim to enable Python as a first-class language on the CUDA platform. In my prior roles at NVIDIA I co-founded several Python projects, including cuQuantum Python and nvmath-python, and spoke in PyCon 2022 as a sponsor. I have also supported many open-source developments, such as conda-forge, Python array API standard, and DLPack. Before joining NVIDIA, I was an assistant computational scientist at Brookhaven National Lab, and contributed to open source projects like CuPy and mpi4py.

CuPy: My Journey toward GPU-Accelerated Computing in Python .ical 2025-07-09 13:15–13:45, Ballroom

CuPy: My Journey toward GPU-Accelerated Computing in Python
.ical

2025-07-09 13:15–13:45, Ballroom