07-11, 10:45–11:15 (US/Pacific), Ballroom
As scientific computing increasingly relies on diverse hardware (CPUs, GPUs, etc) and data structures, libraries face pressure to support multiple backends while maintaining a consistent API. This talk presents practical considerations for adding dispatching to existing libraries, enabling seamless integration with external backends. Using NetworkX and scikit-image as case studies, we demonstrate how they evolved to become a common API with multiple implementations, handle backend-specific behaviors, and ensure robustness through testing and documentation. We also discuss technical challenges, differences in approaches, community adoption strategies, and the broader implications for the SciPy ecosystem.
Scientific Python libraries often reinvent functionality to support new hardware or data structures, leading to fragmentation. For example, a GPU-enabled library might closely mimic existing library's APIs (numpy/cupy, pandas/cudf, networkx/cugraph, scikit-learn/cuml, scikit-image/cucim, etc.)
Dispatching allows existing libraries to act as a common interface for multiple backends, reducing redundancy and empowering users to switch implementations without rewriting code. This greatly reduces the work for backend developers, because the original library provides documentation, tests, and a broader community of users. Similarly, the original library is enhanced by having external backends for little effort, because backends are separately developed, maintained, and tested.
Dispatching and backend selection is a complex field with various possible implementations. Many projects implement multiple dispatching based on types, and other projects have experimented with explicit backend selection that goes beyond type dispatching and allows swapping in a different algorithm. NetworkX, the most popular library for graph analytics, has both. Dispatching in NetworkX has been developed over the last three years, and it has added features such as automatic converting (and caching) of inputs, and incorporating backend information into its documentation. Dispatching in scikit-image is much newer, and it takes a minimal, but similar, approach.
From Blaze to Ibis to uarray to array protocols to the Array API to Narwhals and many others, the SciPy ecosystem has experienced many efforts of dispatching over the years. Adding dispatching to an existing library such as NetworkX is less disruptive--and much less work--than trying to come up with a new standard API. NetworkX is already the de facto standard for graph analytics, having been developed by many contributors over decades. Its strengths are its API, documentation, tests, community, readability, and maintainability--all the difficult but vital aspects of open-source software! Its main shortcoming is its scalability because it is written in pure Python, which can be overcome by dispatching to an accelerated backend such as nx-cugraph.
When successful, dispatching can be a "win-win-win" for library maintainers, backend developers, and users, and we expect it to become more and more needed as hardware and software becomes more diverse and specialized.
Target audience:
- Maintainers of libraries seeking to support multiple backends
- Developers of accelerated implementations
- Users frustrated by API fragmentation
- Users interested in zero-code change acceleration on NVIDIA GPUs
- Anybody interested in helping standardize dispatch patterns via scientific-python/spatch
Erik Welch is a senior system software engineer on the RAPIDS cuGraph team at NVIDIA. He has 20 years' experience using Python as a scientist, engineer, and open-source developer on a wide range of data and high-performance computing problems. He primarily works on nx-cugraph
, an accelerated backend to NetworkX, and is a primary maintainer of the popular toolz
library.