07-10, 15:25–15:55 (US/Pacific), Room 317
SciPy package scipy.sparse is moving from its matrix API to an array API. This will allow sizes other than 2D, and clean up treatment of the multiplication operator *. This talk will start by describing the changes and their impacts. We will then discuss the process of revamping an API without messing up existing user code too much. And the trade-offs between slow changes over many releases vs. faster, perhaps breaking changes. And choosing whether to just make a new package instead. The talk should be useful for users of scipy.sparse and also for packages considering a major API change.
It’s been many years since people started talking about replacing the SciPy sparse matrix API with an array API. Two major differences are the multiplication operator * and the restriction on shapes. The matrix API restricts to 2D and * means matrix multiplication. The array API allows any number of dimensions and * means element-wise multiplication. In 2014, Python’s PEP 465 introduced the matmul operator to perform matrix multiplication, so * vs @ had clear meanings backed by Python itself. The sparse package within SciPy was historically based on a matrix API. Every sparse matrix class has rows and columns with * signifying dot product matrix multiplication. It has been a long journey to switch the API to sparse arrays.
In May 2022 Scipy 1.8 was released with a set of sparse array classes intending to get feedback and work toward eventual replacement of the sparse matrix classes. SciPy 1.11 cleaned up the methods somewhat, deprecating old-style matrix methods that didn’t fit the new array style. SciPy 1.12 introduced array construction methods (eye_array
, random_array
, etc). Support for 1D arrays is planned for only a few formats (COO/DOK/CSR) with release in SciPy 1.13.
This talk will describe the changes between the two APIS. We will discuss strategies for users to update their code to the new array api, and how to update libraries that wish to support users who use either (or both) styles.
The process of changing APIs involves hundreds of minor decisions. We will discuss the rules of thumb that have evolved as criteria for making these decisions and the trade-offs between slower safer changes vs. faster, possibly breaking changes. And we’ll discuss the trade-offs between changing the api vs creating a new package to replace the existing one.
We will discuss some ideas for the path forward to nD support, user-facing issues of API design and developer facing issues of performance, continued complete testing and effective compiled code.
Dan Schult is the Charles G Hetherington Professor of Mathematics at Colgate University. He comes to scientific computing from studies of spreading processes. In some circles he is better known as a founding developer of NetworkX. He has attended SciPy a number of times and it is always worthwhile.