SciPy 2023

Out-Performing NumPy is Hard: When and How to Try with Your Own C-Extensions
07-12, 10:45–11:15 (America/Chicago), Amphitheater 204

While the NumPy C API lets developers write C that builds or evaluates arrays, just writing C is often not enough to outperform NumPy. NumPy's usage of Single Instruction Multiple Data routines, as well as multi-source compiling, provide optimizations that are impossible to beat with simple C. This presentation offers principles to help determine if an array-processing routine, implemented as a C-extension, might outperform NumPy called from Python. A C-extension implementing a narrow use case of the np.nonzero() routine will be studied as an example.


While it is well known that C-extensions can improve the performance of Python programs, writing C-extensions that improve the performance of NumPy array operations is different. Many NumPy functions employ highly optimized C routines, some of which take advantage of low-level processor optimizations. In most cases, just writing Python that calls NumPy is faster than a custom C extension. However, for routines that are sufficiently narrow in scope, there are opportunities for optimization.

This presentation offers principles to help determine if a routine, implemented as a C-extension, might outperform related NumPy routines called from Python. Along the way, Python project setup, and the basics of the NumPy C API, will be introduced.

A narrow use-case of the np.nonzero() function will be implemented in C as an example: rather than returning all indices of all non-zero values for all dtypes and dimensionalities (as np.nonzero() does), this new function, first_true_1d(), will return only the index of the first-encountered non-zero value for one-dimensional Boolean arrays. The performance of this far simpler routine, and why it sometimes cannot out-perform np.nonzero(), will be examined.

See also: Sample performance comparison panel. (145.6 KB)

Christopher Ariza is Partner and Chief Technology Officer at Research Affiliates, a global leader in investment strategies and research. He is the creator and lead developer of StaticFrame, an alternative DataFrame library built on an immutable data model. Having worked in Python for over 20 years, he has developed tools in a variety of domains, including algorithmic music composition and computer-aided musicology, and has spoken at numerous conferences, including PyCon USA, PyData Global, PyData Los Angeles, and numerous other venues.