Thomas J. Fan
Thomas J. Fan is a Staff Software Engineer at Quansight Labs and is a maintainer for scikit-learn, an open-source machine learning library for Python. Previously, Thomas worked at Columbia University to improve interoperability between scikit-learn and AutoML systems. He is a maintainer for skorch, a neural network library that wraps PyTorch. Thomas has a Master's in Mathematics from NYU and a Master's in Physics from Stony Brook University.
Sessions
Numerical Python libraries can run computations on many CPU cores with various parallel interfaces. When we simultaneously use multiple levels of parallelism, it may result in oversubscription and degraded performance. This talk explores the programming interfaces used to control parallelism exposed by libraries such as NumPy, SciPy, and scikit-learn. We will learn about parallel primitives used in these libraries, such as OpenMP and Python's multiprocessing module. We will see how to control parallelism in these libraries to avoid oversubscription. Finally, we will look at the overall landscape for configuring parallelism and highlight paths for improving the user experience.
The array API standard (https://data-apis.org/array-api/) is a common specification for Python array libraries, such as NumPy, PyTorch, CuPy, Dask, and JAX.
This standard will make it straightforward for array-consuming libraries, like scikit-learn and SciPy, to write code that uniformly supports all of these libraries. This will allow, for instance, running the same code on the CPU and GPU.
This talk will cover the scope of the array API standard, supporting tooling which includes a library-independent test suite and compatibility layer, what work has been completed so far, and the plans going forward.