07-10, 11:25–11:55 (US/Pacific), Room 318
This talk presents zfit with the newest improvements, a general purpose distribution fitting library for complicated model building beyond fitting a normal distribution. The talk will cover all aspects of fitting with a focus on the strong model building part in zfit; composable distributions with sums, products and more, build and mix binned and unbinned, analytic and templated functions in multiple dimensions. This includes the creation of arbitrary, custom distributions with minimal effort that fulfils everyones need.
Thanks to the numpy-like backend used by TensorFlow, zfit is highly performant by using JIT compiled code on CPUs and even GPUs, a showcase for scientific computing faster than numpy.
scalable, pythonic likelihood fitting
The library has a github repository: https://github.com/zfit/zfit
and tutorials that can be run in the browser: https://zfit-tutorials.readthedocs.io/en/latest/
The problem
Fitting distributions, such as Normal, Poisson and polynomials, is a common task in many scientific fields. The Python ecosystem contains multiple libraries, such as scipy, lmfit, statsmodels and more, to provide some basic tools for building models and fitting them. However, these tools have strongly limited features: they generally lack the ability to compose models such as sums, products, convolutions or build multidimensional distributions; they are restricted to analytic integrable functions; they do not offer extensive customization possibilities; and the performance is not competitive with libraries written in compiled languages.
Talk summary
We present zfit, a scalable, general purpose fitting library; it is built to significantly enhance the fitting capabilities in the Python ecosystem.
The talk will cover the main the topic of distribution fitting with the main features of zfit and is targeted towards a wide scientific audience to anyone who ever needed to fit a function.
Focus of the talk will be the extensive model building part using distributions including custom ones, the fitting and the performance. The latter originates from the backend, TensorFlow, that will be discussed as a general way of increasing the performance of scientific computing. We will also discuss the integration of zfit with other libraries in the Python ecosystem for data loading, plotting and statistical inference.
Description of zfit in detail
Distributions
zfit has an extensive model building part: it contains PDFs from simple analytical functions, such as Normal distributions, to complex multidimensional distributions, and allows the composition of these models. To incorporate functions that are specific to a domain and not made up of basic shapes, convenient baseclasses allow the user to implement a function using a numpy-like syntax. This PDFs can directly be used, as zfit automatically takes care of the numerical normalization and sampling. Furthermore, PDFs can also be binned and mixed with unbinned data to accomodate large data samples.
Fitting
The fitting is primarily based on the minimization of a loss function, typically a likelihood. The loss can be customized to include constraints and penalties to incorporate simultaneous fits and arbitrary correlations between parameters. The minimization is performed using a variety of minimizers, including the popular ones from scipy, nlopt or iminuit.
The result has methods for simple error estimation of the parameters.
Due to the general API and workflow, other libraries integrated zfit to perform further statistical inference, such as the statistical library hepstats.
Performance
zfit uses a numpy-like computing backend that is also used by TensorFlow. This allows for just-in-time compiled code that significantly speeds up the performance to C++-like speeds and can further be run in a distributed manner on CPUs and GPUs. Automatic gradients provided by the backend are used in the gradient-based minimizers, which also speed up the minimization process.
Physicist at CERN with a dedication focus on machine learning, statistical tools and software engineering.