SciPy 2023

Bayesian Statistics with Python, No Resampling Necessary
07-12, 16:05–16:35 (America/Chicago), Zlotnik Ballroom

TensorFlow Probability is a powerful library for statistical analysis in Python. Using TensorFlow Probability’s implementation of Bayesian methods, modelers can incorporate prior information and obtain parameter estimates and a quantified degree of belief in the results. Resampling methods like Markov Chain Monte Carlo can also be used to perform Bayesian analysis. As an alternative, we show how to use numerical optimization to estimate model parameters, and then show how numerical differentiation can be used to get a quantified degree of belief. How to perform simulation in Python to corroborate our results is also demonstrated.


This talk is a concise update of a talk delivered previously for PyStan, the Python Interface for STAN, which is software for Bayesian inference. Now we will focus on the TensorFlow Probability library.
Here are links for the previous talk:
https://github.com/c22hatal/bayes_confidence/tree/main/meetup11aug21/meetup11aug21
https://www.youtube.com/watch?v=-7l5QTq5Hz0&list=PLhbPZ4oC18muuVdH3pjpjGmHkJqxCldYR&index=11&t=1073s

We first briefly review the Bayesian concepts of prior and posterior and elaborate on how the posterior distribution of the parameters can be approximated by a normal distribution with large sample sizes. This is the key theoretical point of the talk and is discussed in section 4.1 of Bayesian Data Analysis [1]. Through the talk, we will corroborate the proof by using resampling methods. We show that the normal approximation and resampling methods are equivalent with large data using TensorFlow Probability. After the talk, users can confidently use TensorFlow Probability and SciPy/NumPy to perform Bayesian analysis without resampling if their samples are sufficiently large.

After the theoretical discussion, we get into how the posterior distribution can be modeled using TensorFlow Probability’s distribution classes. I will show how you can sample from the distributions and calculate the posterior log probability density.
We will focus on a linear regression setting where the Normal and đťś’2 distributions will be used as priors for the slope and intercept parameters.
https://www.tensorflow.org/probability/api_docs/python/tfp/distributions/Chi2
https://www.tensorflow.org/probability/api_docs/python/tfp/distributions/Normal
https://www.tensorflow.org/probability/api_docs/python/tfp/distributions/JointDistributionNamed

Then I show how the posterior modes can be estimated using TensorFlow or SciPy optimization. The Broyden–Fletcher–Goldfarb–Shanno algorithm (BFGS) will be used. This method doesn’t calculate the full Hessian, the second and cross derivatives of the log posterior function.
https://www.tensorflow.org/probability/api_docs/python/tfp/optimizer/lbfgs_minimize
https://docs.scipy.org/doc/scipy/reference/optimize.minimize-lbfgsb.html

The inverse Hessian gives us the posterior variance under our approximation. So I show how you can take numeric derivatives in NumPy to obtain it. Derivatives are taken according to the method in Numerical Recipes [2]. Vectorized computations will be used where possible

Then I finally use resampling, particularly Markov Chain Monte Carlo sampling to show how well the approximation to the posterior distribution works. This is accomplished using TensorFlow Probability functions. I provide a framework for simulation in Python that is used to demonstrate these results as well.
https://www.tensorflow.org/probability/api_docs/python/tfp/mcmc

References

  1. Bayesian Data Analysis (3rd. ed.). A. Gelman, J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari and D. B. Rubin, 2013 Boca Raton, Chapman and Hall–CRC
  2. Numerical Recipes: The Art of Scientific Computing (3rd. ed.). W. H. Press, S. A. Teukolsky, W T. Vetterling, and Brian P. Flannery. 2007. Cambridge University Press, USA.

Charles Lindsey is a Principal Data Scientist at Revionics. Charles earned a PhD in Statistics from Texas A&M in 2010, where he researched dimension reduction and classification. Charles then worked at StataCorp LLC. At StataCorp, Charles was the lead developer of the Extended Regression Model (ERM) commands, which allow causal inference on observational data with common complications like unobserved confounding variables and sample selection. At Revionics, Charles works on price optimization and sales forecasting using Bayesian methods and other machine learning techniques.