SciPy 2023

Resampling and Monte Carlo Methods in SciPy.stats
07-11, 13:30–17:30 (America/Chicago), Classroom 202

Resampling and Monte Carlo statistical techniques are surprisingly intuitive, and they are often more flexible and accurate than their better-known analytical counterparts. In this tutorial, participants will develop their intuitive understanding of frequentist statistics and apply it using three functions in scipy.stats - monte_carlo_test, permutation_test, and bootstrap - to dramatically expand the statistical analyses they can perform with the SciPy Library.


Scientists and engineers often seek to answer questions of the following forms.

  1. Is my sample drawn from this hypothesized distribution?
  2. Are my samples drawn from the same distribution?
  3. Based on these samples, what can I infer about the populations from which they were drawn?

Common statistical procedures used to answer questions of these forms include:

  1. the one-sample t-test ("Is my sample drawn from a distribution with population mean m?"),
  2. the two-sample t-test ("Are my two samples drawn from distributions with the same population mean?"), and
  3. the confidence interval of the mean ("Given my sample, what can I say about the true value of the population mean?").

Such procedures are developed under technical assumptions (e.g., the samples were drawn from normally-distributed populations) that make the mathematics tractable, yet in practice, these assumptions can never be met exactly. Fortunately for science, the conclusions drawn from the procedures above are relatively insensitive to deviations from these assumptions… except when they’re not!

One solution is to abandon frequentist statistics in favor of another paradigm (Bayesian), but the approach suggested by this tutorial is to remove the assumptions, reduce reliance on the analytical approximations, and instead use computers to approximate (or even exactly calculate) responses to the original questions. This idea will lead us to three techniques:

  1. Monte Carlo tests (scipy.stats.monte_carlo_test)
  2. Permutation tests (scipy.stats.permutation_test)
  3. The Bootstrap (scipy.stats.bootstrap)

For many of the same reasons that arithmetic (sums and differences) seems simpler than calculus (integrals and derivatives), these techniques are relatively easy to grasp. Likewise, just as computational methods for integration, equation solving, and optimization can solve a wider variety of problems than analytical approaches, these computational statistical techniques are comparatively flexible and easy to apply.

During this tutorial, participants will write their own code to execute fundamental resampling and Monte Carlo algorithms and compare the results of their code against the equivalent functions in SciPy. They will apply their new understanding of SciPy's monte_carlo_test, permutation_test, and bootstrap functions to reproduce and extend the capabilities of SciPy's other statistics functions (e.g. to small samples, to discrete distributions). Through this tutorial, participants will improve their ability to apply existing statistical procedures to a given situation and gain the ability to create customized statistical procedures for demanding applications.


Prerequisites

No prior statistics education is required; the intent is for both statistics novices and formally-trained experts to gain a new perspective about making inferences from data.

Installation Instructions

Participants do not need to install anything to participate; the tutorial materials will all be accessible and editable using Colab on any machine with Internet access and a standard browser. However, participants may also use any locally-installed notebook software that can load .ipynb notebooks (e.g. JupyterLab) and import SciPy, matplotlib, and their dependencies (e.g. NumPy).

Matt Haberland (@mdhaber) is an Assistant Professor in the BioResource and Agricultural Engineering Department at Cal Poly. He earned his Ph.D. in Mechanical Engineering at MIT in 2014 for his thesis "Extracting Principles from Biology for Application to Running Robots", and previously created the Contact Sensor / Stabilizer for the rock drill of the Mars rover Curiosity. Matt has been attending the SciPy conference since 2019 as maintainer of the SciPy library.

Albert Steppi (@steppi) is a Senior Software Engineer at Quansight Labs. He earned a PhD in Statistics from Florida State University in 2018. Albert has been a maintainer of the SciPy library since 2021.