### Jacob Schreiber

Jacob Schreiber is a post-doctoral researcher at Stanford University, where he studies human genomics using modern machine-learning tools. In his "free time," he contributes to the Python data science ecosystem in the form of pomegranate, a package for probabilistic modeling, and apricot, a package for submodular optimization for summarizing large data. In the past, he was a core developer for scikit-learn.

#### Sessions

An important problem in genomics is identifying the proteins that bind to DNA. Although many methods attempt to learn DNA motifs underlying protein binding as position-weight matrices (PWMs), these PWMs cannot faithfully represent real biology. For instance, a static PWM cannot describe a zinc-finger protein whose fingers can optionally include one-nucleotide spacing. TF-MoDISco is a framework for extracting motifs using attribution scores from a machine-learning model. The learned motifs and syntax overcome many of the limitations presented by PWM. I will describe the TF-MoDISco algorithm and showcase its efficient re-implementation, tfmodisco-lite.

pomegranate is a probabilistic modeling library for Python that includes methods like hidden Markov models and Bayesian networks. As it nears its 10th birthday, pomegranate has gotten a makeover by being completely rewritten using PyTorch! This rewrite has significantly increased speed (up to 300x for dense hidden Markov models), added GPU support, enabled integration with any PyTorch model, and simplified the underlying code-base and API while preserving the same flexibility as before. In this talk, I will describe these aspects with exciting examples and demonstrate how pomegranate can now scale to modern massive data sets..