Jacob Schreiber
Jacob Schreiber is a post-doctoral researcher at Stanford University, where he studies human genomics using modern machine-learning tools. In his "free time," he contributes to the Python data science ecosystem in the form of pomegranate, a package for probabilistic modeling, and apricot, a package for submodular optimization for summarizing large data. In the past, he was a core developer for scikit-learn.
Session
An important problem in genomics is identifying the proteins that bind to DNA. Although many methods attempt to learn DNA motifs underlying protein binding as position-weight matrices (PWMs), these PWMs cannot faithfully represent real biology. For instance, a static PWM cannot describe a zinc-finger protein whose fingers can optionally include one-nucleotide spacing. TF-MoDISco is a framework for extracting motifs using attribution scores from a machine-learning model. The learned motifs and syntax overcome many of the limitations presented by PWM. I will describe the TF-MoDISco algorithm and showcase its efficient re-implementation, tfmodisco-lite.