SciPy 2023

From Espaloma to SAKE: To brew, distill, and mix force fields with balanced briskness, smoothness, and intricacy.
07-14, 14:35–15:05 (America/Chicago), Grand Salon C

Force fields (FF)—the (parametrized) mapping from geometry to energy, are a crucial component of molecular dynamics (MD) simulations, whose associated Boltzmann-like target probability densities are sampled to estimate ensemble observables, to harvest quantitative insights of the system. State-of-the-art force fields are either fast (molecular mechanics, MM-based) or accurate (quantum mechanics, QM-based), but seldom both. Here, leveraging graph-based machine learning and incorporating inductive biases crucial to chemical modeling, we approach the balance between accuracy and speed from two angles---to make MM more accurate and to make machine learning force fields faster.


A force field as accurate as quantum mechanics (QM) and as fast as molecular mechanics (MM), with which one can simulate a biomolecular system efficiently enough and meaningfully enough to get quantitative insights, is among the most ardent dreams of biophysicists. Machine learning force forces have been designed to bring us one step closer to this dream, by fitting simpler functional forms to QM data and extrapolating to chemically and geometrically diverse regions. Nonetheless, current state-of-the-art architectures, though approaching or surpassing the quantum chemical accuracy, are by magnitudes slower than MM and manifest various pathologies when it comes to interpretability, generalizability, and stability.

In this talk, we introduce our efforts to approach the lotusland from two angles: by making MM force fields more accurate (using a GNN to replace the atom typing schemes, Espaloma) and making state-of-the-art machine learning force fields faster (maintaining local universal approximative power without employing spherical harmonics, SAKE). Along the way, we show a plethora of useful gadgets, including the first unified force field for joint protein--ligand parametrization, an AM1-BCC surrogate charge model thousands-fold faster with error smaller than discrepancies among backends, and a way to forecast the fate of dynamic systems before the simulation even starts.

With these, we identify the opportunities and challenges of machine learning force fields design: What interpretable, stable, simple yet expressive function forms to use? How do we bake domain knowledge in, e.g., forces vanish when particles are far and explode when close? Can we detach sophisticated neural networks during inference? Can force fields be uncertainty-aware? And finally how do we stir these ingredients well to achieve the delicious balance between stability and speed and accuracy?

Simons Center Fellow, NYU