SciPy 2024

Towards MDAnalysis 3.0: a fast, interoperable, and extensible community-driven ecosystem for handling molecular simulation data
07-10, 13:15–13:45 (US/Pacific), Room 316

MDAnalysis (https://www.mdanalysis.org) is one of the most widely used open-source Python libraries for molecular simulation analysis, with applications ranging from understanding the interaction of drugs with proteins to the design of novel materials. With over 200 contributors and 18 years of development, MDAnalysis has established a mature, stable API and a broad user community. Here we present the current status of the library’s capabilities as it approaches its next major release. We also detail ongoing work to address modern challenges in the ever-evolving landscape of molecular simulation, such as handling increasingly large simulation datasets and meeting the tenets of FAIR.


Talk overview:

Molecular simulations (MD) have become an essential tool in understanding the behavior of chemical systems at the atomic scale. Combining the laws of classical mechanics with approximate descriptions of interatomic interactions enables the calculation of structural and thermodynamic properties of molecular assemblies. A key step in all MD-based projects is the extraction of relevant features from simulation data, which come in a plethora of formats and with sizes ranging up to terabytes.

Here we present MDAnalysis (https://www.mdanalysis.org), a free open-source library for molecular simulation analysis with over 200 contributors and 18 years of development. The library offers users an MD package-agnostic means to programmatically access and manipulate simulation data, with a well-tested Python API based on NumPy. MDAnalysis provides a variety of built-in analysis methods, efficient routines for coordinate manipulation and supports over 40 file formats covering most simulation packages.

In this contribution we first provide an overview of the current state of the MDAnalysis library and how it can be leveraged to explore and analyze simulation data. We then detail ongoing work towards the v3.0 release of the library, which aims to address key scientific and software challenges (https://www.mdanalysis.org/2023/10/25/towards_3.0/), including: the efficient handling of increasingly large and complex simulation datasets, improved handling of chemical context information, and tighter interoperability with other widely used software libraries.

Finally, we also discuss challenges encountered in maintaining the library and its surrounding ecosystem over the years. We reflect on an ongoing strategy for sustainable community and ecosystem building through the MDAKits framework (https://conference.scipy.org/proceedings/scipy2023/ian_kenney.html). This approach aims to train and empower the next generation of developers by offering the tools and documentation to develop and maintain FAIR-compliant (findability, accessibility, interoperability, and reproducibility) code which uses MDAnalysis.

Presenter Biography:

Dr Irfan Alibay has been a maintainer for the MDAnalysis project (https://www.mdanalysis.org/) since 2020. For his day job, he acts as the Science Lead for the Open Free Energy initiative (https://openfree.energy/) where he leverages open source tools to accurately estimate protein-ligand binding affinities.

Examples of previous talks from Irfan include an MDAnalysis Bioexcel webinar: https://www.youtube.com/watch?v=1Wot83DSt4E and the MDAnalysis 2023 User Group Meeting state of the Union: https://zenodo.org/records/8388971.

Author information list:
Irfan Alibay[a], Oliver Beckstein[b], Richard J. Gowers[a], Hugo MacDermott-Opeskin[c], Michaela Matta[d], Rocco Meli[e], Fiona B. Naughton[f], Tyler Reddy[g], Lily Wang[a]

[a] Open Molecular Software Foundation, Irvine, CA, USA
[b] Department of Physics, Arizona State University, Tempe, AZ, USA
[c] Computational and Systems Biology Program, Memorial Sloan-Kettering Cancer Center, New York, NY, USA
[d] Department of Chemistry, King’s College London, London, UK
[e] Swiss National Supercomputing Center (CSCS), ETH Zürich, Switzerland
[f] Cardiovascular Research Institute, University of California San Francisco, San Francisco, CA, USA
[g] CCS-7 Applied Computer Science, Los Alamos National Laboratory, Los Alamos, NM, USA

Dr Irfan Alibay has been a maintainer for the MDAnalysis project (https://www.mdanalysis.org/) since 2020. For his day job, he acts as the Science Lead for the Open Free Energy initiative (https://openfree.energy/) where he leverages open source tools to accurately estimate protein-ligand binding affinities.

Examples of previous talks from Irfan include an MDAnalysis Bioexcel webinar: https://www.youtube.com/watch?v=1Wot83DSt4E and the MDAnalysis 2023 User Group Meeting state of the Union: https://zenodo.org/records/8388971.