SciPy 2023

Designing user-friendly APIs for the NIST Interatomic Potentials Repository
07-12, 16:05–16:35 (America/Chicago), Grand Salon C

The NIST Interatomic Potentials Repository project has developed Python APIs to support user interactions with the repository data hosted at https://potentials.nist.gov. The associated code is layered, starting with generic methods for JSON/XML-based data and databases, and building up to user-friendly interfaces specific to the repository. This design allows for basic users to easily explore the data and expert users to perform more complicated operations or create custom APIs for other databases. The repository APIs help users find and compare interatomic models, set up simulations, perform high throughput calculations, and access the high throughput results.


This presentation outlines the Python APIs developed for the public database of the NIST Interatomic Potentials Repository. The entire framework consists of six different Python packages designed for data interaction and generation: DataModelDict, cdcs, yabadaba, potentials, atomman, and iprPy. These packages have an import hierarchy with each subsequent package incorporating or inheriting the previous.
All project data is represented with JSON/XML equivalent data models. Having data that can be equivalently represented in JSON and XML takes advantage of the benefits of both formats while placing only minor limits on schema designs. The “DataModelDict” Python class extends the basic dict to allow for easy transformations between Python, JSON and XML, and includes additional methods for exploring and manipulating individual records.
All public potentials data are hosted in a CDCS database accessible at https://potentials.nist.gov. CDCS databases store XML formatted records, they support multiple schemas, and provide both a web-based interface and a REST API for interacting with the data. The “cdcs” package defines Python methods for common database interactions that wrap around the REST API calls. The also provides options to build custom REST calls to the database for features not yet directly supported.
The JSON/XML equivalent data models means that all records can also be stored in JSON-based Mongo databases or as local collections of JSON or XML files. The “yabadaba” Python package provides an intermediate abstraction layer allowing users to interact with data stored in all three database infrastructures using common methods. It also provides a framework for interpreting and building data records associated with different schemas. These features make it possible for end users to explore and generate data while remaining agnostic to the infrastructure used to store the data.
While the “cdcs” and “yabadaba” packages provide APIs for interacting with an arbitrary CDCS database, the “potentials” package provides APIs specifically focused on interatomic potentials content in potentials.nist.gov. Utilizing the yabadaba features, any user can create their own copy of all interatomic potentials listings and then search and explore from either location. Searches can be performed both using simple Python methods or using Jupyter widget-based GUIs. The potentials package also forms the basis for adding new listings to the repository and for generating the traditional static repository website at https://www.ctcms.nist.gov/potentials/.
The ”atomman” package focuses on setting up and analyzing atomic configurations and LAMMPS simulations. On the data side, it extends the “potentials” package functionality to interpreting schemas of atomic configurations. Finally, the “iprPy” package is centered around providing a collection of standard atomistic property calculation methods for characterizing interatomic potentials. The iprPy calculations can be performed individually or in high throughput, and can be executed from the command line, from within Python, or using transparent-box demonstration Jupyter Notebooks.

Dr. Lucas Hale is a materials research scientist at NIST where he is the content manager for the Interatomic Potentials Repository project. In support of the project, he has developed numerous Python packages for interacting with the repository data, designing atomistic simulations for investigating bulk crystal and crystalline defects, and developing and performing high throughput calculations.