07-10, 11:25–11:55 (US/Pacific), Room 315
Synthetic aviation fuels (SAFs) offer a pathway to improving efficiency, but high cost and volume requirements hinder property testing and increase risk of developing low-performing fuels. To promote productive SAF research, we used Fourier Transform Infrared (FTIR) spectra to train accurate, interpretable fuel property models. In this presentation, we will discuss how we leveraged standard Python libraries – NumPy, pandas, and scikit-learn – and Non-negative Matrix Factorization to decompose FTIR spectra and develop predictive models. Specifically, we will review the pipeline developed for preprocessing FTIR data, the ensemble models used for property prediction, and how the features correlate with physicochemical properties.
Synthetic aviation fuels (SAFs), derived from biological sources, represent a critical opportunity to enhance efficiency in the aviation industry. However, the high costs and volume requirements often delay the experimental testing of novel blends, increasing the risk of scaling up SAFs that may underperform. This presentation addresses these challenges by showcasing how advanced machine learning techniques can revolutionize the development process for SAFs.
In recent years, machine learning has emerged as a powerful tool for developing property prediction models that accelerate SAF development by enabling early predictions of key fuel properties. However, many existing models face limitations, including reliance on complex analytical techniques, narrow focus on specific property ranges, and lack of interpretability.
In 2020, we presented our approach at SciPy, which enabled the prediction of properties for over 10,000 molecules using molecular descriptors, later published in Fuel (https://doi.org/10.1016/j.fuel.2022.123836). In 2023, we introduced our preliminary method for predicting high-throughput aviation fuel properties using FTIR spectra, focusing on feature cleaning and transformation while evaluating dimensionality reduction techniques for spectra.
This year, we present our finalized approach, which employs non-negative matrix factorization (NMF) to decompose FTIR spectra into interpretable features. By integrating these NMF features with property data in ensemble models, we achieve accurate predictions of fuel properties and uncover significant correlations with blend composition. This enhanced methodology not only improves prediction accuracy but also provides critical insights into the relationships between fuel composition and performance.
Our presentation will detail the refined workflow for training property prediction models, using libraries such as NumPy, pandas, and scikit-learn. Key aspects will include the mathematical intuition of NMF and its features, its practical implementation, and the challenges encountered with pipeline optimization tools like TPOT, which can yield suboptimal results compared to a more tailored approach.
We will conclude by presenting our model results, emphasizing their interpretability and the insights gained regarding blend composition and predicted properties. By demonstrating how our open-source web tool (https://feedstock-to-function.lbl.gov/) can significantly reduce the time and costs associated with bioprocess optimization and the scale-up of SAFs, we highlight the potential for scientific Python to address pressing research challenges in synthetic aviation fuel development.
This talk is positioned to engage the scientific computing community by illustrating how computational techniques can solve complex research problems, ultimately contributing to the advancement of new energy solutions.
As a Scientific Engineering Associate at Lawrence Berkeley National Laboratory, Ana conducts multidisciplinary research focused on the development of innovative solutions, including tools to accelerate jet fuel research or autonomously design semantic models and data infrastructure for buildings. Ana enjoys using machine learning and data science to discover complex patterns and ultimately advance scientific research.