SciPy 2023

A Computer Vision (ML) Approach to Classifying Clouds and Aerosols from Satellite Observations
07-12, 11:25–11:55 (America/Chicago), Zlotnik Ballroom

The NASA Atmosphere SIPS, located at the University of Wisconsin, is responsible for producing operational cloud and aerosol scientific products from satellite observations. With decades of satellite observations, new scientific algorithms are employing Machine Learning (ML) methods to improve processing efficiencies and scientific analyses. In preparation for future developments, we are working with NASA Atmospheric Science Teams to understand ML requirements and assist in developing new tools that will benefit both the Science Teams and the broader Open-Source Science community. This talk will step through a ML methodology being used to identify cloud types and severe aerosols.


The purpose of this talk is to share how to make most efficient use of the existing machine learning (ML) software, such as tensorflow, to implement scientific ML methods. We will first describe the science objectives we are trying to achieve, elaborate on lessons learned, and finally introduce future challenges.

Our primary science objective is to identify different cloud types and aerosols from satellite imagery, where the cloud types are indicative of different meteorological conditions. The science objective during the talk will be catered towards the broader scientific community while expecting little to no background in atmospheric science or ML. All of this will be accomplished by presenting visualization of satellite imagery throughout to relate the data to the audience.

Subsequently we will introduce the ML techniques we have been using. We employ a pretrained VGG16, a convolutional neural network (CNN), which we fine-tune to identify cloud types and aerosols from satellite imagery. There will be accompanying animations illustrating this process and how the inference is combined into the softmax layer providing the result.

The specific lessons learned in using ML software is to consider which part of the code executes in CPU or GPU space. Initially we noticed GPU usage was not consistently 100% during inference. To demonstrate potential, we dumped the data to a 200GB file and streamed that directly to the GPU. This test proved what was possible and allowed us to rewrite our generator using keras Sequence where the init and getitem called tf.device locking the data I/O and preprocessing to the CPU leaving the GPU solely for inference. This approached yielded a 2x performance increase.

Since our goal is to add value to existing NASA algorithm methodologies via ML, this goal requires us to have labeled data. We experimented with existing labeling packages but in the end decided to incorporate the labeling tasking into existing software the community uses. Thankfully as part of NASA’s participation in Open-Source Science, one of the primary tools used by the Atmospheric Science community, NASA Worldview, is already open sourced. This allowed us to install their docker images and extend this tool bringing the labeling task directly to the Scientists.

Additionally, I will talk about the importance of visualizing the data through the entire process of training a CNN. For example, I have a video file flipping through thousands of images from our training set. And I will use this to emphasize the importance of looking at data throughout the process and the importance of being able to share information. Open-Source Science is great but being able to convey information about how ML works is just as important.

Steve Dutcher is a Software Developer/Engineer at the University of Wisconsin, Space Science & Engineering Center with 20 years of experience applying his computer science degree to Atmospheric Science. He currently works as part of the NASA Atmosphere SIPS responsible for producing operational cloud and aerosol products from polar orbiting satellites. Additional projects include machine learning applications, producing a low latency fire product from direct broadcast, and supporting instrument field campaigns.