SciPy 2023

Seeing the Sun through the Clouds: Accelerating the SunPy Data Analysis Ecosystem with Dask
07-12, 14:35–15:05 (America/Chicago), Grand Salon C

Over the last decade, the SunPy ecosystem, a Python solar data analysis environment, has evolved organically to serve the needs of scientists analyzing solar physics data, mostly on desktop and laptop computers. However, modern solar observatories are producing data volumes in the tens of petabytes, necessitating the need for parallelized and out-of-core computation. HelioCloud is a cloud computing environment tailored for heliophysics research and colocated with many terabytes of solar physics data. In this talk, we will show how the SunPy ecosystem, combined with Dask on HelioCloud, can be used to efficiently process high-resolution solar data.


The SunPy ecosystem is a set of community-developed, free and open-source Python packages for solar data analysis. The ecosystem consists of the core sunpy package, which provides general capabilities such as data download, data structures, and coordinate transformations, as well as a growing set of affiliated packages which provide more application-specific functionality such as image processing techniques. The entire SunPy ecosystem depends heavily on the broader scientific Python ecosystem, including numpy, scipy, and scikit-image and especially the astropy package, a community Python package for astronomy.

Over the last decade, the SunPy ecosystem has evolved organically to serve the needs of scientists analyzing solar physics data. Analysis of observational solar data has traditionally been carried out on desktop or laptop computers or small compute clusters (see Bobra et al., 2020). This limitation is partly due to the longstanding historical reliance on the proprietary Interactive Data Language (IDL) by the solar physics community which has limited scalability due in part to licensing restrictions. However, modern space- and ground-based solar observatories are producing data volumes in the tens of petabytes, necessitating the need for parallelized and out-of-core computation. The surge in popularity of Python within the broader astronomy community as well as the growing availability of computing resources has led to many solar researchers using Python in cloud environments. All of these factors have propelled the development of HelioCloud. Inspired by similar science platforms for other disciplines like Pangeo, HelioCloud is a NASA-funded, AWS-backed cloud computing environment tailored for heliophysics research. HelioCloud provides both a dashboard for creating custom virtual machines as well as a JupyterLab interface. Using the latter allows for interactive, scalable computation enabled by Dask across many compute nodes. Most importantly, HelioCloud is collocated with nearly 1 petabyte of solar physics data such that researchers can perform their analysis without the added latency of needing to download the data.

In this talk, we will demonstrate how the SunPy ecosystem, combined with Dask on HelioCloud, can be used to efficiently process high-resolution solar data. First, we will provide a brief description of the SunPy project with particular emphasis on the ndcube and sunkit-image affiliated packages. Next, we will provide a brief description of the JupyterLab interface of the HelioCloud platform. Finally, we will demonstrate a typical scientific workflow on HelioCloud by efficiently analyzing many hours worth of solar active region evolution using sunpy, ndcube, sunkit-image, and Dask to scale out our computation over many workers. Additionally, we will discuss existing incompatibilities between Dask and the astropy ecosystem and how collaboration with the broader scientific Python community could resolve such frictions.

Research Term Faculty, American University, Washington, D.C., USA
Research Scientist, NASA Goddard Spaceflight Center, Greenbelt, MD, USA

Nabil Freij is working as Research Software Engineer for Bay Area Environmental Research Institute supporting several NASA missions at Lockheed Martin Solar and Astrophysics Laboratory.

Before this, he was a Software Engineer at the The Institute for Environmental Analytics based at the University of Reading focused on providing customized weather and climate data to growers and farmers.

Before his pivot to Software Engineering, he was a research scientist at Universitat de les Illes Balears working on coronal heating and MHD waves.

I have worked in the field of solar physics since 1995. I am a co-founder of the SunPy and Helioviewer Projects. I am currently working as the Project Scientist for NASA's Solar Data Analysis Center, and US Project Scientist for the Solar and Heliospheric Observatory.

Stuart writes open source software for solar and astro physics. Is the the lead-developer of SunPy, contributes to Astropy, and spends most of his time working with the DKIST data center on data products and Python software for users of DKIST data.