07-11, 15:00–15:30 (US/Pacific), Room 316
HyperSpy is a community-developed open-source library providing a
framework to facilitate interactive and reproducible analyses of
multidimensional datasets. Born out of the electron microscopy
scientific community and building on the extensive scientific Python
environment, HyperSpy provides tools to efficiently explore, manipulate,
and visualize complex datasets of arbitrary dimensionality, including
those larger than a system's memory. After 14 years of development,
HyperSpy recently celebrated its 2.0 version release. This presentation
will (re)introduce HyperSpy's features and community, with a focus on
recent efforts paring the library into a domain-agnostic core and a
robust ecosystem of extensions providing specific scientific
functionality.
I am submitting this proposal to SciPy on behalf (and with consent) of the HyperSpy
development team with the hopes of sharing the features and capabilities
of our project, together with the challenges of sustainably managing a
mature community-driven project through substantial change and developer
turnover. The basics of HyperSpy were previously presented at the 2016
SciPy conference [1], and the project has seen substantial development
and community growth since that time.
At its core, HyperSpy is a core Python library and ecosystem of
extensions that strive to make multidimensional data analysis as easy
and natural as possible, while providing a powerful and scalable
framework for advanced data analysis pipelines. It leverages the robust
scientific Python ecosystem (numpy, jupyter, scipy, dask, scikit-learn,
etc.) to facilitate the visualization and analysis of multidimensional
signals primarily via scripting and/or Jupyter interfaces.
This presentation will focus on four areas:
-
Introducing HyperSpy
I will briefly discuss the history and open-source journey of the
HyperSpy project, from its origins in 2007 as a PhD student's
collection of domain-specific scripts used to analyze electron
energy loss spectroscopy (EELS) spectrum images -- to the robust,
community-maintained general purpose multidimensional analysis
toolkit of today. As of February, the project has over 1,100
scholarly references [2], over 60 contributors, 470 GitHub stars
[3], and is used by over 150 other repositories, providing a
robust community. -
What is unique about HyperSpy?
HyperSpy has a range of unique features that have led to its
strong growth and position as the leading open-source data
analysis package in the materials characterization community. The
most important feature is HyperSpy's conceptual model of an
extensible Signal type, which wraps a numpy or dask data array.
Regardless of the underlying data shape or type, the Signal
provides a consistent API along scientifically intuitive axes
definitions that provide indexing along navigation and signal
dimensions. Signals can be extended and customized based on
scientific need, leading to a large community of HyperSpy
"extensions" that provide domain-specific capabilities. While
HyperSpy was developed out of the needs of the electron microscopy
community, its data model is well-suited to data structures common
in a range of scientific disciplines including medical/biological
imaging, remote sensing, particle physics, and others. -
Recent developments in the HyperSpy ecosystem
HyperSpy recently (December 2023) celebrated a long-awaited
version 2.0 release that -- among other improvements --
substantially refactored the overall project structure and reduced
lines of code by over 40%. A key goal of this release was to
reduce the size of HyperSpy's main library and to extract
input/output functionality [4] and domain-specific analysis code
into extensions [5], leaving a "core" that is completely
domain-agnostic to facilitate future extension development. I will
summarize the reasoning behind these efforts and why they better
position HyperSpy for the future. -
How we foster the HyperSpy community
HyperSpy's greatest asset is the user and developer community
that has organically developed over 15 years of open-source
development, which has welcomed and bid farewell to countless
students, postdocs, and general users. This portion of the
presentation will highlight the community and the efforts of the
HyperSpy maintainers to develop, cultivate, and sustain this
community in the face of consistent turnover inherent in the
academic world.
References
[1] Tomas Ostasevicius. (2016, July 15). HyperSpy: How to Easily Bend
Multi-dimensional Data to your Analytical Will [Video]. YouTube.
https://youtu.be/kVlf3bMZcsc\
[2] Francisco de la Peña, et al. (2017, July). Electron Microscopy
(Big and Small) Data Analysis With the Open Source Software Package
HyperSpy. Microscopy and Microanalysis, 23(S1). pp. 214-215.
https://doi.org/10.1017/S1431927617001751
and Google Scholar search
(https://scholar.google.com/scholar?q=hyperspy)\
[3] HyperSpy. (2024). GitHub repository.
https://github.com/hyperspy\
[4] RosettaSciIO. (2024). GitHub repository.
https://github.com/hyperspy/rosettasciio\
[5] HyperSpy Extensions List. (2024). GitHub repository.
https://github.com/hyperspy/hyperspy-extensions-list
Dr. Joshua Taillon is a staff scientist within the NIST Office of Data and Informatics, working in the Data Science group as a Materials Research Engineer. Drawing on his extensive background in materials characterization, his professional interests lie at the intersection of materials characterization and data science, utilizing machine learning, artificial intelligence, and state-of-the art signal/data processing techniques to facilitate greater understanding of material systems.
Prior to this appointment, Josh was an NRC Postdoctoral Associate in NIST's Microscopy and Microanalysis Research Group. During this time, his research included the development and application of novel data acquisition and processing schemes in both electron and ion-beam microscopy. He received a B.S. from Cornell University, and as an NSF Graduate Research Fellow, received his Ph.D. in Materials Science and Engineering from the University of Maryland where he specialized in analytical transmission electron microscopy and focused ion beam nanotomography.