SciPy 2024

HyperSpy – Your Multidimensional Data Analysis Toolbox
07-11, 15:00–15:30 (US/Pacific), Room 316

HyperSpy is a community-developed open-source library providing a
framework to facilitate interactive and reproducible analyses of
multidimensional datasets. Born out of the electron microscopy
scientific community and building on the extensive scientific Python
environment, HyperSpy provides tools to efficiently explore, manipulate,
and visualize complex datasets of arbitrary dimensionality, including
those larger than a system's memory. After 14 years of development,
HyperSpy recently celebrated its 2.0 version release. This presentation
will (re)introduce HyperSpy's features and community, with a focus on
recent efforts paring the library into a domain-agnostic core and a
robust ecosystem of extensions providing specific scientific
functionality.


I am submitting this proposal to SciPy on behalf (and with consent) of the HyperSpy
development team with the hopes of sharing the features and capabilities
of our project, together with the challenges of sustainably managing a
mature community-driven project through substantial change and developer
turnover. The basics of HyperSpy were previously presented at the 2016
SciPy conference [1], and the project has seen substantial development
and community growth since that time.

At its core, HyperSpy is a core Python library and ecosystem of
extensions that strive to make multidimensional data analysis as easy
and natural as possible, while providing a powerful and scalable
framework for advanced data analysis pipelines. It leverages the robust
scientific Python ecosystem (numpy, jupyter, scipy, dask, scikit-learn,
etc.) to facilitate the visualization and analysis of multidimensional
signals primarily via scripting and/or Jupyter interfaces.

This presentation will focus on four areas:

  1. Introducing HyperSpy

    I will briefly discuss the history and open-source journey of the
    HyperSpy project, from its origins in 2007 as a PhD student's
    collection of domain-specific scripts used to analyze electron
    energy loss spectroscopy (EELS) spectrum images -- to the robust,
    community-maintained general purpose multidimensional analysis
    toolkit of today. As of February, the project has over 1,100
    scholarly references [2], over 60 contributors, 470 GitHub stars
    [3], and is used by over 150 other repositories, providing a
    robust community.

  2. What is unique about HyperSpy?

    HyperSpy has a range of unique features that have led to its
    strong growth and position as the leading open-source data
    analysis package in the materials characterization community. The
    most important feature is HyperSpy's conceptual model of an
    extensible Signal type, which wraps a numpy or dask data array.
    Regardless of the underlying data shape or type, the Signal
    provides a consistent API along scientifically intuitive axes
    definitions that provide indexing along navigation and signal
    dimensions. Signals can be extended and customized based on
    scientific need, leading to a large community of HyperSpy
    "extensions" that provide domain-specific capabilities. While
    HyperSpy was developed out of the needs of the electron microscopy
    community, its data model is well-suited to data structures common
    in a range of scientific disciplines including medical/biological
    imaging, remote sensing, particle physics, and others.

  3. Recent developments in the HyperSpy ecosystem

    HyperSpy recently (December 2023) celebrated a long-awaited
    version 2.0 release that -- among other improvements --
    substantially refactored the overall project structure and reduced
    lines of code by over 40%. A key goal of this release was to
    reduce the size of HyperSpy's main library and to extract
    input/output functionality [4] and domain-specific analysis code
    into extensions [5], leaving a "core" that is completely
    domain-agnostic to facilitate future extension development. I will
    summarize the reasoning behind these efforts and why they better
    position HyperSpy for the future.

  4. How we foster the HyperSpy community

    HyperSpy's greatest asset is the user and developer community
    that has organically developed over 15 years of open-source
    development, which has welcomed and bid farewell to countless
    students, postdocs, and general users. This portion of the
    presentation will highlight the community and the efforts of the
    HyperSpy maintainers to develop, cultivate, and sustain this
    community in the face of consistent turnover inherent in the
    academic world.

References

[1] Tomas Ostasevicius. (2016, July 15). HyperSpy: How to Easily Bend
Multi-dimensional Data to your Analytical Will
[Video]. YouTube.
https://youtu.be/kVlf3bMZcsc\
[2] Francisco de la Peña, et al. (2017, July). Electron Microscopy
(Big and Small) Data Analysis With the Open Source Software Package
HyperSpy
. Microscopy and Microanalysis, 23(S1). pp. 214-215.
https://doi.org/10.1017/S1431927617001751
and Google Scholar search
(https://scholar.google.com/scholar?q=hyperspy)\
[3] HyperSpy. (2024). GitHub repository.
https://github.com/hyperspy\
[4] RosettaSciIO. (2024). GitHub repository.
https://github.com/hyperspy/rosettasciio\
[5] HyperSpy Extensions List. (2024). GitHub repository.
https://github.com/hyperspy/hyperspy-extensions-list

Dr. Joshua Taillon is a staff scientist within the NIST Office of Data and Informatics, working in the Data Science group as a Materials Research Engineer. Drawing on his extensive background in materials characterization, his professional interests lie at the intersection of materials characterization and data science, utilizing machine learning, artificial intelligence, and state-of-the art signal/data processing techniques to facilitate greater understanding of material systems.

Prior to this appointment, Josh was an NRC Postdoctoral Associate in NIST's Microscopy and Microanalysis Research Group. During this time, his research included the development and application of novel data acquisition and processing schemes in both electron and ion-beam microscopy. He received a B.S. from Cornell University, and as an NSF Graduate Research Fellow, received his Ph.D. in Materials Science and Engineering from the University of Maryland where he specialized in analytical transmission electron microscopy and focused ion beam nanotomography.