07-11, 10:45–11:15 (US/Pacific), Room 317
Neuroscientists record brain activity using probes that capture rapid voltage changes ('spikes') from neurons. Spike sorting, the process of isolating these signals and attributing them to specific neurons, faces significant challenges: incompatible file formats, diverse algorithms, and inconsistent quality control. SpikeInterface provides a unified Python framework that standardizes data handling across technologies and enables reproducibility. In this talk, we will discuss: 1) SpikeInterface's modular components for I/O, processing, and sorting; 2) containerized dependency management that eliminates complex installation conflicts between diverse spike sorters; and 3) parallelization tools optimized for the memory-intensive nature of large-scale electrophysiology recordings.
Background and Motivation
Spike sorting—the process of extracting individual neuron spiking activity from raw electrophysiological recordings—is a central technique in systems neuroscience, yet it remains fraught with challenges. Researchers routinely struggle with a fragmented landscape of file formats (both open and proprietary) and heterogeneous sorting algorithms, each with its own API and complex dependencies. These technical hurdles are further compounded by variability in data pre-processing pipelines, massive datasets generated by dense electrode arrays, and persistent difficulties in reproducing analyses across laboratories.
SpikeInterface aims to solve this problem by providing an end-to-end, unified Python framework for spike sorting. It establishes a common interface to diverse recording formats (e.g., Open Ephys, Plexon, Neurodata Without Borders) and spike sorters (e.g., Kilosort2, SpyKING Circus, IronClust), liberating neuroscientists to focus on scientific questions rather than writing complex data conversion and management code.
Technical Discussion and Broader Relevance
This talk will highlight several core technical features in SpikeInterface:
-
Unified I/O and Metadata Management.
By abstracting data I/O interactions behind a standardized Extractor object, SpikeInterface simplifies loading and saving data. This approach not only solves many cross-format incompatibilities, but also enables further workflows such as attaching metadata about probes, electrode geometries, sampling rates, and additional descriptors crucial for reproducibility. -
Modular Pre-processing and Post-processing.
SpikeInterface supports a wide range of pre-processing (e.g., filtering, re-referencing) and post-processing (e.g., principal component analysis, waveform extraction, quality metrics) steps. Most are implemented in a lazy fashion to enable memory-efficient access. -
Containerized Algorithms and Parallelization.
We provide containerized versions of the most common sorting algorithms through Docker and Singularity, plus a framework for adding future ones. This approach effectively resolves library conflicts between sorters implemented in MATLAB, C++, Python, or mixed languages, while maintaining a consistent API for end users regardless of the underlying implementation. -
Modular Sorting Components.
Despite diverse designs, most spike sorting algorithms share common building blocks: peak detection, feature extraction, clustering, template matching, and drift correction. SpikeInterface exposes these steps modularly, allowing users to modify peak detection thresholds, compare feature extraction strategies, or experiment with different clustering approaches. This composable design enables mixing methods across sorters, reusing validated components, and benchmarking against various datasets (ground-truth, hybrid, and synthetic). Researchers can adapt core steps for different recording conditions while maintaining a unified API, reducing code redundancy and accelerating research and experimentation.
Intended Audience
The talk is aimed at neuroscientists, data engineers, and scientific software developers. Attendees who might be interested in the following topics will benefit:
- Strategies for handling heterogeneous file formats and complex dependency trees
- Robust pre-processing, post-processing, and curation workflows for big electrophysiological data
- Reproducible benchmarking for spike sorters
Project Resources and Previous Presentations
- SpikeInterface documentation: Library documentation
- Source code: GitHub repo
- Previous presentations on SpikeInterface project:
- Spike Sorting Workshop (2024): overview of SpikeInterface, main features and capabilities, and history of the project
- Oxford Cortex Club slides (2025): slides for a more recent presentation with similar content
Watch the presentation
View the slides
Heberto Mayorquin
Heberto Mayorquin holds a BSc in Physics, an MSc in Complexity Science, and a PhD in Computational Neuroscience. After a brief stint in the private sector optimizing SQL queries, he returned to science by joining CatalystNeuro. At CatalystNeuro, he helps neuroscience labs standardize their data—from extracting information buried in proprietary binary formats to streamlining metadata documentation and optimizing data layouts for long-term cloud storage. Within the organization, he serves as the lead maintainer of NeuroConv and is also a maintainer of SpikeInterface. His focus is on developing open-source tools and workflows that make it easier for researchers to share and reuse their own data, as he believes that open collaboration is a catalyst for scientific progress.
I am an engineer and software developer focused on methods and analysis tools for neuroscience research, especially for extracellular electrophysiology. I am passionate about science, software, and engineering, and my mission is to support neuroscientists and facilitate their research efforts by providing state-of-the-art analysis methods and software tools. Among these, I am the core developer of several open-source scientific tools, including SpikeInterface, a widely used software framework to unify and simplify the analysis of extracellular electrophysiology data.
In March 2022, I joined the Allen Institute for Neural Dynamics team as an electrophysiology pipeline development engineer consultant, with the goal of building open-source and computationally efficient processing pipelines to analyze large amounts of electrophysiological data. Since July 2020, I have been working part-time at CatalystNeuro, a consulting company with the mission of facilitating collaborations in neuroscience and standardizing data analysis and data storage solutions.
Previously, I was a Postdoctoral Fellow at the Bio Engineering Lab at ETH, working on multimodal approaches to probe neural activity and to construct detailed biophysical models. Before that I was at the Center for Integrated Neuroplasticity CINPLA, at the University of Oslo, where I received my PhD.