SciPy 2025

AI as a Detector: Lessons in Real Time Pulsar Discovery
07-10, 15:50–16:20 (US/Pacific), Room 317

The Universe isn't always so quiet: neutron stars, fast radio bursts, and potentially alien civilizations emit bursts of electromagnetic energy - radio transients - into the unknown. In some cases, these emissions, like with pulsars, are constant and periodic; but in others, like with fast radio bursts, they're short in duration and infrequent. Classical detection surveys typically rely on dedispersion techniques and human-crafted signal processing filters to remove noise and highlight a signal of interest. But what if we're missing something?

In this talk we will introduce a workflow to avoid classical processing all together. By feeding RF samples directly from the telescope's digitizers into GPU computing, we can train an AI model to serve as a detector -- not only enabling real time performance, but also making decisions directly on raw spectrogram data, eliminating the need for classical processing. We will demonstrate how each step of the pipeline works - from AI model training and data curation to real-time inferencing at scale. Our hope is that this new sensor processing architecture can simplify development, democratize science, and process increasingly large amounts of data in real time.


Background
The radio astronomy and broader signal processing communities have long relied on classical signal processing techniques to identify needles of interest in the galactic haystack. Techniques to improve the signal to noise ratio, like dedispersion, and detect known emissions, like matched filtering, work well when the RF environment is relatively clean and one knows what she is looking for; but what do we do when we're hunting for new discoveries we haven't yet observed?

Artificial Intelligence techniques have shown great promise towards new event detections in the radio astronomy field. Gerry Zhang et al reported 72 previously undiscovered Fast Radio Bursts in recorded observations at the Green Bank Observatory through usage of a convolutional neural network applied to spectrograms in C-Band.

But how do we apply these types of detectors to real data streams? Through SciPy tools like scipy.signal and GPU equivalents (cupyx.scipy.signal), it's become possible to deliver high performance signal processing performance with easy data movement between AI frameworks like PyTorch, ensuring that data doesn't migrate needlessly from CPU to GPU. Moreover, the emergence of AI sensor processing frameworks like NVIDIA's Holoscan allows one to easily connect GPU compute resources to physical sensors, like FPGAs. Together, these community driven tools can culminate into real time scientific discovery.

Methods
We trained our fast radio transient model with a modified ResNet architecture and simulated over 200,000 samples of fast radio bursts using the InjectFRB library. This augmented real collections of FRB founded previously at a variety of instruments. The model was trained in PyTorch using the ADAM optimizer and was later optimized with NVIDIA's TensorRT SDK. Further, we determined that the ResNet based approach to signal identificaion outperformed SPANDAK, the current fast radio transient search algorithm used in production at the Allen Telescope Array.

One the model has been trained and optimized, we leveraged NVIDIA's Holoscan real time AI sensor processing SDK to connect incoming RF samples from the instrument to GPU computing running a real time signal processing and ML inferencing pipeline.

Results
In this experiment, we collected an aggregated 100Gbps of UDP Ethernet data from a total of 28 antenna feeds at the Allen Telescope Array. Each feed collected 96 MHz of data, and at 2 polarizations, the total experiment had an aggregate bandwidth of 5.4GHz. The GPU processor used was an NVIDIA IGX (Orin + ConnectX 7 NIC + RTX A6000 GPU). On the IGX, a Holoscan pipeline read UDP data straight from NIC to GPU and then performed GPU based beamforming and AI inferencing for radio transient detection. We detected a candidate pulse at 1.236 GHz when pointed at the Crab Nebula. Upon further processing, we confirmed this emission was originating from the Crab Pulsar (PSR B0531+21), thus being the first pulsar detected with an online AI pipeline on raw sensor data.

Conclusion
This talk is a comprehensive "how-to" guide to both building science based ML models and deploying them to a real time instrument in production. It covers critical pieces of a real time pipeline like ML model optimization, data movement from sensors to compute, real time visualization, accelerated computing, and culminates in the detection of a pulsar. For the non astronomers, we hope this talk will provide pointers on how to build real time AI workflows with liberal use of the SciPy library ecosystem.

Luigi Cruz is a computer engineer working as a staff engineer at the SETI Institute. He created the CUDA-accelerated digital signal processing backend called BLADE currently in use at the Allen Telescope Array (ATA) and Very Large Array (VLA) for beam forming and high-spectral resolution observations. Luigi is also the maintainer of multiple open-source projects like the PiSDR, an SDR-specialized Raspberry Pi image, CyberEther, a heterogenous accelerated signal visualization library, and Radio Core, a Python library for demodulating SDR signals using the GPU with the help of CuPy.