SciPy 2025

ReSCU-Nets: recurrent U-Nets for segmentation of multidimensional microscopy data
07-09, 11:25–11:55 (US/Pacific), Room 317

Image analysis is a central tool in modern biology. Cell and developmental biologists generate multidimensional microscopy data, including imaging of cellular, subcellular and tissue structures, in three dimensions, over time, and with multiple molecular markers. Segmentation and tracking of multidimensional microscopy data requires high accuracy across many images (e.g. timepoints) and is a labour-intensive part of biological image processing pipelines. We present ReSCU-Nets, recurrent convolutional neural networks that use the segmentation results from the previous frame as a prompt to segment the current frame. We demonstrate that ReSCU-Nets outperform state-of-the-art segmentation models in different tasks on biological multidimensional microscopy sequences.


Quantification of microscopy images is a central tool in cell and developmental biology. Quantification pipelines often begin with segmentation and tracking of the structures to be measured. Neural network architectures, such as the U-Net, have improved the accuracy of automated microscopy image segmentation. However, segmentation of multidimensional images remains challenging, as photobleaching and environmental changes can reduce the signal-to-noise ratio during image acquisition, limiting the ability of neural networks to recognize the same object in multiple images of a sequence.

The sequential information in timelapse images provides an avenue for improving segmentation. A common way to use temporal information is to add recurrence. In a recurrent network, the output of a layer depends on both the current input and previous outputs. For example, in the Long Short-Term Memory (LSTM) U-Net convolutional layers are replaced with layers that recall information previously seen during inference. Because recurrence in LSTM U-Nets is within layers, the segmentation results are still solely based on the image being considered. Promptable segmentation methods, such as the Segment Any Model (SAM), can be used recurrently: the user provides a prompt for the first image of a sequence, and the segmentation masks produced by the network are used as prompts for subsequent images. SAM is built on large transformer models that require an abundance of training data (over 1 billion masks), something not accessible to the standard microscopist.

We designed a neural network architecture to accurately segment multidimensional microscopy images with limited training data. Thus, we began from a U-Net, given the low data requirements of U-Net training. To incorporate temporal context, we added a prompt encoder. The prompt encoder used the output mask produced by the network to inform the segmentation of the same object in the following image. The image to be segmented and the prompt entered the network through split input streams that were eventually concatenated. We refer to this architecture as Recurrent Split Concatenated U-Net (ReSCU-Net).

ReSCU-Nets combine the advantages of recurrent and promptable methods. The network is prompted with a segmentation of the first image in a sequence, produced manually, with a U-Net, or with other methods. The network then uses its outputs as prompts to segment the rest of the images. To assess network performance, we assembled three timelapse datasets, generated by imaging living Drosophila embryos. Datasets included the nuclei of migrating cardiac progenitors, the edge of epidermal wounds, and the membranes of epidermal cells.

We compared the performance of ReSCU-Nets to U-Nets, LSTM U-Nets, and the pretrained SAM ViT Huge. ReSCU-Nets produced the most accurate segmentations, with intersection-over-union values of 886% (nuclei), 905% (wounds), and 896% (cells). ReSCU-Nets did not produce false positives due to prompting. The high accuracy of segmentations combined with the lack of false positives resulted in true positive values of 982%, 991%, and 991% for nuclei, wounds, and cells, respectively, significantly higher than any other network. Thus, ReSCU-Nets maximize accuracy and minimize user interventions required to correct false positives, outperforming state-of-the-art models for segmentation of timelapse microscopy images.