Microscopy analysis neural network to solve detection, enumeration and segmentation from image-level annotations

Bilodeau, Anthony; Delmas, Constantin V. L.; Parent, Martin; De Koninck, Paul; Durand, Audrey; Lavoie-Cardinal, Flavie

doi:10.1038/s42256-022-00472-w

Article
Published: 14 April 2022

Microscopy analysis neural network to solve detection, enumeration and segmentation from image-level annotations

Nature Machine Intelligence volume 4, pages 455–466 (2022)Cite this article

2217 Accesses
13 Citations
53 Altmetric
Metrics details

Subjects

A preprint version of the article is available at bioRxiv.

Abstract

The development of deep learning approaches to detect, segment or classify structures of interest has transformed the field of quantitative microscopy. High-throughput quantitative image analysis presents a challenge due to the complexity of the image content and the difficulty to retrieve precisely annotated datasets. Methods capable of reducing the annotation burden associated with the training of a deep neural network on microscopy images becomes primordial. Here we introduce a weakly supervised MICRoscopy Analysis neural network (MICRA-Net) that can be trained on a simple main classification task using image-level annotations to solve multiple more complex tasks such as semantic segmentation. MICRA-Net relies on the latent information embedded within a trained model to achieve performances similar to established architectures when no precisely annotated dataset is available. This learnt information is extracted from the network using gradient class activation maps, which are combined to generate detailed feature maps of the biological structures of interest. We demonstrate how MICRA-Net substantially alleviates the expert annotation process on various microscopy datasets and can be used for high-throughput quantitative analysis of microscopy images.

You have full access to this article via your institution.

Download PDF

A deep learning approach to identifying immunogold particles in electron microscopy images

Article Open access 08 April 2021

Democratising deep learning for microscopy with ZeroCostDL4Mic

Article Open access 15 April 2021

ArtSeg—Artifact segmentation and removal in brightfield cell microscopy images without manual pixel-level annotations

Article Open access 06 July 2022

Main

The development of powerful microscopy techniques that allow us to characterize biological structures with subcellular resolution and on large fields of view has tremendously increased the complexity of quantitative image analysis tasks¹. The resulting images exhibit a wide range of structures that need to be identified, counted, precisely located and segmented. Expert knowledge is commonly required to achieve successful identification and segmentation of the structures of interest in microscopy images^2,3. These tasks can be tedious and time-consuming, especially for large databanks or for the comparison of multiple biological conditions. It was recently demonstrated that deep convolutional neural networks (DNN) are excellent feature extractors⁴. They were successfully applied to segmentation (for example, whole cells, nuclei and dendritic spines), enumeration (for example, cell counting) and classification (for example, the state of a cell) in microscopy images^{5,6,7,8,9,10,11}. The most common deep learning (DL) approaches applied to microscopy and biomedical images are fully supervised and require precisely annotated datasets^8,10,11. Hence, the annotation process is often a limiting step in the application of DL for quantitative analysis of biomedical imaging^3,12. To alleviate this, weakly supervised DL methods were introduced^12,13,14,15. Bounding box annotations are commonly used for weakly supervised segmentation tasks as they are simple, allow the task to be spatially constrained^{2,14,16,17,18} and have been shown to decrease the annotation phase by 15-fold compared to precise identification of structure boundaries¹⁹. Methods for training with binary, image-level targets, reducing even further the complexity and duration of the annotation task, have been proposed when multiple instances are displayed on a single image²⁰. Unfortunately, when applied to microscopy and biomedical image analysis, such weakly supervised approaches using whole image annotations resulted in lower segmentation precision compared to approaches using precisely identified structures^21,22,23. Visualizing the spatial attention of the model through class activation maps is a powerful tool in assessing whether a classification is biologically meaningful²⁴. The resultant attention maps were leveraged in the context of microscopy images to extract coarse segmentation masks of the underlying structure^23,25.

In this Article we propose MICRA-Net (MICRoscopy Analysis Neural Network), a new approach relying only on image-level classification annotations for training a DNN to perform different types of microscopy image analysis task, such as semantic segmentation, cell counting and the detection of sparse features. MICRA-Net builds on latent learning²⁶, which refers to a model retaining information (that is, latent space) that is not required for the task at hand in order to learn new complementary tasks²⁶. MICRA-Net uses binary, image-level, classification targets to build a general representation of the specific dataset. It generates detailed feature maps from which specific tasks, such as semantic segmentation and detection, can be addressed without the need for a precisely annotated dataset for training.

Results

The generation of precisely annotated large datasets to train DNNs in a fully supervised manner remains a challenge in the field of microscopy and biomedical imaging. MICRA-Net, a convolutional neural network (CNN)-based method, addresses this challenge by using solely whole-image binary targets for training. This approach outperforms established DL baselines trained in a weakly supervised manner for the semantic segmentation of diverse biological structures. It is therefore of great interest for the automated quantitative analysis of microscopy datasets for which no fully supervised training dataset is available. In the following we first investigate the impacts of the annotation burden, before characterizing the performance of MICRA-Net on synthetic and real data for various tasks. We then evaluate how MICRA-Net can be fine-tuned to leverage information from a previously acquired, but different, dataset. Finally, we show how the proposed approach could be used to support experts in the annotation of sparse and small structures in large images.

Annotation task reduction analysis

MICRA-Net is trained on a simple multi-class classification task and therefore only requires the expert to identify class-specific positive and negative images with respect to the structures of interest. In contrast to the identification of structure boundaries using precise or bounding box contours, image-level annotations do not require specification of the positions of objects in the images (Fig. 1a).

**Fig. 1: Various supervision levels can be employed for training a DL model to segment structures of interest in microscopy images.**

We quantified the time required to generate annotations with different levels of precision (precise contours, bounding boxes and points) by conducting a User-Study in which we asked microscopists to annotate testing images from the Cell Tracking Challenge (CTC) dataset⁷ (Methods). We chose this annotation task, as whole-cell detection and segmentation is a common task in microscopy, allowing us to recruit experienced participants in the User-Study. We analysed the inter-participant variability using a metric that combines both the level of association between objects (F1 score) and the quality of annotated objects²⁷ (intersection over union (IOU); Fig. 1b and Supplementary Figs. 1–3). As a general tendency, simpler annotation tasks reduced the inter-participant variability (higher F1 score at a given IOU). We measured an error on the position of point annotations ranging from 4 to 10 pixels (median = 7 pixels; Fig. 1c), whereas for precise annotations the Hausdorff distance on the cell boundaries was between 5 and 24 pixels (median = 16 pixels) depending on the cell line (Fig. 1d). Several factors can reduce the precision of the annotations, such as the contrast (Fluo-N2DL-HeLa is high contrast whereas PhC-C2DL-PSC is low contrast) and the shape (Fluo-N2DH-GOWT1 is round whereas PhC-C2DH-U373 is irregular) (Fig. 1c,d and Supplementary Fig. 3).

The time required to annotate a single cell is increased by approximately twofold when going from points annotations to bounding boxes and from bounding boxes to precise annotations (Fig. 1e). Compared to the precise annotations required to train fully supervised DL approaches, the generation of whole image binary annotations reduces the required annotation duration by 6.7–157.6 h ([1.2, 19] fold change, median = 5.76, Supplementary Table 1) on the four datasets evaluated in this study (Fig. 1f).

MICRA-Net architecture and baselines

Figure 2a shows the architecture of MICRA-Net, which was designed around a CNN architecture composed of eight convolutional layers (L¹ to L⁸), followed by a fully connected layer. For each dataset, the number of predicted classes was adjusted (Supplementary Notes 1–5). The gradient class activated maps (grad-CAM; Methods) were extracted for each predicted class and at every layer of the network (L^1–8; Fig. 2a–e and Extended Data Fig. 1) by backpropagating each activated class through the network. Thereafter, rectified linear unit (ReLU) activation and thresholding on the grad-CAM of the last convolutional layer (L⁸) were applied to generate a coarse class-specific feature map²⁸. This yielded activation maps that are a general overview of the structure^24,25. To increase the information contained in the extracted feature map, we concatenated the local maps from layers L^1–7, resulting in a class-specific seven-dimensional feature space (Fig. 2b,c). We retrieved the first principal component of every pixel using principal component analysis (PCA) decomposition on the feature space to generate a single feature map for each dataset (Fig. 2b,c and Methods).

**Fig. 2: MICRA-Net architecture and experimental results on the modified MNIST dataset.**

To characterize the performance of MICRA-Net, we compared the results obtained on different datasets with three established baselines in microscopy: (1) U-Net⁸, (2) Mask R-CNN⁹ and (3) Ilastik²⁹. These baselines were chosen as they are widely used in the literature and they allow semantic segmentation, detection or enumeration of biological structures in microscopy images with none or simple modifications (Supplementary Notes 2 and 3).

Multi-class segmentation of synthetic images

To validate the classification and segmentation performance of MICRA-Net, we created a synthetic dataset containing N randomly sampled cluttered handwritten digits from the MNIST dataset³⁰ (Modified MNIST dataset; Fig. 2c and Methods). The first step was to classify the digits appearing on each image to validate the representation capability of the network, which was confirmed by the obtained class-wise mean classification testing accuracy of 98.9(5)% (mean ± s.d.).

The information embedded in the grad-CAMs was used to generate class-specific segmentation of the digits. As described above, we concatenated the grad-CAMs of layers L^1–7 to generate detailed segmentation maps (Fig. 2 and Supplementary Fig. 5). The semantic segmentation performance of MICRA-Net was compared to a U-Net³¹ trained with fully and weakly supervised training schemes. Fully supervised learning consisted of training with the binary digits contours from MNIST, whereas weak contours were generated by a dilation of the digits with a square of size {5, 10, 25} pixels as a structuring element (Supplementary Note 1). Figure 2f shows that MICRA-Net achieves similar or superior segmentation performance compared to all weakly supervised training instances of the U-Net and is only outperformed on all measured metrics (F1 score, IOU and symmetric boundary dice (SBD)) by fully supervised training (Supplementary Fig. 6 and Supplementary Table 2).

Semantic segmentation of super-resolution microscopy images

The next question that we addressed was the applicability of our approach for super-resolution microscopy image segmentation, for which precisely annotated datasets are rarely available. The specific task was the semantic segmentation of a publicly available weakly annotated dataset of stimulated emission depletion (STED) microscopy images of two F-actin nanostructures in neurons: (1) a periodical lattice structure and (2) longitudinal fibres (Fig. 3a,c)². Image-level annotation reduced—by more than threefold—the annotation process compared to polygonal bounding boxes annotations, which were recently used for weakly supervised training of the U-Net architecture².

**Fig. 3: Semantic segmentation of F-actin nanostructures observed on super-resolution microscopy images.**

On the classification task, MICRA-Net achieved accuracies of 75.2% and 83.7% on the testing dataset for the F-actin periodical lattice and longitudinal fibres, respectively. This is in line with a mean inter-participant classification accuracy of 80(5)% and 75(7)% for periodical lattice and longitudinal fibres, respectively (calculated for six participants using a leave-one-out scheme from 50 images; Supplementary Fig. 7). Detailed masks were obtained from MICRA-Net to solve the semantic segmentation task. We relied on a precisely annotated dataset consisting of 25 images of each structure (Extended Data Fig. 2) to evaluate the performance of all trained models: (1) MICRA-Net, (2) multi-participants polygonal bounding box annotations (six participants on 25 images of each structure: User-Study), (3) U-Net and (4) Mask R-CNN trained with polygonal bounding boxes² and (5) Ilastik trained using scribbles (Supplementary Note 2). MICRA-Net achieved equivalent or superior segmentation performance on the precisely annotated dataset in comparison to both the User-Study and all baselines when comparing the common segmentation metrics (Extended Data Fig. 2, Supplementary Figs. 8 and 9 and Supplementary Tables 3 and 4). Thus, even if trained with weak image-level annotations, MICRA-Net can extract the necessary structural information to generate segmentation maps for both nanostructures.

A qualitative visual inspection of the segmentation masks suggested that MICRA-Net segmentation produced a finer detailed mask of the boundaries of the F-actin nanostructures compared to the weakly supervised baselines segmentation² (for fibres especially, for which it provided detailed segmented contours of single fibre strains). Custom performance metrics that were adapted to the F-actin nanostructures were required to better characterize this observation. For the F-actin periodical lattice, we measured the fast Fourier Transform (FFT) of the segmented areas for frequencies corresponding to the periodicity of the lattice (180–190 nm, ref. ³²; Fig. 3b and Methods). The FFT metric calculated on the areas segmented with MICRA-Net is not significantly different from the one obtained from the precisely annotated dataset (Fig. 3b), whereas all other baselines were significantly increased (Supplementary Tables 4 and 5). Similarly, a custom metric based on the pixel intensity distribution of the segmented areas was developed to evaluate the approaches on the fibre segmentation task (Methods). We inspected the reduction of low-intensity pixels between single fibre strains that is observed in the predicted mask compared to the precisely annotated dataset. Only the segmentation masks obtained with MICRA-Net do not show a significantly different pixel intensity distribution to the one obtained from the precisely annotated dataset (Fig. 3d and Supplementary Table 6). This supports the ability of MICRA-Net to precisely identify the contours of individual fibres and the boundaries of the periodical lattice regions in STED images using solely image-level annotation for training.

Single-cell semantic segmentation

Cell counting and segmentation is a common challenge in high-throughput analysis of optical microscopy images^{7,8,11,33,34,35}. Both fully and weakly supervised DL approaches were shown to be very powerful to assess these tasks on multiple cell lines^6,23. To characterize the performance of MICRA-Net for semantic instance segmentation, we decided to use the images from six cell lines in the CTC dataset. Although this dataset is generally used to compare DL approaches on the instance segmentation task, we tested the ability of MICRA-Net to perform semantic instance segmentation on those images. We chose the CTC as it is widely used across the microscopy community, it is associated with a precisely labelled multi-class training set and it could be integrated in a User-Study. We acknowledge that the more recent dataset CoNSeP (2019) could also have been considered³⁵. We trained MICRA-Net on resampled images of the CTC and obtained a classification accuracy of 95.8(4)%. Despite having a high classification accuracy, the semantic instance segmentation performance of MICRA-Net was strongly reduced when no negative samples were provided (Supplementary Fig. 10; DIC-C2DH-HeLa and Fluo-N2DH-GOWT1). The cell density on the images of DIC-C2DH-HeLa did not allow us to find a scaling factor that would generate negative and positive crops with entire cells. This cell line was therefore removed from the training set (Supplementary Fig. 11). Hence, we report the performance of all trained models on five selected cell lines from the CTC for the detection and semantic instance segmentation tasks.

We compared the semantic instance segmentation of MICRA-Net with fully and weakly supervised baselines: U-Net⁸, Mask R-CNN⁹ and Ilastik²⁹ (Supplementary Note 3 and Supplementary Tables 7 and 8). The weak labels were generated by dilating/eroding each label of the fully supervised dataset by a value sampled from a normal distribution with 0 mean and s.d. in {2, 5, 10} (Altered-${{{\mathcal{X}}}}$ or ALT-${{{\mathcal{X}}}}$), or by taking the bounding boxes of each object (Methods and Fig. 4a). We compared the variability in the annotation obtained from participants of our User-Study to that of the altered versions of the precise labels that were used to train fully supervised approaches (Fig. 4a,b). Figure 4b shows the distribution of IOU between associated objects (Object-IOU) of the User-Study and the altered versions of the precise labels (eight repetitions) compared to the original precisely annotated dataset. From Fig. 4b we can conclude that the distribution of the User-Study is similar to the distribution of ALT-5. Hence, training DL architectures with a training set obtained from multiple participants (for example, crowd-sourced) should result in similar baseline performance as when trained with ALT-5.

**Fig. 4: Semantic instance segmentation on five selected cell lines of the CTC dataset.**

For the semantic instance segmentation task, MICRA-Net was trained to predict both the presence of a cell and the contact between cells, which was subtracted from the former (Methods and Extended Data Fig. 3). We quantify the result by extracting the normalized area under the F1 score detection as a function of the IOU for the associated object curve²⁷ (AUC; Methods). Figure 4d (left) reports the variation of MICRA-Net in AUC from the baselines trained with various levels of supervision when pooling data from all cell lines (Methods, Supplementary Figs. 12–20 and Supplementary Tables 9–16). As shown in Fig. 4d and Supplementary Fig. 20, the performance of the baselines that were developed for fully supervised datasets is affected when reducing the supervision level (Supplementary Figs. 12–14). MICRA-Net achieves similar performance to the baselines trained with weak labels (ALT-2 and ALT-5 datasets). On the ALT-10 dataset, MICRA-Net achieves superior performances to all considered baselines, which is also reflected when using the standard metric (DET and SEG) from the CTC (Supplementary Tables 17 and 18). Notably, we observed that training MICRA-Net only on phase contrast images (PhC) in an instance segmentation task significantly improved the performances (Supplementary Table 19). We also measured a similar or significantly increased testing AUC when using MICRA-Net compared to the precise contour annotations and bounding boxes that were obtained from the User-Study (Fig. 4d, right and Supplementary Figs. 2 and 3). Therefore, when no precisely annotated and proofed dataset is available, or when the manual annotation error may be high due to the difficulty of the task, the performance of baseline architectures cannot be guaranteed to achieve superior semantic instance segmentation performance (Supplementary Fig. 20 and Supplementary Tables 12 and 13). Although the CTC is a precisely annotated and curated dataset that can be easily used for fully supervised approaches, using image-level binary labels for training would reduce by more than ~157 h the necessary annotation time needed to complete the annotation task if the training set of the CTC needed to be generated from scratch (Figs. 1f and 4b).

Multi-device analysis

Although DL approaches can be very powerful when tackling tasks on very similar images, challenges are often encountered when the imaging conditions change over time (for example, due to a new device)^36,37. We investigated how MICRA-Net could be fine-tuned on a dataset that contains similar structures but acquired on a different device. We chose a brightfield microscopy dataset of Giemsa-stained Plasmodium vivax (malaria) infected human blood smears (Fig. 5a), for which the training and testing datasets had very distinct intensity distributions (Fig. 5a,b)^33,38. In the context of parasite detection and stage determination for malaria, the most important task, which we have addressed in this publication, consists in the detection of infected smears, but some related work also perform phenotype classification^33,38,39,40.

**Fig. 5: Detection of Giemsa-stained red blood cells from two different datasets of brightfield microscopy images from ref. ³⁸.**

We predicted the presence of infected smears in a 256 × 256-pixel image. A mean classification accuracy of 80(10)% (mean ± s.d., five different instances) was obtained. A classification accuracy over 87% was obtained when updating the threshold of the linear layer and over 88% for fine-tuned models trained on {12, 24, 36} sampled images from the original test set acquired on another device (Supplementary Note 4, Extended Data Fig. 3 and Supplementary Table 20).

With fine-tuning of at least the linear layer and the depth 4 of MICRA-Net, the F1 score was significantly increased, beating the inter-expert agreement (0.64 averaged from infected cells reported in ref. ³⁸). Additionally, increasing the number of images sampled from the testing set can significantly increase the detection accuracy (0.82(1)), as obtained by updating either Linear + 3, 4 or All layers (Supplementary Table 21.

We compared the segmentation results of MICRA-Net with expert precise annotations. Owing to the lack of a precisely annotated dataset in the original publication³³, we asked an expert to manually segment all infected smears from the test set (303 smears), which were then proofed by a second expert in microscopy annotation processes. In contrast to the results obtained for the detection accuracy, updating more layers while fine-tuning (Linear + 3, 4 {12, 24, 36} and All {12, 24}) significantly reduced the IOU compared to only updating the linear layer (Supplementary Fig. 21 and Supplementary Table 22). Hence, a trade-off should be made by the users according to their specific needs. For instance, with these P. vivax datasets, the best trade-off to maximize both detection and segmentation efficiency requires the fine-tuning of at least the linear layer and depth 4.

Expert detection and segmentation assistance

The next step was to assess how MICRA-Net could be implemented as a tool to guide experts in the annotation of sparse and small structures in large images of an electron microscopy (EM) dataset. Our approach was tested on a dataset of scanning electron microscopy (SEM) images of ultrathin mouse brain sections in which axons were genetically labelled with a small engineered peroxidase APEX2⁴¹ (axon DAB (3,3′-diaminobenzidine); Methods). In the SEM dataset, 1–10 small axonal regions (average size of 113 × 113 pixels) needed to be identified in images of around 10,000 × 10,000 pixels (Fig. 6a). Applied to this dataset, MICRA-Net was used to suggest regions containing the axon DAB marker.

**Fig. 6: MICRA-Net is used as a tool to assist experts in the detection of sparse axon DAB markers in large SEM images of ultrathin mouse brain sections.**

An expert identified axon DAB positive regions on the training (158 images) and testing (44 images) sets (Methods). Thereafter, all positive and negative crops (1,024 × 1,024 pixel, 5.12 × 5.12 μm²) were extracted from the training images without overlap to train MICRA-Net (Fig. 6a and Supplementary Note 5).

In the context of very sparse detections, positive-unlabelled (PU) learning can improve the performance of a given architecture⁴². On the classification task, an accuracy between 83% and 90% was obtained for all PU ratios between 2:1 and 1:16 (Fig. 6b and Supplementary Tables 23 and 24). With MICRA-Net, this sparse detection task resulted in an increase of the measured recall above the inter-expert agreement (0.791, Supplementary Fig. 22), while requiring an expert to proof only 3.13% of a newly acquired image. The detection recall obtained by MICRA-Net (>0.90) is similar to other methods applied in EM to detect small structures such as synapses (0.88⁴³ and 0.85⁴⁴) or extracellular vesicles (0.75, average over three datasets⁴⁵). Using MICRA-Net, the area that would be required by an expert to inspect new images would be reduced by 30-fold. Additionally, MICRA-Net allowed the expert to detect 57 new axon DAB regions in the test set (representing 25% more detections) that had been missed during the initial image annotation process (Fig. 6c). In comparison, Ilastik trained with circle annotation of the axon DAB structures achieved a classification accuracy of 8%, requiring expert inspection of nearly all newly acquired images (Supplementary Fig. 23). MICRA-Net could also be used to generate coarse segmentation masks of the detected axon DAB regions (Supplementary Fig. 24a and Supplementary Table 25). Application of MICRA-Net to this EM annotation task was thus successful in reducing the burden of generating the training dataset, while also notably increasing the discovery of regions that were missed by the manual expert annotation.

Discussion

Although pixel-wise metrics and ground-truth annotations are well established in the field of DL and computer vision with natural images, retrieval of ground-truth annotations in biomedical imaging is a laborious process, requires highly trained experts, and annotation imprecision often occurs^3,46 (Fig. 1). This highlights the need for weakly supervised DL approaches that do not rely on spatially precise annotations of the structure, but rather on annotations that are easier and faster to retrieve. MICRA-Net, a CNN-based method, relies on the information embedded in the latent space of a classification task to solve complementary tasks without the need to generate task-specific precisely annotated training sets. This is done by combining high- and low-level features of the hidden layers with grad-CAMs to generate detection or segmentation maps. We designed multiple experiments to challenge MICRA-Net at solving common tasks (semantic segmentation, detection or localization) on different modalities (STED, fluorescence, phase contrast, brightfield and EM) relevant to high-throughput microscopy image analysis^3,8. Unlike multi-task learning⁴⁷, MICRA-Net does not combine auxiliary tasks to increase the learning performance of a main task, nor does it require more annotations from the dataset for each task^48,49. Hence, MICRA-Net should substantially reduce the burden of task-specific annotation of bioimaging datasets for which precise annotations are challenging to retrieve, thereby increasing the accessibility of such DL-based microscopy image analysis.

Although fully supervised DL approaches have the drawback of being costly to train, they can benefit from pre-training^8,50,51, given that the image space is similar⁵², and have access to precise information about the structure boundaries. On the other hand, MICRA-Net extracts spatial features from the hidden layers of the network to generate detailed feature maps using solely easy-to-retrieve, binary image-level annotations for training. Considering the observed reduction of the inter-expert variability when diminishing the complexity of the annotations (Fig. 1), this will be an important aspect for future DL applications leveraging crowd-sourced annotations for training^3,53.

Additionally, we have demonstrated that MICRA-Net could be fine-tuned, for example, when images are acquired on two different microscopes. This is of particular interest for large-scale studies, conducted on multiple sites, that require the analysis framework to be easily adaptable to new experimental conditions^24,33,54. Future work on fine-tuning of such approaches to new structures of interest and analysis tasks will be an important step to increase their accessibility to a larger community of researchers.

Finally, MICRA-Net was used to assist an expert in the complex annotation task of detecting small sparse objects in EM images. Originally, this task was prone to identification errors and fatigue, limiting the performance of the experts and increasing inter-expert variability. When using the detections from MICRA-Net, experts could focus their attention on less than 5% of the image. Not only was the annotation time decreased by 30-fold, but with MICRA-Net the number of detected objects was also increased by 25%.

MICRA-Net provides similar performance on multiple tasks to the established weakly and fully supervised learning approaches, thus making it an unprecedented alternative to address bioimaging analysis challenges for which large and precisely annotated datasets are not available.

Methods

MICRA-Net

Architecture

Figure 2a presents a schematic representation of the MICRA-Net architecture. MICRA-Net is based on the encoder part of a U-Net³¹. The rationale is that U-Net is an established method to solve different analysis tasks (for example, segmentation, localization and detection) on biomedical datasets. Each depth of the network contains two blocks of convolutions (kernel size of 3) followed by batch normalization, and ReLU activation. The number of filters in the convolutional layers is doubled after maxpooling (stride and kernel size of 2) to increase the richness of the representation. The number of filters for each layer is {32, 64, 128, 256}. Global maxpooling on the output layer allows a reduction of the dimensionality, and a fully connected layer (FCL) is used to provide a classification prediction. Dropout (probability of 0.5) is applied on the input features of the FCL.

At inference, MICRA-Net predicts a whole-image target from a given sample. Then, from each activated class c, a local map L^l is calculated from the weighted combination of the activation map A^l,k and the mean gradient ${\alpha }_{l,k}^{c}$ of each l layer²⁸. The mean gradient ${\alpha }_{l,k}^{c}$ is calculated from the backpropagated class activation y^c as

$${{\alpha}^{c}_{l, k}} = {\frac{1}{Z}}{ \sum\limits_i}{\sum\limits_j} \underbrace{\frac{\partial {y^c}}{\partial {{A}^{l, k}_{i, j}}}}_{{\text{gradients via backprop}}}.$$

(1)

The local map L^l is calculated as the linear combination of the activation map and the mean gradient of each layer of convolutions in the network

$${{L}^{l}}={\mathop{\sum}\limits_{k}}{{\alpha }_{l,k}^{c}}{{A}^{l,k}}.$$

(2)

Because MICRA-Net produced spatially reduced feature maps, local maps are upsampled using nearest-neighbour interpolation to match the input image size of 256 × 256 pixels. These images are then normalized in the range [0, 1] using a min–max scaling. ReLU activation is applied on the last layer (L⁸) of the network, as in the seminal implementation of grad-CAM²⁸, to be used for coarse segmentation. Local maps from layers L^1–7 (Fig. 2a–c) are concatenated into a feature space and retrieve the first principal component of every pixel using PCA⁵⁵ decomposition to retain prominent information from the feature space. The network was built and trained with the PyTorch library⁵⁶.

To facilitate the analysis of new images using MICRA-Net, a graphical user interface (GUI) is provided to qualitatively analyse the influence of each local map (Extended Data Fig. 1). Although the implementation of MICRA-Net uses layers L^1–7 with a PCA decomposition of the resultant feature space, the GUI allows us to arbitrarily combine different local maps of the MICRA-Net architecture and threshold the resultant detailed feature map.

Training procedure

The general training procedure for the MICRA-Net architecture is reported in this section. MICRA-Net was trained using the Adam optimizer with a learning rate specific to each dataset and other default parameters⁵⁷. A learning rate scheduler was used to reduce the learning rate of the optimizer with a minimal possible learning rate of 1 × 10⁻⁵. The number of training epochs was adapted to the specific dataset (Supplementary Tables 26–30). Early stopping was used to reduce overfitting. Unless otherwise specified, we used binary cross-entropy with logits loss. We kept the model with the best generalization properties on the validation set (calculated from the objective loss function). The training of MICRA-Net was repeated five times with different random seeds.

Data augmentation was used to increase the performance of the network. Supplementary Tables 26–30 provide a detailed data augmentation procedure for each dataset. All operations were applied in a random order with a probability of 50%.

Specific tasks

This section presents how MICRA-Net can be used to solve common tasks in microscopy images analysis.

Segmentation (and semantic segmentation) is solved by first extracting a detailed semantic feature map. The segmentation masks are obtained by thresholding the resultant feature map using a common thresholding algorithm (for example, Otsu or percentile thresholding). The dataset-specific thresholding is detailed in Supplementary Notes 1–5.

The detection task on the P. vivax and EM microscopy dataset is solved by predicting the probability of the presence of an object on all extracted crops. The overlap between the crops is 75% in both directions. Overlapping crops are averaged and reassigned to an output feature map of the same shape as the image. The detection threshold is inferred from the validation set using a precision–recall curve.

The semantic instance segmentation task is required on the CTC dataset. MICRA-Net is required to predict (1) the presence of an object and (2) the contact between objects. The grad-CAMs of the activated objects are extracted from the architecture and combined using a PCA. If a contact is predicted on an image, the grad-CAM from L⁸, which contains the prominent information of the contact, is extracted. The contact feature map is subtracted from the object feature map as in some fully supervised techniques²⁷. An Otsu threshold is used to generate the semantic segmentation masks of the instances.

Datasets

The Modified MNIST dataset

We generated the modified MNIST training dataset by randomly sampling N digits from the original MNIST training dataset and randomly distributed them on a 256 × 256-pixel field of view. To avoid overlap between digits we used a random Poisson disc sampling algorithm with a radius size of 25 pixels⁵⁸. The number of digits N was uniformly sampled from {1, 2, 3, 4, 5, 10, 15, 20, Max}, where Max corresponds to the maximum number of digits that can be placed without overlap. A rotation of ±30°, uniformly sampled, was applied to the digits before placement on the image. We applied, in a random order, a Gaussian blur with sigma uniformly sampled in [0, 2] and artificial normalized Poisson noise with ${\lambda }={\frac{\sqrt{255}}{2}}$ to mimic slight variations akin to those that may be observed in microscopy images. The resulting image intensities were clipped to lie in [0, 1]. Using this technique, we generated 2,000 and 1,000 images for training and validation, respectively.

The modified MNIST testing dataset consists of 1,000 images of handwritten digits sampled from the original MNIST testing dataset. As for the training dataset, we also applied, in random order, Gaussian blur and artificial normalized Poisson noise, sampled as before.

The F-actin dataset

The F-actin dataset was generated by using a sliding window (256 × 256 pixels) with a stride of 192 pixels over 260 complete images with an approximate size of 1,000 × 1,000 pixels. Because the super-resolution microscopy images used are mostly composed of background, we set out to keep the crops containing at least 10% of dendritic area, thereby reducing the number of crops to identify. The dendritic mask was obtained from the foreground detection on the confocal imaged of the dendritic marker MAP2 using a global Otsu thresholding on the normalized Gaussian blurred image^2,59. The sigma parameter of the Gaussian blur was set to 20 pixels as it provided suitable dendrite detection over a wide range of images. We next annotated each generated crop as being positive to the presence of the F-actin periodical lattice or longitudinal fibres. The resulting training dataset contained 3,832 crops (256 × 256 pixels, 897 images positive to the periodical lattice and 1,456 positive to the longitudinal fibres), the validation dataset contained 1,287 crops (405 positive to the periodical lattice and 377 positive to fibres) and the testing dataset contained 416 crops (83 positive to the periodical lattice and 132 positive to fibres). The images $(x)$ were rescaled $(x^\prime)$ $x^\prime$ to lie in the [0, 1] interval. The maximum value for scaling (max) was obtained by sampling the maximal value of all training images from which we calculated the median in addition to 3 s.d. The minimum value was calculated as the median of minimas (min). To ensure a proper scaling of the images we also added a scaling factor of 0.8:

$${x^{\prime}} ={\frac{{x}-{\min }}{0.8({\max }-{\min })}}.$$

(3)

To evaluate the segmentation performance of the trained models, an expert precisely highlighted the contours of the structures in 50 images (25 images positive to the periodical lattice and 25 images positive to fibres) randomly sampled from the testing set. The contours of the structures were proofed by a second expert user. This small segmentation dataset only served to compare the segmentation performances from MICRA-Net, weakly supervised baselines (U-Net, Mask R-CNN, Ilastik) and User-Study.

The CTC dataset

We selected six cell-line datasets from the CTC⁷: the DIC-C2DH-HeLa dataset, which was acquired using differential interferometry contrast microscopy, three non-synthetic fluorescence microscopy datasets (Fluo-C2DL-MSC, Fluo-N2DH-GOWT1 and Fluo-N2DL-HeLa) and two phase contrast microscopy datasets (PhC-C2DH-U373 and PhC-C2DL-PSC). All original images were rescaled in the [0, 1] range using a per image min–max scale. We then resized each image and associated precise annotations according to the specific needs using bi-linear interpolation and nearest neighbours, respectively, with the Scikit-Image⁶⁰ Python library (Supplementary Table 7 provides the scaling factors). We used a sliding window of size 128 × 128 pixels or 256 × 256 pixels with a 25% overlap between crops in both directions. Using this sliding window technique yielded a total of 27,106 positive crops and 3,364 negative crops for the 256 × 256-pixel crops resized to have an effective pixel size of 0.5 μm. The sliding window with size 128 × 128-pixel crops and resized to have single cells in the field of view yielded a total of 66,466 positive crops (20,724 positive to contact) and 88,722 negative crops for training and 17,621 positive crops (5,606 positive to contact) and 22,279 negative crops for validation. We simulated weak annotations from the precise contours of the cells provided in the original CTC dataset by identifying an image crop as positive if the corresponding annotated crop contained at least the size of the average annotated cell, and negative otherwise. To evaluate the segmentation and detection tasks, we manually segmented four images randomly sampled per cell line in the testing set. The images were annotated by one person and verified by a second person. Both were experts in optical microscopy and very well trained in image annotation processes.

The P. vivax dataset

We used image set BBBC041v1, available from the Broad Bioimage Benchmark Collection³³. The complete dataset contained 1,327 three-channel images and was already split into training (1,207 images) and testing (120 images) sets. The dataset is composed of blood smears that were stained with Giemsa reagent and acquired on three different brightfield microscopes from three different laboratories. All blood smears (infected or uninfected) were annotated using bounding boxes. The blood smears were later classified as infected (gametocytes, rings, trophozoites and schizonts) or uninfected (red blood cells and leukocytes) by an expert. The task was to differentiate infected from uninfected blood smears. The dataset is highly unbalanced towards red blood cells, which compose over 95% of the annotated cells.

For training and testing we applied a whitening normalization (null mean and s.d. of 1) to each image (and channel) to minimize the impact of a very different intensity distribution. The binary targets for training were generated using the provided bounding boxes. A crop was considered positive if it contained at least 5% of overlap with an infected cell (otherwise negative). The crops were 256 × 256 pixels.

We manually extracted and precisely annotated all infected cells in the testing set, resulting in 303 small crops of size 256 × 256 pixels centred on the cell of interest. Each cell was annotated by a person and verified by a second person. Both were experts in optical microscopy and very well trained in image annotation processes.

The SEM dataset

The SEM dataset contained 92 images of 10,240 × 10,240 pixels for training, 66 for validation and 44 for testing. An expert annotated the images using positional markers to locate the axon DAB markers. On average, the large fields of view contained three small detections (113 × 113 pixels, between one and ten detections per image). This resulted in an annotation time of ~30 min per field of view. Training and inference were performed on 512 × 512-pixel crops. The dataset contained all positive crops (1,024 × 1,024 pixels, centred on the axon DAB markers) and all negative crops (without overlap). To manually annotate the images, the expert inverted the acquired images. Hence, we provided MICRA-Net with the inverted image to mimic the expert task. We rescaled the provided 8-bit depth images in the [0, 1] range by dividing by a scalar value of 255.

All axon DAB markers were extracted from the testing set (170 positive markers) and an expert carefully identified their contours.

Evaluation procedure

Classification

The classification accuracy of MICRA-Net was evaluated by inferring the testing images. To quantitatively assess the performances, the classification accuracy was calculated for each trained model. We reported the mean ± s.d. of the trained models.

Detection

The centroid of each detected object was obtained from MICRA-Net by using the dataset-specific procedures detailed in Supplementary Notes 1–5. Each detected centroid was associated with the centroid of objects in the ground-truth mask using the Hungarian algorithm⁶¹ with a maximal distance of N pixels, where N is approximately the object radius. In this context, an associated detected object is considered as a true positive, a non-associated detected object is a false positive and a missed ground-truth object is a false negative. To evaluate the detection capability of MICRA-Net, we reported the F1 score. For a quantitative comparison, we repeated the evaluation for each trained model. We then bootstrapped the average of the trained models to show the bootstrapped mean and 95% confidence interval (CI) (10,000 repetitions).

Segmentation

The segmentation performance of the trained models was evaluated using three common evaluation metrics: F1 score, IOU and SBD⁶². If multiple instances of a model were trained on the same task, we bootstrapped the average of the trained models to show the bootstrapped mean and 95% confidence interval (10,000 repetitions).

Instance segmentation

Before evaluation we removed small objects (<20 × 20 pixels) from the segmentation mask and filled holes for all trained models. All segmentation masks were resized to the baseline scale (Supplementary Table 7) for proper comparison. The instance segmentation performance were evaluated using the method proposed by ref. ²⁷ (Supplementary Figs. 15–18). Briefly, this method evaluates the detection and failures of the architecture dependent on the IOU. Reference ²⁷ used a minimal IOU of 0.5 to avoid multiple predicted objects being associated with a ground-truth object. The goal is to maximize the F1 score versus IOU, while the failure modes should be minimized. We, on the other hand, solved the association between the ground-truth and predicted objects using the Hungarian algorithm⁶¹, allowing us to report the performance and failure modes across the entire range of IOU. Using a broader range of IOU allows reporting the performance in instance detection and segmentation. The normalized area under the resultant curves for each trained model is bootstrapped to obtain the mean and 95% confidence interval (10,000 repetitions) and is reported in Fig. 4.

Custom performance metrics

The F-actin periodical lattice is detected as an oscillating pattern between high- and low-intensity stripes with 180–190-nm periodicity³². We designed a metric that would take this periodicity into account to evaluate the MICRA-Net detailed segmentation performance. We computed, as a baseline, the Fourier transform (FT) of the original image (FT_b) and the FT of the segmented regions: for the expert (FT_e), and for the predicted segmentation masks (FT_pred). The variation from the baseline was computed as the difference in the FT spectrum, for spatial frequencies in the range [170, 200 nm], between FT_e,pred and FT_b over the sum of FT_b. A smaller absolute difference between the variation of the expert and the variation of the predicted mask implies more similar segmentation.

Given that F-actin fibres are contiguous and have a high intensity on the dendrites, we designed a metric that would use the distribution of pixels under a segmented mask. The rational behind this metric is that the F-actin nanostructures on dendrites are composed of both high- and low-intensity pixels. Because F-actin fibres have high intensities, a detailed segmentation of fibres would imply few low-intensity pixels annotated, while a coarse segmentation would introduce more low-intensity identified pixels. Hence, we considered a pixel within the segmentation mask as part of a fibre if its value was superior to a given threshold. We calculated this threshold by first measuring the 25th percentile of pixel intensities outside of the expert mask for all images. We then extracted the 90th percentile intensity values from all images containing F-actin fibres. This resulted in a threshold between high- and low-intensity pixels within the dendritic mask of 9.

User-Study

We conducted two different examples of User-Study in this Article, one for the F-actin nanostructure segmentation and one for the instance segmentation on the CTC. All participants were trained microscopists who are familiar with biomedical images.

F-actin segmentation

We performed a User-Study in which six participants highlighted the contours of the F-actin periodical lattice and longitudinal fibres on a small dataset of 50 images using polygonal bounding boxes. We used polygonal bounding boxes as this annotation method reduces the time required by a participant by more than threefold compared to precisely identifying the boundaries of the structures (Supplementary Fig. 9). We used our own annotation application that was optimized for this type of task. Annotation of the full dataset required ~40 min for the participants. The averaged performance of the six participants was compared to MICRA-Net using F1 score, IOU and SBD.

CTC instance segmentation

A User-Study was conducted using the CTC to analyse the required time per cells and the achievable performance of inter-participant annotation for such a task. The User-Study consisted of annotation of the 24 testing images using different levels of supervision (precise, bounding boxes and points). For each level of supervision, the participants were asked to annotate a quarter of the testing image, which was the same for all participants. The image intensity scale was set at a constant value for all participants. The participants used Fiji software to annotate the images. The medians of the participant scores on the testing set are reported, as well as the inter-participant scores. The time required by the participant to annotate each image was recorded, which allowed us to calculate the time per cell for each cell line. Representative examples from the participant annotations are provided in Supplementary Figs. 1–3 and the specific curves per cell line in Supplementary Fig. 4. The images annotated for the User-Study were not used in the testing phase of MICRA-Net and other baselines.

In-house datasets acquisition

Sample and STED imaging of F-actin

Before dissection of hippocampi, neonatal Sprague–Dawley rats were euthanized by decapitation, in accordance with procedures approved by the Animal Care Committee of Université Laval. Dissociated cells were plated on poly-d-lysine-coated glass coverslips, fixed and immunostained as described previously². F-actin was stained with Phalloidin-STAR635 (Abberior) and dendrites with microtubule-associated-protein (MAP2)². STED images of the F-actin nanostructures were acquired on a four-colour Abberior Expert-Line STED microscope (Abberior Instruments), equipped with a ×100 1.4 NA oil objective and using pulsed (40 MHz) excitation (640 nm) and depletion (775 nm) lasers. Fluorescence was detected with an avalanche photodiode (APD) and an ET685/70 (Chroma) fluorescence filter. Pixel size was set to 20 nm.

Animals and stereotaxic injections for the SEM dataset

This study was carried out on three-month-old mice, weighing 25–35 g. Animals were housed under a 12-h light–dark cycle with water and food ad libitum. All procedures were approved by the Comité de Protection des Animaux de l’Université Laval, in accordance with the Canadian Council on Animal Care’s Guide to the Care and Use of Experimental Animals (Ed2), and with the ARRIVE guidelines. Maximum efforts were made to minimize the number of animals used. Transgenic e-Pet Cre mice expressing Cre recombinase under the control of Fev promoter, known to be specific for serotonin (5-HT) neurons⁶³, were injected in the dorsal raphe nucleus (DRN) with 1 μl of AAV9-CAG-DIO-APEX2NES-WPRE. Stereotaxic injections were performed using a 30° angle along the frontal plane at anteroposterior, −4.78; mediolateral, +2.00; dorsoventral, −3.20. In these injected transgenic mice, the small engineered peroxidase APEX2⁴¹ is specifically expressed in the cytosol/cytoplasm of 5-HT-infected neurons of the DRN and is used, in the presence of hydrogen peroxide, to oxidize the DAB chromogen, which is readily visible at the light and electron microscope levels.

Tissue preparation for the SEM dataset

After a period of 21 days following stereotaxic injection, the mice were anaesthetized with a mixture of ketamine (100 mg kg⁻¹) and xylazine (10 mg kg⁻¹) and transcardially perfused with 50 ml of phosphate-buffered-saline (PBS; 50 mM at pH 7.4) followed by 150 ml of 4% paraformaldehyde (PFA) and 1% glutaraldehyde diluted in phosphate buffer (PB; 100 mM at pH 7.4). Brains were dissected out, post-fixed for 24 h in the same fixative solution and cut with a vibratome (model VT1200; Leica) into 50-μm-thick frontal sections, which were serially collected in sodium PBS (100 mM, pH 7.4). Frontal brain sections at the level of the subthalamic nucleus (STN) were processed to reveal the presence of APEX2 in axons arising from DRN-infected neurons using DAB (cat. no. D5637, Sigma-Aldrich) as the chromogen. Briefly, selected 50-μm-thick sections were washed three times in PBS and then twice in Tris. Sections were then incubated for 1 h in 0.05% DAB solution diluted in Tris, then for 1 h in 0.05% DAB solution containing 0.015% hydrogen peroxide. The sections were then rinsed twice in Tris and three times in PBS, then temporary mounted in PBS and coverslipped for light microscope examination. STN sections containing DAB-labelled axons were selected for further processing. These sections were washed three times in phosphate buffer (PB), then incubated for 1 h in 2% osmium tetroxide diluted in 1.5% potassium ferrocyanide solution. They were then washed three times in ddH₂O, incubated for 20 min in 1% thiocarbohydrazide (TCH) solution and washed again three times in ddH₂O. The sections were placed for 30 min in 2% osmium tetroxide and washed three times in ddH₂O, then dehydrated in ethanol and propylene oxide and flat-embedded in Durcupan (Electron Microscopy Science). Areas of interest were cut from embedded sections and glued to the tip of resin blocks. The blocks were cut with an ultramicrotome (Leica EM UC7) into ultrathin sections (80 nm), which were serially collected on a silicon-coated 10 × 10-mm chip wafer (Ted Pella; #16006).

SEM

Serial sections were imaged in a SEM (Zeiss Gemini 540) with the help of the ATLAS acquisition software. Images were acquired at a resolution of 5 nm per pixel, using an acceleration voltage of 1.4 kV and current of 1.2 nA. Serial sections acquisitions produced a stack of 38 rectangle images of 25,370 × 25,633 pixels (126.850 × 128.165 μm) taken out of 38 ultrathin sections. In addition, a large single section acquisition was acquired and produced a single trapezoidal image of 31,065 pixels for the small base (155.329 μm), 91,393 pixels for the large base (456.967 μm) and 53,161 pixels for the height (265.809 μm). All acquired images were subdivided into overlapping square tiles of 10,240 × 10,240 pixels (51.2 × 51.2 μm).

Statistical assessment using resampling

Resampling was used as a statistical test to verify the statistical difference between two groups⁶⁴. Statistical analysis was performed using a randomization test with the null hypothesis being that the different conditions (A, B) belong to the same distribution. The absolute difference between mean values of A and B was calculated (D_gt = ∣μ_A − μ_B∣). For the randomization test, each value belonging to A and B was randomly reassigned to A′ and B′, with the sizes of A′ and B′ being N_A and N_B, respectively. The absolute difference between the mean values of A′ and B′ was determined (${D}_{{{{\rm{rand}}}}}=| {\mu }_{{\rm{A}}^{\prime} }-{\mu }_{{\rm{B}}^{\prime} }|$) and the randomization test was repeated 10,000 times. The obtained distribution was compared with the absolute difference of the mean of A and B (D_gt) to verify the null hypothesis.

When the number of groups was greater than two, the F-statistic was sampled from each group using a resampling method. The F-statistic was calculated from all groups (A, B, C and so on) as a ground truth (F_gt). Each value was randomly reassigned to new groups (A′, B′, C′ and so on) where group X′ has the same size as group X. The F-statistic of newly formed groups (F_rand) was calculated and this process was repeated 10,000 times. We compared F_rand with F_gt to confirm the null hypothesis that the groups have the same mean distribution. When the null hypothesis was rejected, that is, at least one group did not have the same mean distribution, we compared each group in a one-to-one manner using the randomization test described above. In all cases, a confidence level of 0.05 was used to reject the null hypothesis. Because the precision of the calculation of the P value is limited to ${\frac{1}{N}}$, where N is the number of repetitions, we report a P value of <1.0000 × 10⁻⁴ instead of 0.

Evaluation of the time required for fully supervised training

For F-actin, the mean annotation time per crop (50) was calculated using the precisely annotated dataset. For CTC, the mean image annotation time of 900 s was obtained from the precise annotation of each image of the testing set. For the P. vivax dataset, the annotation time for fully supervised annotations was estimated at 2 min per image from the precise annotation of ten images. For EM, the required annotation time was calculated as the average time required by the expert per image (30 min per image, 156 images) to detect all axon DAB markers. We added 14 s (calculated from highlighting the contours of the axon DAB regions on the testing set) for each positive detection (537 detections) to account for precise annotation.

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this Article.

Data availability

The MNIST, Cell Tracking Challenge and P. vivax datasets are all publicly available online. The F-actin and EM dataset are available at https://s3.valeria.science/flclab-micranet/index.html.

Code availability

Open source code for the MICRA-Net approach is available at https://github.com/FLClab/MICRA-Net and https://doi.org/10.5281/zenodo.5949132.

References

Schermelleh, L. et al. Super-resolution microscopy demystified. Nat. Cell Biol. 21, 72–84 (2019).
Article Google Scholar
Lavoie-Cardinal, F. et al. Neuronal activity remodels the F-actin based submembrane lattice in dendrites but not axons of hippocampal neurons. Sci. Rep. 10, 11960 (2020).
Article Google Scholar
Schlegl, T., Seeböck, P., Waldstein, S. M., Langs, G. & Schmidt-Erfurth, U. f-AnoGAN: fast unsupervised anomaly detection with generative adversarial networks. Med. Image Anal. 54, 30–44 (2019).
Article Google Scholar
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
Article Google Scholar
Caicedo, J. C. et al. Nucleus segmentation across imaging experiments: the 2018 Data Science Bowl. Nat. Methods 16, 1247–1253 (2019).
Article Google Scholar
Moen, E. et al. Deep learning for cellular image analysis. Nat. Methods 16, 1233–1246 (2019).
Article Google Scholar
Ulman, V. et al. An objective comparison of cell-tracking algorithms. Nat. Methods 14, 1141–1152 (2017).
Article Google Scholar
Falk, T. et al. U-Net: deep learning for cell counting, detection and morphometry. Nat. Methods 16, 67–70 (2019).
Article Google Scholar
He, K., Gkioxari, G., Dollár, P. & Girshick, R. Mask R-CNN. In Proc. IEEE International Conference on Computer Vision 2961–2969 (IEEE, 2017).
Kromp, F. et al. An annotated fluorescence image dataset for training nuclear segmentation methods. Sci. Data 7, 262 (2020).
Article Google Scholar
Stringer, C., Wang, T., Michaelos, M. & Pachitariu, M. Cellpose: a generalist algorithm for cellular segmentation. Nat. Methods 18, 100–106 (2021).
Article Google Scholar
Cheplygina, V., de Bruijne, M. & Pluim, J. P. W. Not-so-supervised: a survey of semi-supervised, multi-instance, and transfer learning in medical image analysis. Med. Image Anal. 54, 280–296 (2019).
Article Google Scholar
Papandreou, G., Chen, L.-C., Murphy, K. P. & Yuille, A. L. Weakly-and semi-supervised learning of a deep convolutional network for semantic image segmentation. In Proc. IEEE International Conference on Computer Vision 1742–1750 (IEEE, 2015).
Khoreva, A., Benenson, R., Hosang, J., Hein, M. & Schiele, B. Simple does it: weakly supervised instance and semantic segmentation. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 876–885 (IEEE, 2017).
Xu, J., Schwing, A. G. & Urtasun, R. Tell me what you see and I will show you where it is. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 3190–3197 (IEEE, 2014).
Pesce, E. et al. Learning to detect chest radiographs containing pulmonary lesions using visual attention networks. Med. Image Anal. 53, 26–38 (2019).
Article Google Scholar
Rajchl, M. et al. DeepCut: object segmentation from bounding box annotations using convolutional neural networks. IEEE Trans. Med. Imaging 36, 674–683 (2016).
Article Google Scholar
Yang, L. et al. BoxNet: deep learning based biomedical image segmentation using boxes only annotation. Preprint at https://arxiv.org/abs/1806.00593 (2018).
Lin, T.-Y. et al. Microsoft COCO: common objects in context. In Proc. Computer Vision—ECCV 2014. Lecture Notes in Computer Science Vol. 8693 (eds Fleet, D. et al.) 740–755 (Springer, 2014).
Vezhnevets, A., Ferrari, V. & Buhmann, J. M. Weakly supervised structured output learning for semantic segmentation. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 845–852 (IEEE, 2012).
Dubost, F. et al. Weakly supervised object detection with 2D and 3D regression neural networks. Med. Image Anal. 65, 101767 (2020).
Article Google Scholar
Li, J. et al. An EM-based semi-supervised deep learning approach for semantic segmentation of histopathological images from radical prostatectomies. Comput. Med. Imaging Graph. 69, 125–133 (2018).
Article Google Scholar
Kraus, O. Z., Ba, J. L. & Frey, B. J. Classifying and segmenting microscopy images with deep multiple instance learning. Bioinformatics 32, i52–i59 (2016).
Article Google Scholar
Ouyang, W. et al. Analysis of the Human Protein Atlas Image Classification competition. Nat. Methods 16, 1254–1261 (2019).
Article Google Scholar
Long, R. K. M. et al. Super resolution microscopy and deep learning identify Zika virus reorganization of the endoplasmic reticulum. Sci. Rep. 10, 20937 (2020).
Article Google Scholar
Chatterjee, B. & Poullis, C. Semantic segmentation from remote sensor data and the exploitation of latent learning for classification of auxiliary tasks. Preprint at https://arxiv.org/abs/1912.09216 (2019).
Caicedo, J. C. et al. Evaluation of deep learning strategies for nucleus segmentation in fluorescence images. Cytometry A 95, 952–965 (2019).
Article Google Scholar
Selvaraju, R. R. et al. Grad-CAM: visual explanations from deep networks via gradient-based localization. In Proc. IEEE International Conference on Computer Vision 618–626 (IEEE, 2017).
Berg, S. et al. ilastik: interactive machine learning for (bio)image analysis. Nat. Methods 16, 1226–1232 (2019).
Article Google Scholar
LeCun, Y. et al. Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998).
Article Google Scholar
Ronneberger, O., Fischer, P. & Brox, T. U-Net: convolutional networks for biomedical image segmentation. In Proc. International Conference on Medical Image Computing and Computer-Assisted Intervention (eds Navab, N.) 234–241 (Springer, 2015).
Xu, K., Zhong, G. & Zhuang, X. Actin, spectrin and associated proteins form a periodic cytoskeletal structure in axons. Science 339, 452–456 (2013).
Article Google Scholar
Ljosa, V., Sokolnicki, K. L. & Carpenter, A. E. Annotated high-throughput microscopy image sets for validation. Nat. Methods 9, 637 (2012).
Article Google Scholar
Kromp, F. et al. Evaluation of deep learning architectures for complex immunofluorescence nuclear image segmentation. IEEE Trans. Med. Imaging 40, 1934–1949 (2021).
Article Google Scholar
Graham, S. et al. Hover-Net: simultaneous segmentation and classification of nuclei in multi-tissue histology images. Med. Image Anal. 58, 101563 (2019).
Article Google Scholar
Belthangady, C. & Royer, L. A. Applications, promises, and pitfalls of deep learning for fluorescence image reconstruction. Nat. Methods 16, 1215–1225 (2019).
Article Google Scholar
Weigert, M. et al. Content-aware image restoration: pushing the limits of fluorescence microscopy. Nat. Methods 15, 1090–1097 (2018).
Article Google Scholar
Hung, J. & Carpenter, A. Applying faster R-CNN for object detection on malaria images. In Proc. IEEE Conference on Computer Vision and Pattern Recognition Workshops 56–61 (IEEE, 2017).
Hung, J. et al. Keras R-CNN: library for cell detection in biological images using deep neural networks. BMC Bioinformatics 21, 300 (2020).
Article Google Scholar
Depto, D. S. et al. Automatic segmentation of blood cells from microscopic slides: a comparative analysis. Tissue Cell 73, 101653 (2021).
Article Google Scholar
Lam, S. S. et al. Directed evolution of APEX2 for electron microscopy and proximity labeling. Nat. Methods 12, 51–54 (2015).
Article Google Scholar
Bekker, J. & Davis, J. Learning from positive and unlabeled data: a survey. Mach. Learn. 109, 719–760 (2020).
Article MathSciNet MATH Google Scholar
Kreshuk, A., Koethe, U., Pax, E., Bock, D. D. & Hamprecht, F. A. Automated detection of synapses in serial section transmission electron microscopy image stacks. PLoS ONE 9, e87351 (2014).
Article Google Scholar
Jagadeesh, V. et al. Synapse classification and localization in electron micrographs. Pattern Recognit. Lett. 43, 17–24 (2014).
Article Google Scholar
Gómez-de-Mariscal, E. et al. Deep-learning-based segmentation of small extracellular vesicles in transmission electron microscopy images. Sci. Rep. 9, 13211 (2019).
Article Google Scholar
Christiansen, E. M. et al. In silico labeling: predicting fluorescent labels in unlabeled images. Cell 173, 792–803 (2018).
Article Google Scholar
Caruana, R. Multitask learning. Mach. Learn. 28, 41–75 (1997).
Article Google Scholar
Girshick, R. Fast R-CNN. In Proc. IEEE International Conference on Computer Vision 1440–1448 (IEEE, 2015).
Ruder, S. An overview of multi-task learning in deep neural networks. Preprint at https://arxiv.org/abs/1706.05098 (2017).
Mathis, A. et al. DeepLabCut: markerless pose estimation of user-defined body parts with deep learning. Nat. Neurosci. 21, 1281–1289 (2018).
Article Google Scholar
He, K., Girshick, R. & Dollár, P. Rethinking ImageNet pre-training. In Proc. IEEE International Conference on Computer Vision 4918–4927 (IEEE, 2019).
Raghu, M., Zhang, C., Kleinberg, J. & Bengio, S. Transfusion: understanding transfer learning for medical imaging. In Advances in Neural Information Processing Systems 32 (eds Wallach, H. et al.) 3347–3357 (Curran Associates, 2019).
Mazzara, G. P., Velthuizen, R. P., Pearlman, J. L., Greenberg, H. M. & Wagner, H. Brain tumor target volume determination for radiation treatment planning through automated MRI segmentation. Int. J. Radiat. Oncol. Biol. Phys. 59, 300–312 (2004).
Article Google Scholar
Eliceiri, K. W. et al. Biological imaging software tools. Nat. Methods 9, 697–710 (2012).
Article Google Scholar
Hotelling, H. Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 24, 417–441 (1933).
Article MATH Google Scholar
Paszke, A. et al. Pytorch: an imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 32, 8024–8035 (2019).
Google Scholar
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. In 3rd International Conference on Learning Representations (eds Bengio, Y. & LeCun, Y.) (ICLR, 2015).
Cook, R. L. Stochastic sampling in computer graphics. ACM Trans. Graph. (TOG) 5, 51–72 (1986).
Article Google Scholar
Otsu, N. A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9, 62–66 (1979).
Article Google Scholar
Van der Walt, S. et al. scikit-image: image processing in Python. PeerJ 2, e453 (2014).
Article Google Scholar
Kuhn, H. W. The Hungarian method for the assignment problem. Naval Res. Logistics Q. 2, 83–97 (1955).
Article MathSciNet MATH Google Scholar
Yeghiazaryan, V. & Voiculescu, I. D. Family of boundary overlap metrics for the evaluation of medical image segmentation. J. Med. Imaging 5, 015006 (2018).
Article Google Scholar
Scott, M. M. et al. A genetic approach to access serotonin neurons for in vivo and in vitro studies. Proc. Natl Acad. Sci. USA 102, 16472–16477 (2005).
Article Google Scholar
Good, P. I. Resampling Methods 3rd edn (Birkhäuser, 2006).

Download references

Acknowledgements

We acknowledge the following: L. Emond for F-actin sample preparation and immunocytochemistry; F. Nault, C. Salesse and L. Emond for the neuronal cell culture; J. Marek and R. Bernatchez for the development of a custom Python annotation application; T. Dhellemmes for inter-expert axon DAB annotations in EM images; C. Gagné and M.-A. Gardner for preliminary discussions on semantic segmentation; A. Schwerdtfeger and A. Gabela for careful proofreading of the manuscript. Funding was provided by grants from the Natural Sciences and Engineering Research Council of Canada (RGPIN-2017-06171, P.D.K.; RGPIN-2018-06264, M.P.; RGPIN-2019-06704, F.L.-C.), Canadian Institutes of Health Research (153107, P.D.K.; 470155, M.P.), Neuronex Initiative (Fond de Recherche du Québec—Santé; 295823, P.D.K. and F.L.-C.), CERVO Brain Research Center Foundation (F.L.-C.) and the Canadian Foundation for Innovation (32786, P.D.K.; 39088, F.L.-C.). F.L.-C. is a Canada Research Chair Tier II (CRC-2019-00126, F.L.-C.), A.D. is a CIFAR AI Chair, and A.B. is supported by a PhD scholarship from the Fonds de Recherche du Québec—Nature et Technologie (FRQNT) and an excellence scholarship from the FRQNT strategic cluster UNIQUE.

Author information

Authors and Affiliations

CERVO Brain Research Center, Québec, Québec, Canada
Anthony Bilodeau, Constantin V. L. Delmas, Martin Parent, Paul De Koninck, Audrey Durand & Flavie Lavoie-Cardinal
Institute for Intelligence and Data, Québec, Québec, Canada
Anthony Bilodeau, Audrey Durand & Flavie Lavoie-Cardinal
Department of Psychiatry and Neuroscience, Université Laval, Québec, Québec, Canada
Martin Parent & Flavie Lavoie-Cardinal
Department of Biochemistry, Microbiology, and Bioinformatics, Université Laval, Québec, Québec, Canada
Paul De Koninck
Department of Computer Science and Software Engineering, Université Laval, Québec, Québec, Canada
Audrey Durand
Department of Electrical and Computer Engineering, Université Laval, Québec, Québec, Canada
Audrey Durand
Canada CIFAR AI Chair, Québec, Québec, Canada
Audrey Durand

Authors

Anthony Bilodeau
View author publications
You can also search for this author in PubMed Google Scholar
Constantin V. L. Delmas
View author publications
You can also search for this author in PubMed Google Scholar
Martin Parent
View author publications
You can also search for this author in PubMed Google Scholar
Paul De Koninck
View author publications
You can also search for this author in PubMed Google Scholar
Audrey Durand
View author publications
You can also search for this author in PubMed Google Scholar
Flavie Lavoie-Cardinal
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.B. and F.L.-C. designed the approach. A.B. implemented the neuronal network architectures, generated the modified MNIST dataset, created the annotation application for the user study and performed all deep learning experiments. A.B., A.D. and F.L.-C. analysed the results. F.L.-C. acquired and annotated the F-actin dataset. C.V.L.D. and M.P. generated and provided the annotated EM dataset. F.L.-C., A.D. and P.D.K. supervised the project. F.L.-C., A.D. and A.B. wrote the manuscript.

Corresponding author

Correspondence to Flavie Lavoie-Cardinal.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Alexander Krull and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Graphical user interface (GUI) developed to facilitate the visualisation of the extracted local maps from MICRA-Net.

Graphical user interface (GUI) developed to facilitate the visualisation of the extracted local maps from MICRA-Net. The detailed instructions to use the application can be found on the GitHub repository (https://github.com/FLClab/MICRA-Net). Briefly, one can load a trained MICRA-Net model and an image to predict the presence of a specific structure. The GUI shows the extracted local maps L¹⁻⁸ to the user for each activated class of the selected image. The user can select the desired local maps which are combined into a detailed feature map that can be thresholded to generate a final segmentation mask.

Extended Data Fig. 2 Representative images of F-actin semantic segmentation on dendrites for both structures.

Representative images of F-actin semantic segmentation on dendrites for both structures (fibers and periodical lattice [rings]). From left to right, the fine segmentation from the Expert, MICRA-Net, weakly supervised U-Net, weakly supervised Mask R-CNN and Ilastik are shown. The color code maps true positive (TP, green), false positive (FP, yellow) and false negative (FN, red) segmentation for each method compared to the fine Expert labels. A red arrow indicates a region in the periodical lattice image missed by the Expert. Scale bars 1μm.

Extended Data Fig. 3 Representative examples of the instance segmentation procedure using MICRA-Net for two cell lines of the Cell Tracking Challenge.

Representative examples of the instance segmentation procedure using MICRA-Net for two cell lines of the Cell Tracking Challenge (top: PhC-C2DL-PSC, bottom: Fluo-N2DL-HeLa). Shown are the input image (left), the PCA decomposition of the raw feature maps extracted from layers L¹⁻⁷ of MICRA-Net for the cell prediction (middle, and the grad-CAM of layer L⁸ for semantic contact (right). Scale bars: 25 μm.

Extended Data Fig. 4 Schematic of the training and fine-tuning procedure for MICRA-Net on the P. Vivax dataset.

Schematic of the training and fine-tuning procedure for MICRA-Net on the P. Vivax dataset. a) Data preparation: 80/20 split of the provided training set is used for training and validation respectively, keeping the testing set as is. b) Fine-tuning of MICRA-Net: uniform sample of {12, 24, 36} images from the testing set. A 3-fold scheme is used: training on two folds and validating on a separate fold, enabling early stopping. The 3-fold allowed to calculate the total number of epochs to train each model and to set the detection thresholds. All methods were tested on the same testing set of 84 images. c) Training: 5 different models were trained on the original dataset (Naive). For fine-tuning, the 3-fold scheme was repeated 5 times, one time for each of the 5 Naive models as starting points, generating a total of 25 models. Thus, allowing to stop the fine-tuning at a specific epoch.

Supplementary information

Supplementary Information

Supplementary Figs. 1–24, Tables 1–30 and Notes 1–5.

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bilodeau, A., Delmas, C.V.L., Parent, M. et al. Microscopy analysis neural network to solve detection, enumeration and segmentation from image-level annotations. Nat Mach Intell 4, 455–466 (2022). https://doi.org/10.1038/s42256-022-00472-w

Download citation

Received: 29 June 2021
Accepted: 01 March 2022
Published: 14 April 2022
Issue Date: May 2022
DOI: https://doi.org/10.1038/s42256-022-00472-w

This article is cited by

Efficient end-to-end learning for cell segmentation with machine generated weak annotations
- Prem Shrestha
- Nicholas Kuang
- Ji Yu
Communications Biology (2023)
Uncertainty-guided dual-views for semi-supervised volumetric medical image segmentation
- Himashi Peiris
- Munawar Hayat
- Mehrtash Harandi
Nature Machine Intelligence (2023)
Resolution enhancement with a task-assisted GAN to guide optical nanoscopy image analysis and acquisition
- Catherine Bouchard
- Theresa Wiesner
- Flavie Lavoie-Cardinal
Nature Machine Intelligence (2023)

Subjects

Abstract

Similar content being viewed by others

Main

Results

Annotation task reduction analysis

MICRA-Net architecture and baselines

Multi-class segmentation of synthetic images

Semantic segmentation of super-resolution microscopy images

Single-cell semantic segmentation

Multi-device analysis

Expert detection and segmentation assistance

Discussion

Methods

MICRA-Net

Architecture

Training procedure

Specific tasks

Datasets

The Modified MNIST dataset

The F-actin dataset

The CTC dataset

The P. vivax dataset

The SEM dataset

Evaluation procedure

Classification

Detection

Segmentation

Instance segmentation

Custom performance metrics

User-Study

F-actin segmentation

CTC instance segmentation

In-house datasets acquisition

Sample and STED imaging of F-actin

Animals and stereotaxic injections for the SEM dataset

Tissue preparation for the SEM dataset

SEM

Statistical assessment using resampling

Evaluation of the time required for fully supervised training

Reporting Summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links