Introduction

Breast-conserving surgery, or lumpectomy, for breast carcinoma, has been shown to be equivalent to mastectomy in terms of survival with the benefit of improved cosmesis [1]. Achieving negative margins (no carcinoma at the margins) is important as positive margins are associated with higher rates of local recurrence [2, 3]. Therefore, accurate microscopic analysis of margins by the pathologist is critical for determining the need for additional surgery and adjuvant radiation.

Various methods are used for orienting and evaluating the margins of lumpectomy specimens [4,5,6]. The “cavity-shave” method, the standard technique used at our institution, involves the surgeon excising additional segments of tissue from the wall of the lumpectomy cavity after the main specimen containing the carcinoma has been removed. This method, which allows intraoperative designation of the margin by the surgeon, has been associated with lower rates of positive margins and fewer re-excisions [7,8,9]. Evaluation of cavity shave margins is the preferred method for many pathologists because it does not require assessment of multiple differentially inked margins of the same specimen. However, microscopic review of these margin specimens can be time-consuming, as it requires examination of multiple additional hematoxylin and eosin (H&E) slides that most often show benign findings.

The development of machine learning approaches, particularly deep learning approaches [10] using convolutional neural networks, along with the rapidly advancing technology of digitalizing histologic slides, could potentially improve the efficiency of the pathologist’s daily workload. The latter could conceivably be accomplished in a setting in which all histologic slides are scanned prior to review by the pathologist, and in analogy to the automated Papanicolaou screening test, a machine learning model could screen WSIs of margins specimens before manual review, resulting in increased productivity. We used a deep convolutional neural network utilizing morphological features from high magnification and low magnification for more accurate segmentation called deep multi-magnification network (DMMN) [11] to study its potential utility as a screening tool for evaluation of WSIs from breast margin specimens.

Materials and methods

Case selection

After institutional review board approval, lumpectomy specimens containing invasive ductal carcinoma no special type (IDC) and/or DCIS were identified for which all slides in the case had previously been digitally scanned. Invasive lobular carcinoma cases and other special types of breast carcinoma were not studied. Neoadjuvant-treated cases were excluded. We selected cases of IDC and/DCIS of various histologic grades and included cases with IDC or DCIS in the shave margin slides as well as benign cases. All digital WSIs were reviewed by a breast pathologist to confirm the ground truth diagnosis for each WSI.

Digital slide scanning and slide viewer

Glass slides were scanned using Leica Aperio AT2 (Leica Biosystems, Buffalo Grove, Illinois, USA) whole slide scanners at ×40 (0.25 μm/pixel) equivalent magnification using a 20×/0.75NA Plan Apo objective and ×2 automatic optical magnification changer. WSIs are accessed through the PICSPlus module in CoPathPlus and launched into the MSK Slide Viewer, an internally developed whole slide scanner, vendor agnostic WSI viewer [12]. The MSK slide viewer allows for standard whole image viewing, slide label viewing, zooming, panning, and slide navigation. The viewer has tools for manual annotation including a ruler for measurements, tracking of viewed slide regions, screenshots, and comments [12].

Tissue segmentation and classification

To segment carcinoma from benign tissue, we used a convolutional neural network called DMMN with multi-encoder, multi-decoder, and multi-concatenation [11] (Fig. 1). The DMMN looks at patches in 20×, 10×, and 5× magnifications in 256 × 256 pixels centered at the same coordinate to extract features from both nuclei/texture from high magnification and architectural patterns from low magnification. Our DMMN model can accurately segment carcinomas by utilizing various features from multiple magnifications with a wider field of view. To train our model, we employed a pretrained segmentation model by triple-negative breast cancer (TNBC) dataset as our initial model [11]. We used deep interactive learning (DIaL) [13] to reduce the manual labeling process for finetuning the model. Specifically, we segmented carcinoma in breast margin WSIs using the initial model and manually labeled (or corrected) any false-positive or false-negative regions to update the model. Corrected patches are repeated twice to emphasize them during finetuning.

Fig. 1: Deep multi-magnification network (DMMN).
figure 1

Whole slide image (WSI) from a breast margin specimen with DCIS. The DMMN looks at a set of patches from multiple magnifications from the WSI allowing a wider field-of-view. The segmentation prediction image shows carcinoma highlighted in red, while the remaining tissue is highlighted yellow.

Once finetuning is complete, the model classifies WSIs as positive or negative for detecting carcinoma. If the number of pixels segmented as carcinoma in a WSI is greater than the set threshold, the image would be classified as positive. The model does not distinguish between invasive carcinoma and DCIS. All segmentation predictions images were reviewed for confirmation that the segmented pixels represented carcinoma in the WSI. In reviewing segmentation predictions, pixels that are classified as carcinoma are highlighted in red, while the remaining tissue is highlighted in yellow.

Sensitivity, specificity, and the area under the receiver operating characteristic (ROC) curve [14] were calculated to assess the performance of the machine learning model in segmenting carcinoma in margin samples.

Results

Cases

The validation set included 408 WSIs (mean, 4.3 WSIs/specimen; range 1–10) from 98 separate shave margin specimens from 20 patients (Table 1). These included 348 (85.3%) benign WSIs and 60 (14.7%) WSIs containing carcinoma. Forty WSIs contained DCIS (6 low-grade, 22 intermediate-grade, 12 high-grade) and 20 WSIs contained IDC (6 well-differentiated, 12 moderately-differentiated, and 2 poorly-differentiated), including 15 WSIs containing both IDC and DCIS.

Table 1 Summary of margin diagnoses in whole slide images analyzed in the validation and testing data sets.

The testing set, which was independent of the validation set, included 427 WSIs (mean, 4.1 WSIs/specimen; range 1–12) from 104 shave margin specimens from 20 patients (Table 1). These included 374 (87.6%) benign WSIs and 53 (12.4%) WSIs with carcinoma. Forty-one WSIs contained DCIS (five low-grade, 18 intermediate-grade, 18 high-grade) and 12 WSIs contained IDC (three well-differentiated, four moderately-differentiated, five poorly-differentiated), including four WSIs with both IDC and DCIS.

DMMN segmentation predictions

Validation set

The initial model applied to the validation set of WSIs achieved an area under the curve (AUC) = 0.941. A review of the WSIs that were incorrectly classified as carcinoma revealed that the false positive classifications were due to a variety of reactive changes included biopsy site changes/fat necrosis and stromal elastosis. Other causes of false-positive classification included fibrocystic changes including florid usual ductal hyperplasia, apocrine metaplasia, and cysts/cyst contents. Through further annotation, we corrected those false positive regions on 22 WSIs. After finetuning the model with DIaL, we observed reduced false positives on our segmentation predictions with the AUC = 0.968 (Fig. 2A). We selected the pixel classification threshold to be 105 pixels to achieve a sensitivity of 100% (57/57), at which threshold the specificity becomes 78%. With this pixel classification threshold, cancer regions larger than a circle with a radius of 90 µm would be detected in WSIs.

Fig. 2: Receiver-operating characteristic (ROC) curves for the machine learning model’s classification of margin specimens.
figure 2

ROC curves for the validation data set (A) and testing data set (B). AUC area under the ROC curve.

Testing set

We applied both our initial model and updated model to the testing set of WSIs. Figure 2B shows that the updated model improves the AUC = 0.927 compared to the initial model with the AUC = 0.900. Examples of segmentation predictions for both DCIS and IDC are shown in Figs. 35. With the same pixel classification threshold, 105 pixels, selected from the validation set, we achieved a sensitivity of 92% (47/51) and specificity of 78% for detecting carcinoma in margin WSIs. We observed four false-negative WSIs in the testing data set. These included a 0.5 mm focus of intermediate-grade DCIS with cautery artifact (Fig. 6A), a 1 mm focus of cribriform low-grade DCIS (Fig. 6B), and two cases in which foci of well-differentiated invasive carcinoma measuring 3 mm and 1.5 mm were present in the margins specimens (Fig. 6C, D, respectively). Of the 374 benign WSIs, 84 (22%) were incorrectly classified as carcinoma. The false positive WSIs resulted from a variety of artifacts such as electrocautery (Fig. 7A, B), tissue folds (Fig. 7C, D), and reactive and benign processes such as biopsy site changes (Fig. 7E, F) and apocrine metaplasia (Fig. 7G, H).

Fig. 3: Ductal carcinoma in situ (DCIS) in margin specimen with segmentation classifications.
figure 3

A Margin specimen with DCIS (scale bar = 2 mm). B Segmentation classification showing DCIS correctly classified (red). Adjacent benign breast glandular tissue is not classified as carcinoma. C Higher power view of DCIS (scale bar = 0.5 mm). D Segmentation prediction. E DCIS involving adenosis (scale bar = 0.2 mm). F Segmentation of foci in DCIS in red.

Fig. 4: Examples of ductal carcinoma in situ (DCIS) in margin specimens with segmentation classifications.
figure 4

A DCIS with pagetoid growth in a duct (left) and adjacent benign duct (right) (scale bar = 0.2 mm). B Segmentation prediction highlights DCIS in red. C High-grade DCIS in margin associated with reactive stroma and chronic inflammation (right) (scale bar = 0.5 mm). D Segmentation image showing correct classification of carcinoma.

Fig. 5: Invasive ductal carcinoma in margin specimens and segmentation predictions.
figure 5

A, C Two examples of invasive carcinoma in margin specimens ((A): scale bar = 0.5 mm; (C): scale bar = 0.5 mm). B, D Segmentation predictions show most of the carcinoma was correctly segmented red.

Fig. 6: False negative classifications from the testing set of whole slide images.
figure 6

A Intermediate-grade DCIS with cautery artifact (scale bar = 0.2 mm). B Low-grade cribriform DCIS (scale bar = 0.5 mm). C, D Two examples of well-differentiated invasive carcinoma ((C): scale bar = 0.2 mm; (D): scale bar = 0.2 mm).

Fig. 7: Examples of whole slide images from benign margin specimens that were falsely classified as positive by the machine learning model.
figure 7

A, B Benign lobule with cautery artifact ((A): scale bar = 0.1 mm). C, D Fold in tissue led to incorrect classification with the model ((C): scale bar = 0.5 mm). E, F Biopsy site with reactive epithelial changes ((E): scale bar = 0.2 mm). G, H Apocrine metaplasia incorrectly segmented as carcinoma ((G): scale bar = 0.5 mm).

Discussion

We applied a deep convolutional neural network called DMMN to WSIs from breast lumpectomy shave margin specimens and showed that this model could accurately segment carcinoma from benign tissue in these specimens. By using a previously developed segmentation prediction model in a validation set of WSIs from margin specimens followed by correction of incorrect segmentation predictions via manual annotation of additional WSIs, the model achieved high accuracy for detecting carcinoma with an AUC of 0.927, a sensitivity of 92%, and specificity of 78%.

A review of WSIs from the testing data set showed that false-negative classifications resulted from the model classifying 2 small foci of DCIS and 2 foci of well-differentiated invasive carcinoma as benign. Although the performance of the model for identifying carcinoma was excellent in this study, correctly classifying 92% of carcinoma WSIs, this rate would be unacceptable in actual clinical practice where the model would be used as a screening tool. The false-negative classifications were likely due in part to insufficient training of the model with low-grade carcinomas. Additional training of the model via manual annotation of low-grade DCIS and well-differentiated carcinoma is expected to improve its performance.

Convolutional neural networks have gained increasing popularity in computational pathology for carcinoma detection and characterization, and many studies have investigated their utility in breast cancer. The cancer metastases in lymph nodes challenge 2016 (CAMELYON16) was an international competition in which research groups were asked to develop algorithms to detect metastatic carcinoma in sentinel lymph nodes from breast cancer patients [15]. Results from the challenge showed that some convolutional neural network algorithms achieved a better AUC for identifying lymph node metastases than pathologists when given a time constraint [15]. The follow-up CAMELYON17 challenge showed that convolutional neural network algorithms were also able to stratify metastases according to pN stage, although submitted algorithms were poor in their ability to detect isolated tumor cells [16]. Other studies specific to breast cancer have used convolutional neural networks for quantifying tumor extent [17], mitosis detection [18, 19], predicting grade and molecular subytype [20], and for classifying tumor-associated vs benign stroma in core biopsies [21].

Ours is the first study that specifically used a machine learning model to assess breast shave margin WSIs. The cavity shave method of margin assessment for breast carcinoma lumpectomy specimens is the standard method at our institution and many others. The surgeon removes the entire targeted malignancy in one specimen that is submitted to pathology without orientation. In addition, 5–6 separate margin specimens are submitted that have been taken from the wall of the lumpectomy cavity and are oriented with a stitch indicating the true final margin. The area of the tissue containing the stitch is inked and the specimen is sectioned perpendicular to the true final margin. The pathologist examines multiple histologic sections from these specimens. In a busy breast pathology service, microscopic analysis of these margin specimens can be time-consuming and tedious, and a model such as DMMN could facilitate this process. In such a scenario, the model would be applied to scanned WSIs of margin specimens prior to review by the pathologist. The model could categorize and triage each individual slide as positive or negative for carcinoma and can direct the pathologist to regions containing carcinoma.

To mirror how pathologists examine microscopic slides, we used a convolutional neural network called DMMN in this analysis. This model is unique in that a set of patches from three different magnifications (5×, 10×, and 20×) are used as an input for training the model. Multiple magnifications rather than a patch-based method from one magnification [22] are employed, similar to how a pathologist uses multiple microscopic objectives. To train the model, partial annotation was performed by a pathologist on a set of invasive breast carcinoma WSIs from primary resection specimens. Tissue types that were annotated include carcinoma (invasive and in situ), benign glandular breast tissue, fibrous stroma, adipose tissue, necrosis, and background (no tissue present). Partial annotation of one WSI takes ~30 min. Following training of the model, multi-class segmentation of breast margin WSIs can be performed using input patches from different magnifications. For this study, we were only interested in segmenting carcinoma from all other tissue types, so we used a binary classification where carcinoma regions were highlighted in red and all non-cancer types are combined and highlighted in yellow.

There are several limitations to our study. We limited our study to cases of IDC and DCIS and did not study invasive lobular carcinomas or other special histologic types of breast carcinoma in this initial study. Invasive lobular carcinomas are typically of low nuclear grade and infiltrate the stroma as single cells. Training of the model was achieved using a set of invasive and in situ ductal carcinomas of various histologic grades that did not include lobular carcinomas. Expansion of the DMMN algorithm to include lobular carcinomas, as well as others, would be necessary to implement this model in practice. This would be true even for cases of IDC as invasive lobular carcinoma may be identified in margin specimens as incidental findings, and thus, the model would need to identify these cases. In addition, our current model does not segment DCIS separately from invasive carcinoma but rather segments carcinoma (DCIS or IDC) from benign tissues. There are also some challenges associated with the evaluation of margin specimens using a machine learning model. Cautery artifact and biopsy site changes are common findings in margins and were sources of false positive classifications in our analysis. More importantly, however, is that carcinoma in margin specimens usually presents as small foci and it is necessary to have a model sensitive enough to identify minimal carcinoma in these specimens. As we have shown in our validation set, a perfect sensitivity rate is feasible by adjusting the model’s pixel threshold. For a model that would be clinically applicable, the ultimate goal must be the elimination of false negatives. Furthermore, future efforts using this DMMN algorithm to quantity carcinoma in the margin as well as measure the distance to the true inked margins will be undertaken.

In summary, we have shown that a machine learning model using patches from multiple magnifications is able to accurately differentiate carcinoma from benign tissue in WSIs from shave margin specimens of breast lumpectomy specimens with high performance. Using this approach as a screening tool could save significant time for pathologists. The performance of the model in terms of accuracy compared with a pathologist in a true clinical setting, and whether the model improves the efficiency of manual assessment of these specimens requires further study in a prospective setting.