Anti-cancer immunotherapy dramatically changes the clinical management of many types of tumours towards less harmful and more personalized treatment plans than conventional chemotherapy or radiation. Precise analysis of the spatial distribution of immune cells in the tumourous tissue is necessary to select patients that would best respond to the treatment. Here, we introduce a deep learning-based workflow for cell nuclei segmentation and subsequent immune cell identification in routine diagnostic images. We applied our workflow on a set of hematoxylin and eosin (H&E) stained breast cancer and colorectal cancer tissue images to detect tumour-infiltrating lymphocytes. Firstly, to segment all nuclei in the tissue, we applied the multiple-image input layer architecture (Micro-Net, Dice coefficient (DC)
A host-tumour immune conflict is a well-known process happening during the tumourigenesis. It is now clear that tumours aim to escape host immune responses by a variety of biological mechanisms (Beatty and Gladney,
As opposed to individual nuclei detection, models proposed in Turkki
Such a high-level tissue segmentation approach has been widely used for cancer tissue segmentation tasks, such as stroma-epithelium tissue classification (Morkunas
The Hover-Net model published in Graham
Our study focuses on the customization of cell segmentation autoencoder architecture and aims to investigate a two-step cell segmentation and subsequent lymphocyte classification workflow using digital histology images of H&E stained tumour tissues. Robust separation of clumped cell nuclei is a common challenge in whole slide image analysis (Guo
The paper is organized as follows. In Section
1 WSI slide was obtained from The Cancer Genome Atlas database, tile ID: TCGA_AN_A0AM (Grossman
Two additional public datasets were used for classification testing purposes. The CRCHistoPhenotypes dataset (CRCHP) contains colorectal adenocarcinoma cell nuclei. 1143 nuclei are annotated as inflammatory (used for lymphocyte category in our experiments), and 1040 annotated as epithelial (used for other cell type category) (Sirinukunwattana
Two datasets were used for segmentation and classification tasks. Segmentation experiments were performed on
Segmentation set | Tumour type | Raw set | Final augmented set | Origin |
BC | 192 | 3648 | NCP | |
CRC | 82 | 1558 | NCP | |
Training | total | 274 | 5206 | NCP |
BC | 54 | 54 | NCP | |
CRC | 16 | 16 | NCP | |
Validation | total | 70 | 70 | NCP |
BC | 96 | 96 | TCGA | |
Testing | total | 96 | 96 | TCGA |
Classification set | Nucleus type | Raw set | Final augmented set | Origin |
lymphocyte nuclei | 11032 | 50950 | NCP | |
other nuclei | 10922 | 55825 | NCP | |
Training | total nuclei | 21954 | 106775 | NCP |
lymphocyte nuclei | 2588 | 2588 | NCP | |
other nuclei | 2751 | 2751 | NCP | |
Validation | total nuclei | 5339 | 5339 | NCP |
BC lymphocytes | 903 | 903 | TCGA | |
CRC lymphocytes | 1143 | 1143 | CRCHP | |
total lymphocytes | 2046 | 2046 | ||
BC other | 1195 | 1195 | TCGA | |
CRC other | 1040 | 1040 | CRCHP | |
total other | 2235 | 2235 | ||
Testing I | total nuclei | 4281 | 4281 | |
BC lymphocytes | 2949 | 2949 | JAN | |
BC other | 1921 | 1921 | JAN | |
Testing II | total nuclei | 4870 | 4870 | JAN |
Image augmentation techniques and parameters used for training dataset expansion.
Augmentation | Parameters |
Transposition, rotation axis flipping | Perpendicular rotation angles |
CLAHE (Zuiderveld, |
Cliplimit = 2.0, tilegridsize = (8, 8) |
Brightness adjustment | HSV colourspace, hue layer increased by 30 |
RGB augmentation | Random pixel value adjustments up to 0.1 |
RGB2HED colour adjustments (Ruifrok and Johnston, |
Colour values adjusted within range |
The segmentation dataset is summarized in Table
Overall schema of the proposed workflow. On top, a training phase for both segmentation and classification models is shown. The segmentation network is trained on original image patches and manually annotated ground truth images. The classification model is trained on cropped nuclei to discriminate lymphocytes (in the red box) from other nuclei. In the middle, a testing phase is shown. The trained segmentation model accepts new images and produces segmentation masks (for clarity, the active contour layer in the resulting segmentation mask is not shown). Resulting segmentation masks are used to crop out detected cell nuclei that are fed into the classifier model and sorted into lymphocytes and non-lymphocyte nuclei. In the bottom panel, on the left, we have representative segmentation results (lymphocyte nuclei are coloured in red for clarity), and on the right, we have an original image with detected nuclei contours outlined and detected lymphocyte nuclei depicted with red dots. Green dots indicate lymphocyte ground truth.
The overall schema of the proposed workflow is summarized in Fig.
The architecture of the proposed deep learning model.
The autoencoder architecture for nuclei segmentation is shown in Fig.
We used elu activation after each convolution layer and sigmoid activation for the output layer. Adam optimizer was used with initial learning rate
The performance metrics of segmentation and classifier models. A: training and validation metrics (top-Dice coefficient, below-loss values per epoch) of segmentation autoencoder, B: confusion matrix depicting cell nuclei classifier performance on the testing set (true positive lymphocyte predictions and true negatives marked in grey, false predictions – in red), C: ROC curve obtained from nuclei classifier testing data.
Model converged after 36 epochs (see Fig.
The multilayer perceptron model was employed to solve the binary classification problem of lymphocyte identification. Our experiment’s model consists of three dense layers (number of nodes: 4096, 2048, 1024), with softmax as the output layer activation function. For each layer, we used relu activation, followed by batch normalization. The dropout layer (dropout rate 0.4) was used in the middle layer instead of batch normalization to avoid model overfitting. We used Adam optimizer with initial learning rate
Neural network models for nuclei segmentation and cell-type classification were trained on GeForce GTX 1050 GPU, 16 Gb RAM using Tensorflow, and Keras machine learning libraries (Abadi Link to GitHub repository of the project:
The optimal model architecture was experimentally evaluated using a hyperparameter grid search. To test segmentation robustness, we evaluated both pixel-level and object-level metrics. The dice coefficient was used to track pixel-level segmentation performance, while object-level segmentation quality was evaluated by calculating intersection over union (IoU). We treated the predicted nuclei as true positive if at least 50% of the predicted nuclei area overlapped with the ground truth nuclei mask. In order to prevent multiple predicted objects mapping to the same ground truth nucleus, ground truth nucleus mask could only be mapped to a single predicted object. Results of hyperparameter tuning are provided in Table
Performance metrics of convolutional autoencoders (CAE) used for the hyperparameter grid search for nuclei segmentation. Dice coefficients (mean Dice coefficient ± standard deviation). Mean and standard deviation values were calculated from stand-alone dice coefficients for each tile from the testing set. DO – drop out rate, BN – batch normalization.
Act func | Output act func | Kernel size | DO | BN | Dice coefficient | Accuracy | Precision | Recall | f-score |
U-Net | |||||||||
64 | 0.2 | − | |||||||
Micro-Net model | |||||||||
tanh | 64 | − | − | ||||||
Our model | |||||||||
16 | 0.2 | − | |||||||
32 | 0.2 | − | |||||||
48 | 0.2 | − | |||||||
16 | 0.3 | − | |||||||
32 | 0.3 | − | |||||||
48 | 0.3 | − | |||||||
32 | − | + | |||||||
32 | − | + | |||||||
32 | − | + | |||||||
32 | − | + |
Instead of basing our optimal model selection rationale solely on the Dice coefficient and object-level testing metrics, we evaluated the gridsearch models based on its loading and image prediction time relative to the original Micro-Net model. Since no significant changes were observed between dropout rates, we chose a custom model of a 0.2 dropout rate, elu activation function, and sigmoid activation function with differing layer widths of 16, 32, and 48 kernels. The testing results provided in Table
A comparison table of autoencoder parameter size and performance speed. Model loading and prediction times were obtained relative to the original Micro-Net model. The best performing model is highlighted in bold.
Model | Parameters | Relative loading time | Relative prediction time |
Micro-Net | 73 467 842 | 1 | 1 |
Custom-16 | 131 746 | 0.212 | 0.314 |
Custom-48 | 507 138 | 0.268 | 0.359 |
To evaluate the impact of the active contour layer on nuclei separation, we trained convolutional autoencoder using single-layered nuclei masks and compared the results with an identical model trained on two-layered annotations. During this experiment, we used the best-scoring model architecture from the hyperparameter search experiment. Nuclei segmentation using masks supplemented with the active contour layer has outperformed the model with single-layered masks both on pixel-level and object-level measurements, as shown in Table
The active contour layer effect on nuclei segmentation autoencoder performance. Pixel-level Dice coefficients (mean Dice coefficient ± standard deviation) were obtained from a testing set consisting of 96
Mask layers | Dice coefficient | Accuracy | Precision | Recall | f-score |
2-layered | |||||
1-layered |
The cell classification problem was approached with several different statistical models. Random Forest was chosen as a baseline machine learning algorithm. We used Python implementation of a random forest classifier from the sklearn machine learning library (Feurer
The hyperparameter grid search results for cell nuclei classifier (mean ± standard deviation). The model performance was evaluated on the testing set. Mean and standard deviation values were obtained by running each experiment 5 times.
Models | Accuracy | Precision | Recall | f-score |
Random forest | ||||
Multilayer perceptron | ||||
Convolutional neural network | ||||
Kernels per layer: 16 | ||||
Kernels per layer: 32 |
The confusion matrix for our cell classification model demonstrates that out of 2046 labelled lymphocytes, 310 were falsely misclassified as other cell types, while 13 false-positive observations were registered out of 2235 nuclei labelled as other cell types as shown in Fig.
Of note, the proposed two-step lymphocyte detection model can potentially be adapted to detect more cell types by replacing existing lymphocyte classifier with a model trained on several classes.
Exemplary 5 testing images from breast cancer lymphocyte dataset (Janowczyk and Madabhushi,
The proposed lymphocyte identification workflow has been tested on the lymphocyte dataset published by Janowczyk and Madabhushi ( Link to the dataset:
The first analysis results – nuclei segmentation – are shown in the second column of Fig.
The confusion matrix in Fig.
To address high staining variability between different histological samples, the lymphocyte testing dataset was normalized using the Reinhard stain normalization method. Reinhard algorithm adjusts the source image’s colour distribution to the colour distribution of the target image by equalizing the mean and standard deviation pixel values in each channel (Reinhard
Testing metrics for breast cancer lymphocyte dataset. A: confusion matrix for testing images with original sample staining; B: confusion matrix for testing images with Reinhard stain normalization applied on image stain.
Stain normalization effect on cell lymphocyte detection was evaluated by comparing testing metrics before stain normalization and after Reinhard algorithm implementation. The confusion matrix in Fig.
A comparison table depicting the effect of stain normalization on lymphocyte identification efficiency is presented. For comparison, we give here the results of the studies that utilized the same dataset. It is important to note that we only used this dataset to test our method, while studies referenced in the table used part of this dataset for training as well.
Accuracy | Precision | Recall | f-score | |
Proposed method, original staining | 0.71 | 0.76 | 0.75 | 0.70 |
Proposed method, wt stain normalization | 0.81 | 0.80 | 0.81 | 0.80 |
Janowczyk and Madabhushi ( |
– | 0.89 | – | 0.90 |
Alom |
0.90 | – | – | 0.91 |
Both Janowczyk and Madabhushi (
In this paper, we propose an end-to-end deep learning-based algorithm for cell nuclei segmentation and consecutive lymphocyte identification in H&E stained
Our proposed autoencoder structure component – convolutional texture blocks – can achieve Dice nuclei segmentation score similar to that of the Micro-Net model (our model achieved 1% higher testing Dice coefficient).
Additional active contour layer in nuclei annotation masks increases nuclei segmentation accuracy by 1.5%.
Lymphocyte classification by multilayer perceptron network achieves
Nuclei segmentation autoencoder architecture investigated in this paper has lower model complexity compared to U-Net and Micro-Net models, which brings the advantage of lower computational resource usage. Our suggested pipeline shows good generalization properties, eliminates overfitting, and can be easily extended for multi-class nuclei identification by replacing the nuclei classification MLP model and re-employing the same pre-trained segmentation autoencoder.
The authors are thankful for the HPC resources provided by the IT APC at the Faculty of Mathematics and Informatics of Vilnius University Information Technology Research Centre.