Deep Learning Model for Cell Nuclei Segmentation and Lymphocyte Identiﬁcation in Whole Slide Histology Images

. Anti-cancer immunotherapy dramatically changes the clinical management of many types of tumours towards less harmful and more personalized treatment plans than conventional chemotherapy or radiation. Precise analysis of the spatial distribution of immune cells in the tu-mourous tissue is necessary to select patients that would best respond to the treatment. Here, we introduce a deep learning-based workﬂow for cell nuclei segmentation and subsequent immune cell identiﬁcation in routine diagnostic images. We applied our workﬂow on a set of hematoxylin and eosin (H&E) stained breast cancer and colorectal cancer tissue images to detect tumour-inﬁltrating lymphocytes. Firstly, to segment all nuclei in the tissue, we applied the multiple-image input layer architecture (Micro-Net, Dice coeﬃcient (DC) 0 . 79 ± 0 . 02 ). We supplemented the Micro-Net with an introduced texture block to increase segmentation accuracy (DC = 0 . 80 ± 0 . 02 ). We preserved the shallow architecture of the segmentation network with only 280 K trainable parameters (e.g. U-net with ∼ 1900 K parameters, DC = 0 . 78 ± 0 . 03 ). Subsequently, we added an active contour layer to the ground truth images to further increase the performance (DC = 0 . 81 ± 0 . 02 ). Secondly, to discriminate lymphocytes from the set of all segmented nuclei, we explored multilayer perceptron and achieved a 0.70 classiﬁcation f-score. Remarkably, the binary classiﬁcation of segmented nuclei was signiﬁcantly improved (f-score = 0.80) by colour normalization. To inspect model generalization, we have evaluated trained models on a public dataset that was not put to use during training. We conclude that the proposed workﬂow achieved promising results and, with little eﬀort, can be employed in multi-class nuclei segmentation and identiﬁcation tasks.


Introduction
A host-tumour immune conflict is a well-known process happening during the tumourigenesis. It is now clear that tumours aim to escape host immune responses by a variety of biological mechanisms (Beatty and Gladney, 2015;Zappasodi et al., 2018;Allard et al., 2018). Thus the importance of tumour-infiltrating lymphocytes (TILs) in pathology diagnosis, prognosis, and treatment increases. Quantification of the immune infiltrate along tumour margins in the tumour microenvironment has gathered researchers' attention as a reliable prognostic measure for various cancer types (Basavanhally et al., 2010;Galon et al., 2012;Huh et al., 2012;Rasmusson et al., 2020). With the emergence of whole slide imaging (WSI) and recent Federal Drug Administration's (FDA) approval for WSI usage in clinical practice, various techniques have been proposed to detect lymphocytes in digital pathology images focusing on the algorithms based on colour, texture, and shape feature extraction, morphological operations, region growing, and image classification.
Recent works. In general, prior studies were limited to lymphocyte detection and therefore relied on unsupervised approaches such as in Basavanhally et al. (2010), where lymphocytes were automatically detected by a combination of region growing and Markov random field algorithms. Before detection, applying tissue epithelium-stroma classification reduced the noise irrelevant for the lymphocyte nuclei detection by 18 texture features (Kuse et al., 2010).
As opposed to individual nuclei detection, models proposed in Turkki et al. (2016) and Saltz et al. (2018) have been trained to identify TIL-enriched areas rather than standalone lymphocytes. In a study by Saltz et al. (2018), authors have developed a convolutional neural network (CNN) classifier capable of identifying TIL-enriched areas in WSI slides from TCGA (The Cancer Genome Atlas) database. Similarly, in Turkki et al. (2016), lymphocyte-rich areas were identified by training an SVM classifier on a set of features extracted by the VGG-F neural network from CD45 IHC-guided superpixel-level annotations in digitized H&E specimen.
Such a high-level tissue segmentation approach has been widely used for cancer tissue segmentation tasks, such as stroma-epithelium tissue classification (Morkunas et al., 2018). However, lymphocyte infiltration quantification accuracy would benefit from a more granular level analysis using object segmentation models. Convolutional encoderdecoder based model architectures (convolutional autoencoders CAEs) have been established as an efficient method for medical imaging tasks. U-Net autoencoder model, proposed in Ronneberger et al. (2015), has become a golden standard model for medical areas ranging from cell nuclei segmentation to tissue analysis in computed tomography (CT) scans (Ma et al., 2019). The deep, semantic feature maps from the U-Net decoder are combined with shallow, low-level feature maps from the encoder part of the model via skip connections, thus maintaining the fine-grained features of the input image. This renders U-Net applicable in medical image segmentation, where precise detail recreation is of utmost importance. Specifically for lymphocyte detection, approaches utilizing fully convolutional neural networks on the digital H&E slides were published by Chen and Srinivas (2016) and Linder et al. (2019). Both approaches investigate convolutional autoencoders using histology sample patches with annotated lymphocyte nuclei. Detection and classification, but not segmentation of nuclei in H&E images, were done using spatially constrained CNN in Sirinukunwattana et al. (2016). Notably, the classification into four cell types (epithelial, inflammatory, fibroblast, and miscellaneous) was performed on patches centred on nuclei considering their local neighbourhood. A more recent adaptation -the Micro-Net model -incorporates an additional input image downsampling layer that circumvents the max-pooling process, thus maintaining the input features ignored by the max-pooling layer. This way, more detailed contextual information is passed into the output layer, enabling better segmentation of adjacent cell nuclei .
The Hover-Net model published in Graham et al. (2019) enables simultaneous cell nuclei segmentation and classification by three dedicated branches of the model -segmenting, separating, and classifying. Hover-Net was applied to two datasets and achieved 0.573 and 0.631 classification f-score. In Janowczyk and Madabhushi (2016), AlexNet was employed to identify centres of lymphocyte nuclei. The network was trained on cropped lymphocyte nuclei as a positive class, and the negative class was sampled from the most distant regions with respect to the annotated ground truth. The trained network produces the posterior class membership probabilities for every pixel in the test image; subsequently, potential centres of lymphocyte nuclei are identified by disk kernel convolution and thresholding. In Alom et al. (2019), the same dataset was utilized to evaluate different advanced neural networks for a variety of digital pathology tasks, including lymphocyte detection. The authors proposed a densely connected recurrent convolution network (DCRCNN) to directly regress the density surface with peaks corresponding to lymphocyte centres. When compared to AlexNet, the DCRCNN improves the f-score by 1%, yet it is worth mentioning that both (Janowczyk and Madabhushi, 2016;Alom et al., 2019) do not demonstrate method generalization -in the respective studies, the same dataset was used for training and testing.
Our study focuses on the customization of cell segmentation autoencoder architecture and aims to investigate a two-step cell segmentation and subsequent lymphocyte classification workflow using digital histology images of H&E stained tumour tissues. Robust separation of clumped cell nuclei is a common challenge in whole slide image analysis (Guo et al., 2018). To tackle this nuclei segmentation challenge, our cell nuclei segmentation model renders an additional active contour layer, which increases the segmentation efficiency of adjacent cell nuclei. Apart from overlapping nuclei, image magnification is another critical factor for nuclei segmentation models. Publicly available annotated nuclei datasets contain histological samples scanned at 40× magnification, preserving texture features and facilitating precise feature extraction. In pathology practice, however, samples scanned at 20× magnification are more common. Image analysis at a lower resolution is faster and less memory-exhaustive, yet the precise cell nuclei segmentation becomes a more difficult task. As reported by Cui et al. (2019), the active contour layer improves adjacent nuclei separation -this has been observed in our experiments as well. We report that multiple re-injection of downsampled images to the model -approach initially proposed in Raza et al. (2019) in the Micro-Net model -has significantly boosted nuclei segmentation performance compared to the baseline U-Net model Ronneberger et al., 2015). We further observe that our customized model architecture component -two parallel blocks of convolutional layers, referred to as a texture block -increases segmentation quality compared to the original Micro-Net model and reduces model complexity to less than 280 000 parameters. For the lymphocyte classification task, we utilized traditional machine learning approaches -Random Forest classifier, Multilayer perceptron, and a CNN. We have performed minimal hyperparameter tuning of classification models in a grid search procedure. We have used a private dataset to train our models, and a public dataset for final workflow evaluation, thus demonstrating the generalization of proposed models.
The paper is organized as follows. In Section 2.1, we describe the datasets used in the study. In Section 2.2, we introduce the segmentation method based on autoencoder neural network architecture, followed by the classification of segmented nuclei. In Section 3, we present experimental results comparing different cell nuclei segmentation as well as lymphocyte discrimination approaches. In particular, Section 3.3 covers the evaluation of our method on the publicly available annotated data set of breast cancer H&E images. We formulate conclusions in Section 4.

The Datasets
Images. In our study, we used 4 whole-slide histology sample images prepared with H&E staining (2 WSI slides from breast cancer patients and 2 WSIs from colorectal cancer). These slides were produced in the National Center of Pathology, Lithuania (NCP), and digitized with the Aperio ScanScope XT Slide Scanner at 20× magnification.
1 WSI slide was obtained from The Cancer Genome Atlas database, tile ID: TCGA_AN_A0AM (Grossman et al., 2016), and used for both segmentation and classification testing.
Two additional public datasets were used for classification testing purposes. The CRCHistoPhenotypes dataset (CRCHP) contains colorectal adenocarcinoma cell nuclei. 1143 nuclei are annotated as inflammatory (used for lymphocyte category in our experiments), and 1040 annotated as epithelial (used for other cell type category) (Sirinukunwattana et al., 2016). The breast cancer dataset (JAN) published by Janowczyk and Madabhushi (2016) consists of 100 images (100 × 100 pixel-sized) with lymphocytes annotated. Samples were digitized using 20× magnification and stained with hematoxylin and eosin. An expert pathologist annotated lymphocytes by marking lymphocyte nuclei centres. In contrast to the CRCHP dataset, this image corpus is more suitable for our tasks since the data was prepared specifically for lymphocyte identification. The CRCHP dataset entails broader cell type categories, where lymphocytes are annotated under the inflammatory label and other immune cells such as mast cells and macrophages.
Segmentation dataset. To train and validate the segmentation model, we randomly selected 344 tiles of 256 × 256 pixel size. Dataset was split into training and validation sets, respectively. To test the segmentation model, we prepared 96 tiles from the breast cancer TCGA slide. Both tiles generated from the TCGA slide and tiles generated from NCP slides were manually annotated by EB and MM. In the annotation process, each cell nucleus present in an image patch was manually outlined, and 2 pixel-wide active contour borders surrounding each nucleus were added as a second layer to the nuclei segmentation masks. Each outlined nucleus was assigned a class label (a lymphocyte or other). To the training set, we applied various image augmentation methods (rotation, flip, transpose, RGB augmentation, brightness adjustment, CLAHE, Zuiderveld, 1994) to obtain the final training set of 5206 images. The segmentation dataset is summarized in Table 1, and the techniques used to augment training patches are summarized in Table 2.
Classification dataset. To train and validate the classification models, we generated a dataset from the same image patches used to train the segmentation model. Particularly, manually generated segmentation masks were used to crop-out all types of cell nuclei from raw images. Each extracted nucleus was centred in a blank 32×32 pixel-sized patch. Each nucleus-containing patch inherited a class label (assigned manually to the ground truth in Table 2 Image augmentation techniques and parameters used for training dataset expansion.

The Proposed Method
The overall schema of the proposed workflow is summarized in Fig. 1.

Modified Micro-Net Model
The autoencoder architecture for nuclei segmentation is shown in Fig. 2. The model consists of 3 encoder and 3 decoder blocks consisting of 2 convolution layers (3 × 3 convolutional filters with stride 2), dropout (dropout rate 0.2), and max-pooling layers. Our model adopts multiple downsized image input layers after each max-pooling operation, which were originally proposed in the Micro-Net model by Raza et al. We propose additional model enhancement by introducing a texture block after each image input layer. The texture block consists of 2 parallel blocks of 3 convolution layers, which enhance image texture extraction. To ensure robust nuclei separation, we supplement our nuclei annotations with an additional active contour layer. Our experiments indicate that the proposed model architecture is more compact and requires less computational resources than the original Micro-Net structure. We used elu activation after each convolution layer and sigmoid activation for the output layer. Adam optimizer was used with initial learning rate lr = 0.001, which was reduced by factor 0.1 if validation loss did not improve for 4 consecutive epochs (min lr = 1×10 −6 ) (Kingma and Ba, 2014). Dice coefficient (1) was used to quantify model metrics with binary crossentropy dice loss (3) as custom loss function.
Model converged after 36 epochs (see Fig. 3A) using batch size of 1 (input image dimensions: 256 × 256 × 3) for training and validation. Input images were normalized by scaling pixel values to the range [0, 1].
where TP is true positive, FP is false positive and FN is false negative. where y is binary class indicator andŷ is predicted probability.

Multilayer Perceptron
The multilayer perceptron model was employed to solve the binary classification problem of lymphocyte identification. Our experiment's model consists of three dense layers (number of nodes: 4096, 2048, 1024), with softmax as the output layer activation function. For each layer, we used relu activation, followed by batch normalization. The dropout layer (dropout rate 0.4) was used in the middle layer instead of batch normalization to avoid model overfitting. We used Adam optimizer with initial learning rate lr = 0.001, which was reduced by factor 0.1 if validation loss did not improve for 6 consecutive epochs (min lr = 1 × 10 −6 ). Accuracy was used as metrics with binary cross-entropy as loss function (2). The model was trained until convergence using 64 and 32 batch sizes for training and validation, respectively.

Implementation
Neural network models for nuclei segmentation and cell-type classification were trained on GeForce GTX 1050 GPU, 16 Gb RAM using Tensorflow, and Keras machine learning libraries (Abadi et al., 2016). Proposed neural model architectures are available in the GitHub repository. 1

Hyperparameter Tuning
The optimal model architecture was experimentally evaluated using a hyperparameter grid search. To test segmentation robustness, we evaluated both pixel-level and object-level metrics. The dice coefficient was used to track pixel-level segmentation performance, while object-level segmentation quality was evaluated by calculating intersection over union (IoU). We treated the predicted nuclei as true positive if at least 50% of the pre- dicted nuclei area overlapped with the ground truth nuclei mask. In order to prevent multiple predicted objects mapping to the same ground truth nucleus, ground truth nucleus mask could only be mapped to a single predicted object. Results of hyperparameter tuning are provided in Table 3. Hyperparameter space was investigated by changing dropout rates, convolution filters per network layer, and activation functions. Due to multiple image down-sampling and concatenation operations in CNN architecture, models with parameter size higher than 500 000 have exceeded memory limitations. Our experiments indicate that expansion of model layer width (tested kernel sizes 16, 32, 48) did not dramatically affect the model prediction metrics -which suggests that texture block component may ensure consistent feature extraction in a wide range of model width.

Model Performance Speed
Instead of basing our optimal model selection rationale solely on the Dice coefficient and object-level testing metrics, we evaluated the gridsearch models based on its loading and image prediction time relative to the original Micro-Net model. Since no significant changes were observed between dropout rates, we chose a custom model of a 0.2 dropout rate, elu activation function, and sigmoid activation function with differing layer widths of 16, 32, and 48 kernels. The testing results provided in Table 4 indicate that the lowest relative image prediction and model loading time was observed for segmentation autoencoder consisting of 32 convolutional kernels per layer, 0.2 dropout rate using elu activation function and sigmoid activation function for output layer with total parameter size lower than 280.000. In comparison to U-Net autoencoder (>1.9 M parameters), which has reached 0.78±0.028 Dice coefficient for testing dataset, our selected model achieved 0.81±0.018 Dice coefficient with over 6-fold lower model complexity.

Active Contour Layer
To evaluate the impact of the active contour layer on nuclei separation, we trained convolutional autoencoder using single-layered nuclei masks and compared the results with an identical model trained on two-layered annotations. During this experiment, we used the best-scoring model architecture from the hyperparameter search experiment. Nuclei segmentation using masks supplemented with the active contour layer has outperformed the model with single-layered masks both on pixel-level and object-level measurements, as shown in Table 5. Active-contour increased object segmentation accuracy and f-score by 1 percent (0.75 ± 0.062 and 0.85 ± 0.04, respectively).

Hyper Parameter Tuning and Model Comparison
The cell classification problem was approached with several different statistical models. Random Forest was chosen as a baseline machine learning algorithm. We used Python implementation of a random forest classifier from the sklearn machine learning library (Feurer et al., 2015) (using the Gini impurity criterion as split quality measurement and 10 estimators). Random forest classifier was trained on linearized nuclei images (32 × 32 RGB-coloured images linearized to 3072-length vector), which achieved 0.77 testing accuracy. In addition, we investigated two deep-learning-based strategies for cell nuclei classification: multilayer perceptron (MLP) consisting of three consecutive dense layers, and convolutional neural network (CNN) consisting of 4 convolutional, 2 max-pooling, and 2 dense layers. Model performance metrics were evaluated for several hyperparameter combinations, including a number of nodes per layer, activation functions, and a number of convolutional kernels. Hyperparameter search is summarized in Table 6. During our Table 6 The hyperparameter grid search results for cell nuclei classifier (mean ± standard deviation). The model performance was evaluated on the testing set. Mean and standard deviation values were obtained by running each experiment 5 times. experimentations, a multilayer perceptron with three dense layers, softmax for output and relu layer activation functions, 2 batch-normalization layers, and a dropout layer achieved the highest testing accuracy score of 0.78 with 0.82, 0.71, and 0.99 f-score, precision and recall values, respectively. The confusion matrix for our cell classification model demonstrates that out of 2046 labelled lymphocytes, 310 were falsely misclassified as other cell types, while 13 falsepositive observations were registered out of 2235 nuclei labelled as other cell types as shown in Fig. 3B. Receiver-operating curve (ROC) shown in Fig. 3C indicates the low false-positive rate of our lymphocyte classifier.
Of note, the proposed two-step lymphocyte detection model can potentially be adapted to detect more cell types by replacing existing lymphocyte classifier with a model trained on several classes.

Workflow Evaluation
The proposed lymphocyte identification workflow has been tested on the lymphocyte dataset published by Janowczyk and Madabhushi (2016). 2 The dataset is composed of 100 breast cancer images stained with hematoxylin and eosin and digitized using 20× magnification. The lymphocyte centres were manually annotated by an experienced pathologist. The same dataset was used in Alom et al. (2019). Since our nuclei segmentation model was trained on 256 × 256 pixel image patches, each testing image was zero-padded to the desired input size while preserving the original image scale. Each testing slide was first analysed with autoencoder to segment all cell nuclei, followed by nuclei cropping and subsequent classification of each cropped nucleus using a pre-trained multilayer perceptron for lymphocyte identification. If the nucleus was classified as a lymphocyte, the cell centre was marked with a green dot. The classifier's testing results were evaluated using dataset annotations as a reference.
The first analysis results -nuclei segmentation -are shown in the second column of Fig. 4. Nuclei segmentation masks generated by autoencoder demonstrate consistent 2 Link to the dataset: http://www.andrewjanowczyk.com/use-case-4-lymphocyte-detection/. Fig. 4. Exemplary 5 testing images from breast cancer lymphocyte dataset (Janowczyk and Madabhushi, 2016) with corresponding lymphocyte identification model outputs. From left to right: 1st column-original testing image from the lymphocyte dataset. 2nd column: nuclei segmentation masks predicted by autoencoder. 3rd column: Expert pathologist's annotation supplied in the dataset. 4th column: lymphocyte classifier result (if the nucleus was predicted as a lymphocyte, its centre was labelled with a green dot). 5th column: lymphocyte classifier result after Reinhard stain normalization. cell nuclei detection efficiency regardless of image staining intensity. This can be explained by two factors. Due to robust image colour augmentation during autoencoder training, the CAE model learned to generalize the input image by texture, rather than colour. Secondly, our modified Micro-Net model architecture incorporates texture convolutional blocks shown in Fig. 2, which facilitate relevant feature extraction for the autoencoder.
The confusion matrix in Fig. 5A shows a low false-positive lymphocyte misclassification rate. However, the high false-negative rate suggests that the lymphocyte classification model is sensitive to image stain intensity. This is well reflected in Fig. 4 Unmodified image column, where lymphocyte detection efficiency conspicuously decreases as image staining intensity fades. This is not a surprising result, given that a multilayer perceptron was trained on lymphocytes cropped from histology samples prepared in a different laboratory, where image staining is more consistent across different histology samples. This result illustrates the main limitations of the lymphocyte classification model: cropped nuclei images lose image background information, which otherwise could be leveraged in differentiating nucleus stain intensity versus its background colour intensity.

The Effect of Colour Normalization on Overall Model Performance
To address high staining variability between different histological samples, the lymphocyte testing dataset was normalized using the Reinhard stain normalization method. Reinhard algorithm adjusts the source image's colour distribution to the colour distribution of the target image by equalizing the mean and standard deviation pixel values in each channel (Reinhard et al., 2001).
where l, α, β are colour channels in LAB colourspace,ˆmeans standard deviation,¯stands for mean value of all pixel values from channel. Colour normalization algorithm was implemented using openCV (Bradski, 2000) and Numpy (Oliphant, 2006) python libraries using representatively stained image from training dataset as target for stain normalization. Stain normalization effect on cell lymphocyte detection was evaluated by comparing testing metrics before stain normalization and after Reinhard algorithm implementation. The confusion matrix in Fig. 5B indicates a lower false-negative rate for lymphocytes. Stain normalization has increased accuracy, precision, recall, and f-score values by approximately 10%, as shown in Table 7. These results indicate that the stain normalization step is an effective pre-processing part which can mitigate high staining intensity variance between histology samples. Observed improvement of lymphocyte classification accuracy by applied relatively simple Reinhard stain normalization suggests this part of our workflow can be further explored. Structure-preserving image normalization methods (Vahadane et al., 2016;Mahapatra et al., 2020) demonstrate promising results; also, certain medical image denoising techniques (Meiniel et al., 2018;Pham et al., 2020) could appear useful in future work. Both Janowczyk and Madabhushi (2016) and Alom et al. (2019) used the same dataset to train and evaluate their proposed models; therefore, to deal with overfitting, authors had to apply some sort of cross-validation. 5-fold cross-validation was used in Janowczyk and Madabhushi (2016), and Alom et al. (2019) reserved 10% of the dataset for testing purposes. In contrast, we used the whole dataset exclusively for the proposed model evaluation, thus completely eliminating the possibility of overfitting. Our result (f-score = 0.80) indicates good model generalization and comparable performance to both the above-mentioned methods.

Conclusions
In this paper, we propose an end-to-end deep learning-based algorithm for cell nuclei segmentation and consecutive lymphocyte identification in H&E stained 20× magnified breast and colorectal cancer whole slide images. Our conducted experiments suggest that: • Our proposed autoencoder structure component -convolutional texture blocks -can achieve Dice nuclei segmentation score similar to that of the Micro-Net model (our model achieved 1% higher testing Dice coefficient). • Additional active contour layer in nuclei annotation masks increases nuclei segmentation accuracy by 1.5%. • Lymphocyte classification by multilayer perceptron network achieves 78 ± 0.3% testing accuracy on the private dataset (NCP), and 0.71 on the public dataset (0.81 with Reinhard stain normalization).
Nuclei segmentation autoencoder architecture investigated in this paper has lower model complexity compared to U-Net and Micro-Net models, which brings the advantage of lower computational resource usage. Our suggested pipeline shows good generalization properties, eliminates overfitting, and can be easily extended for multi-class nuclei identification by replacing the nuclei classification MLP model and re-employing the same pre-trained segmentation autoencoder.