1 Introduction
-
(1) In this paper, we propose a novel brain tumour segmentation method called MAU-Net, which effectively captures spatial contextual information by employing convolutional kernels of varying scales with attention mechanism, thereby improving small size tumour location details and tumour segmentation accuracy.
-
(2) MAU-Net introduces Mixed Depth-wise Convolutions in the encoder and decoder to extract multi-scale brain tumour features, and leverage Context Pyramid Modules combining multi-scale with attention embedded in the skip connection position to combine local features and global features. Besides, it also adopts Self-ensemble in the decoding process to further improve segmentation performance.
-
(3) We comprehensively evaluated MAU-Net on the publicly available BraTS 2019, BraTS2020 and BraTS 2021 brain tumour image datasets. Ablation experiments, visualization results alongside comparisons with representative methods effectively demonstrate the effectiveness of MAU-Net for MRI brain tumour segmentation.
2 Related Work
2.1 Brain Tumour Segmentation
2.2 Attention Mechanism
3 Proposed Model: MAU-Net
3.1 Overall Structure
Table 1
Output size | Encoder | Decoder |
$128\times 128\times 128$ | $Conv3d,3\times 3\times 3,16$ | $Conv3d,3\times 3\times 3,64$ |
$\left[\begin{array}{c}DWConv3d,3\times 3\times 3,16/G\\ {} DWConv3d,5\times 5\times 5,16/G\\ {} DWConv3d,k\times k\times k,16/G\end{array}\right]$ | Trilinear Interpolation | |
$Conv3d,3\times 3\times 3,32$ | ||
$\left[\begin{array}{c}DWConv3d,3\times 3\times 3,32/G\\ {} DWConv3d,5\times 5\times 5,32/G\\ {} DWConv3d,k\times k\times k,32/G\end{array}\right]$ | ||
Conv3d, $1\times 1\times 1$, 32 | ||
$64\times 64\times 64$ | Max Pooling | $Conv3d,3\times 3\times 3,32$ |
Conv3d, $3\times 3\times 3$, 32 | Trilinear Interpolation | |
$\left[\begin{array}{c}DWConv3d,3\times 3\times 3,32/G\\ {} DWConv3d,5\times 5\times 5,32/G\\ {} DWConv3d,k\times k\times k,32/G\end{array}\right]$ | $Conv3d,3\times 3\times 3,64$ | |
$\left[\begin{array}{c}DWConv3d,3\times 3\times 3,64/G\\ {} DWConv3d,5\times 5\times 5,64/G\\ {} DWConv3d,k\times k\times k,64/G\end{array}\right]$ | ||
Conv3d, $1\times 1\times 1$, 64 | ||
$32\times 32\times 32$ | Max Pooling | $Conv3d,3\times 3\times 3,256$ |
Conv3d, $3\times 3\times 3$, 64 | Trilinear Interpolation | |
$\left[\begin{array}{c}DWConv3d,3\times 3\times 3,64/G\\ {} DWConv3d,5\times 5\times 5,64/G\\ {} DWConv3d,k\times k\times k,64/G\end{array}\right]$ | $Conv3d,3\times 3\times 3,128$ | |
$\left[\begin{array}{c}DWConv3d,3\times 3\times 3,128/G\\ {} DWConv3d,5\times 5\times 5,128/G\\ {} DWConv3d,k\times k\times k,128/G\end{array}\right]$ | ||
Conv3d, $1\times 1\times 1$, 128 | ||
$16\times 16\times 16$ | Max Pooling | Trilinear Interpolation |
Conv3d, $3\times 3\times 3$, 128 | $Conv3d,3\times 3\times 3,256$ | |
$\left[\begin{array}{c}DWConv3d,3\times 3\times 3,128/G\\ {} DWConv3d,5\times 5\times 5,128/G\\ {} DWConv3d,k\times k\times k,128/G\end{array}\right]$ | $\left[\begin{array}{c}DWConv3d,3\times 3\times 3,256/G\\ {} DWConv3d,5\times 5\times 5,256/G\\ {} DWConv3d,k\times k\times k,256/G\end{array}\right]$ | |
Context Pyramid Module (CPM) | ||
$8\times 8\times 8$ | Max Pooling | |
$Conv3d,3\times 3\times 3,256$ | ||
$\left[\begin{array}{c}DWConv3d,3\times 3\times 3,256/G\\ {} DWConv3d,5\times 5\times 5,256/G\\ {} DWConv3d,k\times k\times k,256/G\end{array}\right]$ |
3.2 Mixed Depth-Wise Convolution (MDConv)
3.3 Context Pyramid Module (CPM)
(1)
\[ {A_{p}}=\beta \times \delta \big(W\big[{P_{1}}{G^{\prime }_{1}},{P_{2}}{G^{\prime }_{2}},\dots ,{P_{({s^{2}})}}{G^{\prime }_{({s^{2}})}}\big]\big)+X.\]3.4 Self-Ensemble
4 Experiments
4.1 Experimental Environment
4.2 Datasets and Data Processing
4.2.1 Datasets
4.2.2 Data Preprocessing
4.2.3 Data Postprocessing
4.3 Evaluation Metrics
4.4 Loss Function
(5)
\[\begin{aligned}{}& {L_{CE}}=-\frac{1}{N}{\sum \limits_{i=1}^{N}}{\sum \limits_{j=1}^{L}}{g_{ij}}\log ({p_{ij}}),\end{aligned}\](6)
\[\begin{aligned}{}& {L_{\textit{Dice}}}=1-\frac{2\big({\textstyle\textstyle\sum _{i=1}^{N}}{\textstyle\textstyle\sum _{j=1}^{L}}{g_{ij}}{p_{ij}}\big)+\xi }{{\textstyle\textstyle\sum _{i=1}^{N}}{\textstyle\textstyle\sum _{j=1}^{L}}{g_{ij}}+{\textstyle\textstyle\sum _{i=1}^{N}}{\textstyle\textstyle\sum _{j=1}^{L}}{p_{ij}}+\xi }.\end{aligned}\]5 Results
5.1 Ablation Experiment
Table 2
Dataset | Methods | DSC (%)↑ | Hausdorff95 (mm)↓ | ||||
ET | WT | TC | ET | WT | TC | ||
BraTS2019 | U-Net (baseline) | 77.3 | 90.0 | 82.3 | 6.48 | 5.34 | 8.54 |
U-Net+Mix | 78.1* | 90.3 | 83.1* | 4.39* | 4.32* | 7.35 | |
U-Net+CPM | 78.3* | 90.5 | 83.5* | 4.51* | 4.51 | 6.78* | |
U-Net+Mix+CPM | 78.9* | 90.7* | 84.0* | 3.78* | 4.01* | 6.03* | |
MAU-Net | 79.5* | 90.8* | 84.3* | 3.45* | 3.79* | 5.43* | |
BraTS2020 | U-Net (baseline) | 79.4 | 90.5 | 82.6 | 26.60 | 5.14 | 8.73 |
U-Net+Mix | 79.6 | 91.1* | 84.0* | 25.48* | 4.97 | 10.47 | |
U-Net+CPM | 79.5 | 91.5* | 83.2* | 26.61 | 4.64 | 6.84* | |
U-Net+Mix+CPM | 80.4* | 91.1* | 84.4* | 28.01 | 4.48* | 8.48 | |
MAU-Net | 80.7* | 91.3* | 85.1* | 26.31 | 4.35* | 6.16* |
Table 3
Dataset | Methods | DSC (%)↑ | Hausdorff95 (mm)↓ | ||||
ET | WT | TC | ET | WT | TC | ||
BraTS2019 | U-Net (baseline) | 76.5 | 90.6 | 80.3 | 3.94 | 4.68 | 6.82 |
U-Net+Mix | 76.8 | 90.8 | 81.8* | 2.81* | 4.40 | 5.95 * | |
U-Net+CPM | 76.7 | 90.5 | 81.8* | 3.97 | 4.56 | 6.55 | |
U-Net+Mix+CPM | 77.2* | 90.6 | 82.1* | 3.14* | 4.30* | 6.14 | |
MAU-Net | 77.9* | 90.6 | 82.7* | 3.05* | 4.07* | 5.40* | |
BraTS2020 | U-Net(baseline) | 76.9 | 89.3 | 79.9 | 32.56 | 7.70 | 12.11 |
U-Net+Mix | 77.4* | 90.0* | 81.5* | 33.00 | 8.26 | 15.61 | |
U-Net+CPM | 77.4* | 89.9 | 81.4* | 31.08* | 7.41 | 12.28 | |
U-Net+Mix+CPM | 77.4* | 90.0* | 82.3* | 32.11 | 7.09* | 11.49 | |
MAU-Net | 78.5* | 90.2* | 82.8* | 26.96* | 7.61 | 8.61* |
Table 4
MAU-Net | DSC (%)↑ | Hausdorff95 (mm)↓ | |||||
Batch size | Learning rate | ET | WT | TC | ET | WT | TC |
2 | 0.005 | 79.7 | 90.9 | 84.4 | 27.04 | 4.87 | 6.91 |
0.001 | 80.3 | 91.2 | 84.9 | 26.24 | 4.55 | 6.71 | |
0.0005 | 78.1 | 90.4 | 83.4 | 29.33 | 5.84 | 7.32 | |
0.0001 | 77.4 | 89.9 | 83.5 | 29.97 | 6.32 | 7.94 | |
4 | 0.005 | 79.3 | 90.7 | 84.6 | 27.51 | 4.79 | 6.94 |
0.001 | 80.7 | 91.3 | 85.1 | 26.31 | 4.35 | 6.16 | |
0.0005 | 78.3 | 90.1 | 83.9 | 28.77 | 5.32 | 6.47 | |
0.0001 | 77.6 | 90.2 | 84.2 | 30.21 | 6.33 | 7.56 |
Table 5
Hyper-parameter | DSC (%)↑ | Hausdorff95 (mm)↓ | ||||
ET | WT | TC | ET | WT | TC | |
$\alpha =0$ | 79.3 | 90.8 | 84.2 | 27.48 | 5.09 | 7.33 |
$\alpha =0.25$ | 80.7* | 91.3* | 85.1* | 26.31 | 4.35* | 6.16* |
$\alpha =0.50$ | 80.1* | 90.4 | 84.5 | 25.47* | 4.55 | 6.74 |
$\alpha =0.75$ | 80.4* | 91.1 | 84.7 | 26.11 | 4.78 | 6.14* |
$\alpha =1$ | 78.8 | 91.0 | 83.9 | 27.14 | 5.51 | 7.07 |
Table 6
MAU-Net | DSC (%)↑ | Hausdorff95 (mm)↓ | |||||
ET | WT | TC | ET | WT | TC | ||
Padding methods | Zero padding | 78.5 | 90.2 | 82.8 | 26.96 | 7.61 | 8.61 |
Reflection padding | 78.7 | 90.4 | 83.0 | 25.31* | 6.45* | 7.44* | |
Replication padding | 78.5 | 90.3 | 82.7 | 26.23 | 7.34 | 7.96 | |
Noise type | – | 78.5 | 90.2 | 82.8 | 26.96 | 7.61 | 8.61 |
Gaussian noise | 78.5 | 90.1 | 82.6 | 27.14 | 8.10 | 8.94 | |
Salt-and-pepper noise | 78.3 | 89.5 | 82.3 | 27.54 | 7.94 | 9.01 | |
Rayleigh noise | 78.0 | 90.2 | 82.4 | 28.04* | 7.84 | 8.73 |
5.2 Visualization
Fig. 7
Table 7
Methods | DSC (%)↑ | Hausdorff95 (mm)↓ | ||||
ET | WT | TC | ET | WT | TC | |
Jun et al. (2021) | 76.0* | 88.8* | 77.2* | 5.20* | 7.76* | 8.26* |
Milletari et al. (2016) | 70.9* | 87.4* | 81.2* | 5.06* | 9.43* | 8.72* |
Akbar et al. (2022) | 74.2* | 88.5* | 81.0* | 6.67* | 10.83* | 10.25* |
Liu et al. (2021a) | 75.9* | 88.5* | 85.1 | 4.80* | 5.89* | 6.56* |
Zhao et al. (2020) | 75.4* | 91.0 | 83.5 | 3.84* | 4.57 | 5.58 |
Guo et al. (2020) | 77.3* | 90.3 | 83.3 | 4.44* | 7.10* | 7.68* |
Wang et al. (2021) | 73.7* | 89.4* | 80.7* | 5.99* | 5.68* | 7.36* |
Chang et al. (2023) | 78.2 | 89.0* | 81.2* | 3.82* | 8.53* | 7.43* |
MAU-Net (Ours) | 78.0 | 90.6 | 82.8 | 3.04 | 4.05 | 5.37 |
5.3 Comparison with Representative Methods
Table 8
Methods | DSC (%)↑ | Hausdorff95 (mm)↓ | ||||
ET | WT | TC | ET | WT | TC | |
Jun et al. (2021) | 75.2* | 87.8* | 77.9* | 30.65* | 6.30 | 11.02* |
Milletari et al. (2016) | 68.8* | 84.1* | 79.1* | 50.98* | 13.37* | 13.61* |
Akbar et al. (2022) | 72.9* | 88.9* | 80.2* | 31.97* | 10.26* | 13.58 * |
González et al. (2021) | 77.3* | 90.2 | 81.5* | 21.80 | 6.16 | 7.55 |
Vu et al. (2021) | 77.2* | 90.6 | 82.7 | 27.04* | 4.99 | 8.63 |
Cirillo et al. (2021) | 75.0* | 89.3* | 79.2* | 36.00* | 6.39 | 14.07* |
Jiang et al. (2022) | 77.4* | 89.1* | 80.3* | 26.84 | 8.56* | 15.78 * |
Wang et al. (2021) | 78.7 | 90.1 | 81.7* | 17.95 | 4.96 | 9.77* |
Li et al. (2024b) | 75.4* | 89.9 | 83.0 | 22.07 | 6.64 | 6.09 |
MAU-Net (Ours) | 78.7 | 90.4 | 83.0 | 25.31 | 6.45 | 7.44 |
Table 9
Methods | DSC (%)↑ | Hausdorff95 (mm)↓ | ||||
ET | WT | TC | ET | WT | TC | |
Jia et al. (2023) | 85.1* | 92.1* | 90.1* | – | – | – |
Vijay et al. (2023) | 85.0* | 90.0* | 90.0* | 6.30* | 9.43* | 7.78* |
Hatamizadeh et al. (2022) | 86.2* | 92.5* | 91.8* | 11.28* | 7.74* | 7.85* |
Peiris et al. (2022) | 85.3* | 93.1 | 90.2* | 10.78* | 6.76* | 7.56* |
Li et al. (2024a) | 89.7 | 92.8* | 92.9 | 2.29 | 5.12 | 4.16 |
MAU-Net (Ours) | 88.9 | 93.7 | 93.2 | 3.75 | 4.68 | 4.03 |