Table 1 Mean ± standard deviation values of Dice, precision and recall (sensitivity) for each individual model and our ensemble, as evaluated on the test set. Bolds indicate maximum values and asterisks indicate statistically significant differences with respect to the ensemble (p < 0.05) based on one-tail paired Wilcoxon sign-rank tests.
Metric | Evaluation area | |||
---|---|---|---|---|
Model | CSF | 3 CMM area | 3–1 ring | Full volume |
Dice | ||||
U-Net | 0.867 ± 0.075* | 0.908 ± 0.028* | 0.914 ± 0.026 | 0.908 ± 0.025* |
All-Dropout | 0.789 ± 0.175* | 0.885 ± 0.049* | 0.898 ± 0.032* | 0.894 ± 0.036* |
BRU-Net | 0.865 ± 0.076 | 0.905 ± 0.027* | 0.910 ± 0.029 | 0.908 ± 0.022* |
U2-Net | 0.874 ± 0.074 | 0.912 ± 0.026 | 0.917 ± 0.024 | 0.909 ± 0.024* |
Ensemble | 0.875 ± 0.075 | 0.912 ± 0.026 | 0.917 ± 0.024 | 0.911 ± 0.024 |
Precision | ||||
U-Net | 0.9160 ± 0.052 | 0.918 ± 0.043* | 0.919 ± 0.044* | 0.920 ± 0.039* |
All-Dropout | 0.930 ± 0.053 | 0.936 ± 0.046 | 0.937 ± 0.048 | 0.939 ± 0.029 |
BRU-Net | 0.889 ± 0.050* | 0.903 ± 0.050* | 0.904 ± 0.053* | 0.913 ± 0.036* |
U2-Net | 0.920 ± 0.047 | 0.923 ± 0.041 | 0.924 ± 0.043 | 0.925 ± 0.038 |
Ensemble | 0.925 ± 0.041 | 0.923 ± 0.043 | 0.923 ± 0.046 | 0.926 ± 0.037 |
Recall (Sensitivity) | ||||
U-Net | 0.832 ± 0.116 | 0.900 ± 0.042 | 0.910 ± 0.034 | 0.898 ± 0.045 |
All-Dropout | 0.708 ± 0.224* | 0.842 ± 0.068* | 0.864 ± 0.045* | 0.856 ± 0.068* |
BRU-Net | 0.854 ± 0.127 | 0.910 ± 0.035 | 0.918 ± 0.030 | 0.904 ± 0.042 |
U2-Net | 0.840 ± 0.117 | 0.902 ± 0.039 | 0.912 ± 0.033 | 0.895 ± 0.043 |
Ensemble | 0.840 ± 0.122 | 0.904 ± 0.038 | 0.913 ± 0.030 | 0.899 ± 0.044 |