Table 1 Training, validation and held-out test sets used in our study.

From: Data valuation for medical imaging using Shapley value and application to a large-scale chest X-ray dataset

 

Sample size

# Pneumonia (%)

Training set

2,000

200 (10.0%)

Validation set

500

249 (49.8%)

Held-out test set

610

306 (50.2%)

  1. All three sets were sampled from the train or validation set of ChestX-ray14 dataset9. The training set was used to train data Shapley algorithm and compute Shapley values, the validation set was used to compute the predictor performance score during training, and the held-out test set was used to report the final results. Because the distribution of pneumonia labels in the ChestX-ray14 dataset is highly imbalanced, we sampled a larger proportion of pneumonia cases in the training set and sampled balanced validation and held-out test sets in this study.