Table 1 Mean absolute error (mean ± s.d.) when estimating population uniqueness (100 trials per population)

From: Estimating the success of re-identifications in incomplete datasets using generative models

 

MERNIS

USA

ADULT

HDV

MIDUS

Corpus

n

8,820,049

3,061,692

32,561

8403

7108

c

10

40

50

50

60

[min Ξ, max Ξ]

[0.087, 0.844]

[0.000, 0.961]

[0.000, 0.794]

[0.002, 0.941]

[0.052, 0.944]

Sampling fraction

100%

0.029 ± 0.019

0.028 ± 0.026

0.018 ± 0.016

0.006 ± 0.009

0.018 ± 0.014

10%

0.030 ± 0.019

0.028 ± 0.016

0.022 ± 0.020

0.011 ± 0.009

0.035 ± 0.044

5%

0.029 ± 0.019

0.027 ± 0.016

0.027 ± 0.023

0.015 ± 0.012

0.037 ± 0.055

1%

0.029 ± 0.019

0.029 ± 0.015

0.027 ± 0.014

0.045 ± 0.050

0.055 ± 0.079

0.5%

0.028 ± 0.019

0.029 ± 0.015

0.048 ± 0.039

  

0.1%

0.026 ± 0.017

0.058 ± 0.037

   
  1. Our model correctly estimates population uniqueness even when only a small to very small fraction of the population is available. n denotes the population size and c the corpus size (the total number of populations considered per corpus). We do not estimate population uniqueness when the sampled dataset contains <50 records