Table 1 Mean absolute error (mean ± s.d.) when estimating population uniqueness (100 trials per population)

		MERNIS	USA	ADULT	HDV	MIDUS
Corpus	n	8,820,049	3,061,692	32,561	8403	7108
	c	10	40	50	50	60
	[min Ξ, max Ξ]	[0.087, 0.844]	[0.000, 0.961]	[0.000, 0.794]	[0.002, 0.941]	[0.052, 0.944]
Sampling fraction	100%	0.029 ± 0.019	0.028 ± 0.026	0.018 ± 0.016	0.006 ± 0.009	0.018 ± 0.014
	10%	0.030 ± 0.019	0.028 ± 0.016	0.022 ± 0.020	0.011 ± 0.009	0.035 ± 0.044
	5%	0.029 ± 0.019	0.027 ± 0.016	0.027 ± 0.023	0.015 ± 0.012	0.037 ± 0.055
	1%	0.029 ± 0.019	0.029 ± 0.015	0.027 ± 0.014	0.045 ± 0.050	0.055 ± 0.079
	0.5%	0.028 ± 0.019	0.029 ± 0.015	0.048 ± 0.039
	0.1%	0.026 ± 0.017	0.058 ± 0.037

Our model correctly estimates population uniqueness even when only a small to very small fraction of the population is available. n denotes the population size and c the corpus size (the total number of populations considered per corpus). We do not estimate population uniqueness when the sampled dataset contains <50 records

Quick links

Search