To the Editor:

I have read with interest the paper entitled “Longitudinal changes in the frequency of mosaic chromosome Y loss in peripheral blood cells of aging men varies profoundly between individuals” by Danielsson et al. [1]. Aneuploidy of the sex chromosomes in human leukocytes increases with age and involves a small proportion of cells in normal individuals [2, 3]. A longitudinal follow-up of LOY as reported by Danielsson et al. is of great interest because the loss of the Y (LOY) has been found in leukaemic or preleukaemic male patients where a 45,X0 clone can represent up to 90% of the cell population (clonal hematopoiesis). LOY is also associated with increased mortality rates and with non-hematological malignancies [4, 5].

LOY can be quantified either using classical molecular cytogenetics [3] or genomics data (for instance, Whole Genome Sequencing/WGS data or the mean intensity log-R ratio (mLRR) OF micro-array probes) [6]. Several formulas have been proposed to translate mLRR into LOY(%). This has been done by relating mLRR and a direct readout of LOY for the same samples, such as qPCR or digital droplet PCR. I have recently proposed an empirical equation to estimate LOY(%) assuming that the fraction of cells carrying a Y chromosome in a sample (i.e., XY/(XY + X0)) would be directly related to 2mLRR, so that LOY(%) would be proportional to 1-2mLRR [7]. By analyzing published mLRR data, the equation LOY(%) = (1,8(1-2mLRR) + 0,015)100 was obtained, which was corroborated with another dataset based on WGS [8].

Danielsson et al. [1] have proposed the empirical formula LOY(%) = 100 × (1 − 22mLRRY) AND compared it to Veitia’s. This analysis showed that the two formulas generate similar predictions of mosaicism for “low” levels of LOY. However, they find that at “higher levels of mosaicism, only the formula presented here asymptotically approaches the theoretical maximum of 100% mosaicism”. I fully agree that Veitia’s formula, which was obtained from a dataset of normal individuals, goes beyond 100% for abnormally high levels of mosaicism. This mathematical divergence can be easily seen in Danielsson’s Figure 4A. The observed divergence with real data can be appreciated in Fig. 4B, where it mostly concerns a few data-points. As can be seen in most of the figures of Danielsson’s paper, they are the exception rather than the rule. That said, my purpose here is to better define the LOY ranges where each formula can be applied. For this, I extracted the numerical values of Danielsson’s Fig. 2A and B using the software WebPlotDigitizer (https://automeris.io/WebPlotDigitizer/) for samples studied with SNP-array and either WGS (Fig. 2A) or ddPCR (Fig. 2B). Data extraction produced 25 data points for the WGS dataset and 71 for ddPCR that were perfectly superimposable to original figures of the paper. After applying Danielsson’s and Veitia’s formulas, estimates according Danielsson’s formula were lower than those by Veitia’s for all data points. However, this does not tell whether one is ‘performing’ better than the other. Next, I computed the sum of the squares of the deviations between the values predicted by each formula and the observed ones (i.e., by WGS or ddPCR). This provides a measure of the goodness of fit. For the WGS data, Veitia’s formula produced a lower sum of squares up to 50% of LOY. The better FIT of Veitia’s formula extended to 66% of LOY according to the ddPCR data (Online Fig. 1A and B).

I also compared the estimates of LOY(%) using microarray data with those previously published based on molecular cytogenetics (from Fig. 2A from reference [7]). Specifically, I calculated the arithmetic mean values of LOY(%) for age bins ranging from 42 to 93 years using Danielsson’s and Veitia’s formulas over more than 6000 control individuals. Online Figure 1C shows the existence of a significant correlation between the LOY(%) estimates provided by each formula versus those observed using molecular cytogenetics for matching ages. However, as expected, application of Danielsson’s formula led to values of LOY(%) systematically lower than those observed using molecular cytogenetics suggesting again that Veitia’s works better for the low LOY(%) values observed in the normal population.

Danielsson’s Fig. 1C gives the impression that every progressor history is specific. However, there is an underlying common theme. LOY progression can be described by the logistic function that ranges from 0 to 100% [9]:

$$LOY\left( {\mathrm{\% }} \right) = \frac{{100LOYo}}{{LOYo + \left( {100 - LOYo} \right)e^{ - kt}}}$$

where LOY0 is LOY “at birth” and k is A rate constant. This equation can be linearized as shown in reference [10], which can allow us to determine k and LOY0 from pairs (or more) of data-points for each individual. Using data I could unambiguously extract from Figure 1C of Danielsson et al. (38 progressors), this analysis yielded an average k value of 0.16+/−0.10 year−1, which points to a variability that can be attributed to technical issues but also to inter-individual differences. As shown by Veitia, 2019 [10], Δ(100/LOY-1)/Δt versus 100/LOY-1 is a straight line with slope k (=0.18 year1 according to Online Fig. 2). A correction for LOY values < 50% according to Veitia’s formula yields a k = 0.15 year−1. Regarding LOY0, it is well below 1% for most cases. However, for 3 progressors values ranging from 1.5 to 7.7% were observed. However, for such individuals k values were under the average value, even if not considered as outliers by a Grubb’s test at p < 0.05 (https://www.graphpad.com/quickcalcs/grubbs1/).

These analyses suggest that, as a rule of thumb, Veitia’s formula should be used for LOY below 50%. In turn, Danielsson’s should be used when dealing with abnormal cases where LOY can soar well beyond 50%. The second part shows that progessors display a logistic expansion of X0 cells, for some of whom the process could in principle start early in life, but this requires further experimental confirmation.