Introduction

The brain shows major changes over the course of aging. It is not fully understood how neurodegenerative diseases affect brain regions and networks that are also affected by normal aging. However, increasing evidence suggests that neural systems vulnerable to age are also vulnerable to Alzheimer’s disease (AD) and other neurodegenerative diseases1. Recent availability of large-scale neuroimaging datasets has facilitated the application of machine learning techniques and enabled development of models that can predict behavior and characteristics of brain structure and function known to change with age2,3,4,5,6,7,8,9,10. We investigated whether predicted brain age may be a relevant biomarker of neurodegenerative disease2, inasmuch as disease may cause deviations from normal aging trajectories, and the factors that influence these deviations may be studied. As an example, brain age predictive models using data from structural magnetic resonance imaging (MRI) have shown accelerated biological aging in individuals who develop AD dementia11,12,13,14. Similar phenomena are already apparent in others who have mild cognitive impairment (MCI) that progresses to dementia12,15. Such inter-individual differences between predicted biological and chronological age have been studied in relation to lifestyle variables16,17,18,19 and to genetic determinants14,17,20. It is currently unknown, however, whether accelerated brain aging precedes evidence of cognitive decline, and whether it can be detected in the pre-clinical phase of AD.

The dementia of AD is characterized by progressive cognitive decline that becomes sufficient to impair activities of daily living21. Prior work has shown that brain changes characteristic of an AD process can be demonstrated two or three decades before symptom onset22,23. Typically, this sequence begins with the accumulation of cerebral beta-amyloid (Aβ), followed by the deposits of hyperphosphorylated tau (neurofibrillary tangles), metabolic brain alterations, and other evidence of neurodegeneration that precede cognitive and functional symptoms22,24. Functional brain alterations revealed by MRI measures of resting state connectivity (rs-fMRI) become detectable almost synchronously with Aβ and tau measured by positron emission tomography (PET) and are therefore evident several years before atrophy can be detected by structural MRI25,26. Conjunction of such functional and biological changes appears to extend throughout the development of AD from its pre-clinical to its dementia stages24. These findings suggest that MRI measures of resting state functional connectivity may be a more sensitive modality than structural imaging for detection of brain changes in pre-clinical AD.

AD dementia symptoms appear only after massive, evidently irreversible brain changes. Therefore, a more promising approach, at least in theory, is to prevent such changes. However, AD prevention requires improved understanding of the pre-clinical phase of AD27. Identification of individuals in this clinically silent phase of the disease is challenging because it is mostly unknown who will develop dementia during the lifespan. One way to circumvent this problem is the study of autosomal dominant AD (ADAD), a group of rare genetically determined variants of AD caused by mutations in the amyloid precursor protein (APP), presenilin 1 (PSEN1) or presenilin 2 (PSEN2) genes, all involved in Aβ production22,28. Because these mutations are fully penetrant, progression to disease is predictable, making ADAD an ideal model for the study of the pre-clinical (i.e., pre-symptomatic) phase of AD.

Although it is impossible to determine with certainty who will develop dementia due to sporadic AD (sAD), some factors are known to increase the risk of its development. Prominent among these is the ε4 allele at the polymorphic APOE locus that encodes apolipoprotein E, known to be involved in Aβ clearance28,29. More broadly, a strong family history of sAD dementia has also been associated with a 2- to 4-fold increased incidence of dementia30,31. Individuals whose brains show Aβ pathology are also known to experience brain changes and related cognitive decline over time32,33. Thus, asymptomatic individuals can be classified as being in the pre-clinical phase of the disease if they have Aβ pathology27. Likewise, their risk of dementia is increased if they carry an APOE ε4 allele or other known genetic risk factor and/or if they have a strong family history of the disease34,35. Here, we tested whether individuals in the pre-clinical phase of ADAD, or at risk of pre-clinical sAD, show evidence of accelerated brain aging prior to the symptoms predicted by their genetic risk or Aβ status.

We studied 1624 cognitively unimpaired participants between 18 and 94 years of age, recruited and scanned in different studies and centers. Within these, we developed a method that predicts brain age from rs-fMRI. We relied on measures of network integration and segregation, known as graph metrics36, to represent global brain functioning and developed a neural net. Briefly, we trained this model initially on a cohort of cognitively unimpaired individuals ranging in age from 18 to 90 years old. We then validated its generalizability in another group of cognitively unimpaired individuals (in age from 19 to 79 years old) from another study/site. After such validation, we tested whether individuals with pre-clinical ADAD showed accelerated functional brain aging in comparison with their age-matched relatives without a causal mutation. Importantly, none of these latter participants had been involved in the development or validation of the brain age model. In these same individuals, we also tested whether Aβ pathology was a further predictor of brain age. Finally, in a cohort of asymptomatic individuals having a parental or other strong family history of sAD, we tested whether APOE ε4 and/or Aβ pathology were associated with predicted functional brain age.

Our results showed, first, that pre-symptomatic carriers of ADAD mutations (DIAN cohort) had evidence of accelerated functional brain aging. Importantly, this finding was stronger in individuals who already accumulated significant Aβ pathology as evidenced by PET imaging. In the cohort at elevated risk of sAD (PREVENT-AD cohort), neither APOE ε4 status nor PET evidence of Aβ pathology was associated with apparent accelerated brain aging but individuals closer to their parental age of onset tended to show accelerated brain aging. Secondary analyses in a third independent cohort including a small subset of individuals diagnosed with either sAD dementia or MCI (ADNI cohort) confirmed the expected acceleration in functional brain aging in patients vs. cognitively normal older adults, suggesting that functional brain age is accelerated in cognitively impaired individuals with sAD and therefore validating the sensitivity of our model to sporadic AD-related processes. We conclude that asymptomatic persons with strong genetic determinants show a characteristic pattern of functional brain changes that are associated with accelerated biological brain aging. The biological development of AD is therefore characterized by a pattern of advanced brain aging that can be detected prior to symptom onset, at least in individuals having rare genetic mutations that cause AD and significant Aβ pathology.

Results

Separation of the multisite data into a training, validation, and test sets

We gathered rs-fMRI data from 1624 cognitively unimpaired participants between 18 and 94 years of age, provided by the Dominantly Inherited Alzheimer Network (DIAN), the Pre-symptomatic Evaluation of Experimental or Novel Treatments for Alzheimer’s Disease cohort (PREVENT-AD), the Cambridge Centre for Ageing and Neuroscience (CamCAN), the 1000-Functional Connectomes Project—Cambridge site (FCP-Cambridge), the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and the International Consortium for Brain Mapping (ICBM) cohorts, to build a “brain age” predictive model (Table 1). Considering our focus on the pre-clinical phase of AD, individuals with mild cognitive impairment (MCI) or AD dementia were excluded from the main analyses. In secondary analyses, we nevertheless tested whether cognitively impaired individuals with sAD evidenced accelerated brain aging using our functional predictive model that was built solely on cognitively unimpaired individuals.

Table 1 Dataset characteristics.

After processing and quality control, 1340 cognitively unimpaired individuals remained for the analyses. These were divided into a training set of 773 persons (large multi-cohorts dataset covering the lifespan used to build the predictive models), a validation set (independent lifespan dataset of 46 persons from ICBM used to test the generalizability of the developed models and select the final model), and one multi-cohort test set (125 DIAN mutation carriers and 29 without a mutation, 256 PREVENT-AD individuals thought to be at enhanced genetic risk of sAD, 96 from CamCAN, and 15 cognitively normal individuals from ADNI). A harmonized pre-processing pipeline was applied to all individuals, and 26 graph metrics were chosen based on their ability to quantify whole-brain connectivity and extracted from each participant’s correlation matrix (see Material and Methods for details). Further details are shown in Table 1 and Fig. 1.

Fig. 1: Methodology overview.
figure 1

a Multiple cohorts covering the lifespan were included in the study. They were separated into a training and validation set, both used to develop the predictive brain age model, and a test set in which our model was applied. b All participants underwent resting state functional magnetic resonance imaging that was processed with a uniform pipeline. Functional connectivity matrices were generated from the Power atlas82, from which graph metrics were calculated. Graph metrics were the input in our brain age model, and thus all possible metrics were of interest. c The first step toward building the model was to rank the different graph metrics from the most to least related to aging in our training set, to determine an order of importance to our model inputs using both support vector machine and regression tree ensemble algorithms. Neural networks were then tested to identify the best brain age model. Different architectures were tested, and the model applied in the training set that best generalized to the validation set was chosen as the final model (see Fig. 2). d The model was applied to the left-out test set and our measure of interest was the predicted age difference (PAD). Mut−: mutation non-carriers, Mut+: mutation carriers, MRI: magnetic resonance imaging, PAD: predicted age difference.

Feature ranking as a step for reducing the number of features in the final model

First, to reduce the number of inputs of the model, we searched for graph metrics that most reliably predicted chronological age10. To do so, training set data was entered in parallel in support vector machine (SVM) and regression tree ensemble models to identify graph metrics with highest weights. The root mean squared error (rmse) for predicted chronological age in SVM and the tree ensemble were 16.45 and 16.08, respectively. Graph metrics were then ranked separately by order of SVM weights and ensemble model importance (i.e., highest load corresponding to the most important). We used the average rank from both models to determine the overall importance of each metric, as presented in Fig. 2a. Feature rank determined which metrics would be used as input into the subsequent neural network models, to build our predictive brain age model.

Fig. 2: Features ranking and neural networks performance.
figure 2

a Scatter plots of SVM model weights (y-axis) and ensemble tree feature importance (x-axis). Model weights are absolute value, and normalized such that 1 indicates highest importance. Numbers next to data points indicate their rank (i.e., 1 = highest average rank between both SVM and ensemble models; orange dots correspond to the top 10 features, blue dots represent lower-ranked features). b Root mean square error of different neural network models with inputs sorted according to rank for the training set (left), and the validation set (middle). Values were averaged over 3 iterations of the models. Neural networks trained with randomly-ranked inputs served as our null models (right). The x-axis indicates the number of inputs into the model (number of graph metrics) while the y-axis indicates the network architecture. For example, 5 means 1 hidden layer with 5 units, 5 2 means 2 hidden layers, the first one with 5 units and the second with 2 units. Darker (blue) colors indicate higher accuracy, while lighter (yellow) colors indicate lower accuracy. The red square identifying the model that provides the better generalizability in the validation set (lowest rmse) contains 2 hidden layers of 5 and 2 units, and uses the 10 highest-ranked graph metrics as input. The same neural network trained on randomly-ranked inputs (null model, gray square) provides lower accuracy. c Brain age model performance across datasets. Correlations between chronological age (x-axis) and age predicted by the neural network (y-axis) are represented for the training (n = 773), validation (n = 46) and test (n = 521) sets. Statistical values (c) were obtained from Pearson’s correlations (two-sided test, with no adjustment). Source data are provided as a Source data file. SVM: support vector machine, rmse: root mean square error, mae: mean absolute error.

Building the Brain Age model and improving its generalizability

We chose the optimal neural net architecture after having built different neural networks with increasing complexity, varying in number of input features (5, 10, 15, 20, or 25 most-important graph metrics, ranked as described previously), hidden layers, and hidden layer units. Importantly, each graph metric was only entered once as input for each neural network architecture tested, and the inputs were kept constant across the model’s iterations, such that features of more complex models always included the features of the simpler ones. We used an average of three determinations of rmse to assess the performance for each model. The different neural networks were applied separately in the training and validation set (Fig. 2b). To test the relevance of the metrics’ ranking, we assessed also the performance of neural nets on the training data when including the metrics randomly (null model, see Fig. 2b, right panel) and compared it to the models created based on ranked metrics (Fig. 2b, left panel). This null model suggested that the neural network performed better when features were ordered using SVM weights and ensemble feature importance, at least in simpler models.

To select the optimal neural network architecture for our brain age model, we generated the different models using the training set and evaluated which of these provided the best generalizability to the validation set (i.e., avoiding overfitting). Thus, the validation set helped us to determine the best balance between improved age prediction and good generalizability. In general, increasing model complexity (more features and hidden layers/units) led to better performance in the training set (Fig. 2b, left panel). However, as expected, too much complexity resulted in overfitting as evidenced by improved performance in the training set, resulting in poorer fit in the validation set (Fig. 2b, middle panel). The model that produced the lowest rmse in the validation set (averaged rmse over 3 iterations = 13.89) had 10 inputs (i.e., the 10 most important metrics, see Fig. 2a) and 2 hidden layers (5 units in the first layer and 2 in the second). The performance of this specific model was similar to that obtained in the training set (averaged rmse over 3 iterations = 13.75). This model was thus applied to the remaining unseen data (test set) to test whether genetics or AD pathology accelerated apparent functional brain aging. Subgraph centrality, clustering/modularity coefficients, and small-worldness were among the selected graph metrics (10 first metrics Fig. 2a). For reference, the covariance matrix of the 10 selected graph metrics is presented as Supplementary material (Supplementary Fig. 1).

Performance of the final brain age model

We show the association between chronological age and the model-predicted brain age for each dataset in Fig. 2c. As expected, predicted age was correlated with chronological age in the training set (R2 = 0.53, p < 0.0001; rmse = 14.01, mean absolute error [mae] = 11.00; Fig. 2c left) and the validation set (R2 = 0.49, p < 0.0001; rmse = 13.84; mae = 11.90; Fig. 2c middle). Of note, the neural net model outperformed the simpler models used in our feature ranking step (rmse = 16.45 for SVM and 16.45 for tree ensemble, see above). Importantly, the model was able to predict chronological age from functional brain properties in the test set (R2 = 0.36, p < 0.0001; rmse = 13.24; mae = 11.58; Fig. 2c right). Notably, the same was true when restricting the analyses to the CamCAN cohort, considered as a lifespan dataset representative of healthy aging (R2 = 0.26; p < 0.001; rmse = 16.70; mae = 14.32).

Functional brain aging and pre-clinical Alzheimer’s disease

To assess the characteristics of functional brain aging in pre-clinical AD and evaluate whether genetic determinants/risk and Aβ pathology were related to accelerated brain aging, we calculated the predicted age difference or PAD (Fig. 1d). This was computed as predicted brain age minus chronological age for each participant in the test set3. PAD deviation from zero should not be interpreted in isolation due to the potential existence of site/cohort effects, and we, therefore, interpret group comparisons only within cohorts.

Analyses considered DIAN (Fig. 3a–d) and PREVENT-AD (Fig. 3e, f) participants from the test set (Table 2). We tested whether genes predisposing to AD, either the ADAD mutation carriers or the broader familial risk of sAD, were associated with accelerated brain aging. To do so, we compared PAD between mutation carriers vs non-carriers from DIAN, and APOE ε4 carriers vsnon-carriers from PREVENT-AD. Considering the tendency of the model to overestimate younger ages and underestimate older ages, all subsequent analyses were controlled for chronological age (see ref. 14 for a similar procedure). The model’s prediction in DIAN mutation carriers overestimated their chronological age (i.e., positive PAD = 8.19 years) in contrast to mutation non-carriers (i.e., negative PAD = −3.54 years; F1,152 = 4.88, p = 0.03; Table 3 and Fig. 3a, b). Overall, the predicted age in the PREVENT-AD cohort overestimated the chronological age by ~5 years (Fig. 3e), but APOE ε4 status was not associated with differences in this PAD (F1,253 < 1; p = 0.49, Table 3 and Fig. 3f).

Fig. 3: Predicted age difference in DIAN and PREVENT-AD.
figure 3

Density plot of chronological age vs predicted age in the test set participants in DIAN (n = 154) (a). Brain age is overestimated in autosomal dominant mutation carriers (n = 125) compared to non-carriers (n = 29) (b). The overestimation in mutation carriers is in part due to Aβ, with a difference between mutation noncarriers (n = 29) and Aβ+ mutation carriers (n = 39) only (Aβ− mutation carriers [n = 75] did not differ from the other groups) (c), and an association between Aβ load and predicted age difference across the whole cohort (n = 154) (d). Light (yellow) colors represent DIAN mutation non-carriers and darker (orange) colors represent DIAN mutation carriers. Density plot of chronological age vs predicted age in the test set participants in PREVENT-AD (n = 256) (e). In individuals at risk of sporadic Alzheimer’s disease, brain age is overestimated irrespectively of APOE ε4 genotype (f). Light (salmon) colors represent PREVENT-AD APOE ε4 non-carriers (n = 147) and darker (dark orange) colors represent PREVENT-AD APOE ε4 carriers (n = 108). For b, c and f the interquartile range (25th Percentile, Median and 75th Percentile), the whiskers (lines indicating variability outside the upper and lower quartiles minimum value) and the individual dots are presented. For d, shaded (gray) area represents confidence intervals (95%). Statistical values were obtained from general linear models (b, c, f) or partial Pearson’s correlations (d), controlling for chronological age, without further adjustment (two-sided tests). Aβ: beta-amyloid, Aβ−: amyloid-negative, Aβ+: amyloid-positive; APOE4: apolipoprotein E4, PIB: Pittsburgh compound B, SUVR: standardized uptake value ratio. Source data are provided as a Source data file.

Table 2 DIAN and PREVENT-AD test set characteristics.
Table 3 Model’s prediction according to the presence of genetic mutation/risk and Aβ pathology in DIAN and PREVENT-AD cohorts (test set only).

Given the importance of Aβ deposition in the cascade of events leading to AD dementia, we investigated whether Aβ burden is related to functional brain aging. We assessed the effect of Aβ deposition, measured by PET, on the PAD in both the DIAN and PREVENT-AD cohorts. Aβ-PET was acquired using 11C-PIB in DIAN and 18F-NAV4694 in PREVENT-AD, and Aβ burden was determined for each cohort according to their own processing pipelines and methods (see Methods section for Aβ measurements details). We assessed the influence of Aβ burden on functional brain aging by comparing Aβ-positive to Aβ-negative individuals. We also explored the possible influence of Aβ as a continuous variable by assessing the partial correlation between PAD and Aβ load. All analyses were controlled for chronological age.

In DIAN, we found a grading effect of (quasi-continuous) Aβ on PAD. Higher PAD was observed in Aβ-positive mutation carriers when compared to the group of non-carriers (F1,65 = 6.9, p = 0.02; Fig. 3c). However, the PAD in Aβ-negative carriers compared to non-carriers was only marginally higher (F1,101 = 2.73, p = 0.10; Fig. 3c). There were no significant differences between DIAN Aβ-positive and Aβ-negative mutation carriers (F1,111 = 1.93, p = 0.17; Fig. 3c, Table 3). Partial Pearson correlations showed that accelerated brain age was associated with increased fibrillar Aβ load in the entire DIAN cohort (r140 = 0.18, p = 0.04; Fig. 3d), a finding that was no longer significant when the analysis was restricted to mutation carriers (r111 = 0.14, p = 0.14).

In PREVENT-AD, among the 64 individuals who underwent Aβ-PET imaging (test set only), 50 were Aβ-negative and 14 were Aβ-positive (Table 3). The PAD was not associated with Aβ burden, either when looking at Aβ-positivity (F1,61 < 1; p = 0.33) or the influence Aβ load (r61 = 0.12; p = 0.35). Adding the delay between PET and rs-fMRI assessments as a covariate provided similar results (F1,60 < 1; p = 0.36 using Aβ-status and r61 = 0.12; p = 0.37 for the partial correlation with Aβ load).

In supplementary analyses, we explored the association between PAD and estimated years to symptom onset (EYO). EYO has been widely used as an estimate of disease progression in DIAN22,37, and it has been associated with amyloid pathology in individuals having a parental history of sporadic AD38,39. This index, calculated as the difference between parental age at symptom onset and participant’s chronological age, estimates each individual’s proximity to symptom onset (see Supplementary Methods for details). A weak but positive association was found between EYO and PAD in the PREVENT-AD (r = 0.13, p = 0.05), such that individuals that had higher PAD tended to also be closer to their expected age of onset. No such association was found in DIAN mutation carriers (r = −0.13, p = 0.16), or non-carriers (r = −0.23, p = 0.23).

Finally, we performed additional post hoc analyses to test whether sAD symptomatic individuals (MCI and dementia) had a higher PAD than asymptomatic individuals at risk of sAD (APOE ε4 carriers). This analysis was not initially planned, and was conducted only in a small subsample of the ADNI dataset (15 asymptomatic APOE ε4 carriers from the test set and 100 symptomatic individuals). The findings do suggest, as expected, increased PAD among individuals with cognitive impairment as compared with asymptomatic individuals at risk of sAD (using parametric, F1,112 = 2.85, p = 0.047, or non-parametric Mann–Whitney-U = 965, p = 0.04, one-tailed test).

Discussion

Variation in notional biological aging has been proposed to account for inter-individual differences in the way people age40. Combined with larger and more available datasets, machine learning methods can improve our understanding of brain function and our ability to predict health trajectories from brain properties. Previous models of brain aging have been informed primarily by characteristics of brain structure41. Accelerated structural brain aging has been found in individuals with MCI and AD dementia12,14,15. However, functional brain abnormalities are generally detectable prior to structural changes in the AD continuum, the latter being typically more proximate to the expression of clinical symptoms25,42,43. Here we developed a model that could evidently predict brain age across the entire adult human lifespan (ages 18–94). This model relied on topological properties of graphs constructed from rs-fMRI and demonstrates the feasibility of predicting brain age from rs-fMRI using global measures of network integration and segregation6,36. Applying our predictive functional model to ADAD in the DIAN cohort, we observed that brain aging was apparently accelerated in individuals with pre-clinical ADAD. This association was especially clear in individuals who had PET evidence of Aβ deposition. Among individuals at elevated risk of sAD (PREVENT-AD cohort), neither APOE ε4 nor Aβ was associated with accelerated brain aging. However, asymptomatic individuals who were closer to their expected age of symptom onset tended to show accelerated brain aging. The latter observation was corroborated by observations that symptomatic individuals with sAD showed accelerated brain aging when compared to asymptomatic individuals at risk (ADNI cohort, secondary analyses).

We developed the described model in participants from different cohorts and sites, and validated its generalizability in an independent monocentric dataset. While there is undoubtedly a cost to (internal) accuracy when optimizing model (external) generalizability, this external validation step is a major strength of this work. While modest in size, the validation set represented a completely independent dataset that covers the entire adult lifespan. Although we cannot exclude the possibility that a larger multicenter validation cohort might have led to selection of a slightly different network architecture, we note that our model’s rmse was very similar between the validation and the test sets. Importantly, the test set was never used in the development/validation of the brain aging model. Also, the model was not modified any further after the hypotheses were tested, i.e., hypotheses were only tested once using a model that appeared (from development and validation work) to be optimal. This approach ensured that our results regarding brain aging in pre-symptomatic AD were independent of the way the model was built.

To assess information integration in the brain, we relied on global brain function while applying graph metrics6,36. This approach provides a holistic view of brain function that has been shown previously to change through aging and AD44. Graph theory has the advantage that it quantifies and simplifies the many “moving parts” of dynamic systems inasmuch as every connection is defined by its relation to all others. We also used feature selection as an intermediate step to simplify the final model. We suggest that our approach using graph theory and feature selection are steps in the right direction toward interpretability of complex models. We are encouraged that the 10 graph metrics suggested as most important by these algorithms provided much lower error in our final neural network model in comparison to random choice of graph metrics. Of note, models using individual functional connections as inputs are also possible, but such models have been shown to require multiple dozens45 or hundreds10 of functional connections whose inter-relationships are not defined.

Compared with structural predictive models, previous modeling approaches using rs-fMRI data have found higher error7,19,46,47. These observations could partly be attributable to known characteristics of rs-fMRI data. Such data are typically noisier and experience more dynamic changes than structural data, and they may be more sensitive to multi-site effects. Despite these difficulties, we attempted to derive our brain age model from rs-fMRI because this modality appears better suited to study of the pre-clinical phase of AD. An extensive literature suggests that connectivity disruption appears early in the course of sAD as well as in “normal” aging48,49,50,51,52,53. Moreover, training, validating, and testing our predictive model across multiple cohorts also increased the error of our model compared to the previous studies3,7,11,13,14. Yet, inclusion of data from different sites should logically improve the generalizability of the model, a key strength when the model is applied to new data from different cohorts7,54. Finally and importantly, brain age models tend to overestimate younger ages and underestimate older ages19,55. While some researchers apply an age-bias correction procedure to their model19, we are showing the non-adjusted model and used chronological age as a nuisance variable in our PAD analyses instead of applying this correction prior to the PAD calculation. In sum, while we recognize that the error of our model is higher than most previous brain age models, it was derived from rs-fMRI data, no age-bias correction was applied to test the model accuracy, and we suggest that it is more generalizable than previous models. Crucially, it also appears to be sensitive to the questions of interest here.

Applying our model in the context of AD, we found evidence of accelerated functional brain aging in individuals in the pre-clinical phase of dominantly inherited AD. ADAD is widely believed to be a disease caused by overproduction of Aβ, and studies of ADAD have shown that biomarkers such as CSF-Aβ, start changing in mutation carriers as early as 25 years before symptom onset22. This is followed by the accumulation of fibrillar Aβ deposition as measured by PET imaging, alongside changes in concentrations of tau in the CSF and cerebral atrophy. Later changes include glucose hypometabolism and episodic memory decline and global cognitive decline22. Studies employing rs-fMRI in this disease are relatively rare, however, and have considered the entire ADAD spectrum48,56,57,58. One of these studies compared asymptomatic mutation carriers and non-carriers and suggested reduction in Default Mode Network functional connectivity among asymptomatic carriers56. This finding is concordant with literature on sAD suggesting that change in rs-fMRI is one of the earliest biomarkers of the disease25,26. Our rs-fMRI predictive model implied that functional brain age of ADAD pre-symptomatic mutation carriers (DIAN) exceeded their chronological age by about 10 years (based on the findings in non-carriers). This observation alone suggests that the pre-symptomatic phase of ADAD is accompanied by accelerated brain aging. The relative importance of Aβ burden on accelerated brain aging was less clear. While no association was found between Aβ burden when restricted to DIAN mutation carriers, the difference between mutation carriers and non-carriers was stronger (i.e., significant only) in those with fibrillar Aβ as detected with PET imaging. The observations of accelerated brain aging in carriers may therefore not be entirely attributable to the accumulation of Aβ. While Aβ is often hypothesised to be the starting point of the AD neuropathological cascade59, tau is believed to be more toxic60,61 and might therefore be more closely associated with accelerated aging. Mutated genes in ADAD could also have life-long effects on the brain that are not fully dependent on Aβ accumulation. Consistent with this view, a previous study in PSEN1 mutation carriers from the Columbian cohort showed early changes in brain function before evidence of cerebral Aβ plaque accumulation62. Finally, we cannot exclude the possibility that some Aβ-negative individuals would in fact be Aβ accumulators63,64, or present other forms of Aβ that cannot be detected through PET. What seems to be clear is that AD genetic mutations influence functional brain properties in pre-clinical ADAD. The exact mechanisms that drive this accelerated brain aging will need further investigation.

When investigating the characteristics of PAD in individuals with a family history of sAD (PREVENT-AD), we did not find differences between APOE ε4 carriers and non-carriers, nor associations with Aβ burden (for similar results with structural and metabolic brain age, see refs. 11,65). These findings do not necessarily contradict an extensive literature suggesting association between APOE ε4 status, Aβ, and rs-fMRI44,66,67,68,69. They seem instead to suggest that we are capturing different constructs. While the previous studies tested the direct effect of these two factors on rs-fMRI metrics, we tested the associations between these AD risk factors and a proxy of biological aging derived from rs-fMRI. While we found no association between Aβ and PAD, we did observe, however, an association between EYO and PAD such that PREVENT-AD participants who were closer to their parents’ age of onset tended to have older predicted brain age when compared with others. In the same cohort, EYO had previously been associated with functional changes mimicking brain changes characteristic of AD dementia70.

While our focus was the pre-clinical phase of the disease, we performed post-hoc analyses using rs-fMRI data from a small subset of ADNI patients. We found accelerated functional aging in persons with symptomatic sAD (MCI or dementia) when compared with others who were asymptomatic, but at increased risk of sAD (APOE ε4 participants from our test set). These additional analyses suggest accelerated functional brain aging in individuals with clinical sAD and further confirm the validity of our brain age model.

Several limitations should be mentioned. These relate both to the model and to the cohorts used to test our hypotheses. First, our choice not to update or tweak the model after it was used to test our hypotheses (a main strength of our approach) left us with a few small errors when constructing the model (e.g., two PREVENT-ADAPOE ε4 carriers were included in the training set). While these oversights were unlikely to have affected the final results (APOE ε4 carriers from other cohorts without genotype data were presumably included in the training set), they nevertheless pose a small threat to the integrity of the model. Second, we also cannot exclude the possible influence of collinearity when determining the age predictive graph metrics. The SVM and the tree ensemble were however mostly in agreement, and it’s unlikely that multicollinearity would have had equal influence on these two very different algorithms. Also, while we made great efforts to increase the generalizability of our predictive model, most of the participants included in this study were Caucasian (see Supplementary Methods), stressing the need to increase diversity in both lifespan and AD cohorts. Functional brain age was also found to exceed chronological age in the PREVENT-AD cohort, while this was not the case in other sites/cohorts of similar ages. While it is tempting to interpret these results as resulting from the participants’ family history, we think it reflects largely a site effect. To minimize such possible site effects, we drew on data from a variety of cohorts and sites, validated the model on a completely independent validation set (new site) and applied similar processing methods to all data. No further harmonization procedure was applied. The site effects are inherently related to the different age composition of the sites (or cohorts, see Fig. 1), and thus harmonizing by sites would have removed the age difference between participants (see Supplementary Fig. 2 for an example of sites correction using ComBat; https://github.com/Jfortin1/ComBatHarmonization). While the possibility of site effects limits our ability for direct comparisons across cohorts, it cannot reasonably threaten the integrity of our main findings, which resulted from within-cohort comparisons. One obvious limitation of inference from the PREVENT-AD data compared to those from DIAN, is that we cannot know which participants will later develop AD dementia. The lack of evidence for accelerated brain aging in PREVENT-ADAPOE ε4 carriers (vs non-carriers) might reflect nothing more than the known fact that not all APOE ε4 carriers will develop AD dementia (i.e., are in the pre-clinical phase of the disease) while some non-carriers will develop the disease. The subsample of PREVENT-AD participants having Aβ pathology in the test set was also relatively small, which could likely limit inference.

In sum, using rs-fMRI graph metrics, we developed a model that can predict brain age across the whole human lifespan. Applying this model to predict brain aging in the context of pre-clinical AD revealed that the pre-symptomatic phase of ADAD is characterized by accelerated functional brain aging. Whether a similar relationship holds for pre-clinical sAD and by which underlying mechanisms AD accelerates brain aging will require further evaluation.

Methods

Cohorts and participants

Dominantly Inherited Alzheimer Network—DIAN

DIAN is a multisite longitudinal study71, which enrolls individuals age 18 and older who have a biological parent that carries a genetic mutation responsible for ADAD. They all underwent clinical and cognitive assessments, genetic testing, and imaging (magnetic resonance imaging [MRI] and amyloid-positron emission tomography [PET]). Data has been obtained after request and IRB approval (information can be found at dian.wustl.edu/our-research/observational-study/). Baseline data from cognitively unimpaired mutation carriers and non-carriers archived in the DIAN data freeze 10 (January 2009 to May 2016) were used in the present study. All selected individuals had a Clinical Dementia Rating (CDR)72 scale of 0. Baseline data from 280 cognitively unimpaired individuals (mutation carriers and non-carriers) aged between 18 and 69 years old, for whom structural MRI and rs-fMRI data were available, have been included.

Pre-symptomatic Evaluation of Experimental or Novel Treatments for Alzheimer’s Disease—PREVENT-AD

The PREVENT-AD (Douglas Mental Health University Institute, Montréal) is a monocentric longitudinal cohort73. Briefly, 399 cognitively unimpaired older individuals with a family history of sAD (at least one parent or multiple siblings) were enrolled between September 2011 and November 2017. Inclusion criteria included (i) being 60 or older; 55–59 for individuals who were less than 15 years from the age of their relative at symptom onset, (ii) being cognitively normal and (iii) no history of major neurological or psychiatric disease. Normal cognition was defined as CDR of 0 and a Montreal Cognitive Assessment (MoCA)74 ≥24. In the few cases of ambiguous results (3 participants having a CDR of 0.5 and 1 participant with a MoCA of 23 in the present sample), participants were further evaluated with a more extensive neuropsychological test battery, which was carefully reviewed by neuropsychologists and physicians to ensure normal cognition. Participants underwent clinical and cognitive examinations, blood tests, and MRI annually. Data from the present study were archived in the Data Release 5.0 and are partially available at https://openpreventad.loris.ca/. PET scans were acquired in a subset of participants between February 2017 and July 2019. Three hundred and fifty-three participants, aged 55–84, for whom baseline structural MRI and rs-fMRI were available were included in the present study.

Cambridge Centre for Ageing and Neuroscience—CamCAN

The Cambridge Centre for Ageing and Neuroscience (Cam-CAN; http://www.cam-can.org/) is a large-scale collaborative research project, launched in October 2010, using epidemiological, behavioral, and neuroimaging data to characterize age-related changes in cognition and brain structure and function, and to uncover the neurocognitive mechanisms that support healthy cognitive ageing75. In the present study, 648 individuals aged between 18 and 88, with structural MRI and rs-fMRI data were included.

Alzheimer’s Disease Neuroimaging Initiative—ADNI

ADNI data used in the preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu)76. ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial MRI, PET, other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of MCI and early AD. Considering the focus on pre-clinical AD, forty-nine cognitively unimpaired individuals with structural MRI and rs-fMRI data were included in the present study. An additional 106 (100 after quality control) individuals with MCI or dementia and structural MRI and rs-fMRI data were included in post hoc analyses to validate the model in cognitively impaired sAD individuals.

1000-Functional Connectomes Project (Cambridge site)—FCP-Cambridge

The 1000-Functional connectomes project (FCP) is a large initiative that gathers functional data from cognitively unimpaired adults recruited worldwide (33 sites) and makes it publicly available to facilitate discovery science of brain function (http://fcon_1000.projects.nitrc.org/fcpClassic/FcpTable.html)77. We used the large dataset from Cambridge-Buckner that includes 198 subjects between 18 and 30 years old, all collected at the Cambridge site ([FCP-Cambridge], PI: Buckner, R.L.).

International Consortium for Brain Mapping—ICBM

The ICBM dataset78 is publicly available as part of the 1000-FCP repository (see above; see also ref. 79 for details). The dataset is constituted of 86 cognitively unimpaired older adults from 19 to 95 years old who underwent structural MRI and rs-fMRI at the same site (Montreal Neurological Institute, Canada).

For the purpose of the brain age model, participants were divided into training, validation, and test sets. In order to reach the most accurate model of “healthy” brain aging from our data, cognitively unimpaired individuals from the different cohorts were assigned randomly to the training set, except when their genetic status was available (DIAN, PREVENT-AD, and ADNI), in which case only individuals with no genetic predisposition for AD were included in the training set; the remaining data (including individuals at increased risk of AD) was assigned to the test set. Thus, mutation non-carriers from DIAN (~80% of DIAN non-carriers, randomly selected) were assigned to the training set, along with ADNI APOE4 non-carriers, individuals from FCP-Cambridge, and ~80% of the cognitively unimpaired individuals selected randomly from CamCAN. While the PREVENT-AD cohort has an increased risk of sAD, a few individuals from this cohort (~10%) were assigned to the training set to expose the model to this site’s characteristics; these individuals were randomly selected from the subsample of APOE4non-carriers (with the exception of two APOE4 carriers who were included in the training set by mistake). ICBM was used as an independent sample of healthy individuals to assess the generalizability of the brain age model to other datasets (validation set). Finally, the test set included our population of interest (DIAN mutation carriers, most PREVENT-AD participants) and the remaining asymptomatic individuals from the other cohorts (remaining ~20% DIAN mutation non-carriers and CamCAN participants, along with ADNI APOE4 carriers).

Standard protocol approvals, registrations, and participants consents

All studies were approved by study sites’ respective regional ethics committees.

More specifically, DIAN study procedures were approved by the Washington University Human Research Protection Office and the local institutional review boards of the participating sites.

PREVENT-AD study was approved by the Research, Ethics and Compliance Committee of McGill University (Montréal, Canada).

The Ethics committees/institutional review boards that approved the ADNI study are: Albany Medical Center Committee on Research Involving Human Subjects Institutional Review Board, Boston University Medical Campus and Boston Medical Center Institutional Review Board, Butler Hospital Institutional Review Board, Cleveland Clinic Institutional Review Board, Columbia University Medical Center Institutional Review Board, Duke University Health System Institutional Review Board, Emory Institutional Review Board, Georgetown University Institutional Review Board, Health Sciences Institutional Review Board, Houston Methodist Institutional Review Board, Howard University Office of Regulatory Research Compliance, Icahn School of Medicine at Mount Sinai Program for the Protection of Human Subjects, Indiana University Institutional Review Board, Institutional Review Board of Baylor College of Medicine, Jewish General Hospital Research Ethics Board, Johns Hopkins Medicine Institutional Review Board, Lifespan—Rhode Island Hospital Institutional Review Board, Mayo Clinic Institutional Review Board, Mount Sinai Medical Center Institutional Review Board, Nathan Kline Institute for Psychiatric Research & Rockland Psychiatric Center Institutional Review Board, New York University Langone Medical Center School of Medicine Institutional Review Board, Northwestern University Institutional Review Board, Oregon Health and Science University Institutional Review Board, Partners Human Research Committee Research Ethics, Board Sunnybrook Health Sciences Centre, Roper St. Francis Healthcare Institutional Review Board, Rush University Medical Center Institutional Review Board, St. Joseph’s Phoenix Institutional Review Board, Stanford Institutional Review Board, The Ohio State University Institutional Review Board, University Hospitals Cleveland Medical Center Institutional Review Board, University of Alabama Office of the IRB, University of British Columbia Research Ethics Board, University of California Davis Institutional Review Board Administration, University of California Los Angeles Office of the Human Research Protection Program, University of California San Diego Human Research Protections Program, University of California San Francisco Human Research Protection Program, University of Iowa Institutional Review Board, University of Kansas Medical Center Human Subjects Committee, University of Kentucky Medical Institutional Review Board, University of Michigan Medical School Institutional Review Board, University of Pennsylvania Institutional Review Board, University of Pittsburgh Institutional Review Board, University of Rochester Research Subjects Review Board, University of South Florida Institutional Review Board, University of Southern, California Institutional Review Board, UT Southwestern Institution Review Board, VA Long Beach Healthcare System Institutional Review Board, Vanderbilt University Medical Center Institutional Review Board, Wake Forest School of Medicine Institutional Review Board, Washington University School of Medicine Institutional Review Board, Western Institutional Review Board, Western University Health Sciences Research Ethics Board, and Yale University Institutional Review Board.

The CamCAN study has been approved by the local ethics committee, Cambridgeshire 2 Research Ethics Committee.

For the 1000-Functional Connectomes Project (ICBM and FCP-Cambridge), each contributor’s respective ethics committee approved submission of deidentified data. The institutional review boards of NYU Langone Medical Center and New Jersey Medical School approved the receipt and dissemination of the data.

All participants gave written informed consent prior to participation.

MRI acquisition and processing

DIAN: DIAN imaging data was acquired at multiple sites on 3T scanners by applying ADNI parameters and procedures22,71. T1-weighted MRI (used for rs-fMRI processing) were acquired with the following parameters: repetition time (TR) = 2400 ms, echo time (TE) = 16 ms, flip angle = 8°, acquisition matrix = 256 × 256, voxel size = 1 × 1 × 1 mm. Eyes-open rs-fMRI images were acquired using the following parameters: TR = 2230 ms or 3000 ms; TE = 30 ms, flip angle = 80°, voxel-size = 3.3 × 3.3 × 3.3 mm, field of view (FOV) = 212, 140 volumes; acquisition lasting ~5 min or 7 min.

PREVENT-AD: MRI data were acquired on a 3T Magnetom Tim Trio (Siemens) scanner. T1-weighted images were obtained using a GRE sequence with the following parameters: TR = 2300 ms; TE = 2.98 ms; flip angle = 9°; matrix size = 256 × 256; voxel size = 1 × 1 × 1 mm; 176 slices. For resting state fMRI scans, two consecutive functional T2*-weighted scans were collected eyes-closed with a blood oxygenation level-dependent (BOLD) sensitive, single-shot echo planar sequence with the following parameters: TR = 2000ms; TE = 30 ms; flip angle = 90°; matrix size = 64 × 64; voxel size = 4 × 4 × 4 mm; 32 slices; 150 volumes, acquisition time = 5min45s. For consistency with the other cohorts that only had one run, only the first run was considered for each participant.

CamCAN: Images were acquired on a 3T Magnetom Tim Trio (Siemens). T1-weighted MRI were acquired using the following parameters: 3D MPRAGE GRAPPA = 2, TR = 2250 ms, TE = 2.99 ms, TI = 900 ms; flip angle = 9°; voxel-size 1 mm isotropic; FOV = 256 × 240 × 192 mm; acquisition time = 4 min 32 s. Rs-fMRI data were acquired eyes closed using a T2* GE EPI sequence with the following parameters: TR = 1970 ms; TE = 30 ms, flip angle = 78°; voxel-size = 3 × 3 × 4.44 mm, FOV = 192 × 192; 261 volumes of 32 axial slices 3.7 mm thick with a 0.74 mm gap, acquisition time = 8 min 40 s.

ADNI: Data were acquired at multiple sites, following the ADNI protocol80. Structural images were acquired using a 3D MPRAGE T1-weighted sequence with the following parameters: TR = 2300 ms; TE = 2.98 ms; TI = 900 ms; flip angle = 9°; voxel size=1.1 × 1.1 × 1.2 mm3; FOV = 256 × 240 mm2; 170 slices. The rs-fMRI images were obtained, eyes open, using a T2 weighted echo-planar imaging sequence with the following parameters: TR = 3000 ms; TE = 30 ms; flip angle = 80°; 48 slices of 3.3 mm; 140 volumes; acquisition lasting ~5 min.

FCP-Cambridge: Images were acquired using a Siemens 3T Trio scanner. High-resolution T1-weighted images were acquired as follows: MP-RAGE TR = 2200 ms, TE = 1.04–7.01 ms, flip angle = 7°, voxel size = 1.2 × 1.2 × 1.2 mm, FOV = 230 mm, 144 sagittal slices. Rs-fMRI were collected, eyes open, with the following parameters: EPI TR = 3000 ms, TE = 30 ms, flip angle = 85°, voxel size = 3 × 3 × 3 mm, FOV = 216 mm, 47 axial slices, 124 volumes, lasting ~6 min.

ICBM data was acquired on a Siemens Sonata 1.5 T MR scanner at the MNI. T1-weighted scan was acquired as follows: TR = 2200 ms, TE = 92 ms, flip angle = 30°, 256 × 256 matrix with a 1 × 1 mm2 resolution, 176 contiguous sagittal slices covering the whole-brain, slice thickness = 1 mm. Three rs-fMRI runs were acquired eyes-closed with the following parameters: 2D echoplanar BOLD MOSAIC sequence, TR = 2000 ms, TE = 50 ms, flip angle = 90°, 64 × 64 matrix with a 4 × 4 mm2 resolution, 23 contiguous axial slices covering the cortex but not the cerebellum, slice thickness = 4 mm, 138 volumes; each run lasting ~4 min 30 s. For consistency with the other cohorts that only had one run, only the first run was considered for each participant.

Rs-fMRI processing

In order to limit site effects, all functional images were processed in our laboratory (by APB) applying the exact same pipeline and processing steps. The NeuroImaging Analysis Kit version 0.12.4 (NIAK; http://niak.simexp-lab.org/) was used for rs-fMRI preprocessing, following the procedure applied in the previous publications54,70. Briefly, images underwent slice timing correction, and rigid-body motion parameters were estimated. T1-weighted images were linearly and non-linearly normalized to the MNI space. After coregistration to structural scans, functional images were normalized to the MNI space by applying parameters from the T1-weighted images and resampled to 2 mm isotropic. Slow time drifts, average white matter and cerebrospinal fluid signal and motion artifacts (first principal components of the six realignment parameters, and their squares) were regressed out from the rs-fMRI time series. Finally, fMRI volumes were smoothed with a 6 mm Gaussian kernel. Frame displacement was calculated and those exhibiting displacement >0.5 were removed (scrubbed), along with one adjacent frame prior, and two consecutive frames after81. Images with less than 40% of their original data after scrubbing were discarded (see Supplementary Table 1 for the percentage of frames retained in each cohorts).

Overall, 266 individuals (16 DIAN, 60 PREVENT-AD, 130 CamCAN, 4 ADNI, 1 FCP-Cambridge, and 39 ICBM, as well as 6 ADNI patients [included in secondary analyses]) were discarded due to failing preprocessing standards or having insufficient data after scrubbing.

Average BOLD signals were extracted from 272 regions corresponding to the Power and Petersen functional atlas82, to which key regions of the limbic system were added83. Regions labeled as “uncertain”, or with weak or non-existent signal in any one image were excluded from all images, resulting in 238 total regions (see Supplementary Table 2 for the total listing of the regions). For each subject, BOLD activity time series from these regions were used to construct a 238 × 238 Pearson correlation matrix, which was then Fisher’s Z-transformed.

Motion-related noise was further mitigated using the mean regression (MR) technique as outlined previously84. Briefly, the average of all correlation values within the upper diagonal of the correlation matrix was calculated for each subject in the training data. A linear fit between these across-subject average values and the across-subject value at each element of the correlation matrix was generated, creating a slope and intercept term associated with each element of the matrix. The final value used in each element of the correlation matrix was equal to the residual between the MR-model fit and the original correlation value. Importantly, the MR model was created with only the training data.

For each subject, 26 graph metrics, chosen based on their ability to quantify whole-brain connectivity, were extracted from the correlation matrix using the Brain Connectivity Toolbox (https://sites.google.com/site/bctnet/)36, in Matlab. Both weighted and unweighted metrics were calculated, if applicable. Graph metrics were chosen because they outperformed models trained directly on the weighted edges of the matrices. In the case of unweighted metrics, correlation matrices were thresholded at 5% link density, which ensured only the top 5% strongest correlation values were counted as connections in the matrix85. Only 5 out of the 26 metrics used binarized matrices and out of those 5, only one was retained in the final model (i.e., weighted modularity coefficient). One global value was extracted for each graph metric. In cases where a metric was outputted for each region (e.g., subgraph centrality), the median or median of log values was used as a global estimate. Small-worldness and resilience metrics, not included in the toolbox but both shown to be strong indicators of age, were calculated as previously determined (see Supplementary Methods for details)6. Briefly, small-worldness was calculated as the averaged clustering coefficient of the correlation matrix divided by the averaged clustering coefficient of a random network with same node-edge count, which was divided by the averaged efficiency of a random network divided by the averaged efficiency of the correlation matrix. In graph theory, resilience of network G is defined as the relative number of edges that must be removed for the network to lose property P, and is a measure of the network’s robustness to targeted or random attacks. Here, resilience is calculated as the slope of the log-log degree distribution. Subjects with any graph metric that was 5 standard deviations beyond the training set group mean was removed from the analysis entirely. A total of 15 individuals from the training set (1 DIAN mutation non-carrier, 11 CamCAN, 2 FCP-Cambridge, 1 ADNI), 1 from the validation set (ICBM) and 8 from the test set (1 DIAN mutation non-carriers, 3 DIAN mutation carriers, 1 PREVENT-AD, 3 CamCAN) were excluded.

Brain age model

The general procedure for iterating through different models included 5-fold cross-validation within the training data, and a second validation with an independent, external-site dataset. Models with the lowest error in predicting age on this validation set then served as candidates for the final model. Once the final model was determined, our hypotheses were then tested on the test set. Of importance, the test set was composed of unseen data that were not used to create, optimize, or validate the model. Neither the model nor the hypotheses were modified after the model was considered final and ready for hypothesis testing.

First, in order to reduce the number of inputs to the model, we searched for the graph metrics that were the most reliably predictive of age. To do so, the training set data was entered in a support vector machine (SVM) and a regression tree ensemble model to estimate which graph metrics were the most important to predict chronological age (i.e., highest weights). For the SVM model, the features (i.e., the 26 metrics) were standardized by subtracting the mean and dividing by the standard deviation of the training group. SVM was implemented with the fitrlinear function using a linear kernel, and Bayesian-optimized ridge regularization. For ensemble methods, feature standardization is not recommended, and thus the unstandardized 26 metrics were used as input. The fitrensemble function was used with Bayesian optimization of hyperparameters including the method (Bag or LSBoost), number of learning cycles, and the learning rate. In both models, chronological age was the response vector, and parameter optimization was determined with the minimum 5-fold cross-validation loss. Feature selection was not part of cross-validation. Graph metrics were then ranked separately by order of SVM weights and ensemble model importance (i.e., highest load corresponding to the most important). Importance in the ensemble model was determined using the predictorImportance function, which is equal to the sum of changes in mean squared error due to splits on every predictor, divided by the number of branch nodes. The average rank from both models was then used to determine the overall importance of each metric.

In a second step, we aimed at creating an accurate model requiring the fewest number of features possible. We used training data to generate a neural net model and assessed its accuracy using the validation set. More specifically, the neural network was optimized by (i) generating different models using the training set, each model varying in number of features used as input and network complexity, and (ii) applying each model to the validation set (independent dataset/site) to evaluate which one provided the better generalizability (i.e., avoid overfitting and give the better prediction on an independent set). Graph metrics in both training and validation sets were standardized by subtracting the training group mean and dividing by the training group standard deviation. Network models had 5–25 input features in increments of 5, entered according to their importance, as determined previously (see above). A null model was also tested by applying the same feature increment procedure but entering the graph metrics in a random order. Each graph metric was only entered once as input for each neural network architecture tested, and the inputs were kept constant across the model’s iterations; features of more complex models always included the features of the simpler ones. Architecture of the network was also tested with various number of hidden layers (1 or 2) and number of units in the hidden layers (2, 5, 7, or 10). Age was modeled on the training data using the fitnet function with Bayesian regularization backpropagation. Model accuracy was ultimately determined by the root mean squared error (rmse) between actual and predicted age on the validation data, with lower rmse reflecting higher accuracy. Because neural network units are initialized with random values, the rmse changed slightly each time model error was measured. Thus, the best model was determined by the lowest rmse, averaged over three iterations. Once the most accurate model was determined, it was applied on unseen data (test set).

Additional measures in DIAN and PREVENT-AD samples (test set)

Genetics

DIAN genotyping was performed by the DIAN Genetics Core at Washington University22. The presence or absence of ADAD mutation was determined using PCR-based amplification of the appropriate exon followed by Sanger sequencing. APOE genotype was determined using an ABI predesigned real-time Taqman assay (C___3084793_20 and C____904973_10 for rs429358 and rs7412 variants, respectively).

APOE genotype in PREVENT-AD was determined using the PyroMark Q96 pyrosequencer (Qiagen, Toronto, Canada) and the following primers: rs429358_amplification_forward 5′-ACGGCTGTCCAAGGAGCTG-3′, rs429358_amplification_reverse_biotinylated 5′-CACCTCGCCGCGGTACTG-3′, rs429358_sequencing 5′-CGGACATGGAGGACG-3′, rs7412_amplification_forward 5′-CTCCGCGATGCCGATGAC-3′, rs7412_amplification_reverse_biotinylated 5′-CCCCGGCCTGGTACACTG-3′ and rs7412_sequencing 5′-CGATGACCTGCAGAAG-3′.

The full list of primers is provided in Supplementary Tables 3 and 4.

PET acquisition and processing

In DIAN, 28 mutation non-carriers and 117 mutation carriers from the test set had an Aβ-PET scans available at baseline. Aβ-PET scans were acquired in different centers, following ADNI protocol86. Briefly, participants were injected intravenously with 8 mCi to 18 mCi of 11C-PIB. Part of the participants underwent a full dynamic acquisition of 70 min, starting at the time of injection. The remaining part of the sample underwent a 30-min scan after a rest period of 40 min. A standard brain transmission scan (or computed tomography [CT] transmission scan for PET/CT scanners) was obtained for attenuation correction. Aβ-PET data was motion corrected and registered to their MRI87. Standardized uptake value ratio (SUVR) were calculated using the cerebellar gray matter as a reference and a global measure of Aβ burden was calculated by averaging SUVRs from the prefrontal cortex, temporal lobe, gyrus rectus, and precuneus of the Desikan-Killiany atlas88. A threshold of 1.31 was used to determine Aβ-positivity87.

In the PREVENT-AD cohort, Aβ-PET scans were performed at the MNI (Montréal, Canada) on a Siemens HRRT. Sixty-four individuals from the test set underwent this examination, at a mean of 10.30 ± 5.63 months from their closest MRI session and 43.10 ± 17.95 months after their baseline session. A 30-min acquisition scan started 40 min after intravenous injection of ~5.4 mCi of 18F-NAV4694. Transmission scans were acquired for attenuation correction. Data were processed using a standard pipeline (see ref. 38 and https://github.com/villeneuvelab/vlpp for details). A global index of neocortical Aβ burden was derived by extracting, in native space, the mean standardized uptake value ratio (SUVR) of the frontal, temporal, parietal, and posterior cingulate cortex of the Desikan-Killiany atlas88, using the cerebellum grey matter as reference region. A threshold for positivity was determined using Gaussian Mixture modeling38 and scans with global neocortical Aβ burden ≥1.39 were considered positive.

Estimated years to onset

Estimated years from expected symptom onset (EYO) was calculated in each cohort taking the parental age at onset as a reference (see Supplementary Methods for details).

Statistical analyses on the predicted age difference (test set)

To analyze the specificities of brain aging in the context of pre-clinical AD, we calculated the predicted age difference for DIAN and PREVENT-AD participants in the test set, as previously detailed3, by subtracting the actual chronological age from the predicted brain age (output from the model). We were particularly interested in the influence of the genes involved in AD, which are either responsible for ADAD or increase the risk of sAD. We compared, in the test set, the predicted age difference (i.e., PAD) between (1) mutation non-carriers and mutation carriers from DIAN, and (2) APOE4 carriers vs non-carriers in the PREVENT-AD. We were also interested to further understand the influence of Aβ accumulation on functional brain aging in asymptomatic individuals. To do so we assessed the effect of Aβ deposition, measured by PET, on the PAD in both the DIAN and PREVENT-AD cohorts, both by comparing Aβ-positive and Aβ-negative individuals (dichotomous variable) and by assessing the correlation between PAD and Aβ load (continuous variable). All analyses were controlled for chronological age14.

Exploratory analyses were conducted to assess the correlation between the PAD and estimated years to onset (EYO). Finally, we validated that our model was capturing advanced brain aging in sAD patients with cognitive impairment by comparing our cognitively unimpaired ADNI participants (15 APOE4 carriers from the test set) to a subset of 100 ADNI participants with MCI or dementia using a general linear model (one-tailed test), controlling for chronological age. Considering the small sample size in the control group, analyses were also replicated using nonparametric test (Mann–Whitney).

Analyses were conducted using Statistical Package for the Social Sciences (SPSS), and statistical significance was set at p < 0.05.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.