Introduction

Depressive disorders affect an estimated 322 million people globally and are a leading cause of disability, mortality, and economic disparity1,2. Current treatment options are considered suboptimal and a more complete understanding of etiology, as well as the identification of effective interventional strategies, are urgently needed3. A promising novel development in this area pertains to the potential role of the gut microbiome, i.e., the diverse microbial communities living in the gut and their genetic material4. Research has demonstrated that the composition of the intestinal microbiota may impact cognition and affect through multiple pathways, collectively known as the gut-brain axis4. These novel insights have fueled the idea that modification of microbial ecology may provide new options for the treatment and prevention of depression5.

At present, much of the supporting evidence still takes the form of extrapolations from non-human research, whereas the human data remains sparse and are mostly limited to small-scale studies4 that yield inconsistent findings6,7,8,9. While disappointing, such inconsistency might be expected in light of the complex composition of the gut microbiota, which is shaped by hundreds of bacterial species that exhibit a marked diversity across groups and individuals10,11,12,13,14,15,16. This complexity only adds to the similarly multi-determined and heterogeneous nature of depression17. Notably missing from the literature, therefore, are adequately powered studies in well-characterized populations that would allow more rigorous analyses of individual differences.

Two large-scale population studies have been published that would appear suitable to address the above issues12,13. The LifeLines Study (N = 1135) showed that depression (based on self-reported diagnosis) is significantly associated with β-diversity, indicating that depressed individuals have a microbiota composition that is distinguishable from those without depression12. The Flemish Gut Flora Project (N = 1068), in which a diagnosis of depression was obtained from physician records13, replicates this association while adjusting for age, sex, BMI, and gastro-intestinal parameters. Further, after excluding participants on antidepressive medication and cross-validation in a separate cohort (i.e., the aforementioned LifeLines Study)13, the study identified two genera (Dialister and Coprococcus), both belonging to the phylum Firmicutes, that were less abundant among those depressed. Whilst lending credibility to the idea of links between depression and the gut microbiome, both studies applied sparse confounder adjustments, e.g., related to lifestyle and health. Thus, uncertainties remain as to the exact interpretation of the microbiome-depression associations, which limits further progress towards diagnostic and clinical applications4.

An additional issue is that prior epidemiological associations are established in ethnically homogenous populations of North-European ancestry12,13. Demographic factors probably represent the largest source of individual variation in the gut microbiome11,12,14,15,18. For example, analyses of a large epidemiological survey (Healthy Life in an Urban Setting study, HELIUS) showed that ethnicity explained far more of the differences in gut microbiota than any of the other measures collected, which included other demographic factors (e.g., age, sex), lifestyle factors, and medical information14. It is unknown to what extent microbiome-depression associations generalize across ethic groups and this, too, limits interpretation, especially when considering the parallel and substantial ethnic disparities in depression19.

In light of the preceding discussion, the present study investigated associations between gut microbiota and depressive symptom levels in a large (N = 3021) multi-ethnic cohort (the HELIUS study), comprised of six ethnic groups living in the same urban geographic area14,20,21. The primary aim was to identify which taxonomic features of the gut microbiota are linked to depressive symptom levels, while adjusting for possible confounding by demographic, lifestyle, and medical factors. For most individuals depression is transient, with a median duration of three to six months. Auxiliary analyses therefore also took pre-existing markers of depression risk into account, as these may provide a window on the temporal specificity of associations between the microbiota and current symptom levels; these included prior depressive episodes, parental history of depression, and the personality trait neuroticism (a generic risk marker for psychopathology)22. The second aim was to determine if microbiota-depression associations generalize across ethnic groups. Such generalizability would greatly broaden the potential applicability of microbiome-based diagnostics and interventions23. Finally, the present study aimed to assess if ethnic differences in gut microbiota may account for ethnic disparities in depression. A parallel study24 provides a large-scale epidemiological investigation of the relation between fecal microbiota and depressive symptoms among subjects of European ancestry, cross-validating data from the Amsterdam HELIUS cohort and the Rotterdam Study cohort.

Results

After applying exclusion criteria (see description in “Methods” section) and accounting for occasional missing data, a total of between N = 3211 (Regression Model 1) and N = 3088 (Regression Model 3) participants were available for analyses. Table 1 provides summary data of the study sample and the main covariates.

Table 1 Summary data of the study sample

Αlpha-diversity predicts depressive symptoms

As shown in Table 2, the Shannon Index predicted PHQ-9 depressive symptom scores in linear regression analyses. Inclusion of demographic covariates (Model 1: age, sex, ethnicity, education) substantially attenuated the association between the Shannon Index and the PHQ-9 sum scores (standardized β = –0.0738, p < 0.001) while improving the overall model fit (ΔR2 = 0.0597, p < 0.001; total R2 = 0.0736). Ethnicity had by far the largest contribution to this model fit: after adjustment for sex and age the contribution of ethnicity was ΔR2 = 0.0431 (p < 0.001), with a modest additional impact of education (ΔR2 = 0.0015, p = 0.024). After sequentially adding lifestyle factors (Model 2: ΔR2 = 0.0087, p < 0.001) and medical variables (Model 3; ΔR2 = 0.0267), the Shannon index continued to predict depressive symptom scores (standardized β-= –0.0597, p = 0.001 and –0.0422, p = 0.023, respectively). No significant ethnicity by alpha-diversity interaction was detected in any of the three models (Model 1; p = 0.232; Model 2; p = 0.134; Model 3; p = 0.325), indicating that the association between alpha diversity and depressive symptoms did not differ across ethnic groups. Also, when results were stratified per ethnic group, the I2 consistently approximated zero (see Supplementary Fig. 1). Repeating the above analyses for the Simpson index yielded comparable results (see Supplementary Table 1).

Table 2 Results of linear regression models with depressive symptom scores as the dependent variable

To estimate the specificity of the above associations, analyses were repeated while adjusting for parental history of depression, number of prior depressive episodes, and neuroticism. α-diversity no longer significantly predicted depressive symptoms after adjustment for neuroticism. Conversely, α-diversity was significantly associated with neuroticism after adjustment for depressive symptoms in all 3 regression models, indicating that the neuroticism was the stronger predictor. Parental history and the number of prior depressive episodes only minimally attenuated the associations with depressive symptoms (Model 3, standardized β > –0.0384, p < 0.033).

Table 3 additionally presents the results of linear regression analyses using α-diversity (Shannon) as the outcome, i.e., reversing X and Y. The fully adjusted model (Model 3) explained approximately 18% of variance in α-diversity, which was mostly attributed to ethnicity (ΔR2 = 0.1143, p < 0.001, after inclusion of age and sex), whereby PHQ-9 scores remained a significant predictor of the Shannon index (see Table 3).

Table 3 Results of linear regression models with Shannon index as the dependent variable

Βeta-diversity predicts depressive symptoms

The principal coordinates (Principal Coordinate Analyses, PCoA) derived from Bray-Curtis dissimilarity or weighted UniFrac distance matrices were entered as predictors in linear regression (with PHQ-9 sum scores as the dependent variable). Forward selection of the first 20 coordinates yielded 6 coordinates that compiled information predictive of depressive symptom scores, and these coordinates were used in subsequent regression analyses. Among these coordinates was PCoA #2, which predicted 6.50% (Bray-Curtis) and 9.73% (Weighted UniFrac) in microbiome composition. Notably, the multidimensional information compiled in this principal coordinate demonstrated a high correlation (r = 0.83) with the Shannon index, indicating that within this statistical approach (and contrary to how α-diversity is typically conceptualized) α-diversity is integral to β-diversity (see also Supplementary Fig. 2C).

Figure 1A shows that the 6 principal coordinates jointly explained between 1.5% (ΔR2 Model 1) and 0.5% (ΔR2 Model 3) of the variance in depression scores. The results presented in Fig. 1B further revealed that fecal microbial composition explained between 28% (Model 1) and 18% (Model 3) of the ethnic differences in depression symptom scores. The β-diversity coordinates still significantly predicted depressive symptoms after adjustment for parental history of depression, prior depressive episodes, or neuroticism (all analyses (ΔR2 > 0.0036. p < 0.002). Replicating these analyses using weighted UniFrac distances (instead of Bray-Curtis dissimilarity) yielded equivalent results. The source data of Fig. 1 can be found in Supplementary Data file 1.

Fig. 1: Beta-diversity is linked with ethnic differences in depressive symptom scores.
figure 1

A Beta-diversity predicting PHQ9 depression. It presents results of linear regression analyses that model β-diversity as a predictor of depressive symptom levels. Panel (A) horizontal bars present ΔR2 after progressive adjustments for confounders (models 1a to 3), and respectively without and with ethnicity included in each regression model. B Ethnicity predicting PHQ9 depression. It presents results of linear regression in which β-diversity is modeled as a mediator of the association between ethnicity and depressive symptom levels (see lower figure). Bars present ΔR2 the prediction of PHQ9 by ethnicity after progressive adjustments (models 1a to 3). Blue bars present ΔR2 without β diversity incorporated as a mediator in the model, and the orange bars present ΔR2 when mediation is assumed. The % in the table (right) indicate the attenuation of the direct effect by mediation. Regression models: We used two-sided linear regression analyses, no adjustments were made for multiple comparisons. Model 1a adjusted for age and gender; Model 1b added education; Model 2 further added behavioral factors (alcohol, smoking, exercise, BMI); Model 3 added GI disease, Diabetes, PPI use, Recent antibiotics, Diarrhea. All ΔR2 p ≤ 0.001, except for ethnicity-inclusive Model 3 (p = 0.023).

Most taxa associated with depressive symptoms are Firmicutes

Shown in Fig. 2, out of 416 non-trivial ASVs, 117 showed a significant unadjusted correlations with PHQ-9 scores (Rho, FDR < 0.05), with most (99 ASVs) showing a negative correlation (indicating a relative depletion). The source data for Table 2 is provided in the supplement (Supplementary Data file 3), which shows a subset of data from Supplementary Data file 3 (Supplementary Data file 3 is presented in table format in Supplementary Data files 4, 5). Approximately 65% identified the phylum Firmicutes (76 ASVs). To circumvent excessive multiple testing, only significant associations obtained in unadjusted analyses (FDR corrected) were further analyzed in subsequent Models 1–3 (using rank-transformed dependent Y). Figure 2 shows that 70 taxa remained significantly associated with PHQ-9 scores after adjustment for age, gender and ethnicity. The vast majority (60 ASVs) of these belonged to the phylum Firmicutes, with a prominent presence of the genus Christensenellaceae (group R7) and various genera within the families Lachnospiraceae (e.g., Blautia, Lachnospiraceae NK4A136, Marvinbryantia, Roseburia) and Ruminococcaceae (e.g., Oscillibacter, Ruminicoccus 1, Ruminococcaceae NK4A214 group, Ruminococcaceae UCG-005). Less prominent phyla included Bacteroidetes (e.g., genus Bacteroides) and Proteobacteria (genus Desulfovibrio and Escherichia/Shigella). Further adjustment for behavioral and medical variables (models 2 and 3) reduced the number of significant associations, yielding respectively 48 and 23 taxa that remained significantly associated with depression scores (see Fig. 2 and V).

Fig. 2: Selection of ASVs (rows) that were significantly associated with depressive symptom levels in unadjusted analyses (Model 0) and results of subsequent adjusted analyses (Models 1–3).
figure 2

Bars indicate effect size (standardized regression coefficient). Green bars indicating a positive and Red bars indicating a negative association (plotted range 0.10 ≥ β ≥ –0.10). Checkmark indicate p < 0.05. The column “Core” highlights ASVs with >75% overall prevalence in the sample population (indicated by green check mark).

The Supplementary Fig. 2 (panel A and B) provide a focused overview of correlations between PHQ-9 scores and individual ASVs (simultaneously plotted against correlations with alpha-diversity on the y-axis). Notable from these supplementary Figures, as well as from Fig. 2 (also see the corresponding source data), is that occasionally ASVs within the same genus showed opposite associations with depressive symptom scores, e.g., Blautia, Bacteroides, and Oscillospira (note that ASVs that the Greengenes database allocates to the single genus Oscillospira are attributed to multiple genera in the Sylva database, see the discussion). Whereas for other genera a more consistent pattern of associations was observed (e.g., Christensenellaceae, Desulvofibrio, Streptococcus).

Supplementary Data 4, 5 (and the corresponding Supplementary Data source data file 3) provide a heatmap depicting the correlations between individual ASVs, depressed mood and relevant depression risk factors and covariates. Heatmap inspection revealed that taxa that showed a strong correlation with depressive symptoms also tended to exhibit stronger correlations with selected covariates as well as with markers of alpha-diversity. As a further visualization, Supplementary Fig. 3 shows several examples of such associations in pair-wise scatterplots (these are based on the same source data files as Supplementary Data files 34, 5).

Associations are mostly invariant across ethnic groups

Applying each of the 3 regression models, only a small proportion of ASV’s (<6%) exhibited a significant ethnicity by ASV interaction (unadjusted for multiple testing), which thus approximated an expected Type 1 error rate. Ethnicity-stratified analysis of the age and gender-adjusted associations showed that most standardized regression coefficients (81%; N = 337) had a I2 below 30% and only 15 correlations (3.6% of total) showed a substantial ethnic heterogeneity (I2 > 50%) (see Supplementary Data files 4, 5 and corresponding source data Supplementary data file 3).

Core microbiota similarly associate with depressive symptoms

It is proposed that ‘core taxa’, i.e., bacteria with a near ubiquitous presence, may exhibit a stronger relevance to health25. Hence, auxiliary analyses compared the results obtained for all ASV to the results obtained for a core subset of taxonomic units (defined as ASVs with a ≥75% prevalence across ethnic groups). Because these highly prevalent core taxa are minimally zero-inflated, these comparisons additionally function as sensitivity analyses for zero-inflation bias. Comparisons between core and non-core taxa revealed no differences with regard to the proportion of significant associations with depression, the average or distribution of effect sizes, robustness to covariate adjustments, or ethnic heterogeneity (I2) in the associations with depression.

Discussion

A primary aim of the current study was to identify which taxonomic features of the gut microbiota are linked to depressive symptom levels. This investigation involved the largest study cohort to date examining microbiome-depression associations, and is the first to study ethnicity as a potentially relevant factor in this association. Consistent associations between the gut microbiota and depressive symptom levels were confirmed at multiple levels of analysis, ranging from global parameters of microbiota diversity (i.e., α-diversity, β-diversity) to the relative abundances of specific taxa. These associations withstood adjustment for a broad range of sociodemographic, behavioral, and medical covariates. Analyses further revealed that these associations were largely invariant across ethnic groups, notwithstanding the substantial ethnic differences in both depressive symptom levels and composition of the gut microbiota14,18,19. Moreover, ethnic disparities in depressive symptom levels were partly explained by between-subject differences in microbiota composition (i.e., β-diversity)19.

Inspection of (mutually adjusted) regression coefficients revealed α-diversity predicted depressive symptoms with effect sizes comparable to several other established risk factors of depression, such as alcohol consumption, exercise, smoking, and BMI26. Conversely, the ability of depressive symptoms to statistically predict α-diversity was in the same range as being diagnosed with, for example, diabetes or a GI disorder11,12,13,14,15,16. By implication, then, these analyses suggest that conditions and interventions that influence the gut microbiome may have the potential to impact well-being on a population-level.

A notable finding was that both Bray-Curtis and weighted UniFrac Principal Component #2 shared substantial variance with α-diversity (Shannon). This finding indicates that α-diversity (a measure of within-subject microbial diversity) also meaningfully characterized between-subject diversity (beta-diversity); in other words, taxa that correlate highly with α-diversity are unevenly distributed across individuals. This is a pertinent observation because exactly these taxa also tended to correlate with depressive symptom scores, as well as established risk factors of depression (e.g., BMI, inflammation, diabetes)27,28,29. The latter replicates prior findings12,30,31,32. Taken together, then, these results are consistent with the idea of α-diversity as a generic biomarker of health and vulnerability33,34 (including depression), as well as with the notion of a common set of bacteria that tend to non-specifically respond to disease and poor health35.

The association of α-diversity with depression dissolved after adjustment for the personality trait neuroticism, which is a constitutional and generic risk factor for common mental disorders, including depression22. This dominant effect of neuroticism might help clarify the observation that disruptions in the gut microbiome have been associated with a rather broad range of psychological disorders without concomitant evidence of taxonomic specificity (i.e., whereby specific taxa differentiate specific disorders)4,8,36. Of note, the other principal components of β-diversity appeared impervious to adjustment by neuroticism, and these might thus identify the more depression-specific features of microbiota composition.

Unadjusted analyses of relative abundances initially yielded 117 ASVs (identifying 59 genera, mostly belonging to the phylum Firmicutes) that correlated with depressive symptom scores. Significantly, many of those taxa have also been linked to other domains of health32,35,37,38,39,40,41, including health factors associated with increased depression risk (e.g., BMI). A prominent example was the genus Christensenellaceae (R-7 group), which likewise showed a negative association with depressive symptoms in the Rotterdam study cohort24. Further, and replicating prior research, Christensenellaceae abundances were additionally correlated with lower BMI and relatively depleted in the presence of diabetes and gastro-intestinal diseases42. These links with medical outcomes may therefore explain why most associations with Christensenellaceae became nonsignificant after adjustment for the medical covariates.

After full adjustment (i.e., Model 3), 23 ASVs identifying at least 15 genera remained significantly associated with depression scores. These included a negative association with the abundant genus Coprococcus (designated GCA-900066575 in the Silva database), hereby confirming results from two independent population cohorts13 and the Rotterdam study24. This genus harbors many butyrate-producing species and has been ascribed anti-inflammatory properties43, both of which have been (inversely) linked to depression. Our analyses showed a positive associations with the genus Dialister13, which ASV we could map onto the oral pathogen D. invisus44. This observation is in step with data showing that poor oral health is a correlate of depression45,46 and a possible upstream determinant of the gut microbiota47.

Other bacteria relatively depleted in relation to depressive symptoms were the Bacteriodetes genus Bacteroides, Ruminococcaea UCG005, Ruminococcus 1, Peptococcus, Holdemanella (sp. H. biformis), various genera in the family Lachnospiraceae, e.g., Lachnospiraceae groups FCS020 and NK4A136, Marvinbryantia (sp. M. formatexigens), Blautia (among which the species B. obeum, which was until recently classified under the genus Ruminicoccus and exhibits overlapping physiological characteristics48), Roseburia (sp. R. inulinivorans), and the Proteobacteria genus Desulfovibrio. Simultaneously, five ASVs were enriched in those with high symptom levels. These included the Blautia species B. caecimuris and B. producta, the genera Lachnoclostridium and Oscilibacter, and the aforementioned Dialister invisus. Overall, a majority of associations were within the phylum Firmicutes. Random forest analyses, using the Rotterdam Study as training cohort and HELIUS as the testing cohort24, replicated associations for several including Ruminococcaceae UCG005, Coprococcus, Lachnoclostridium, Eggerthella, Sellimonas, Roseburia, Bacteroides, Blautia, Veillonella, and Desulfovibrio, of which several were also retained in their adjusted analyses24.

Unexpectedly, ASVs identifying the genus Bifidobacterium showed a positive correlation with depression36, including the abundant species B. longum (greengenes database) that has been tested in multiple probiotic studies for its potential to enhance mood49,50,51. In parallel, we found that B. longum was negatively associated with α-diversity, which replicates observations from another large cohort12, while simultaneously showing the (expected) negative associations with some markers of poor health32. The Bifidobacterium genus is highly diverse52 and it therefore conceivable that the previously reported beneficial mood effects of supplementation are highly strain-specific. Probiotic trials using this genus have yielded inconsistent results whereby a majority have failed to establish beneficial mood effects49,50,51. The results presented here may help identify other candidates for psychobiotic interventions.

The present results suggested that the custom of analysing bacteria at an aggregate level (e.g., genus, OTU) may be a potential cause of inconsistent findings in microbiome-depression studies53. It may thus also clarify some variance with the study of Radjabzadeh et al.24, which used closed reference OTU clustering instead of ASVs. For example, we observed that ASVs identifying the genus Blautia, Bacteroides, or Oscillospira exhibited both significant positive and significant negative correlations with depressive symptom levels. Such potentially biologically meaningful associations may average out when aggregated on a genus level. For example, heterogeneous associations within the same genus may reflect interspecific competition as ecological competition is known to be especially fierce within the same genus54, i.e., indicating that depression is associated with a gut environment conducive to some species while disadvantageous to others55,56. Imprecisions in taxonomic allocation could likewise account for heterogeneous associations57,58. For example, the notably heterogeneous pattern of associations between depression and 10 ASVs that Greengenes maps onto the genus Oscillospira disappeared when utilizing the Silva database (which assigned these 10 ASVs to 8 different genera). Together these observations align with the burgeoning view that ASVs are preferred as the standard unit of marker-gene analysis and reporting53,59.

In closing several strengths and limitations warrant mentioning. The present analyses pertained to ethnic groups living in the same urban area, hereby preventing confounding by geographical effects15. While ethnic differences in the gut microbiota may involve both genetic and environmental factors, the balance of evidence seems to indicate that the latter may dominate60,61,62. More fine-grained analyses, e.g., comparing 1st with 2nd generation immigrants, or comparing the history of local acculturation among 1st generation migrants, may further identify the specific role of environmental exposures. Another advancement is that the present study applied an unprecedented confounder adjustment. Future studies may still consider additional explanatory factors (e.g., diet63,64). We may add that the attenuation of effect sizes with progressive covariate adjustments should not be taken as indicative of spurious associations, since some covariates may be on the causal pathway65. We note that rating instruments like the PHQ-9 do not provide a clinical diagnosis of depression, although these assessment approaches tend to be highly correlated66. The use of a continuous symptom-score is more in step with contemporary views of depression as a continuum67. Depression is a heterogenous construct with different subtypes based on symptom profile. This aspect warrants further research in light of evidence that such subtypes may exhibit distinct biological profiles68,69.

Three additional statistical considerations warrant mentioning also: a strength is that key analyses were cross-validated using different methods, e.g., utilizing multiple parameters of alpha-diversity and beta-diversity, the comparative use of two data bases for taxonomic allocation, and performing both meta-analysis and GLM to determine ethnic heterogeneity. The fact that these different approaches yielded a comparable pattern of results supported the robustness of the present findings. Further, in the context of p-value testing and small effect sizes, the observation that one study or subsample shows a significant association and the other does not, cannot immediately be taken as evidence of a non-replication or inconsistency (e.g., see Radjabzadeh et al.24 as well as Supplementary Fig. 1 in the current paper), but may reflect normal between-sample variation and Type 2 error. Large-scale aggregate analyses of multiple cohorts may be a relevant approach therefore. Finally, although depression symptoms were modeled as the outcome variable in most analyses, causal inferences obviously remain speculative at this point.

In summary, analyses of a large and ethnically diverse population demonstrated robust associations between the gut microbiota and depressive symptoms. These associations were largely invariant across ethnic groups and withstood adjustment for a uniquely large set of relevant confounders, which included demographic, behavioral, and medical factors. The study findings identified potential targets for psychobiotic interventions that warrant further investigation, and may positively impact depression and well-being at an individual or population level.

Methods

Procedures and participants

The HELIUS (Healthy Life in an Urban Setting) study is a multi-ethnic cohort study among citizens of Amsterdam, The Netherlands20,21. The city proper is a moderately sized area (219.49 km²) with approximately 900,000 inhabitants, and is the national capitol. The full study protocol is described elsewhere20,21. In short, participants aged 18–70 years were randomly sampled, stratified by ethnic origin, through the municipal registry of Amsterdam (participation N = 24,789, response rate 28%). Data were collected through physical examination and by questionnaire, which was either self-administered or collected by interview using an ethnically matched interviewer. The HELIUS study was complied with all relevant ethical regulations and in accordance with the Declaration of Helsinki (6th, 7th revisions). Written informed consent was obtained from all participants prior to inclusion. The study was approved by the Institutional Review Board of the Amsterdam University Medical Centers, location AMC.

At the time of the present analyses, fecal 16 S rRNA data were available for a total of 3.343 participants belonging to 8 ethnic groups. Because of small numbers, those identifying as Indonesian-Surinamese background (N = 46) and “another or unknown ethnicity” (N = 63) were excluded. Applying these criteria, and excluding those without data on depressive symptoms (PHQ-9, see below; N = 93), yielded the following 6 ethnic groups; Dutch (N = 769), African Surinamese (N = 767), South-Asian Surinamese (N = 527), Turkish (N = 349), Moroccan (N = 473), and Ghanaian (N = 458). Ethnic groups were classified on the basis of migratory background14,20,21. Accordingly, a person was considered to be of non-Dutch ethnicity when meeting one of the following two criteria: (1) born outside the Netherlands and at least one parent born outside the Netherlands (i.e., first generation), or; (2) born in the Netherlands with both parents born outside the Netherlands (second generation). For participants with a Surinamese ethnicity further subgroups were identified according to self-described ethnic origin14,20. For the Dutch sample, we only invited people who were born in the Netherlands and whose parents were born in the Netherlands.

Depressive symptoms, sociodemographic, behavioral, and medical variables

Depressive symptoms were recorded using the 9-item Patient Health Questionnaire-9 (PHQ-9)66,70. The PHQ-9 boasts good psychometric properties and has been shown to measure the same concept (i.e., is invariant) across all six ethnic groups included in this study19,66,71. Each of the PHQ-9 items evaluates the presence of one of the nine DSM-IV symptom criteria experienced during the past 2 weeks, utilizing a four-point Likert-scale (not at all - almost every day). The severity of depressed mood was assessed by the sum score (ranging between 0 and 27). In case of a single missing item the mean score of the remaining items was used to replace the missing item; with >1 missing items the entire PHQ-9 was considered missing18.

Data on sociodemographic, behavioral, and medical variables were collected by self-report or physical examination20. Demographic data included ethnicity, sex, age, educational level. The latter comprised 4 categories ranging between ‘elementary education or less’ and ‘advanced vocational or university education’ (i.e., BA/BSc or higher). Analyses of behavioral factors focused on physical activity (i.e., minutes per week times the intensity activity of each minute-activity; as categorized in 3 METs groups as based on the compendium of Ainsworth)72, smoking (yes/no), alcohol (alcohol consumption and alcohol-related problem behaviors, as assessed by the 10-item Alcohol Use Disorders Identification Test73), and body mass index (BMI). Medical covariates included a (self-reported) diagnosis of a gastro-intestinal disorder and diabetes. The latter was established using a Boolean algorithm whereby caseness was classified as a self-reported clinical diagnosis, or increased fasting glucose (≥7 mmol/l), or increased HbA1c (≥48 mmol), or the use of glucose-lowering medication. Medication and supplement intake were also recorded and included the use of proton pump inhibitors, antidepressants, and use of antibiotics (past 2 weeks), as well as use of probiotics. Participants also reported symptoms of diarrhea experienced over the past week. Data on inflammatory activity (plasma C-reactive protein) were available for a subset of participants (N = 975) and only used in auxiliary analyses. Approximately 26% of questionnaires was filled out with support of an interviewer-assistant at the assessment center. This entailed that an ethnically and language-matched person was present at the assessment center to provide clarification due to language or reading problems. These assistants were trained to ensure standardization. Follow-up studies yielded no response differences by either administration modus (paper-pencil, digital, assistant supported) or ethnicity on the main outcome variable (GHQ-9)19. Further, analyses showed minimal ethnic and socio-economic differences between participants and those who declined or did not respond21.

Stool sample collection

Participants were given a stool collection tube and a safety bag (for transport) either through mail before assessment center visit or at the end of the visit, as preferred. They were asked to bring a ‘fresh’ stool sample to the assessment center within 6 h after collection. If not possible, participants were instructed to keep the stool sample in their home freezer overnight and to bring it in frozen to the research location the next morning. All samples were immediately frozen at −20 °C at each assessment center, and transported within 1 to 4 weeks to the University medical Center and frozen at −80 °C until processing. The time period each sample was stored locally at −20 °C was not logged. During the physical examination, asked if (1) they used probiotics (frequency, type), (2) used antibiotics in the past three months or two weeks, (3) had experienced diarrhea in the past week. Standardized procedures were used at the collection sites using written SOPs and training of research personnel. Quality checks on the staff/procedures were done at regular intervals during the data collection period.

Bioinformatics

Fecal microbiota composition was profiled by sequencing the V4 region of the 16S rRNA gene on an Illumina MiSeq instrument (llumina RTA v1.17.28; MCS v2.5) with 515F and 806R primers designed for dual indexing74 and the V2 Illumina kit (2 × 250 bp paired-end reads)14. 16S rRNA genes from each sample were amplified in duplicate reactions in volumes of 25 μl containing 1x Five Prime Hot Master Mix (5 PRIME GmbH), 200 nM of each primer, 0.4 mg/ml BSA, 5% DMSO, and 20 ng of genomic DNA. PCR was carried out under the following conditions: initial denaturation for 3 min at 94 °C, followed by 25 cycles of denaturation for 45 s at 94 °C, annealing for 60 s at 52 °C and elongation for 90 s at 72 °C, and a final elongation step for 10 min at 72 °C. Duplicates were combined, purified with the NucleoSpin Gel and PCR Clean-up kit (Macherey-Nagel) and quantified using the Quant-iT PicoGreen dsDNA kit (Invitrogen). Purified PCR products were diluted to 10 ng/μl and pooled in equal amounts. The pooled amplicons were purified again using Ampure magnetic purification beads (Agencourt) to remove short amplification products. Raw sequencing reads were quality checked using FastQC. USEARCH (v11.0.667 64-bit Linux version)75 was used to process the raw reads. Read pairs were merged with 30 maximum accepted differences and 80% minimum overlap identity, then filtered using a threshold of maximum 1 expected error per merged contig. Reads passing the filter were subsequently dereplicated. Sequences occurring at least 8 times in the entire dataset were used to infer biological sequences with the UNOISE3 algorithm (α-parameter set to 2.0)76. All merged reads (including reads that failed quality filtering) were mapped back to the inferred Amplicon Sequence Variants (ASVs) in order to construct an ASVs table59. Taxonomy was assigned to the ASVs with the SINTAX algorithm77 using Greengenes v.13.5 and Silva 13278. The ASV table was rarefied to 14,942 counts per sample. ASV sequences were then used as input for MAFFT (v.7.427)79,80 in order to obtain a multiple sequence alignment, based on which a phylogenetic tree was constructed using IQ-TREE (v. 1.6.11)48. The phylogenetic tree was midpoint-rooted using the “phytools” R package (Revell, 2012). The “phyloseq” R package81 was used to integrate the ASV counts, taxonomy assignments, phylogenetic tree and sample metadata. The above analyses identified 1438 ASVs of which 418 were deemed to have a non-trivial counts (>0.02%, corresponding to approximately 3 counts per sample out of 14,942 reads). A core microbiota subset was defined on the basis of ASVs that were present in at least 75% of the cohort, yielding 109 ASVs.

Covariate selection

To avoid overfitting, covariates were selected a priori, as informed by prior epidemiological analyses (mainly11,12,82 and insofar available in our dataset). The selected covariates involved sociodemographics (ethnicity, age, sex, education), behavioral/lifestyle (smoking, alcohol, exercise, bodyweight (BMI), and medical variables (diabetes, diagnosis of GI disorder, proton-pump inhibitor (PPI) use, recent antibiotic use, recent diarrhea; see “Statistical methods” section). Never-smokers and former smokers were categorized as non-smokers. Post hoc analyses yielded comparable results when former smokers were omitted from analyses (results not shown here). To ascertain that no important medical covariates were overlooked, exploratory analyses were performed to identify variables with additional explanatory value in the fully adjusted models. Among the variables tested were presence of metabolic syndrome and its components83, glucocorticoid medication, statins, beta-blockers, inflammatory diseases. None of these were retained in the final analyses on the basis of failing to significantly alter the association between predictor and main outcome. In follow-up of recent recommendations13, the small number of antidepressant users were excluded from the main analyses (N = 132). Auxiliary analyses included covariates that reflect pre-existing psychological risk factors for depression, for which we selected parental history of depressive disorders, number of prior depressive episodes, and neuroticism.

Statistical methods

Statistical analyses were performed in R version 4.0.1, SPSS v27, or JASP 0.13.1. Multiple linear regression was used to determine the association between α-diversity or β-diversity and depressed mood (PHQ-9 sum scores), with the latter as the outcome variable. Covariates (see above) were added in a stepwise fashion yielding 3 models; Model 1 was adjusted for sociodemographics (age, sex, ethnicity, education), Model 2 additionally adjusted for health-related behaviors (smoking, alcohol, exercise, BMI), Model 3 further incorporated medical variables (diabetes, self-reported diagnosis of GI disorder, use of proton pump inhibitors, antibiotic use past 2 weeks, diarrhea past week).

Αlpha- and Beta-diversity indexes were calculated using the ‘vegan’ package in R84 (see above). The Shannon index was used as the primary marker of α-diversity, but analyses were repeated for other measures of α-diversity (i.e., Phylogenetic Diversity, Chao1, Abundance-based Coverage Estimator Observed, Simpson index). For β-diversity, principal coordinate analyses (PCoA) were performed using weighted UniFrac metrics and Bray-Curtis distances. The first 20 principal coordinates were selected for inclusion in multivariable regression analyses by applying forward selection, and the resulting coordinates were used as predictor variables in linear regression (see further description in the results section), utilizing the same 3 regression models described above. The three regression models were also used to examine association between individual ASVs and depressed mood. For these analyses both the predictor (PHQ-9 sum scores) and independent variables (relative abundances) were rank-ordered to yield a more robust estimate. P-values were FDR-corrected for multiple testing (Benjamini-Hochberg)85; a corrected P-value < 0.05 was considered statistically significant.

Possible heterogeneous between-ethnic associations were examined by two methods: First, GLM (SPSS UNIANOVA) was used to determine significant interactions between ethnicity and each ASV (FDR adjusted85). As a second method, the associations were stratified by ethnicity and the heterogeneity of microbiota-depression associations was quantified as I2 (i.e., comparable to a meta-analysis). Associations showing a I2 > 30% and >50% across ethnicity are considered to reflect moderately or high heterogeneity, respectively86.

Mediation analyses were used to test if beta-diversity (i.e., microbial diversity between individuals) may statistically account for ethnic disparities in depressive symptom levels, following the inferential steps as described by Kenny and Baron87.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.