Abstract
Autism spectrum disorder (ASD) is a highly heritable and heterogeneous group of neurodevelopmental phenotypes diagnosed in more than 1% of children. Common genetic variants contribute substantially to ASD susceptibility, but to date no individual variants have been robustly associated with ASD. With a marked sample-size increase from a unique Danish population resource, we report a genome-wide association meta-analysis of 18,381 individuals with ASD and 27,969 controls that identified five genome-wide-significant loci. Leveraging GWAS results from three phenotypes with significantly overlapping genetic architectures (schizophrenia, major depression, and educational attainment), we identified seven additional loci shared with other traits at equally strict significance levels. Dissecting the polygenic architecture, we found both quantitative and qualitative polygenic heterogeneity across ASD subtypes. These results highlight biological insights, particularly relating to neuronal function and corticogenesis, and establish that GWAS performed at scale will be much more productive in the near term in ASD.
Similar content being viewed by others
Main
ASD is the term for a group of pervasive neurodevelopmental disorders characterized by impaired social and communication skills along with repetitive and restrictive behavior. The clinical presentation is highly heterogeneous, including individuals with severe impairment and intellectual disability (ID) as well as individuals with above-average intelligence quotient (IQ) and high levels of academic and occupational functioning. ASD affects 1–1.5% of individuals and is highly heritable, and both common and rare variants contribute to its etiology1,2,3,4. Common variants have been estimated to account for a major part of ASD liability2, as has been observed for other common neuropsychiatric disorders. In contrast, de novo mutations, mostly copy number variants (CNVs) and gene-disrupting point mutations, have larger individual effects but collectively explain <5% of the overall liability1,2,3 and far less of the heritability. Although a number of genes have been convincingly implicated via excess statistical aggregation of de novo mutations, the largest genome-wide association study (GWAS) to date (n = 7,387 cases scanned)—although providing compelling evidence for the bulk contribution of common variants—did not conclusively identify single variants at genome-wide significance5,6,7. These results underscore that common variants, as in other complex diseases such as schizophrenia, individually have low impact and that a substantial scale-up in sample numbers would be needed.
Here we report what are, to our knowledge, the first common risk variants robustly associated with ASD, by more than doubling the discovery sample size relative to that in previous GWAS5,6,7,8. We describe strong genetic correlations between ASD and other complex disorders and traits, confirming shared etiology, and we show results indicating differences in the polygenic architecture across clinical subtypes of ASD. Leveraging these relationships and recently introduced computational techniques9, we identify additional previously undescribed ASD-associated variants that are shared with other phenotypes. Furthermore, by integrating with complementary data from Hi-C chromatin-interaction analysis of fetal brains and brain transcriptome data, we explore the functional implications of our top-ranking GWAS results.
Results
GWAS
As part of the iPSYCH project10, we collected and genotyped a Danish nationwide population-based case–cohort sample including nearly all individuals born in Denmark between 1981 and 2005 and diagnosed with ASD (according to the International Statistical Classification of Diseases and Related Health Problems, 10th revision (ICD-10)) before 2014. We randomly selected controls from the same birth cohorts (Supplementary Table 1). We previously validated registry-based ASD diagnoses11,12 and demonstrated the accuracy of genotyping DNA extracted and amplified from blood spots collected shortly after birth13,14. Genotypes were processed with Ricopili15, performing stringent quality control of data, removal of related individuals, exclusion of ancestry outliers based on principal component analysis (PCA), and imputation by using the 1000 Genomes Project phase 3 reference panel. After this processing, genotypes from 13,076 cases and 22,664 controls from the iPSYCH sample were included in the analysis. As is now standard in human complex-trait genomics, our primary analysis was a meta-analysis of the iPSYCH ASD results with five family-based trio samples of European ancestry from the Psychiatric Genomics Consortium (PGC; 5,305 cases and 5,305 pseudocontrols)16. All PGC samples had been processed with the same Ricopili pipeline for quality control, imputation, and analysis as used here.
Supporting the consistency between the study designs, the iPSYCH population-based and PGC family-based analyses showed a high degree of genetic correlation with rG = 0.779 (s.e.m. = 0.106; P = 1.75 × 10−13), findings similar to the genetic correlations observed between datasets in other mental disorders17. Likewise, polygenicity, as assessed by polygenic risk scores (PRSs), showed consistency across the samples, thus supporting homogeneity of the effects across samples and study designs (results below regarding PRSs on a five-way split of the sample). The SNP heritability (\(h_{\mathrm{G}}^2\)) was estimated to be 0.118 (s.e.m. = 0.010), for a population prevalence of 0.012 (ref. 18).
The main GWAS meta-analysis included a total of 18,381 ASD cases and 27,969 controls, and applied an inverse-variance-weighted fixed-effects model. To ensure that the analysis was well powered and robust, we examined markers with minor-allele frequency (MAF) ≥0.01 and imputation INFO score ≥0.7, which were supported by an effective sample size in >70% of the total. This final meta-analysis included results for 9,112,387 autosomal markers and yielded 93 genome-wide-significant markers in three separate loci (Fig. 1, Table 1a and Supplementary Figs. 1–44). Each locus was strongly supported by both the Danish case–control and the PGC family-based data. Although modest inflation was observed (lambda = 1.12, lambda1000 = 1.006), linkage disequilibrium (LD)-score regression analysis19 indicated that this finding arose from polygenicity (>96%; Methods) rather than confounding. The strongest signal among 294,911 markers analyzed on the X chromosome was P = 7.8 × 10−5.
We next obtained replication data for the top 88 loci with P values <1 × 10−5 in five cohorts of European ancestry, including a total of 2,119 additional cases and 142,379 controls (Supplementary Table 2 and 3). An overall replication of the direction of effects was observed (53 of 88 (60%) of P <1 × 10−5; 16 of 23 (70%) at P <1 × 10−6; sign tests, P = 0.035 and P = 0.047, respectively), and two additional loci achieved genome-wide significance in the combined analysis (Table 1a). More details on the identified loci can be found in Supplementary Table 4, and selected candidates are described in Box 1.
Correlation with other traits and multitrait GWAS
To investigate the extent of genetic overlap between ASD and other phenotypes, we estimated the genetic correlations with a broad set of psychiatric and other medical diseases, disorders, and traits available at LD Hub20, by using bivariate LD-score regression (Fig. 2 and Supplementary Table 5). Significant correlations were found for several traits including schizophrenia15 (rG = 0.211, P = 1.03 × 10−5) and measures of cognitive ability, especially educational attainment21 (rG = 0.199, P = 2.56 × 10−9), thus indicating a substantial genetic overlap with these phenotypes and corroborating previous reports5,22,23,24. In contrast to findings in previous reports16, we find a strong and highly significant correlation with major depression25 (rG = 0.412, P = 1.40 × 10−25), and we report a novel and prominent overlap with ADHD26 (rG = 0.360, P = 1.24 × 10−12). Moreover, we confirm the genetic correlation with social communication difficulties at age 8 in a non-ASD population sample previously reported and based on a subset of the ASD sample27 (rG = 0.375, P = 0.0028).
To leverage these observations for the discovery of loci that may be shared between ASD and these other traits, we selected three particularly well-powered and genetically correlated phenotypes. These were schizophrenia (n = 79,641)15, major depression (n = 424,015)25, and educational attainment (n = 328,917)21. We used the recently introduced MTAG method9 which, in brief, generalizes the standard inverse-variance-weighted meta-analysis for multiple phenotypes. In this case, MTAG takes advantage of how, given an overall genetic correlation between ASD and a second trait, the effect-size estimate and evidence for association to ASD can be improved by appropriate use of the association information from the second trait. The results of these three ASD-anchored MTAG scans are correlated to the primary ASD scan (and to each other), but given the exploration of three scans, we used a more conservative threshold of 1.67 × 10−8 for declaring significance across these secondary scans giving an estimated maximum false discovery rate (maxFDR) of 0.021. In addition to stronger evidence for several of the ASD hits defined above, variants in seven additional regions achieved genome-wide significance, including three loci shared with educational attainment and four shared with major depression (Table 1b, Box 1, Supplementary Table 6 and Supplementary Figs. 49–55). We note that in these seven instances, the effect-size estimate is stronger in ASD than the secondary trait, and the result is not characteristic of the strongest signals in these other scans (Supplementary Table 7–9) (and in fact, three of these seven were not significant in the secondary trait and constitute potentially novel findings). Moreover, we benchmarked against MTAG running two very large and heritable traits (height28, n = 252,288 and body mass index (BMI)29, n = 322,154) with no expected links to ASD, and no significant loci were added to the list of ASD-only significant associations.
Gene and gene-set analysis
Next, we performed gene-based association analysis on our primary ASD meta-analysis by using MAGMA30, testing for the joint association of all markers within a locus (across all protein-coding genes in the genome). This analysis identified 15 genes surpassing the significance threshold (Supplementary Table 10). As expected, most of these genes were located within the genome-wide-significant loci identified in the GWAS, but seven genes were located in four additional loci: KCNN2, MMP12, NTM, and a cluster of genes on chromosome 17 (KANSL1, WNT3, MAPT, and CRHR1) (Supplementary Figs. 57–71). In particular, KCNN2 was strongly associated (P = 1.02 × 10−9), far beyond even single-variant statistical thresholds, and is included in the descriptions in Box 1.
Enrichment analyses using gene coexpression modules from human neocortex transcriptomic data (M13, M16, and M17 from Parikshak et al.31) and loss-of-function intolerant genes (probability of loss-of-function intolerance, pLI >0.9)32,33, for which there is evidence of enrichment in neurodevelopmental disorders26,31,34, yielded only nominal significance for the latter (P = 0.014) and M16 (P = 0.050) (Supplementary Table 11). Genes implicated in ASD by studies or rare variants in Sanders et al.35 were just shy of showing nominally significant enrichment (P = 0.063), whereas enrichment in the curated gene list from the SPARK consortium36 was significant (P = 0.0034). Likewise, analysis of Gene Ontology sets37,38 for molecular function from the Molecular Signatures Database (MsigDB)39 showed no significant sets after Bonferroni correction for multiple testing (Supplementary Table 12).
Dissection of the polygenic architecture
Because ASD is a highly heterogeneous disorder, we explored how \(h_{\mathrm{G}}^2\) partitioned across phenotypic subcategories in the iPSYCH sample, and we estimated the genetic correlations among these groups by using GCTA40. We examined cases with (n = 1,873) and those without ID and the ICD-10 diagnostic subcategories of childhood autism (F84.0, n = 3,310), atypical autism (F84.1, n = 1,607), Asperger’s syndrome (F84.5, n = 4,622), and other/unspecified pervasive developmental disorders (PDDs, F84.8-9, n = 5,795), reducing to nonoverlapping groups when performing pairwise comparisons (Supplementary Table 13). Whereas the pairwise genetic correlations were consistently high among all subgroups (95% confidence intervals (CIs) including 1 in all comparisons), the \(h_{\mathrm{G}}^2\) of Asperger’s syndrome (\(h_{\mathrm{G}}^2\)=0.097, s.e.m. = 0.001) was found to be twice the \(h_{\mathrm{G}}^2\) of both childhood autism (\(h_{\mathrm{G}}^2\)= 0.049, s.e.m. = 0.009, P = 0.001) and the group of other/unspecified PDDs (\(h_{\mathrm{G}}^2\)= 0.045, s.e.m. = 0.008, P = 0.001) (Supplementary Tables 14 and 15 and Supplementary Figs. 82 and 83). Similarly, the \(h_{\mathrm{G}}^2\) of ASD without ID (\(h_{\mathrm{G}}^2\)= 0.086, s.e.m. = 0.005) was three times higher than that for cases with ID (\(h_{\mathrm{G}}^2\) = 0.029, s.e.m. = 0.013, P = 0.015).
To further examine the apparent polygenic heterogeneity across subtypes, we investigated how PRSs trained on different phenotypes were distributed across distinct ASD subgroups. We focused on phenotypes showing strong genetic correlation with ASD (for example, educational attainment) but also included traits with little or no correlation to ASD (for example, BMI) as negative controls. In this analysis, we regressed the normalized scores on ASD subgroups while including covariates for batches and principal components (PCs) in a multivariate regression. Of the eight phenotypes evaluated, only the cognitive phenotypes showed strong heterogeneity (educational attainment21, P = 1.8 × 10−8; IQ41, P = 3.7 × 10−9) (Supplementary Fig. 84). Interestingly, all case–control groups with or without ID showed significantly different loading for the two cognitive phenotypes: controls with ID had the lowest score, followed by ASD cases with ID, and ASD cases without ID again had significantly higher scores than those of any other group (educational attainment, P = 2.6 × 10−12 ; IQ, P = 8.2 × 10−12).
With respect to the diagnostic subcategories constructed hierarchically from ASD subtypes (Supplementary Table 13), the cognitive phenotypes again showed the strongest heterogeneity across the diagnostic classes (educational attainment, P = 2.6 × 10−11; IQ, P = 3.4 × 10−8), whereas neuroticism23 (P = 0.0015), chronotype42 (P = 0.011), and subjective well-being23 (P = 0.029) showed a weaker but nominally significant degree of heterogeneity, and schizophrenia, major depressive disorder, and BMI29 were nonsignificant across the groups (P > 0.19) (Fig. 3). This pattern weakened only slightly when we excluded subjects with ID (Supplementary Fig. 85). For neuroticism, there was a clear split, with atypical and other/unspecified PDD cases having significantly higher PRSs than childhood autism and Asperger’s syndrome, P = 0.00013. Given the genetic overlap of each subcategory with each phenotype, the hypothesis of homogeneity across subphenotypes was strongly rejected (P = 1.6 × 10−11), thereby establishing that these subcategories indeed have differences in their genetic architectures.
Focusing on educational attainment, we found a significant enrichment of PRSs for Asperger’s syndrome (P = 2.0 × 10−17) in particular, and for childhood autism (P = 1.5 × 10−5), but not for the group of other/unspecified PDD (P = 0.36) or for atypical autism (P = 0.13) (Fig. 3). Excluding individuals with ID only marginally changed this result: atypical autism became nominally significant (P = 0.020) (Supplementary Fig. 85). These results show that the genetic architecture underlying educational attainment is indeed shared with ASD but to a variable degree across the disorder spectrum. We found that the observed excess in ASD subjects of alleles positively associated with education attainment43,44 was confined to Asperger’s syndrome and childhood autism, and it was not seen here in atypical autism nor in other/unspecified PDD.
Finally, we evaluated the predictive ability of ASD PRSs by using five different sets of target and training samples within the combined iPSYCH-PGC sample. The observed mean variance explained by PRSs (Nagelkerke’s R2) was 2.45% (P = 5.58 × 10−140) with a pooled PRS-based case–control odds ratio (OR) = 1.33 (95% CI 1.30 –1.36) (Supplementary Figs. 89 and 91). Dividing the target samples into PRS decile groups revealed an increase in ORs with increasing PRSs. The ORs for subjects with the highest PRSs increased to OR = 2.80 (95% CI 2.53–3.10) relative to the lowest decile (Fig. 4a and Supplementary Fig. 92). By leveraging correlated phenotypes in an attempt to improve prediction of ASD, we generated a multiphenotype PRS as a weighted sum of phenotype-specific PRSs (Methods). As expected, Nagelkerkes’s R2 increased for each PRS included, attaining its maximum at the full model at 3.77% (P = 2.03 × 10−215) for the pooled analysis with an OR = 3.57 (95% CI 3.22–3.96) for the highest decile (Fig. 4b and Supplementary Figs. 93 and 94). These results demonstrate that an individual’s ASD risk depends on the level of polygenic burden of thousands of common variants in a dose-dependent manner, which can be reinforced by adding SNP weights from ASD-correlated traits.
Functional annotation
To obtain information on the possible biological underpinnings of our GWAS results, we conducted several analyses. First, we examined how the ASD \(h_{\mathrm{G}}^2\) partitioned on functional genomic categories as well as on cell-type-specific regulatory elements, by using stratified LD-score regression45. This analysis identified significant enrichment of heritability in conserved DNA regions and monomethyl histone H3 Lys4 (H3K4me1) histone marks46, as well as in genes expressed in central-nervous-system cell types as a group (Supplementary Figs. 95 and 96), in line with observations in schizophrenia15, major depression25, and bipolar disorder22. Analyzing the enhancer-associated mark H3K4me1 in individual cells/tissues46, we found significant enrichment in brain and neuronal cell lines (Supplementary Fig. 97). The highest enrichment was observed in the developing brain, germinal matrix, cortex-derived neurospheres, and embryonic-stem-cell-derived neurons, results consistent with ASD as a neurodevelopmental disorder with largely prenatal origins, as supported by data from analysis of rare de novo variants31.
Common variation in ASD is located in regions that are highly enriched with regulatory elements predicted to be active in human corticogenesis (Supplementary Figs. 95–97). Because most gene regulatory events occur at a distance via chromosome looping, we leveraged Hi-C data from the germinal zone (GZ) and postmitotic-zone cortical plate (CP) in the developing fetal brain to identify potential target genes for these variants47. We performed fine-mapping of 28 loci to identify the set of credible variants with likely causal genetic risk48 (Methods). Credible SNPs were significantly enriched in enhancer marks in the fetal brain (Supplementary Fig. 98), thus again confirming the likely regulatory role of these SNPs during brain development.
On the basis of location or evidence of physical contact from Hi-C, the 380 credible SNPs (28 loci) were assigned to 95 genes (40 protein coding), including 39 SNPs within promoters assigned to 9 genes, and 16 SNPs within the protein coding sequence of 8 genes (Supplementary Table 16 and Supplementary Fig. 98). Hi-C identified 86 genes, which interacted with credible SNPs in either the CP or GZ during brain development. Among these genes, 34 interacted with credible SNPs in both CP and GZ, thus representing a high-confidence gene list. Notable examples are illustrated in Fig. 5 and highlighted in Box 1. By analyzing their mean expression trajectory, we observed that the identified ASD-candidate genes (Supplementary Table 16) showed the highest expression during fetal corticogenesis, a finding in line with the enrichment of heritability in the regulatory elements in developing brain (Fig. 5e–g). Interestingly, both common and rare variation in ASD preferentially affects genes expressed during corticogenesis31, thus highlighting a potential spatiotemporal convergence of genetic risk on this specific developmental epoch, despite the disorder’s profound genetic heterogeneity.
Discussion
The high heritability of ASD has been recognized for decades and remains among the highest for any complex disease despite many clinical diagnostic changes over the past 30–40 years resulting in a broader phenotype that characterizes more than 1% of the population. Although early GWAS permitted estimates that common polygenic variation should explain a substantial fraction of the heritability of ASD, individually significant loci remained elusive. This lack of results was suspected to be due to limited sample size, because studies of schizophrenia—with similar prevalence and heritability, and lower fitness—and major depression achieved striking results only when sample sizes five to ten times larger than those available in ASD were used. This study has finally borne out that expectation with definitively demonstrated significant ‘hits’.
Here we report what are, to our knowledge, the first reported common risk variants robustly associated with ASD, on the basis of unique Danish resources in conjunction with results of the earlier PGC data—more than tripling the previous largest discovery sample. Of these, five loci were defined in ASD alone, and seven additional suggested at a stricter threshold by using GWAS results from three correlated phenotypes (schizophrenia, depression, and educational attainment) and a recently introduced analytic approach, MTAG. Both genome-wide LD-score regression analysis and the finding that, even among the loci defined in ASD alone, additional evidence in these other trait scans indicated that the polygenic architecture of ASD is significantly shared with the risk of adult psychiatric illness and higher educational attainment and intelligence. Of note, the MTAG analyses were carried out as three pairwise analyses. Consequently, we avoided the complex interactions that might have arisen if we ran three or four correlated phenotypes at a time9. Indeed, despite the secondary summary statistics coming from large, high-powered studies, we obtained relatively modest weights of the contributions from these statistics, because the genetic correlations were modest. The largest weight was 0.27 for schizophrenia, followed by 0.24 for major depression, and 0.11 for educational attainment. Moreover, the estimated worst-case FDR was 0.021, just 0.001 higher than that of the ASD GWAS alone. Thus, all loci identified by MTAG were found with an acceptable degree of certainty and had substantial contributions from ASD alone (Table 1a,b and Supplementary Table 6). We expect that most or all such loci will probably be identified in future ASD-only GWAS as sample sizes are increased substantially; however, given how new these methods are, the precise phenotypic consequences of these particular variants await expansion of all these trait GWAS.
In most GWAS studies, there has been little evidence of heterogeneity of association across phenotypic subgroups. In this study, however, we observed strong heterogeneity of genetic overlap with other traits when our ASD samples were divided into distinct subsets. In particular, the excess of alleles associated with higher intelligence and educational attainment was observed only in the higher-functioning categories (particularly in individuals with Asperger’s syndrome and individuals without comorbid ID) and not in the other/unspecified PDD and ID categories. These results are reminiscent of, and logically inverted relative to, the much greater role of spontaneous mutations in these latter categories, particularly in genes known to have an even larger effect in cohorts ascertained for ID/developmental delay49. Interestingly, other/unspecified PDDs and atypical autism also have significantly higher PRSs for neuroticism than childhood autism and Asperger’s syndrome. The different enrichment profiles observed provide evidence of a heterogeneous and qualitatively different genetic architecture among subtypes of ASD, which should inform future studies aiming at identifying etiologies and disease mechanisms in ASD.
The strong differences in estimated SNP heritability between ASD cases with versus without ID, and the highest values observed in Asperger’s syndrome, provide genetic evidence of longstanding observations. In particular, the results align well with the observation that de novo variants are more frequently observed in ASD cases with ID than in cases without comorbid ID, that IQ correlates positively with family history of psychiatric disorders50; and that severe ID (encompassing many syndromes that confer high risk of ASD) show far less heritability than that observed for mild ID51, intelligence in general52, and ASDs. Thus, it is perhaps unsurprising that our data suggest that the contribution of common variants may be more prominent in high-functioning ASD, such as Asperger’s syndrome.
We further explored the functional implications of these results with complementary functional genomics data including Hi-C analyses of fetal brains and brain transcriptome data. Analyses at genome-wide scale (partitioned \(h_{\mathrm{G}}^2\) (Supplementary Figs. 95–97) and brain transcriptome enrichment (Fig. 5e–g)) as well as at single loci (Fig. 5a–d and Box 1) highlighted the involvement of processes relating to brain development and neuronal function. Notably, several genes located in the identified loci have previously been linked to ASD risk in studies of de novo and rare variants (Box 1 and Supplementary Table 4), including PTBP2, CADPS, and KMT2E, which were found to interact with credible SNPs in the Hi-C analysis (PTBP2 and CADPS) or to contain a loss-of-function credible SNP (KMT2E). Interestingly, aberrant splicing of the sister gene of CADPS, CADPS2, which has almost identical function, has been found in autism cases, and Cadps2-knockout mice display behavioral anomalies with translational relevance to autism53. PTBP2 encodes a neuronal splicing factor, and alterations in alternative splicing have been identified in brains from individuals diagnosed with ASD54.
In summary, we established an initial robust set of common variant associations in ASD and have begun laying the groundwork through which the biology of ASD and related phenotypes will inevitably be better articulated.
URLs
GenomeDK high-performance-computing cluster in Denmark, https://genome.au.dk/; iPSYCH project, http://ipsych.au.dk/, iPSYCH download site, http://ipsych.au.dk/downloads/; NIMH Repository, https://www.nimhgenetics.org/available_data/autism/; PGC download site, https://www.med.unc.edu/pgc/results-and-downloads/; LISA cluster at SURFsara, https://userinfo.surfsara.nl/systems/lisa/; plink 1.9, http://www.cog-genomics.org/plink/1.9/; LDSC and associated files, https://github.com/bulik/ldsc/; LD Hub, http://ldsc.broadinstitute.org/ldhub/; GTEx portal, https://gtexportal.org/home/
Methods
Subjects
iPSYCH sample
The iPSYCH ASD sample is a part of a population based case–cohort sample extracted from a baseline cohort10 consisting of all children born in Denmark between 1 May 1981 and 31 December 2005. Singletons who were born to a known mother and were resident in Denmark on their first birthday were included. Cases were identified from the Danish Psychiatric Central Research Register (DPCRR)12, which includes data on all individuals treated in Denmark at psychiatric hospitals (from 1969 onward) as well as at outpatient psychiatric clinics (from 1995 onward). Subjects were diagnosed with ASD in 2013 or earlier by a psychiatrist according to ICD10, including diagnoses of childhood autism (ICD10 code F84.0), atypical autism (F84.1), Asperger’s syndrome (F84.5), other pervasive developmental disorders (F84.8), and pervasive developmental disorder, unspecified (F84.9). For controls, we selected a random sample from the set of eligible children excluding those with an ASD diagnosis by 2013.
The samples were linked by using the unique national personal identification number to the Danish Newborn Screening Biobank (DNSB) at Statens Serum Institute (SSI), where DNA was extracted from Guthrie cards, and whole-genome amplification was performed in triplicate, as described previously13,97. Genotyping was performed at the Broad Institute of Harvard and MIT (Cambridge, MA, USA) with PsychChip arrays from Illumina according to the manufacturer’s instructions. Genotype calling of markers with MAF >0.01 was performed by merging call sets from GenCall98 and Birdseed99, and less frequent variants were called with zCall100. Genotyping and data processing were carried out in 23 waves.
All analyses of the iPSYCH sample and joint analyses with the PGC samples were performed at the secured national GenomeDK high-performance computing cluster in Denmark. The study was approved by the Regional Scientific Ethics Committee in Denmark and the Danish Data Protection Agency.
PGC samples
In brief, five cohorts provided genotypes to the sample (n denotes the number of trios for which genotypes were available): the Geschwind Autism Center of Excellence (ACE; n = 391), the Autism Genome Project94 (AGP; n = 2,272), the Autism Genetic Resource Exchange101,102 (AGRE; n = 974), the NIMH Repository, the Montreal103/Boston Collection (MONBOS; n = 1,396, and the Simons Simplex Collection104,105(SSC; n = 2,231). The trios were analyzed as cases and pseudocontrols. A detailed description of the sample is available on the PGC website, and additional details are provided in Anney et al.5. Analyses of the PGC genotypes were conducted on the computer cluster LISA at the Dutch HPC center SURFsara.
Follow-up samples
As follow-up for the loci with P values <10−6, we asked for look-up in five samples of Nordic and Eastern European origin, including 2,119 cases and 142,379 controls in total: BUPGEN (Norway: 164 cases and 656 controls), PAGES (Sweden: 926 cases and 3,841 controls not part of the PGC sample above), the Finnish autism case–control study (Finland: 159 cases and 526 controls), and deCODE (Iceland: 574 cases and 136,968 controls; Eastern Europe: 296 cases and 388 controls) (details in Supplementary Note).
Statistical analyses
All statistical tests were two sided unless otherwise stated. Software versions and additional information can be found in the Nature Research Reporting Summary.
GWAS analysis
Ricopili15, the pipeline developed by the PGC Statistical Analysis Group was used for quality control, imputation, PCA, and primary association analysis (details in the Supplementary Note). The data were processed separately in the 23 genotyping batches in the case of iPSYCH and separately for each study in the PGC sample. Phasing was achieved with SHAPEIT106, and imputation was done with IMPUTE2 (refs. 107,108) with haplotypes from the 1000 Genomes Project, phase 3 (ref. 109) as a reference.
After exclusion of regions of high LD110, the genotypes were pruned down to a set of approximately 30,000 markers (details in Supplementary Note). With PLINK’s111 identity by state analysis, pairs of subjects were identified with \(\hat \pi > 0.2\), and one subject of each such pair was excluded at random (with a preference for keeping cases). PCA was carried out with smartPCA112,113. In iPSYCH, a subsample of European ancestry was selected as an ellipsoid in the space of PC1–3 and centered and scaled by using the mean and eight s.d. of the subsample whose parents and grandparents were all known to have been born in Denmark (n = 31,500). In the PGC sample, the European (CEU) subset was chosen by using a Euclidian-distance measure weighted by the variance explained by each of the first three PCs. Individuals more distant than ten s.d. from the combined CEU and Toscani in Italy (TSI) HapMap reference populations were excluded. We conducted a secondary PCA on the remaining 13,076 cases and 22,664 controls to provide covariates for the association analyses. Numbers of subjects in the data-generation flow for the iPSYCH sample can be found in Supplementary Table 1.
We performed association analyses by applying PLINK 1.9 to the imputed dosage data (the sum of imputation probabilities P(A1A2) + 2P(A1A1)). In iPSYCH, we included the first four PCs as covariates as well as any PC beyond that, which were significantly associated with ASD in the sample, whereas the case–pseudocontrols from the PGC trios required no PC covariates. Combined results for iPSYCH and for iPSYCH with the PGC were achieved by meta-analysis of batchwise and studywise results by using METAL114 (July 2010 version) with an inverse-variance-weighted fixed-effect model115. On chromosome X, males and females were analyzed separately and then meta-analyzed together. Subsequently, we applied a quality filter allowing only markers with an imputation info score 0.7, MAF of 0.01 and an effective sample size (Supplementary Note) of at least 70% of the study maximum. The degree to which the deviation in the test statistics could be ascribed to cryptic relatedness and population stratification rather than to polygenicity was measured from the intercept in LD-score regression19 (LDSC) as the ratio of (intercept – 1) and (mean χ2 – 1).
MTAG9 was applied with standard settings. The iPSYCH-PGC meta-analysis summary statistics were paired with the summary statistics for each of major depression25 (excluding the Danish samples but including summary statistics from 23andMe57; 111,902 cases, 312,113 controls, and mean χ2 = 1.477), schizophrenia15 (also excluding the Danish samples; 34,129 cases, 45,512 controls, and mean χ2 = 1.804) and educational attainment21 (328,917 samples and mean χ2 = 1.648). These are studies that have considerably more statistical power than the ASD scan, but because the genetic correlations are modest in the context of MTAG, the weights ascribed to the secondary phenotypes in the MTAG analyses remain relatively low (no higher than 0.27). The maximum FDR was estimated as recommended in the MTAG paper9 (details in the Supplementary Note).
The results were clumped, and we highlighted loci of interest by selecting those that were significant at 5 × 10−8 in the iPSYCH-PGC meta-analysis or the meta-analysis with the follow-up sample or were significant at 1.67 × l0−8 in any of the three MTAG analyses. The composite GWAS consisting of the minimal P values at each marker over these five analyses was used as a background when creating Manhattan plots for the different analyses showing both what was maximally achieved and what the individual analysis contributed to that.
Gene-based association and gene-set analyses
MAGMA 1.06 (ref. 30) was applied to the ASD GWAS summary statistics to test for gene-based association. By using NCBI 37.3 gene definitions and restricting the analysis to SNPs located within the transcribed region, we tested mean SNP association with the sum of –log(SNP P value) as the test statistic. The resulting gene-based P values were further used in competitive gene-set enrichment analyses in MAGMA. One analysis explored the candidate sets M13, M16, and M17 from Parikshak et al.31, constrained, loss-of-function intolerant genes (pLI >0.9; refs. 32,33) derived from data from the Exome Aggregation Consortium (details in Supplementary Note), as well as gene sets found in studies of rare variants in autism by Sanders et al.35 and the curated gene list from the SPARK consortium36. Another was an agnostic analysis of the Gene Ontology sets37,38 for molecular function from MsigDB 6.0 (ref. 39). We analyzed only genes outside the broad MHC region (hg19: Chr 6: 25–35 Mb) and included only gene sets with 10–1,000 genes. The gene sets from Sanders et al. and SPARK included only one gene in MHC and were exempt from the MHC exclusion to be as true to the set as possible. All gene sets with significant enrichment were inspected to ensure that the signal was not driven by one or a few associated loci with multiple genes in close LD.
SNP heritability
SNP heritability, \(h_{\mathrm{G}}^2\), was estimated by using LDSC19 for the full ASD GWAS sample and GCTA40,116,117 for subsamples too small for LDSC. For LDSC, we used precomputed LD scores based on the European-ancestry samples of the 1000 Genomes Project118 restricted to HapMap3 (ref. 119) SNPs. The summary statistics with standard LDSC filtering were regressed onto these scores. For liability-scale estimates, we used a population prevalence for Denmark of 1.22% (ref. 18). Lacking proper prevalence estimates for subtypes, we scaled the full spectrum prevalence on the basis of the composition of the case sample.
For subsamples too small for LDSC, the GREML approach of GCTA40,116,117 was used. On best-guess genotypes (genotype probability >0.8, missing rate <0.01, and MAF >0.05) with indels removed, a genetic relatedness matrix was fitted for the association sample (i.e., the subjects of European ancestry with \(\hat \pi \le 0.2\)), thus providing a relatedness estimate for all pairwise combinations of individuals. Estimation of the phenotypic variance explained by the SNPs (REML) was performed by including PC1–4 as continuous covariates together with any other PC that was nominally significantly associated with the phenotype as well as batches as categorical indicator covariates. Testing equal heritability for nonoverlapping groups was performed with permutation tests (with 1,000 permutations), keeping the controls and randomly assigning the different case labels.
Following Finucane et al.45, we conducted an enrichment analysis of the heritability for SNPs for functional annotation and for SNPs located in cell-type-specific regulatory elements. Using first the same 24 overlapping functional annotations (stripped down from 53), as in Finucane et al., we regressed the χ2 from the ASD GWAS summary statistics on the cell-type-specific LD scores downloaded from the site mentioned above with baseline scores, regression weights, and allele frequencies based on European-ancestry 1000 Genome Project data. The enrichment of a category was defined as the proportion of SNP heritability in the category divided by the proportion of SNPs in that category. Still following Finucane et al., we performed a similar analysis using 220 cell-type-specific annotations divided into ten overlapping groups. In addition, we conducted an analysis based on annotations derived from data on H3K4me1 imputed gapped peak data from the Roadmap Epigenomics Mapping Consortium120, more specifically information excluding the broad MHC region (Chr 6: 25–35 Mb).
Genetic correlation
For the main ASD samples, SNP correlations, rG, were estimated by using LDSC19, and for the analysis of ASD subtypes and subgroups in which the sample sizes were generally small, we used GCTA40. In both cases, we followed the same procedures as those explained above. For all but a few phenotypes, LDSC estimates of correlation were achieved by upload to LD Hub20 for comparison to 234 phenotypes in total.
Polygenic risk scores
For the PRSs, we clumped the summary statistics, applying standard Ricopili parameters (details in the Supplementary Note). To avoid potential strand conflicts, we excluded all ambiguous markers for summary statistics not generated by Ricopili by using the same imputation reference. PRSs were generated at the default P-value thresholds (5 × 10–8, 1 × 10–6, 1 × 10–4, 0.001, 0.01, 0.05, 0.1, 0.2, 0.5, and 1) as a weighted sum of the allele dosages in the ASD GWAS sample, summing over the markers abiding by the P-value threshold in the training set and weighing by the additive scale effect measure of the marker (log(OR) or β) as estimated in the training set. Scores were normalized before analysis.
We evaluated the predictive power by using Nagelkerke’s R2 and plots of ORs and CIs over score deciles. Both R2 and ORs were estimated in regression analyses including the relevant PCs and indicator variables for genotyping waves.
Lacking a large ASD sample outside of iPSYCH and PGC, we trained a set of PRSs for ASD internally as follows. We divided the sample into five subsamples of approximately equal size, respecting the division, into batches. We then ran five GWAS, leaving out each group in turn from the training set, then performed meta-analysis of these with the PGC results. This procedure produced a set of PRSs for each of the five subsamples trained on their complement. Before analyses, each score was normalized to the group in which it was defined. We evaluated the predictive power in each group and on the whole sample combined.
To exploit the genetic overlap with other phenotypes to improve prediction, we created a series of new PRSs by adding to the internally trained ASD score the PRSs of other highly correlated phenotypes in a weighted sum (details in the Supplementary Note).
To analyze ASD subtypes in relation to PRSs, we defined a hierarchical set of phenotypes in the following way: The first hierarchical subtypes were childhood autism; hierarchical atypical autism was defined as all individuals with atypical autism and no childhood autism diagnosis, and hierarchical Asperger’s syndrome was defined as all individuals with an Asperger’s syndrome diagnosis and neither childhood autism nor atypical autism. Finally, we lumped other pervasive developmental disorders and pervasive developmental disorder, unspecified into pervasive disorders developmental mixed, and the hierarchical version consisted of all subjects with such a diagnosis and none of the preceding diagnoses (Supplementary Table 13). We examined the distribution over the distinct ASD subtypes of PRSs for a number of phenotypes showing high rG with ASD (as well as a few with low rG as negative controls), by performing multivariate regression of the scores on the subtypes while adjusting for relevant PCs and wave-indicator variables in a linear regression (details in the Supplementary Note).
Hi-C analysis
The Hi-C data were generated from two major cortical laminae: the GZ, containing primarily mitotically active neural progenitors, and the cortical and subcortical plate, consisting primarily of postmitotic neurons47. We first derived a set of credible SNPs (putative causal SNPs) from the identified top-ranking loci in the ASD GWAS by using CAVIAR48. The 30 loci showing the strongest association were intersected with the Hi-C reference data, thus resulting in 28 loci for analysis. To test whether credible SNPs were enriched in active marks in the fetal brain120, we used GREAT, as previously described47,121. Credible SNPs were subgrouped into SNPs without known function (unannotated) and functionally annotated SNPs (SNPs in the gene promoters and SNPs causing nonsynonymous variants) (Supplementary Fig. 98). Then we integrated unannotated credible SNPs with chromatin-contact profiles during fetal corticogenesis47, defining genes physically interacting with intergenic or intronic SNPs (Supplementary Fig. 98).
The spatiotemporal transcriptomic atlas of the human brain was obtained from Kang et al.122. We used transcriptomic profiles of multiple brain regions with developmental epochs spanning prenatal (6–37 weeks postconception) and postnatal (4 months to 42 years) periods. Expression values were log-transformed and centered to the mean expression level for each sample by using a scale(center = T, scale = F)+1 function in R. ASD candidate genes identified by Hi-C analyses (Supplementary Fig. 98) were selected for each sample, and their average centered expression values were calculated and plotted.
Reporting Summary
Further information on research design is available in the Nature Research Reporting Summary.
Data availability
The summary statistics are available for download the iPSYCH and at the PGC download sites (see URLs). For access to genotype data from the PGC samples and the iPSYCH sample, researchers should contact the lead principal investigators M.J.D. and A.D.B. for PGC-ASD and iPSYCH-ASD, respectively.
References
De Rubeis, S. et al. Synaptic, transcriptional and chromatin genes disrupted in autism. Nature 515, 209–215 (2014).
Gaugler, T. et al. Most genetic risk for autism resides with common variation. Nat. Genet. 46, 881–885 (2014).
Iossifov, I. et al. The contribution of de novo coding mutations to autism spectrum disorder. Nature 515, 216–221 (2014).
Krumm, N. et al. Excess of rare, inherited truncating mutations in autism. Nat. Genet. 47, 582–588 (2015).
Autism Spectrum Disorders Working Group of The Psychiatric Genomics Consortium. Meta-analysis of GWAS of over 16,000 individuals with autism spectrum disorder highlights a novel locus at 10q24.32 and a significant overlap with schizophrenia. Mol. Autism 8, 21 (2017).
Ma, D. et al. A genome-wide association study of autism reveals a common novel risk locus at 5p14.1. Ann. Hum. Genet. 73, 263–273 (2009).
Devlin, B., Melhem, N. & Roeder, K. Do common variants play a role in risk for autism? Evidence and theoretical musings. Brain Res. 1380, 78–84 (2011).
Anney, R. et al. Individual common variants exert weak effects on the risk for autism spectrum disorders. Hum. Mol. Genet. 21, 4781–4792 (2012).
Turley, P. et al. Multi-trait analysis of genome-wide association summary statistics using MTAG. Nat. Genet. 50, 229–237 (2018).
Pedersen, C. B. et al. The iPSYCH2012 case-cohort sample: new directions for unravelling genetic and environmental architectures of severe mental disorders. Mol. Psychiatry 23, 6–14 (2018).
Lauritsen, M. B. et al. Validity of childhood autism in the Danish Psychiatric Central Register: findings from a cohort sample born 1990–1999. J. Autism Dev. Disord. 40, 139–148 (2010).
Mors, O., Perto, G. P. & Mortensen, P. B. The Danish Psychiatric Central Research Register. Scand. J. Public Health 39 (Suppl.), 54–57 (2011).
Hollegaard, M. V. et al. Robustness of genome-wide scanning using archived dried blood spot samples as a DNA source. BMC Genet. 12, 58 (2011).
Hollegaard, M. V. et al. Genome-wide scans using archived neonatal dried blood spot samples. BMC Genomics 10, 297 (2009).
Schizophrenia Working Group of the Psychiatric Genomics Consortium. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014).
Cross-Disorder Group of the Psychiatric Genomics Consortium. et al. Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs. Nat. Genet. 45, 984–994 (2013).
Gratten, J., Wray, N. R., Keller, M. C. & Visscher, P. M. Large-scale genomics unveils the genetic architecture of psychiatric disorders. Nat. Neurosci. 17, 782–790 (2014).
Hansen, S. N., Overgaard, M., Andersen, P. K. & Parner, E. T. Estimating a population cumulative incidence under calendar time trends. BMC Med. Res. Methodol. 17, 7 (2017).
Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
Zheng, J. et al. LD Hub: a centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis. Bioinformatics 33, 272–279 (2017).
Okbay, A. et al. Genome-wide association study identifies 74 loci associated with educational attainment. Nature 533, 539–542 (2016).
Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).
Okbay, A. et al. Genetic variants associated with subjective well-being, depressive symptoms, and neuroticism identified through genome-wide analyses. Nat. Genet. 48, 624–633 (2016).
Clarke, T.-K. et al. Common polygenic risk for autism spectrum disorder (ASD) is associated with cognitive ability in the general population. Mol. Psychiatry 21, 419–425 (2016).
Wray, N. R. et al. Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression. Nat. Genet. 50, 668–681 (2018).
Demontis, D. et al. Discovery of the first genome-wide significant risk loci for attention deficit/hyperactivity disorder. Nat. Genet. 51, 63–75 (2019).
St Pourcain, B. et al. ASD and schizophrenia show distinct developmental profiles in common genetic overlap with population-based social communication difficulties. Mol. Psychiatry 23, 263–270 (2018).
Wood, A. R. et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nat. Genet. 46, 1173–1186 (2014).
Locke, A. E. et al. Genetic studies of body mass index yield new insights for obesity biology. Nature 518, 197–206 (2015).
de Leeuw, C. A., Mooij, J. M., Heskes, T. & Posthuma, D. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput. Biol. 11, e1004219 (2015).
Parikshak, N. N. et al. Integrative functional genomic analyses implicate specific molecular pathways and circuits in autism. Cell 155, 1008–1021 (2013).
Samocha, K. E. et al. A framework for the interpretation of de novo mutation in human disease. Nat. Genet. 46, 944–950 (2014).
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
Pardiñas, A. F. et al. Common schizophrenia alleles are enriched in mutation-intolerant genes and in regions under strong background selection. Nat. Genet. 50, 381–389 (2018).
Sanders, S. J. et al. Insights into autism spectrum disorder genomic architecture and biology from 71 risk loci. Neuron 87, 1215–1233 (2015).
SPARK Consortium. SPARK: a US cohort of 50,000 families to accelerate autism research. Neuron 97, 488–493 (2018).
Ashburner, M. et al. Gene ontology: tool for the unification of biology. Nat. Genet. 25, 25–29 (2000).
Gene Ontology Consortium. Gene Ontology Consortium: going forward. Nucleic Acids Res. 43, D1049–D1056 (2015).
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).
Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).
Sniekers, S. et al. Genome-wide association meta-analysis of 78,308 individuals identifies new loci and genes influencing human intelligence. Nat. Genet. 49, 1107–1112 (2017).
Jones, S. E. et al. Genome-wide association analyses in 128,266 individuals identifies new morningness and sleep duration loci. PLoS Genet. 12, e1006125 (2016).
Robinson, E. B. et al. Genetic risk for autism spectrum disorders and neuropsychiatric variation in the general population. Nat. Genet. 48, 552–555 (2016).
Weiner, D. J. et al. Polygenic transmission disequilibrium confirms that common and rare variation act additively to create risk for autism spectrum disorders. Nat. Genet. 49, 978–985 (2017).
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
Shlyueva, D., Stampfel, G. & Stark, A. Transcriptional enhancers: from properties to genome-wide predictions. Nat. Rev. Genet. 15, 272–286 (2014).
Won, H. et al. Chromosome conformation elucidates regulatory relationships in developing human brain. Nature 538, 523–527 (2016).
Hormozdiari, F., Kostem, E., Kang, E. Y., Pasaniuc, B. & Eskin, E. Identifying causal variants at loci with multiple signals of association. Genetics 198, 497–508 (2014).
Kosmicki, J. A. et al. Refining the role of de novo protein-truncating variants in neurodevelopmental disorders by using population reference samples. Nat. Genet. 49, 504–510 (2017).
Robinson, E. B. et al. Autism spectrum disorder severity reflects the average contribution of de novo and familial influences. Proc. Natl Acad. Sci. USA 111, 15161–15165 (2014).
Reichenberg, A. et al. Discontinuity in the genetic and environmental causes of the intellectual disability spectrum. Proc. Natl Acad. Sci. USA 113, 1098–1103 (2016).
Polderman, T. J. C. et al. Meta-analysis of the heritability of human traits based on fifty years of twin studies. Nat. Genet. 47, 702–709 (2015).
Sadakata, T. et al. Autistic-like phenotypes in Cadps2-knockout mice and aberrant CADPS2 splicing in autistic patients. J. Clin. Invest. 117, 931–943 (2007).
Parikshak, N. N. et al. Genome-wide changes in lncRNA, splicing, and regional gene expression patterns in autism. Nature 540, 423–427 (2016).
Davies, G. et al. Genome-wide association study of cognitive functions and educational attainment in UK Biobank (N=112 151). Mol. Psychiatry 21, 758–767 (2016).
Deary, V. et al. Genetic contributions to self-reported tiredness. Mol. Psychiatry 23, 609–620 (2017).
Hyde, C. L. et al. Identification of 15 genetic loci associated with risk of major depression in individuals of European descent. Nat. Genet. 48, 1031–1036 (2016).
Thorleifsson, G. et al. Genome-wide association yields new sequence variants at seven loci that associate with measures of obesity. Nat. Genet. 41, 18–24 (2009).
Willer, C. J. et al. Six new loci associated with body mass index highlight a neuronal influence on body weight regulation. Nat. Genet. 41, 25–34 (2009).
Speliotes, E. K. et al. Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nat. Genet. 42, 937–948 (2010).
Berndt, S. I. et al. Genome-wide meta-analysis identifies 11 new loci for anthropometric traits and provides insights into genetic architecture. Nat. Genet. 45, 501–512 (2013).
Hashimoto, T., Yamada, M., Maekawa, S., Nakashima, T. & Miyata, S. IgLON cell adhesion molecule Kilon is a crucial modulator for synapse number in hippocampal neurons. Brain Res. 1224, 1–11 (2008).
Hashimoto, T., Maekawa, S. & Miyata, S. IgLON cell adhesion molecules regulate synaptogenesis in hippocampal neurons. Cell Biochem. Funct. 27, 496–498 (2009).
Pischedda, F. et al. A cell surface biotinylation assay to reveal membrane-associated neuronal cues: Negr1 regulates dendritic arborization. Mol. Cell. Proteomics 13, 733–748 (2014).
Pischedda, F. & Piccoli, G. The IgLON family member Negr1 promotes neuronal arborization acting as soluble factor via FGFR2. Front. Mol. Neurosci. 8, 89 (2016).
Marg, A. et al. Neurotractin, a novel neurite outgrowth-promoting Ig-like protein that interacts with CEPU-1 and LAMP. J. Cell Biol. 145, 865–876 (1999).
Funatsu, N. et al. Characterization of a novel rat brain glycosylphosphatidylinositol-anchored protein (Kilon), a member of the IgLON cell adhesion molecule family. J. Biol. Chem. 274, 8224–8230 (1999).
Sanz, R., Ferraro, G. B. & Fournier, A. E. IgLON cell adhesion molecules are shed from the cell surface of cortical neurons to promote neuronal growth. J. Biol. Chem. 290, 4330–4342 (2015).
Schäfer, M., Bräuer, A. U., Savaskan, N. E., Rathjen, F. G. & Brümmendorf, T. Neurotractin/kilon promotes neurite outgrowth and is expressed on reactive astrocytes after entorhinal cortex lesion. Mol. Cell. Neurosci. 29, 580–590 (2005).
Lee, A. W. S. et al. Functional inactivation of the genome-wide association study obesity gene neuronal growth regulator 1 in mice causes a body mass phenotype. PLoS One 7, e41537 (2012).
Doan, R. N. et al. Mutations in human accelerated regions disrupt cognition and social behavior. Cell 167, 341–354.e12 (2016).
Vuong, J. K. et al. PTBP1 and PTBP2 serve both specific and redundant functions in neuronal pre-mRNA splicing. Cell Rep. 17, 2766–2775 (2016).
Boutz, P. L. et al. A post-transcriptional regulatory switch in polypyrimidine tract-binding proteins reprograms alternative splicing in developing neurons. Genes Dev. 21, 1636–1652 (2007).
Makeyev, E. V., Zhang, J., Carrasco, M. A. & Maniatis, T. The microRNA miR-124 promotes neuronal differentiation by triggering brain-specific alternative pre-mRNA splicing. Mol. Cell 27, 435–448 (2007).
Spellman, R., Llorian, M. & Smith, C. W. J. Crossregulation and functional redundancy between the splicing regulator PTB and its paralogs nPTB and ROD1. Mol. Cell 27, 420–434 (2007).
Zheng, S. et al. Psd-95 is post-transcriptionally repressed during early neural development by PTBP1 and PTBP2. Nat. Neurosci. 15, 381–388 (2012).
Li, Q. S., Parrado, A. R., Samtani, M. N. & Narayan, V. A. & Alzheimer’s Disease Neuroimaging Initiative. Variations in the fra10ac1 fragile site and 15q21 are associated with cerebrospinal fluid aβ1–42 level. PLoS One 10, e0134000 (2015).
Wassenberg, J. J. & Martin, T. F. J. Role of CAPS in dense-core vesicle exocytosis. Ann. NY Acad. Sci. 971, 201–209 (2002).
Shinoda, Y. et al. CAPS1 stabilizes the state of readily releasable synaptic vesicles to fusion competence at CA3-CA1 synapses in adult hippocampus. Sci. Rep. 6, 31540 (2016).
Farina, M. et al. Caps-1 promotes fusion competence of stationary dense-core vesicles in presynaptic terminals of mammalian neurons. eLife 4, e05438 (2015).
Rietveld, C. A. et al. Common genetic variants associated with cognitive performance identified using the proxy-phenotype method. Proc. Natl Acad. Sci. USA 111, 13790–13794 (2014).
Sun, J. et al. Ube3a regulates synaptic plasticity and learning and memory by controlling sk2 channel endocytosis. Cell Rep. 12, 449–461 (2015).
Cook, E. H. Jr. et al. Autism or atypical autism in maternally but not paternally derived proximal 15q duplication. Am. J. Hum. Genet. 60, 928–934 (1997).
Lin, M. T., Luján, R., Watanabe, M., Adelman, J. P. & Maylie, J. SK2 channel plasticity contributes to LTP at Schaffer collateral-CA1 synapses. Nat. Neurosci. 11, 170–177 (2008).
Hammond, R. S. et al. Small-conductance Ca2+-activated K+ channel type 2 (SK2) modulates hippocampal learning, memory, and synaptic plasticity. J. Neurosci. 26, 1844–1853 (2006).
Murthy, S. R. K. et al. Small-conductance Ca2+-activated potassium type 2 channels regulate the formation of contextual fear memory. PLoS One 10, e0127264 (2015).
Fakira, A. K., Portugal, G. S., Carusillo, B., Melyan, Z. & Morón, J. A. Increased small conductance calcium-activated potassium type 2 channel-mediated negative feedback on N-methyl-d-aspartate receptors impairs synaptic plasticity following context-dependent sensitization to morphine. Biol. Psychiatry 75, 105–114 (2014).
Goes, F. S. et al. Genome-wide association study of schizophrenia in Ashkenazi Jews. Am. J. Med. Genet. B. Neuropsychiatr. Genet. 168, 649–659 (2015).
Mas-Y-Mas, S. et al. The human mixed lineage leukemia 5 (mll5), a sequentially and structurally divergent set domain-containing protein with no intrinsic catalytic activity. PLoS One 11, e0165139 (2016).
Sun, X.-J. et al. Genome-wide survey and developmental expression mapping of zebrafish SET domain-containing genes. PLoS One 3, e1499 (2008).
Ali, M. et al. Molecular basis for chromatin binding and regulation of MLL5. Proc. Natl Acad. Sci. USA 110, 11296–11301 (2013).
Lemak, A. et al. Solution NMR structure and histone binding of the PHD domain of human MLL5. PLoS One 8, e77020 (2013).
Zhang, X., Novera, W., Zhang, Y. & Deng, L.-W. MLL5 (KMT2E): structure, function, and clinical relevance. Cell. Mol. Life Sci. 74, 2333–2344 (2017).
Anney, R. et al. A genome-wide scan for common alleles affecting risk for autism. Hum. Mol. Genet. 19, 4072–4082 (2010).
Torrico, B. et al. Lack of replication of previous autism spectrum disorder GWAS hits in European populations. Autism Res. 10, 202–211 (2017).
Feijs, K. L. H., Forst, A. H., Verheugd, P. & Lüscher, B. Macrodomain-containing proteins: regulating new intracellular functions of mono(ADP-ribosyl)ation. Nat. Rev. Mol. Cell Biol. 14, 443–451 (2013).
Børglum, A. D. et al. Genome-wide study of association and interaction with maternal cytomegalovirus infection suggests new schizophrenia loci. Mol. Psychiatry 19, 325–333 (2014).
Illumina, Inc. Illumina Gencall Data Analysis Software. (Illumina, Inc., San Diego, 2005).
Korn, J. M. et al. Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nat. Genet. 40, 1253–1260 (2008).
Goldstein, J. I. et al. zCall: a rare variant caller for array-based genotyping: genetics and population analysis. Bioinformatics 28, 2543–2545 (2012).
Lajonchere, C. M., AGRE Consortium. Changing the landscape of autism research: the autism genetic resource exchange. Neuron 68, 187–191 (2010).
Geschwind, D. H. et al. The autism genetic resource exchange: a resource for the study of autism and related neuropsychiatric conditions. Am. J. Hum. Genet. 69, 463–466 (2001).
Gauthier, J. et al. Autism spectrum disorders associated with X chromosome markers in French-Canadian males. Mol. Psychiatry 11, 206–213 (2006).
Fischbach, G. D. & Lord, C. The Simons Simplex Collection: a resource for identification of autism genetic risk factors. Neuron 68, 192–195 (2010).
Chaste, P. et al. A genome-wide association study of autism using the Simons Simplex Collection: does reducing phenotypic heterogeneity in autism increase genetic homogeneity? Biol. Psychiatry 77, 775–784 (2015).
Delaneau, O., Marchini, J. & Zagury, J.-F. A linear complexity phasing method for thousands of genomes. Nat. Methods 9, 179–181 (2011).
Howie, B., Fuchsberger, C., Stephens, M., Marchini, J. & Abecasis, G. R. Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat. Genet. 44, 955–959 (2012).
Howie, B. N., Donnelly, P. & Marchini, J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 5, e1000529 (2009).
1000 Genomes Project Consortium. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Price, A. L. et al. Long-range LD can confound genome scans in admixed populations. Am. J. Hum. Genet. 83, 132–135 (2008). author reply 135–139.
Chang, C. C. et al. Second-generation plink: rising to the challenge of larger and richer datasets. GigaScience 4, 7 (2015).
Patterson, N., Price, A. L. & Reich, D. Population structure and eigenanalysis. PLoS Genet. 2, e190 (2006).
Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).
Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).
Begum, F., Ghosh, D., Tseng, G. C. & Feingold, E. Comprehensive literature review and statistical considerations for GWAS meta-analysis. Nucleic Acids Res. 40, 3777–3784 (2012).
Lee, S. H., Wray, N. R., Goddard, M. E. & Visscher, P. M. Estimating missing heritability for disease from genome-wide association studies. Am. J. Hum. Genet. 88, 294–305 (2011).
Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).
1000 Genomes Project Consortium. et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).
Altshuler, D. M. et al. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010).
Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
McLean, C. Y. et al. GREAT improves functional interpretation of cis-regulatory regions. Nat. Biotechnol. 28, 495–501 (2010).
Kang, H. J. et al. Spatio-temporal transcriptome of the human brain. Nature 478, 483–489 (2011).
Acknowledgements
The iPSYCH project is funded by the Lundbeck Foundation (R102-A9118 and R155-2014-1724) and the universities and university hospitals of Aarhus and Copenhagen. Genotyping of iPSYCH and PGC samples was supported by grants from the Lundbeck Foundation, the Stanley Foundation, the Simons Foundation (SFARI 311789 to M.J.D.), and NIMH (5U01MH094432-02 to M.J.D.). The Danish National Biobank resource was supported by the Novo Nordisk Foundation. Data handling and analysis on the GenomeDK HPC facility was supported by NIMH (1U01MH109514-01 to M.C.O.D and A.D.B.). High-performance computer capacity for handling and statistical analysis of iPSYCH data on the GenomeDK HPC facility was provided by the Centre for Integrative Sequencing, iSEQ, Aarhus University, Denmark (grant to A.D.B.). S.D.R. and J.D.B. were supported by NIH grants MH097849 (to J.D.B.) and MH111661 (to J.D.B.), and by the Seaver Foundation (to S.D.R. and J.D.B.). J. Martine was supported by the Wellcome Trust (grant 106047). O.A.A. received funding from the Research Council of Norway (213694, 223273, 248980, and 248778), Stiftelsen KG Jebsen, and South-East Norway Health Authority. We thank the research participants and employees of 23andMe for making this work possible.
Author information
Authors and Affiliations
Consortia
Contributions
Analysis: J.G., S.R., T.D.A., M.M., R.K.W., H.W., J.P., S.A., F.B., J.H.C., C.C., K.D., S.D.R., B.D., S.D., M.E.H., S.H., D.P.H., H.H., L.K., J. Maller, J. Martin, A.R.M., M. Nyegaard, T.N., D.S.P., T.P., B.S.P., P.Q., J.R., E.B.R., K. Roeder, P.R., S. Sandin, F.K.S., S. Steinberg, P.F.S., P.T., G.B.W., X.X., D.H.G., B.M.N., M.J.D., A.D.B. J.G., B.M.N., M.J.D., and A.D.B. supervised and coordinated the analyses. Sample and/or data provider and processing: J.G., S.R., M.M., R.K.W., E.A., O.A.A., R.A., R.B., J.D.B., J.B.-G., M.B.-H., F.C., K.C., D.D., A.L.D., J.I.G., C.S.H., M.V.H., C.M.H., J.L.M., A.P., C.B.P., M.G.P., J.B.P., K. Rehnström, A.R., E.S., G.D.S., H.S., C.R.S., Autism Spectrum Disorder Working Group of the Psychiatric Genomics Consortium, BUPGEN, Major Depressive Disorder Working Group of the Psychiatric Genomics Consortium, 23andMe Research Team, K.S., D.M.H., O.M., P.B.M., B.M.N., M.J.D., and A.D.B. Core PI group: K.S., D.H.G., M. Nordentoft, D.M.H., T.W., O.M., P.B.M., B.M.N., M.J.D., and A.D.B. Core writing group: J.G., M.J.D., and A.D.B. Direction of study: M.J.D. and A.D.B.
Corresponding authors
Ethics declarations
Competing interests
H.S., K.S., S. Steinberg, and G.B.W. are employees of deCODE genetics/Amgen. The 23andMe Research Team members are employed by 23andMe. D.H.G. is a scientific advisor for Ovid Therapeutic, Falcon Computing, and Axial Biotherapeutics. T.W. has acted as scientific advisor and lecturer for H. Lundbeck A/S.
Additional information
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Text and Figures
Supplementary Note, Supplementary Tables 1–16 and Supplementary Figures 1–98
Rights and permissions
About this article
Cite this article
Grove, J., Ripke, S., Als, T.D. et al. Identification of common genetic risk variants for autism spectrum disorder. Nat Genet 51, 431–444 (2019). https://doi.org/10.1038/s41588-019-0344-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-019-0344-8