Main

Autism spectrum disorder (ASD) is composed of three separate diagnoses that include autism and two other milder but qualitatively similar disorders, Asperger's syndrome and Pervasive Developmental Delay-Not Otherwise Specified (PDD-NOS). ASDs are neurodevelopmental disorders characterized by deficits in communication, abnormal social interactions and rigid or repetitive interests and behaviors. Twin, family and disease modeling studies have indicated that ASDs are complex genetic disorders and that an estimated five to 15 interacting genes may be involved in the disease etiology.1,2,3,4,5

The CNS structure most commonly affected in autistic individuals is the cerebellum. Of the 22 autopsy studies, 21 display cerebellar abnormalities including a reduced number of Purkinje cells. These defects occur in the absence of any obvious sign of degeneration, suggesting that autism is caused by developmental defects.6,7,8,9,10,11,12 Cerebellar hypoplasia has also been observed in autistic individuals.11,13,14,15,16,17 Recently, the growth pattern of the cerebellum during childhood has also been shown to be abnormal. Initially, cerebellar growth is accelerated in autistic individuals compared to unaffected controls but then declines after the age of 6 years.18,19 Moreover, functional MRI studies have demonstrated that the cerebellum is active during activities that are deficient in ASD including language generation, attention and problem solving.11,20,21,22,23,24,25,26,27,28,29 Together, these experiments demonstrate that cerebellar development is perturbed in autistic individuals and that these defects might contribute to the behavioral abnormalities observed in ASD.

Mouse genetics have identified a number of genes that function during cerebellar development.30,31 One such gene is Engrailed 2 (En2), a homeobox transcription factor that is orthologous to Drosophila melanogaster engrailed. Both loss of function and transgenic misexpression mutants have been generated for the mouse En2 gene. Interestingly, both types of mutations display a phenotype that is reminiscent of the cerebellar anatomical abnormalities reported for autistic individuals. Adult mice for both mutants are nonataxic, but their cerebella are hypoplastic with a reduction in the number of Purkinje cells and other cell types.32,33,34,35,36 Both mouse mutants also display cerebellar foliation defects that have not been reported in autistic individuals. Closer anatomical examination of these mice has revealed that these phenotypes are due to abnormal postnatal development.32,33,34,35,36,37

Human EN2 maps to distal chromosome 7 (7q36.3), a region that has provided suggestive evidence for linkage in three studies from two independent genome scans.38,39,40 For these reasons, EN2 was examined as a susceptibility locus for ASD by performing family-based association analysis.

Materials and methods

Subjects

Families recruited to the Autism Genetic Resource Exchange (AGRE) were used for these studies. AGRE is a central repository of family DNA samples created by The Cure Autism Now (CAN) Foundation and the Human Biological Data Interchange. The selection criteria require that at least two family members have a diagnosis of autism, Asperger's syndrome or PDD-NOS. The diagnosis and characteristics of these families are described in detail elsewhere.38,41 For our analysis, a narrow diagnosis is defined as only autism, while a broad diagnosis includes individuals affected with either autism, Asperger's syndrome or PDD-NOS. Although unaffected siblings have not undergone an ADI-R evaluation, extensive medical histories that include neurological, psychological and medical evaluations are available for 17 of the 169 unaffected siblings used in this study. None of the 17 unaffected siblings display characteristics of a broad phenotype. Karyotypic data are now available for 73 families on the AGRE website (http://www.agre.org). In all, 60 of these families are used in our study. Only two are karyotypically abnormal (AU0065 and AU0106), with a duplication of SNRPN on chromosome 15q12 (a marker for cytogenic abnormality at the chromosome 15 autism critical region), while the others are either normal or in the process of being analyzed.

In this study, DNA was initially obtained from 138 autistic individuals and their parents (parent–offspring triads). These triads were derived from independent nuclear families by randomly selecting with a diagnosis of autism and both parents. In the second stage of our analysis, the number of our samples was increased to include other affected and unaffected siblings from the 138 original families and an additional 29 nuclear families to give a total of 167 pedigrees. These families comprised 316 triads and 169 phenotypic discordant sib pairs (DSP) under the broad diagnosis (753 total subjects). For the narrow diagnostic classification, 166 pedigrees with 256 triads and 135 DSP were analyzed (n=689).

The 167 nuclear pedigrees were selected based on the initial Columbia University genome scan study38 and subsequent QTL analysis.39 Although these studies report the results of analysis in 110 and 152 families respectively, linkage data on 183 families were available at the time our samples were selected. Of this total, a number of families were not ordered based either on substantial missing genotyping data or in the case of three families evidence for nonidiopathic autism due to fragile X syndrome. Of our 167 families, 34 include multiple births. There are 20 families with DZ multiple births (18 are DZ twins and two are DZ triplets) and 14 families with MZ twins (of which eight families have DNA available for both co-twins). For the DZ twin families, all siblings are genotyped and included in the data analysis. However, for the MZ twins only one of the co-twins is selected for analysis (five of the eight families show phenotypic concordance between the MZ twins (both are autism : autism) and three MZ twin pairs are autism : PDD discordant. For the latter three families, the twin with the narrow diagnosis of autism was selected for further analysis). The DNA availability of eight MZ twin pairs served as internal controls for the estimation of genotyping errors. Genotypic concordance between MZ co-twins was consistently obtained for all four single-nucleotide polymorphisms (SNP) assays.

DNA analysis

The dbSNP database (http://www.ncbi.nlm.nih.gov/SNP) was used to identify EN2 SNPs. The frequency and validity of each SNP was determined by direct sequencing of approximately 200 bp of DNA encompassing each SNP in 24 unrelated individuals (23 Caucasian and one of Hispanic/Latino descent). No additional SNPs were identified by this analysis. Once the SNPs were confirmed, a tetraprimer Amplification Refractory Mutation System (ARMS)-PCR strategy42 or primer extension strategy (Pyrosequencing™ system)43,44 was used to genotype individuals. Primers were designed using publicly available software (tetraprimer ARMS-PCR-http://cedar.genetics.soton.ac.uk/public_html/primer1.html; Pyrosequencing™-http://www.pyrosequencing.com/pages/technical_supp.html). Sequenced individuals were used as controls for optimization of the genotyping assays. SNPs, rs3735653 and rs2361689, were typed as simplex Pyrosequencing™ assays using the automated PSQ HS 96A platform as described previously.43,44 Primer sequences can be obtained from the authors on request. For rs2361689, the following PCR amplification conditions were used: 0.25 μM of each primer, 0.2 mM dNTP, 1.875 mM MgCl2, 50 mM KCl, 10 mM Tris-HCl (pH 9.0), 0.1% Triton X-100 using standard cycling conditions 94°C, 1 min, 1 ×; 94°C, 30 s, 59°C, 30 s, 74°C, 30 s, 45 ×; 74°C, 10 min, 1 ×. For rs3735653, identical conditions were used except 0.1 mM dATP, dCTP, dTTP, 25 μM dGTP and 75 μM 7-deaza-2′-deoxyguanosine triphosphate was included.

For rs1861972 and rs1861973, a tetra-ARMS PCR genotyping assay was employed as described.42 Cycling conditions were as described above for rs2361689 except for the following changes: rs1861972, 72°C annealing temperature in a 10 mM Tris-HCl (pH 8.3), 1.5 mM MgCl2, 25 mM KCl buffer for 30 cycles; rs1861973, 65°C annealing temperature in a 10 mM Tris-HCl (pH 8.8), 1.5 mM MgCl2, 25 mM KCl 10 × buffer for 30 cycles. As an additional control for the rs1861972 and rs1861973 tetra-ARMS PCR assays, pyrosequencing assays have recently been developed for both SNPs and identical genotypes have been generated for the 25 individuals tested.

Error checking of genotype data revealed only two Mendelian inconsistencies (one for SNP rs1861972 and one for SNP rs3735653), both of which were resolved with repeat genotyping. Full four SNP haplotype analysis identified a further 15 likely genotyping errors (two for SNP rs3735653; five for SNP rs1861972; three for SNP rs1861973; and six for SNP rs2361689), which again were resolved by repeat genotyping analysis. A number of sample genotypes could not be assigned due to repeated assay failure or unclear/poor-quality genotype results. These represented a total of 39 missing genotypes and consist of 18 genotypes for SNP rs3735653; three for SNP rs1861972; 10 for SNP rs1861973; and eight for SNP rs2361689. Therefore of the total 3012 genotypes assayed, we identified two Mendelian inconsistencies, a further 15 genotype errors were revealed by haplotype analysis, and 39 genotypes could not be assigned and represented missing data.

Statistical analysis

All genotype data were checked for Mendelian inconsistencies prior to transmission/disequilibrium test (TDT) analysis using the PEDCHECK program version 1.1.45 Each SNP was assessed for deviations from Hardy–Weinberg equilibrium using genotype data from all parents and standard formulae. Error checking for haplotype inconsistencies and all haplotype constructions in the extended pedigrees were carried out using the SIMWALK program version 2.83.46 The linkage disequilibrium (LD) coefficient (delta) was calculated for pairs of SNPs according to published methods,47 using parental genotypes from the original 138 parent–offspring triads. The sign in front of the coefficient indicates whether LD is positive (allelic association between common alleles of the two loci) or negative (association between the common allele at one locus and the rare allele at the other). Delta=0 indicates no LD between the two loci. The significance of the delta can be tested by taking N (delta)2 (where N is the total number of haplotypes observed), this is distributed as χ2 with one degree of freedom.

Single marker and haplotype TDT analyses for the initial 138 triads were performed using the TRANSMIT program (version 2.5.4).48 Compared to other programs, TRANSMIT is advantageous because it can handle both missing parental data and the transmission of multilocus haplotypes, even if phase is unknown. It uses a score test, which is based on the ‘conditional on parental genotype’ likelihood. When transmissions are fully observed, the score test reduces to the familiar Pearson χ2 test. Minimum haplotype frequencies were set at 0.05 for the TRANSMIT analysis (using the –c# flag), ie rare haplotypes with frequency of less than 5% were pooled. For multilocus analysis, both the global P-value (which assess the significance of transmission distortion for all test haplotypes) and P-values, which assess the significance of transmission distortion for specific haplotypes are calculated. The bootstrap simulation procedure implemented in TRANSMIT carries out multiple samplings of the data to control for haplotype ambiguities and derive exact P-values (if sufficient bootstrap samples are drawn).

The TRANSMIT output gives the total number of observed and expected transmissions for each allele/haplotype, which includes uninformative transmissions from homozygous parents. The number of informative transmissions from the heterozygous parents was calculated as stated in the TRANSMIT documentation (http://www-gene.cimr.cam.ac.uk/clayton/software/transmit.txt) as explained below:

  1. i)

    Multiply the tabulated value for Var (OE) (Variance of Observed–Expected) by 4 to give N (where N is equivalent to the number of fully informative transmissions, that is, total number of heterozygous parents).

  2. ii)

    N/2=expected informative transmissions under null hypothesis.

  3. iii)

    The difference between the expected counts from TRANSMIT output table and the informative expected counts (N/2)=number of expected transmissions coming from uninformative homozygous parents. This number is subtracted from the observed transmissions from the TRANSMIT output to give the observed transmissions (T) from the informative heterozygous parents. We then replaced the expected column with the ‘Untransmitted’ counts (UT) and this was derived as NT=UT.

To test for sex difference in susceptibility at this locus, the haplotype transmissions of the two intronic markers for the 138 triads were examined by TRANSMIT using the –s# flag (considers each sex separately by recoding the other sex as unknown prior to analysis). In this series, there are 105 male and 33 female autistic probands. A 2 × 2 contingency table and the Fisher's exact test was then used to compare the transmission ratios obtained for the male triad and the female triad families. A significant difference would be indicative of a sex bias effect in susceptibility at this locus.

Haplotype constructions were conducted using SIMWALK version 2.83,46 assuming no recombination events between markers. In the absence of parental phase ambiguities, each haplotype was recoded as a single allele (creating a pseudosingle marker). This allowed haplotype transmissions to be tested by programs such as pedigree-based transmission/disequilibrium test (PDT)49 and ETDT version 2.4,50 which are designed for single locus analysis. The ETDT program50 was used to assess evidence for a parent-of-origin effect (using the intronic haplotype data from the original 138 triads) since the TRANSMIT program lacks this feature. Haplotype data for the triads were extracted from the nuclear family data after recoding of haplotypes as a pseudosingle marker (see above), prior to ETDT analysis.

The PDT version 4.049 was used for TDT analysis in the extended pedigrees. PDT was designed to allow the use of data from related triads and disease discordant sibships from extended pedigrees when testing for transmission disequilibrium. It determines the presence of association by testing for unequal transmission of either allele from parents to affected offspring and/or unequal sharing of either allele between discordant sibships. Informative extended pedigrees contain at least one informative triad (ie an affected child with at least one parent heterozygous at the marker) and/or discordant sibship (ie at least one affected and one unaffected sibling with different marker genotypes). PDT has been shown to provide substantial gains in power over other similar tests that utilize only a subset of the family data. Furthermore, although misclassification of affected individuals as unaffected is expected to lead to a loss of power, PDT has been shown to be more robust than other tests when extended family data are available.49 PDT has two global scores: the ‘PDTsum’, which sums the level of significance from all families, and the ‘PDTave’, for which equal weight is given to all families in a data set. For our data, the χ2 distribution and P-values obtained under both test statistics gave very similar results. We report the PDTsum data in our results section.

Results

EN2 is composed of two exons separated by a single 3.3 kb intron. Four SNPs (rs3735653, rs1861972, rs1861973 and rs2361689) that span the majority of the EN2 gene were tested for transmission disequilibrium in parent–offspring triads and small nuclear families obtained from the AGRE data set. SNP rs3735653 in exon 1 alters the coding sequence from a Leu to a Phe, while rs2361689 in exon 2 results in a synonymous change for Leu. SNPs rs1861972 and rs1861973 are located 152 bp apart in the intron, approximately 1.3 kb 5′ of rs2361689 (exon 2) and 2.5 kb 3′ of rs3735653 (exon 1). The allele frequencies for each SNP are given in Table 1a. Varying strengths of LD is observed between all four SNPs ranging from delta=0.89 (P=6.4 × 10−49) for the intronic SNPs rs1861972 and rs1861973, to delta=−0.22 (P=0.0004) for rs3735653 and rs1861972 (Table 1b).47

Table 1 (a) EN2 SNP characteristics and (b) LD between SNPs

To test initially whether certain variants of EN2 were transmitted more frequently than expected by chance to autistic individuals, the genotype of each SNP was determined in 138 parent–offspring triads that fit the narrow diagnostic criteria. Once all genotypes were verified to be in Hardy–Weinberg equilibrium, allelic transmissions were assessed for each SNP by the TDT.51 This analysis was conducted using the TRANSMIT program.48 A significant overtransmission of the rs1861972 A-allele to affected offspring was observed, with 73 of 113 heterozygous parents (65%) transmitting the A-allele and only 40 transmitting the G-allele (P=0.0018) (Table 2a). A similar distortion of transmissions was observed for the rs1861973 SNP with 71 of 104 heterozygous parents (68%) preferentially transmitting the C-allele to their autistic offspring (P=0.0003) (Table 2a). However, no evidence for association was obtained for either of the flanking exonic SNPs (Table 2a).

Table 2 (a) TDT results for rs3735653, rs1867972, rs1867973 and rs2361689; (b) rs1861972-rs1861973 haplotype; and (c) rs3735653-rs1867972-rs1867973-rs2361689 haplotypes in 138 autism parent–offspring triads

Haplotype transmissions for these SNPs were then examined using the TRANSMIT software.48 These tests were reiterated 1 000 000 times by a bootstrap simulation procedure in order to control for ambiguous haplotypes and to derive exact P-values. Two types of haplotype analysis were performed. First, the intronic markers were analyzed alone since association was only observed for these SNPs. Next, the entire four SNP haplotype was examined for transmission distortions. In our population, the rs1861972 and rs1861973 A–C haplotype has an observed frequency of 68.8%. This haplotype is specifically overtransmitted to affected individuals (bootstrap P=0.00011) (Table 2b). All other haplotypes are undertransmitted (Table 2b). Global χ2 tests based on the common haplotypes A–C and G–T (frequency > 5%) yielded a bootstrap P-value of 0.000005, demonstrating significant evidence and association between these markers and the autism phenotype (Table 2b).

Extension of the haplotypes to include the exonic markers rs3735653 and rs2361689 resulted in an overall reduction in the significance of the haplotype transmission distortions (Table 2c). Four of the 13 haplotypes observed in this population were composed of the core intronic A–C haplotype and although each demonstrated some degree of excess transmission, inclusion of the exonic markers resulted in a dilution of this effect and statistical significance was no longer achieved (Table 2c).

The haplotype transmissions of the two intronic markers were also investigated for parent-of-origin and sex differences using ETDT50 and TRANSMIT48 software, respectively. When maternal and paternal transmissions were investigated separately, no parental differences in transmission ratios were observed (data not shown). Analysis of the haplotype transmission data separately by sex of proband also demonstrated no significant difference in transmission ratios (data not shown). Therefore, male and female affected individuals appear equally likely to inherit the A–C haplotype. Thus, these data do not support significant parent-of-origin or sex bias effects at this locus.

To determine whether these association findings could be extended to a larger sample, the entire pedigree for the initial 138 triads (ie including other affected and unaffected siblings) plus a further 29 nuclear families (167 in total) were genotyped for the four SNPs. Since these pedigrees include siblings affected with the broad ASD phenotype of Asperger's syndrome or PDD-NOS, association could be tested under both a narrow (autism) and broad (autism, Asperger's syndrome, PDD-NOS) diagnostic schemes. The PDT (PDT version 4.0)49 was chosen for this analysis because the program was designed specifically to test for association using data from multiple related triads as well as information with regard to allele sharing among discordant sibships within the extended pedigrees.

The result of the PDT analysis for each SNP is presented in Table 3a. Once again significant evidence for association was observed for the intronic markers rs1861972 and rs1861973 under both the narrow and broad diagnostic schemes (rs1861972, narrow: P=0.0290, broad: P=0.0175; rs1861973, narrow: P=0.0073, broad: P=0.0107). The A-allele of rs1861972 and the C-allele of rs1861973 were significantly overtransmitted from heterozygous parents and were over-represented in affected sibs compared to unaffected sibs of DSPs (Table 3a). Analysis of the exonic SNPs (rs3735653 and rs2361689) once again demonstrated a lack of association when tested under both the narrow and broad diagnostic classifications (Table 3a).

Table 3 (a) rs3735653, rs1867972, rs1867973 and rs2361689 PDT results in 167 extended pedigrees and (b) rs1861972 and rs1861973 haplotype PDT results for 158 extended pedigrees

Next, we performed haplotype TDT analysis in the extended pedigrees. Since these markers are in tight LD and are only separated by a distance of less than 4 kb of DNA, recombination events between these SNPs are highly unlikely in a sample of this size. Assuming the absence of recombination, unambiguous haplotypes could be assigned to all individuals in 158 of the 167 pedigrees (for the two intronic marker analysis) and 146 of 167 pedigrees for the four marker haplotypes. The pedigrees in which parental haplotype phase could not be distinguished were omitted from further analysis. Each haplotype was recoded as a single allele. By recoding the haplotypes as a single locus, the transmissions could be analyzed by PDT. The A–C haplotype is again overtransmitted under both diagnostic classifications and over-represented in affected siblings of DSPs at statistically significant levels (narrow, P=0.0018; broad, P=0.0035) (Table 3b). Conversely, the A–T, G–C and G–T haplotypes are all undertransmitted and are not observed more frequently in affected siblings of DSPs (Table 3b). Global χ2 tests for all haplotypes also yielded significant P-values (narrow, P=0.0009; broad, P=0.0024) (Table 3b).

Inclusion of the flanking rs3735653 and rs2361689 SNPs in the full four marker haplotype analysis once again failed to provide significant evidence for association of any specific haplotype with ASD under both diagnostic schemes (Table 4). Three SNP haplotype analyses consisting of the two intronic markers with each of the exonic markers in turn (rs3735653–rs1861972rs1861973 and rs1861972rs1861973–rs2361689) were also carried out in an attempt to further elucidate the disease association in this region. However, these analyses also failed to identify a single significantly predisposing haplotype (data not shown). Overall, these results are consistent with the weaker LD observed between the rare alleles of the exonic markers (rs3735653 and rs2361689) and the common predisposing alleles of the intronic markers rs1861972 and rs1861973 (Table 1a).

Table 4 rs3735653, rs1867972, rs1867973 and rs2361689 haplotype PDT results in 146 extended pedigrees analyzed under the narrow and broad diagnoses

Discussion

In this study, we report the investigation of a candidate susceptibility gene ENGRAILED2, for ASD. This gene was selected for analysis based on functional data from mouse mutant studies and the genomic localization of the human EN2 gene to a region on chromosome 7q that has previously displayed suggestive linkage to ASD.6,7,8,9,10,36,38,39,40,37 Four SNPs were examined for evidence of excess transmission from parents to affected offspring. Our data have demonstrated significant association between the intronic A–C rs1861972rs1861973 haplotype and ASD in both the 138 triads and 167 extended pedigrees. These data represent one of the more significant association between a candidate gene and ASD in the publicly available AGRE data set (http://www.agre.org).52,53,54,55,56,57,58,59,60,61 The triad data also define a minimal population for future family-based association studies that is both a powerful and cost-efficient alternative to using extended nuclear families.

No evidence for association was observed for the two exonic markers. Furthermore, four SNP haplotype analysis in the extended pedigrees has revealed slight undertransmission for one of the four core intronic A–C haplotypes (C–A–C–C) (Table 4), indicating that only a subpopulation of the common A–C rs1861972rs1861973 haplotype is in association with ASD. Three SNP haplotype analysis of the two intronic markers with each of the exonic markers (rs3735653–rs1861972rs1861973 and rs1861972rs1861973–rs2361689) also failed to identify a single significantly predisposing haplotype (data not shown). These results argue against a functional role for the A- and C-alleles of rs1861972 and rs1861973 and suggest that they are likely to be nonfunctional polymorphisms in LD with other variant(s) that contribute to autism.

In this study, we have used a family-based association method to demonstrate evidence for association at the EN2 locus on chromosome 7q36.3 with ASD. Previously, three separate studies of two independent genome scans yielded only suggestive evidence for linkage between ASD and 7q36. However, only one of these studies used markers that spanned the EN2 locus. In this Finnish report, suggestive linkage to a combined phenotype of ASD and dysphasia was obtained at marker D7S550 (LOD=2.02), which is located about 170 kb distal of EN2.40 The other two studies were carried out using a subset of the AGRE families that largely overlap with our data set.38,39 Liu et al38 carried out fine mapping analysis of the region proximal to EN2 in the original 110 AGRE families and reported a LOD score of 2.13 for D7S483 located approximately 5.5 Mb proximal of EN2.38 In a further study, Alarcon and co-workers used the same set of microsatellite markers and 152 AGRE families to map quantitative trait loci implicated in language deficits to distal chromosome 7 at a distance of less than 1 Mb from the EN2 locus (P=0.001).39 Recently, Yonan et al performed a genome scan on the complete set of 345 AGRE families and observed only minimal linkage at distal chromosome 7 (LOD<1.3).62 However, once again only markers proximal to the EN2 locus were utilized. Hence, to further investigate linkage of this genomic region to ASD, additional markers spanning the EN2 locus will be analyzed for linkage and association with both qualitative and quantitative variables of the ASD phenotype in the complete set of AGRE samples.

Two previous studies have investigated EN2 as an autism susceptibility locus. In a case–control study performed on a Northern French population, significant association (P<0.01) was observed for a PvuII polymorphism located 5′ of the EN2 promoter,63 supporting a role for this gene in autism development. Future analysis will investigate whether this PvuII polymorphism is associated with ASD in the AGRE population and whether it is in LD with the rs1861972rs1861973 A–C haplotype. More recently, Zhong et al64 examined rs3735653 from exon 1 using a subset of families from the AGRE data set. They demonstrated a lack of association between rs3735653 and autism (P=0.58). We have also observed no association between rs3735653 and ASD in the AGRE data set, and these results are consistent with the weaker LD observed between alleles of this marker and the ASD-associated intronic markers.

In summary, these data demonstrate the significant association of a cerebellar patterning gene with ASD, suggesting a role for EN2 as a susceptibility locus and supporting the hypothesis that genetic alterations that affect cerebellar development could predispose individuals to both autism and related ASD. Further LD mapping of the region spanning 5′–3′ of the EN2 gene is required to elucidate the role of the ASD-associated intronic markers and to reveal the identity of the putative functional variant(s). Cis-regulatory sequences are often located within the first intron of genes,65,66 so it is possible that the functional variant associated with ASD maps to this region and causes the misregulation of EN2 during cerebellar development. As expected, when the intron is scanned for cis-regulatory sequences using computer prediction programs (http://www.genomatix.de:80), multiple potential binding sites for developmentally regulated transcription factors are observed. Consistent with this idea of functional variants affecting the expression of associated genes, causative alleles responsible for bipolar disorder and rheumatoid arthritis have recently been identified in the promoter and intron of XBP1 and SLC22A4, respectively.67,68

During mouse cerebellar development, the spatial and temporal regulation of En2 is tightly regulated. For example, from E17.5 to postnatal day 4 (P4), En2 is expressed in spatially restricted ‘stripes’. Within these ‘stripes’, En2 is expressed in all primary cerebellar cell types (granule, deep nuclei and Purkinje cells). However, by P4, a developmental switch occurs so that En2 is no longer expressed in Purkinje or deep nuclei cells, but is now restricted to differentiating granule cells. In an En2 transgenic that causes the prolonged expression of En2 in Purkinje cells past P4, adult mice exhibit an ‘autistic-like’ cerebellar phenotype, a decreased number of Purkinje cells and hypoplasia. Interestingly, the En2 mouse knockout also displays hypoplasia and reduced number of Purkinje cells, indicating that decreased levels of En2 could result in a similar phenotype. Both mutants disrupt the topographic mapping of spinocerebellar mossy fibers,36,37 which could in turn affect the electrophysiological function of the cerebellum. Together, these data suggest that functional variants in human EN2 that affect either the level or the spatial/temporal expression of the gene during human cerebellar development might contribute to the anatomical cerebellar phenotypes observed in autism which could play a role in the underlying etiology of autism. These data make EN2 an excellent candidate for an autism susceptibility locus. Future work will include both further LD mapping to determine whether other variants within EN2 are associated with ASD and analysis of additional data sets to investigate whether association of EN2 to ASD is observed in other populations.