Introduction

Alzheimer's disease (AD), especially the late-onset form, is the leading cause of dementia among older persons. Twin studies have shown that genetic factors have an important role in explaining AD with up to 79% heritability.1 Beyond the well-established apolipoprotein E association, several susceptibility loci have been identified by large-scale genome-wide association studies.2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 However, single-nucleotide polymorphisms (SNPs) at all those loci explain <50% of AD heritability.

To identify additional genetic factors related to AD, genome-wide scans for copy-number variants (CNVs) have been conducted recently.16, 17, 18, 19, 20, 21, 22, 23, 24 Swaminathan et al.18 reported a trend for reduced levels of deletions and duplications in AD cases compared with controls; whereas Ghani et al.20 found no significant differences between cases and controls in CNV rate, distribution of deletions or duplications, total or average CNV size or number of genes affected by CNVs, except for a nominal association between AD and a duplication on 15q11.2. Chapman et al.22 reported a limited contribution of rare CNVs to AD risk, but proposed that healthy elderly individuals have a reduced rate of large deletions. The inconsistent genome-wide CNV findings in AD may partially be due to its heterogeneous nature.

About 40–60% of late-onset AD patients will develop psychosis. AD+P has more severe cognitive and functional deficits. It has more rapid deterioration than AD without psychosis and represents a distinct phenotype with a genetic basis. The estimated heritability of AD+P is about 61%;25 however the genetic variants that explain this high heritability are unknown. The first genome-wide association study of AD+P was performed recently by Hollingworth et al.26 They have identified several suggestive loci that were associated with AD+P when compared with AD−P and/or general population, but none of them were genome-wide significant. To systematically identify the genetic variants that account for additional heritability of AD+P, we conducted a genome-wide CNV study of AD+P using the Illumina HumanOmni1-Quad BeadChip (San Diego, CA, USA).

Materials and methods

Subjects

Subjects were participants in a study on the genetics of AD, which recruited 1440 AD cases (mean age-at-onset 72.6±6.4 years) through the University of Pittsburgh Alzheimer's Disease Research Center and 1000 older controls (mean age 74.07±6.20 years). All the participants were examined with detailed clinical, laboratory and neuroimaging evaluations.27, 28 Data from 1291 AD patients that passed stringent quality control criteria from our previous genome-wide study2 were used in this study. Subjects with AD met criteria for either probable or definite AD according to criteria set by the National Institute of Neurological and Communicative Disorders and Stroke–Alzheimer’s disease and Related Disorders Association and the Consortium to Establish a Registry for Alzheimer’s disease.29 Subjects with evidence of persistent and or recurrent delusions or hallucinations, defined by the Consortium to Establish a Registry for Alzheimer’s disease behavior rating scale30 items for psychotic features (items 33–35) rated as occurring three or more times in the past month at any visit, were classified as AD+P, as previously described.31 Subjects with evidence of infrequent distortions of thought or perception, defined by any of these items rated as occurring one to two times in the past month, were classified as ‘indeterminate psychosis’ as such symptoms can result from a variety of sources. Subjects rated as no occurrence of possible psychotic symptoms at all visits were classified as AD−P. Subjects were excluded if they had a previous history of schizophrenia (SCZ) or bipolar disorder. Details on the study population characteristics can be found elsewhere.2, 26 Among the 1291 AD subjects, 496 were classified as AD+P, 639 as AD with ‘indeterminate’ psychosis and 156 as AD−P. Because the ‘indeterminate’ group is actually an ‘intermediate’ group and not an ‘unknown’ group, we include them in this study to increase statistical power.

Illumina assay for CNV discovery

All subjects were genotyped using the high-throughput genome-wide SNP Illumina HumanOmni1-Quad BeadChip (containing 1 016 423 SNPs and/or CNVs).2 We generated CNV calls using the PennCNV software (2009 Aug27 version).32 PennCNV is a Hidden Markov Model (HMM)-based method. It uses the log R ratio (LRR) and B allele frequency (BAF) measures computed from the signal intensity files by Beadstudio to detect the CNVs. The simultaneous analysis of intensity data and genotype data in the same experimental setting generated high accurate definition of normal diploid states and any deviation from norm.

CNV quality control

Details on the quality control parameters applied on the samples and markers have been described previously.2, 26 Criteria for exclusion of samples included the consent and diagnostic criteria, high genotyping failure rate and/or cryptic relatedness. Briefly, only samples with call rates >98% were retained. Samples were filtered if mean X-chromosome heterozygosity 0.02 for males or outside the range of 0.25~0.4 for females. Genetic outliers and those with evidence of relatedness (IBD estimate 0.4) or non-European ancestry based on genotype data were also excluded. After filtering, 496 AD+P cases, 639 AD intermediate P, 156 AD−P controls were retained. Markers were excluded if they had a genotype missing rate of >0.02. Markers were examined to determine whether missingness depended on case/control status or the genotyping batch. To call CNVs, we used the GC model wave adjustment procedure in PennCNV to smooth out genomic waving, since wave artifacts correlating with GC content and resulting from hybridization bias of low full-length DNA quantity can interfere with the accurate inference of CNVs. After GC model adjustment, we filtered the samples that met the criterion of LRR standard deviation 0.3, and |GC base pair wave factor| >0.05. About 10% of all samples (122 subjects) were further removed. A total 440 AD+P cases, 593 AD intermediate P, 136 AD−P controls were used in the final analysis. Additional QC was applied at the CNV level by excluding CNV calls <5 kb in length and spanning <5 probes.33 All procedures followed the user guidelines of PennCNV. Table 1 summarizes the number of samples in different categories after quality control measures.

Table 1 Descriptive characteristics for all samples

Statistical analysis of CNVs

Complex CNV overlap is simplified by producing SNP-based statistics. The CNV frequencies between groups were compared at each SNP using an ordered logistic regression model. Psychosis status was treated as an ordered categorical outcome: AD−P, AD with intermediate psychosis and AD+P. The predictor was CNV states. Copy number=2 per individual was considered normal. Deletion and duplication CNVs were defined as CNV calls with copy number <2 and >2, respectively. Deletion and duplication CNVs and normal copy number were coded as separate variables (that is, categorical model with three categories). To adjust for the confounding effect of population stratification, the first four principal components of population structure were included as covariates in the regression model. Age was also adjusted for.

Multiple testing was corrected using the Benjamini–Hochberg false discovery rate. We reported associations with false discovery rate <0.01. We reported statistical local minimums to narrow the association in reference to a region of nominal significance. Regions of significance ranging within a power of 10 are reported, as previously described.34 The resulting nominally significant CNV regions were excluded if they met either of the following criteria: (i) residing on telomere- or centromere-proximal cytobands; (ii) located in a 'peninsula' of common CNVs arising from variation in boundary truncation of CNV calling. Statistical comparisons were performed only for CNV events observed on autosomes, due to the complexity of analyzing the X and Y chromosomes. All data analyses were conducted in R (version 3.0.1).35 Manhattan plots were used to show the genome-wide P-values at each marker. As a final step to confirm the validity of events with significantly frequency differences, LRR and BAF plots were visually inspected to ensure that called CN gains and losses were visually evident and not simply artifacts of the calling algorithm. Human NCBI Build 36 (hg18) was used for assigning chromosome locations.

CNV load

For each individual, we also defined several measures of autosomal genomic CNV burden and compared mean values between phenotypic groups. CNV burden for each individual was defined in three distinct ways; (1) the total length of genomic DNA involved in identified CNV events; (2) average length of CNVs; and (3) total number of CNVs. Comparison of CNV burden among the groups was conducted using linear regression in R. The outcomes were CNV burdens, the predictor was three AD groups, which were coded as separate variables (that is, categorical model with three categories).

Results

Overall CNV burden

Clinical characteristics of the sample can be found in Table 1. We calculated the overall CNV burden including total CNV length, average CNV length and number of observed CNV events in each group of samples and also stratified copy number into duplications and deletions. Comparison of total and average CNV length and CNV number revealed no significant difference in the AD+P or AD intermediate P groups compared with the AD−P groups (reference level), as seen in Table 2. In the univariate model, average number of duplication events in AD−P subjects was 19.7, compared with them, average number of duplication events in AD+P subjects was marginally significantly lower (beta coefficient=−3.0, P=0.072); average number of duplications was still lower in AD+P than in AD−P (beta coefficient=−3.2, P=0.059) group in the multivariable model even after adjusting for age, sex and the first four principal components of population substructure.

Table 2 Overall CNV burden

CNV association analyses

Duplication CNVs

We conducted a genome-wide scan for CNVs associated with AD+P. The genome-wide P-values for duplication CNVs are shown in a Manhattan plot (Figure 1). We identified a duplication CNV on chromosome 19 that was genome-wide significantly protective against AD+P (odds ratio=0.42, P=7.20E−10 for the test based on ordered logistic regression). Forty-three out of 440 AD+P subjects carried this duplication (frequency=9.8%), while 172 out of 593 AD intermediate P and 33 out of 126 AD−P subjects had this duplication (frequency=29.0 and 24.3%, respectively). Although the duplication frequency is slightly higher in the intermediate group than in the AD−P subjects, the two groups are very similar and we interpret this departure from an ordered model as a function of the small sample size. It is worth noting that this result was highly significant even under the ordered model that we fit. This was a hemizygous duplication (copy number of 3). It was located on chromosome 19p13.3 and affected only one gene, APC2 (adenomatosis polyposis coli 2). Examples of this duplication in one CNV carrier and one subject without CNV based on LRR and BAF are shown in Figure 2. In addition, we detected four other duplications in which frequency was significantly lower (P<1.0E−05) in AD+P cases compared with non AD+P subjects. One was located on chromosome 16 and affected the ZFPM1 (zinc finger protein, FOG family member 1) gene; for those who carry the duplication, the odds of being AD+P versus non AD+P was 0.44 times lower than for those without this duplication (P=2.13E−07). The second duplication was located on chromosome 14 and affected the JAG2 (jagged 2) gene; the odds ratio of having AD+P in CNV carriers versus non-carriers was 0.43 (P=5.01E−07). The third duplication was on chromosome 9 and affected the SET (SET nuclear oncogene) gene; the AD+P odds ratio was 0.44 (P=1.95E−06) in CNV carriers. The fourth duplication was on chromosome 17 with AD+P odds ratio of 0.49 (P=4.25E−06) and affected no known genes. Detailed information on all five AD+P associated duplications is provided in Table 3.

Figure 1
figure 1

Manhattan plot shows the genome-wide P-values for duplication CNVs associated with AD+P. This Manhattan plot demonstrates the locations across the chromosomes of the human genome (horizontal axis) where associations between duplication calls at markers (dots) and AD+P are shown using −log10 P-values (vertical axis). The higher the dots are, the stronger the genetic associations are. The strongest signal is seen on chromosome 19. AD, Alzheimer's disease; CNV, copy-number variant; P, psychosis.

Figure 2
figure 2figure 2

Examples of duplication on chr 19 in one CNV carrier (a) and one non CNV subject (b), and examples of deletion on chr 9 in one CNV carrier (c) and one non CNV subject (d) based on LRR and BAF. Each data point in the plots is a single marker (SNP or CNV marker). The x axis shows marker’s base pair position on chromosome 19, based on Human NCBI Build 36 (hg18). For each subject, the y axis on the top panel is the intensity data termed LRR and the bottom panel is the genotype data termed BAF. (a) Chromosome 19 duplication carrier. In the LRR plot, the expected log2R ratio is zero for normal copy of autosomes; increases in log R ratio were observed in arrowed region; in the BAF plot, the values at 0, 1/2 and 1 represent the expected positions of disomic AA, AB and BB genotypes, respectively. For a hemizygous duplication with copy number of 3, it would have a wider BAF split (at 1/3 and 2/3). However, BAF values at 1/3 and 2/3 were not observed in this case because this region is composed of homozygous SNP markers and a lot of CNV markers. (b) Subject without duplication. In the LRR plot, the arrowed region has LRR around zero; in the BAF plot, the values were clustered around 0, 1/2 and 1 as expected for normal copy number. (c) Chromosome 9 deletion carrier. In the LRR plot, the expected log2R ratio is zero for normal copy of autosomes; decreases in log R ratio were observed in arrowed region; in the BAF plot, the values at 0, 1/2 and 1 represent the expected positions of disomic AA, AB and BB genotypes, respectively. For a hemizygous deletion with copy number of 1, the BAF value on 1/2 disappeared. (d) Subject without chr 9 deletion. In the LRR plot, the arrowed region has LRR around zero; in the BAF plot, the values were clustered around 0, 1/2 and 1 as expected for normal copy number. BAF, B allele frequency; CNV, copy-number variant; LRR, log R ratio; SNP, single-nucleotide polymorphism.

Table 3 Summary of the CNVs for which frequency was significantly different among AD+P, AD intermediate P and AD−P

Deletion CNVs

A Manhattan plot showing genome-wide P-values for association of AD+P with deletion CNVs is presented in Figure 3. None of the deletions is genome-wide significantly associated with AD+P. However, we identified two deletions that showed a marginal association with AD+P (P<1.0E−3). One was located on chromosome 4, which was more frequent in AD+P subjects (26.1%) than in AD intermediate P (19.6%) and AD−P (12.5%) subjects. There were no known genes in this deletion region. Another deletion we identified was located on chromosome 9p22.2 and was present only in AD+P subjects (P=8.87E−04); none of AD intermediate P or AD−P subjects had this deletion. This was a hemizygous deletion (copy number=1) and it affected only one gene, CNTLN (centlein, centrosomal protein). Detailed information on these two deletions is given in Table 3. Examples of this deletion in one CNV carrier and one subject without CNV based on LRR and BAF are shown in Figure 2.

Figure 3
figure 3

Manhattan plot shows the genome-wide P-values for deletion CNVs associated with AD+P. This Manhattan plot demonstrates the locations across the chromosomes of the human genome (horizontal axis) where associations between deletion calls at markers (dots) and AD+P are shown using −log10P-values (vertical axis). The higher the dots are, the stronger the genetic associations are. The strongest signals are seen on chromosome 4 and chromosome 9. AD, Alzheimer's disease; CNV, copy-number variant; P, psychosis.

Discussion

Here we report the first genome-wide CNV study of AD+P. We detected a genome-wide significant association between a duplication and AD+P. This hemizygous duplication (copy number of 3) covers the APC2 (adenomatous polyposis coli 2) gene on chr1913.3 and is protective against developing AD+P (odds ratio=0.42, P=7.2E−10). The frequency of this duplication was 10% in AD+P compared with 29% in AD subjects with intermediate P and 24% in AD−P. Previously, a large duplication in this region on chr 19 containing APC2 has been reported at 26% in healthy white males,36 which is similar to what we have observed in subjects with intermediate P and in AD−P.

APC2, also called APCL, closely resembles APC in its overall domain structure and function.37 It is primarily expressed in the central nervous system and spans a genomic region over 20 kb. APC2 is expressed specifically and abundantly in brain.38 APC2 protein could bind to beta-catenin and deplete the intracellular beta-catenin pool,38 suggesting that APC2 may participate in the regulation of neuronal functions in association with beta-catenin. The Wnt (wingless)/β-catenin signaling pathway has been involved in a multitude of neuronal processes, including regulation of synaptogenesis, synapse specificity, axon guidance, dendrite development and overall brain development.39 Wnt/β-catenin signaling pathway has been implicated in the pathogenesis of several neurodevelopmental disorders, including SCZ, bipolar disorder and autism spectrum disorders,40 while psychosis is a key feature of SCZ and present in many cases of bipolar disorder. It has been demonstrated that altered levels and/or phosphorylation states of β-catenin and GSK-3 in brain are related with SCZ and may represent susceptibility loci for SCZ.41, 42 Furthermore, it has been suggested that changes in β-catenin and GSK-3 may represent one of the mechanisms through which antipsychotics are able to exert behavioral changes.43 It is possible that altered expression level of APC2 affects the development of psychosis through Wnt (wingless)/β-catenin signaling pathway.

Besides the interaction with β-catenin, animal models suggest that APC2 could have a role in AD. Kuczera et al. (2010)44 analyzed the effect of APC2 deletion on AD using a mouse model. They found that although loss of APC2 did not affect the course of pathology in an AD, APC2 cKO mice displayed impaired spatial memory and severely impaired extinction of fear memories. They further proposed that the mechanisms by which amyloid-β-pathology and loss of APC2 affect memory function might share similar mechanism of action. It is unclear whether this mechanism is related to the critical role of APC2 in axonal projection and the cytoskeletal regulation at leading edges in response to extracellular signals.45, 46 It is interesting to note that APC2 is associated with susceptibility to SCZ.47 Accumulated evidence has shown that many common SNPs associated with SCZ are shared with bipolar disorder and autism, and recently some overlap between SCZ and psychosis in AD was also observed in a large-scale genome-wide association study.26 It is plausible that the genetic sharing could extend from SNPs to CNVs.

We also identified duplications affecting three other genes, including SET (chr 9), JAG2 (chr14) and ZFPM1 (chr16), which were tentatively associated with AD+P as they did not achieve genome-wide significance.

The protein encoded by SET, also named I2PP2A, is an endogenous inhibitor of protein phosphatase-2A (PP2A), which is a key protein phosphatase in dephosphorylating tau.48 SET is widely expressed in different tissues and localizes primarily in the nucleus.49 In AD brains, the compromised PP2A activity in cytoplasm was shown to be caused by increased levels of SET and its translocation from nucleus to the cytoplasm in neurons.50, 51, 52 Transfection in PC12 cells53 demonstrated that SET induces hyperphosphorylation of tau, suggesting that SET may be involved in AD through the regulation of tau hyperphosphorylation. There is substantial evidence54 that AD+P is associated with increased phosphorylated microtubule-associated protein tau in the dorsolateral prefrontal cortex. This does not go against our findings that frequency of this duplication in AD+P (6.1%) was marginally lower (odds ratio=0.44, P=1.95E−06) than in AD intermediate P (16%) and AD+P (17.6%). Though it is usually expected that full gene duplications increase the gene expression/function, partial duplications may lead to gene products with new functions or dysfunctional gene products that would compete with and negatively regulate the normal gene products.

JAG2, located on chr14 q32, was first identified as the human brain expressed sequence tags homologous to Delta and Serrate.55 Delta and Serrate are ligands of Notch receptor in Drosophila, which can activate Notch and related receptors. Luo et al.55 found that JAG2 was involved in human Notch1 pathway of signal transduction. The Notch pathway is often regarded as a developmental pathway, but Notch proteins and ligands are also expressed and active in the adult brain. Notch is a key regulator of adult neural stem cells, and involved in the regulation of migration, morphology, synaptic plasticity and survival of immature and ageing neurons. Accumulated evidence shows that aberrant Notch signaling contributes to the development of AD.56 Notch may be implicated in familial AD through interaction with presenilin and APP in adult brain. Findings that Notch1 expression was markedly elevated in AD brains in comparison with controls raise the possibility that Notch1 has a role in terminally differentiated neurons and AD.57,58,59

ZFPM1 (Zinc finger protein, FOG family member 1) is expressed in human hematopoietic tissues as well as in adult cerebellum, stomach, testis, lymph node, liver and pancreas.60 It has a role in erythroid differentiation. The relationship between ZFPM1 and psychiatric disorders is unknown.

We did not identify deletions that were genome-wide significantly associated with AD+P. The strongest associated deletions were located on chromosomes 4 and 9. The deletion on chromosome 4 did not cover any known gene, while deletion on 9p22.2 was nearby CNTLN (also named C9orf39). The latter was an uncommon deletion that was present in 7 out of 440 AD+P patients, but not in 136 AD−P or 593 AD intermediate P subjects. CNTLN encodes a protein, called centlein, which was first identified in rat brain.61 An aphidicolin-inducible common fragile site FRA9G has been mapped to 9p22.2, within C9orf39 gene.62 Fragile sites are regions of the genome that are prone to mutation and epigenetic changes; hence, hot spots for genomic instability. Common fragile sites are a component of normal chromosome structure and are composed of unstable DNA stretches that form gaps and breaks on metaphase chromosomes after partial inhibition of DNA synthesis. Increasing evidence links genomic and epigenomic instability, including multiple fragile sites regions to neuropsychiatric diseases, including SCZ and autism.63 It has been observed that fragile sites are more frequent in SCZ and co-localize with SCZ-linked genes. Overlapping deletions in the general population have been previously demonstrated at hg18 genomic position: 17 250 170–19 296 630 with frequency of 10 in 6533 subjects64 and at (hg18 genomic position: 17 226 376–17 367 629) with frequency 1 out of 1557 controls.65 More detailed information on these deletions can be found in the database of genomic variants66 (http://projects.tcag.ca/variation/).

In conclusion, using the Illumina HumanOmni1-Quad BeadChip, we have identified a genome-wide significant duplication in the APC2 gene on chromosome 19 associated with AD+P (P=7.2E−10). We also found suggestive associations (P<5.0E−6) of three other duplication loci in SET, JAG2 and ZFPM1. None of the observed deletions were genome-wide significant. Our study has identified novel susceptible genes for psychosis in AD and suggests the possibility of extending the genetic sharing among psychiatric disorders (for example, SCZ, autism and AD+P) from SNPs to CNVs. However, the relationship of CNVs and the functional activity of these genes need to be examined by experiments. The interpretation of this study is also limited by the lack of replication data; large-scale independent studies are warranted to replicate the findings reported in this study.