Introduction

Mental retardation (MR) or developmental delay is estimated to affect 2–3% of the population.1 However, in a significant proportion of cases, the etiology remains uncertain. Hunter2 reviewed 411 clinical cases of MR and reported that a specific genetic/syndrome diagnosis was carried out in 19.9% of them. Patients with MR often have congenital anomalies, and more than three minor anomalies can be useful in the diagnosis of syndromic MR.2, 3 Although chromosomal aberrations are well-known causes of MR, their frequency determined by conventional karyotyping has been reported to range from 7.9 to 36% in patients with MR.4, 5, 6, 7, 8 Although the diagnostic yield depends on the population of each study or clinical conditions, such studies suggest that at least three quarters of patients with MR are undiagnosed by clinical dysmorphic features and karyotyping.

In the past two decades, a number of rapidly developed cytogenetic and molecular approaches have been applied to the screening or diagnosis of various congenital disorders including MR, congenital anomalies, recurrent abortion and cancer pathogenesis. Among them, array-based comparative genome hybridization (aCGH) is used to detect copy-number changes rapidly in a genome-wide manner and with high resolution. The target and resolution of aCGH depend on the type and/or design of mounted probes, and many types of microarray have been used for the screening of patients with MR and other congenital disorders: bacterial artificial chromosome (BAC)-based arrays covering whole genomes,9, 10 BAC arrays covering chromosome X,11, 12 a BAC array covering all subtelomeric regions,13 oligonucleotide arrays covering whole genomes,14, 15 an oligonucleotide array for clinical diagnosis16 and a single nucleotide polymorphism array covering the whole genome.17 Because genome-wide aCGH has led to an appreciation of widespread copy-number variants (CNVs) not only in affected patients but also in healthy populations,18, 19, 20 clinical cytogenetists need to discriminate between CNVs likely to be pathogenic (pathogenic CNVs, pCNVs) and CNVs less likely to be relevant to a patient's clinical phenotypes (benign CNVs, bCNVs).21 The detection of more CNVs along with higher-resolution microarrays needs more chances to assess detected CNVs, resulting in more confusion in a clinical setting.

We have applied aCGH to the diagnosis and investigation of patients with multiple congenital anomalies and MR (MCA/MR) of unknown etiology. We constructed a consortium with 23 medical institutes and hospitals in Japan, and recruited 536 clinically uncharacterized patients with a normal karyotype in conventional cytogenetic tests. Two-stage screening of copy-number changes was performed using two types of BAC-based microarray. The first screening was performed by a targeted array and the second screening was performed by an array covering the whole genome. In this study, we diagnosed well-known genomic disorders effectively in the first screening, assessed the pathogenicity of detected CNVs to investigate an etiology in the second screening and discussed the clinical significance of aCGH in the screening of congenital disorders.

Materials and methods

Subjects

We constructed a consortium of 23 medical institutes and hospitals in Japan, and recruited 536 Japanese patients with MCA/MR of unknown etiology from July 2005 to January 2010. All the patients were physically examined by an expert in medical genetics or a dysmorphologist. All showed a normal karyotype by conventional approximately 400–550 bands-level G-banding karyotyping. Genomic DNA and metaphase chromosomes were prepared from peripheral blood lymphocytes using standard methods. Genomic DNA from a lymphoblastoid cell line of one healthy man and one healthy woman were used as a normal control for male and female cases, respectively. All samples were obtained with prior written informed consent from the parents and approval by the local ethics committee and all the institutions involved in this project. For subjects in whom CNV was detected in the first or second screening, we tried to analyze their parents as many as possible using aCGH or fluorescence in situ hybridization (FISH).

Array-CGH analysis

Among our recently constructed in-house BAC-based arrays,22 we used two arrays for this two-stage survey. In the first screening we applied a targeting array, ‘MCG Genome Disorder Array’ (GDA). Initially GDA version 2, which contains 550 BACs corresponding to subtelomeric regions of all chromosomes except 13p, 14p, 15p, 21p and 22p and causative regions of about 30 diseases already reported, was applied for 396 cases and then GDA version 3, which contains 660 BACs corresponding to those of GDA version 2 and pericentromeric regions of all chromosomes, was applied for 140 cases. This means that a CNV detected by GDA is certainly relevant to the patient's phenotypes. Subsequently in the second screening we applied ‘MCG Whole Genome Array-4500’ (WGA-4500) that covers all 24 human chromosomes with 4523 BACs at intervals of approximately 0.7 Mb to analyze subjects in whom no CNV was detected in the first screening. WGA-4500 contains no BACs spotted on GDA. If necessary, we also used ‘MCG X-tiling array’ (X-array) containing 1001 BAC/PACs throughout X chromosome other than pseudoautosomal regions.12 The array-CGH analysis was performed as previously described.12, 23

For several subjects we applied an oligonucleotide array (Agilent Human Genome CGH Microarray 244K; Agilent Technologies, Santa Clara, CA, USA) to confirm the boundaries of CNV identified by our in-house BAC arrays. DNA labeling, hybridization and washing of the array were performed according to the directions provided by the manufacturer. The hybridized arrays were scanned using an Agilent scanner (G2565BA), and the CGH Analytics program version 3.4.40 (Agilent Technologies) was used to analyze copy-number alterations after data extraction, filtering and normalization by Feature Extraction software (Agilent Technologies).

Fluorescence in situ hybridization

Fluorescence in situ hybridization was performed as described elsewhere23 using BACs located around the region of interest as probes.

Results

CNVs detected in the first screening

In the first screening, of 536 cases subjected to our GDA analysis, 54 (10.1%) were determined to have CNV (Figure 1; Tables 1 and 2). All the CNVs detected in the first screening were confirmed by FISH. Among the positive cases, in 24 cases one CNV was detected. All the CNVs corresponded to well-established syndromes or already described disorders (Table 1). In 16 cases two CNVs, one deletion and one duplication, were detected at two subtelomeric regions, indicating that one of parents might be a carrier with reciprocal translocation involved in corresponding subtelomeric regions, and at least either of the two CNVs corresponded to the disorders. We also performed parental analysis by FISH for three cases whose parental samples were available, and confirmed that in two cases the subtelomeric aberrations were inherited from paternal balanced translocation and in one case the subtelomeric aberrations were de novo (Table 1). In the other 14 cases, CNVs (25.9%) were detected in regions corresponding to known disorders (Table 2).

Figure 1
figure 1

Percentages of each screening in the current study.

Table 1 A total of 40 cases with CNV at subtelomeric region(s) among 54 positive cases in the first screening
Table 2 Other cases among 54 positive cases in the first screening

CNVs detected in the second screening and assessment of the CNVs

Cases were subject to the second screening in the order of subjects detected no CNV in the first screening, and until now we have analyzed 349 of 482 negative cases in the first screening. In advance, we excluded highly frequent CNVs observed in healthy individuals and/or in multiple patients showing disparate phenotypes from the present results based on an internal database, which contained all results of aCGH analysis we have performed using WGA-4500, or other available online databases; for example, Database of Genomic Variant (http://projects.tcag.ca/variation/). As a result, we detected 66 CNVs in 63 cases (Figure 1; Table 3). Among them, three patients (cases 36, 42 and 44) showed two CNVs. All the CNVs detected in the second screening were confirmed by other cytogenetic methods including FISH and/or X-array. For 60 cases, we performed FISH for confirmation and to determine the size of each CNV. For five cases, cases 13, 36, 48, 57 and 63, with CNVs on the X chromosome, we used the X-array instead of FISH. For cases 4, 6, 16–19 and 34, we also used Agilent Human Genome CGH Microarray 244K to determine the refined sizes of CNVs. The maximum and minimum sizes of each CNV determined by these analyses are described in Table 3.

Table 3 Sixty-three cases with CNV in the 2nd screening

Well-documented pCNVs emerged in the second screening

CNVs identified for recently established syndromes

We assessed the pathogenicity of the detected CNVs in several aspects (Figure 2).21, 37, 38 First, in nine cases, we identified well-documented pCNVs, which are responsible for syndromes recently established. A heterozygous deletion at 1q41–q42.11 in case 2 was identical to patients in the first report of 1q41q42 microdeletion syndrome.39 Likewise a CNV in case 3 was identical to chromosome 1q43–q44 deletion syndrome (OMIM: #612337),40 a CNV in case 4 was identical to 2q23.1 microdeletion syndrome,41 a CNV in case 5 was identical to 14q12 microdeletion syndrome42 and a CNV in case 6 was identical to chromosome 15q26-qter deletion syndrome (Drayer's syndrome) (OMIM: #612626).43 Cases 7, 8 and 9 involved CNVs of different sizes at 16p12.1-p11.2, the region responsible for 16p11.2-p12.2 microdeletion syndrome.44, 45 Although an interstitial deletion at 1p36.23-p36.22 observed in case 1 partially overlapped with a causative region of chromosome 1p36 deletion syndrome (OMIM: #607872), the region deleted was identical to a proximal interstitial 1p36 deletion that was recently reported.46 Because patients with the proximal 1p36 deletion including case 1 demonstrated different clinical characteristics from cases of typical chromosome 1p36 deletion syndrome, in the near term their clinical features should be redefined as an independent syndrome.46

Figure 2
figure 2

A flowchart of the assessment of CNVs detected in the second screening.

CNVs containing pathogenic gene(s)

In four cases we identified pCNVs that contained a gene(s) probably responsible for phenotypes. In case 10, the CNV had a deletion harboring GLI3 (OMIM: *165240) accounting for Greig cephalopolysyndactyly syndrome (GCS; OMIM: 175700).47 Although phenotypes of the patient, for example, pre-axial polydactyly of the hands and feet, were consistent with GCS, his severe and atypical features of GCS, for example, MR or microcephaly, might be affected by other contiguous genes contained in the deletion.48 Heterozygous deletions of BMP4 (OMIM: *112262) in case 11 and CASK (OMIM: *300172) in case 13 have been reported previously.49, 50 In case 12, the CNV contained YWHAE (OMIM: *605066) whose haploinsufficiency would be involved in MR and mild CNS dysmorphology of the patient because a previous report demonstrated that haploinsufficiency of ywhae caused a defect of neuronal migration in mice51 and a recent report also described a microdeletion of YWHAE in a patient with brain malformation.52

Recurrent CNVs in the same regions

We also considered recurrent CNVs in the same region as pathogenic; three pairs of patients had overlapping CNVs, which have never been reported previously. Case 16 had a 3.3-Mb heterozygous deletion at 10q24.31–q25.1 and case 17 had a 2.0-Mb deletion at 10q24.32–q25.1. The clinical and genetic information will be reported elsewhere. Likewise, cases 14 and 15 also had an overlapping CNV at 6q12–q14.1 and 6q14.1, and cases 18 and 19 had an overlapping CNV at 10p12.1–p11.23. Hereafter, more additional cases with the recurrent CNV would assist in defining new syndromes.

CNVs reported as pathogenic in previous studies

Five cases were applicable to these criteria. A deletion at 3p21.2 in case 20 overlapped with that in one case recently reported.53 The following four cases had CNVs reported as pathogenic in recent studies: a CNV at 7p22.1 in case 21 overlapped with that of patient 6545 in a study by Friedman et al.,14 a CNV at 14q11.2 in case 22 overlapped with those of patients 8326 and 5566 in Friedman et al.,14 a CNV at 17q24.1–q24.2 in case 23 overlapped with that in patient 99 in Buysse et al.54 and a CNV at 19p13.2 in case 24 overlapped with case P11 in Fan et al.55

Large or gene-rich CNVs, or CNVs containing morbid OMIM genes

In cases inapplicable to the above criteria, we assessed CNVs from several aspects. A CNV that contains abundant genes or is large (>3 Mb) has a high possibility to be pathogenic.21 The CNVs in cases 25–30 probably correspond to such CNVs. Also, we judged a CNV containing a morbid OMIM gene as pathogenic:21 TBR1 (OMIM: *604616) in case 31,56 SUMF1 (OMIM: *607939) in case 32,57, 58 SEMA3A (OMIM: *603961) in case 33,59 EML1 (OMIM: *602033) and/or YY1 (OMIM: *600013) in case 34,60, 61 A2BP1 (OMIM: *605104) in case 3562 and IL1RAPL1 (OMIM: *300206) in case 36.63 Several previous reports suggest that these genes are likely to be pathogenic, although at present no evidence of a direct association between these genes and phenotypes exists.

CNVs de novo or X maternally inherited

Among the remaining 27 cases, 12 cases had CNVs considered pathogenic as their CNVs were de novo (cases 37–47) or inherited del(X)(p11.3) from the mother (case 48). In the second screening we performed FISH for 36 CNVs of the 34 cases whose parental samples were available to confirm that 24 cases had de novo CNVs, which were probably pathogenic. A CNV in case 48, a boy with a nullizygous deletion at Xp11.3 inherited from his mother, was also probably relevant to his phenotype (Tables 3 and 4). Meanwhile, although case 57 was a boy with a deletion at Xp11.23 inherited from his mother, he was clinically diagnosed with Gillespie syndrome (OMIM: #206700) that was reported to show an autosomal dominant or recessive pattern,64 thus we judged that the deletion was not relevant to his phenotype. As a result, cases 49–57 had only CNVs inherited from one of their parents which are likely to be unrelated to the phenotypes; that is, bCNV (Table 4).

Table 4 Parental analysis of 34 cases in the second screening

As a result, we estimated that 48 cases among 349 analyzed (13.8%) had pCNV(s) in the second screening (Table 3; Figure 2). The CNVs of the remaining six cases, cases 58–63, were not associated with previously reported pathogenicity and their inheritance could not be evaluated, thus we estimated they were variants of uncertain clinical significance (VOUS).38

Discussion

Because aCGH is a high-throughput technique to detect CNVs rapidly and comprehensively, this technique has been commonly used for analyses of patients with MCA and/or MR.38, 65, 66, 67, 68 However, recent studies of human genomic variation have uncovered surprising properties of CNV, which covers 3.5–12% of the human genome even in healthy populations.18, 19, 20, 69 Thus analyses of patients with uncertain clinical phenotypes need to assess whether the CNV is pathogenic or unrelated to phenotypes.21 However, such an assessment may diminish the rapidness or convenience of aCGH.

In this study, we evaluated whether our in-house GDA can work well as a diagnostic tool to detect CNVs responsible for well-established syndromes or those involved in subtelomeric aberrations in a clinical setting, and then explored candidate pCNVs in cases without any CNV in the first GDA screening. We recruited 536 cases that had been undiagnosed clinically and studied them in a two-stage screening using aCGH. In the first screening we detected CNVs in 54 cases (10.1%). Among them, 40 cases had CNV(s) at subtelomeric region(s) corresponding to the well-established syndromes or the already described disorders and the other 14 cases had CNVs in the regions corresponding to known disorders. Thus about three quarters of cases had genomic aberrations involved in subtelomeric regions. All the subtelomeric deletions and a part of the subtelomeric duplications corresponded to the disorders, indicating that especially subtelomeric deletions had more clinical significance compared to subtelomeric duplications, although the duplication might result in milder phenotypes and/or function as a modifier of phenotypes.70 Moreover, parental analysis in three cases with two subtelomeric aberrations revealed that two of them were derived from the parental balanced translocations, indicating that such subtelomeric aberrations were potentially recurrent and parental analyses were worth performing. Recently several similar studies analyzed patients with MCA/MR or developmental delay using a targeted array for subtelomeric regions and/or known genomic disorders and detected clinically relevant CNVs in 4.4–17.1% of the patients.28, 65, 70, 71 Our detection rate in the first screening was equivalent to these reports. Although such detection rates depend on the type of microarray, patient selection criteria and/or number of subjects, these results suggest that at least 10% of cases with undiagnosed MCA/MR and a normal karyotype would be detectable by targeted array.

Another interesting observation in the first screening was that subtelomeric rearrangements frequently occurred even in patients with MCA/MR of uncertain whose karyotype had been diagnosed as normal. This result may be consistent with a property of subtelomeric regions whose rearrangements can be missed in conventional karyotyping,72 and in fact other techniques involving subtelomeric FISH or MLPA also identified subtelomeric abnormalities in a number of patients with MCA and/or MR in previous reports.70, 73, 74 Our result may support the availability of prompt screening of subtelomeric regions for cases with uncertain congenital disorders.

In the second screening we applied WGA-4500 to 349 cases to detect 66 candidate pCNVs in 63 cases (18.1%), and subsequently assessed the pathogenicity of these CNVs. The pCNVs included nine CNVs overlapping identical regions of recently recognized syndromes (cases 1–9; deletion at 1p36.23–p36.22, 1q41–q42.11, 1q43–q44, 2q23.1, 14q12, 15q26-qter and 16p11.2–p12.2, respectively), four CNVs containing disease-associated genes (cases 10–13; GLI3, BMP4, YWHAE and CASK, respectively), three pairs of CNVs of recurrent deletions (cases 14, 15: at 6q12–q14.1 and 6q14.1; case 16, 17: at 10p12.1–p11.23 and case 18, 19: at 10q24.31–q25.1 and 10q24.32–q25.1), five CNVs identical to pCNVs in previous studies (cases 20–24), six large and/or gene-rich CNVs (cases 25–30) and six CNVs containing a morbid OMIM gene (cases 31–36). For the remaining cases, we estimated the pathogenicity of the CNVs from a parental analysis (Table 4). We judged the 11 de novo CNVs (cases 37–47) and 1 CNV on chromosome Xp11.3 inherited from the mother (case 48) as probably pathogenic. And nine inherited CNVs (cases 49–57) were probably benign. The clinical significance of CNVs in the other six cases, cases 58–63, remains uncertain (VOUS). As a result we estimated CNVs as pathogenic in 48 cases among 349 cases (13.8%) analyzed in the second screening. None of the pCNVs corresponded to loci of well-established syndromes. This may suggest that our two-stage screening achieved a good balance between rapid screening of known syndromes and investigation of CNV of uncertain pathogenicity.

Among the cases with parental analyses, the 25 pCNVs had larger sizes and contained more protein-coding genes (average size, 3.1 Mb at minimum to 4.4 Mb at maximum; average number of genes, 44) as compared with the 11 inherited bCNVs that were probably unrelated to phenotypes (average size, 0.39 Mb at minimum to 1.5 Mb at maximum; average number of genes, 5) (Table 5). Although all of the 25 pCNVs except 2 were deletions, about three quarters (8 of 11 cases) of the inherited bCNVs were duplications (Table 5). These findings are consistent with previously reported features of pCNVs and bCNVs.21, 38

Table 5 Summary of parental analyses

We also compared our current study with recent aCGH studies meeting the following conditions: (1) a microarray targeted to whole genome was applied; (2) patients with MCA and/or MR of uncertain etiology, normal karyotype and the criteria for patients selection were clearly described; (3) pathogenicity of identified CNVs were assessed. On the basis of the above criteria, among studies reported in the past 5 years, we summarized 13 studies (Table 6).10, 14, 15, 17, 54, 55, 75, 76, 77, 78, 79, 80, 81 Diagnostic yield of pCNVs in each study was 6.3–16.4%, and our current diagnostic yield of the second screening was 13.8%. Though cases with subtelomeric aberration detected in the first screening had been excluded, our diagnostic yield was comparable to those of the reported studies. It is not so important to make a simple comparison between diagnostic yields in different studies as they would depend on the conditions of each study, for example, sample size or array resolution,38, 82 however it seems interesting that the higher resolution of a microarray does not ensure an increase in the rate of detection of pCNVs. One recent study showed data that may explain the discrepancy between the resolution of microarray and diagnostic yield.54, 83 The authors analyzed 1001 patients with MCA and/or MR using one of two types of microarray, BAC array and oligonucleotide array. The BAC array was applied for 298 patients to detect 58 CNVs in 47 patients, and among them 26 CNVs (8.7%) were determined to be causal (pathogenic). Conversely, the oligonucleotide arrays were applied for 703 patients to detect 1538 CNVs in 603 patients, and among them 74 CNVs (10.5%) were determined to be pathogenic. These results may lead to the following idea: a lower-resolution microarray detects a limited number of CNVs likely to be pathogenic, because such CNVs tend to be large, and a higher-resolution microarray detects an increasing number of bCNVs or VOUS.38 Indeed, in studies using a high-resolution microarray, most of the CNVs detected were smaller than 500 kb but almost all pCNVs were relatively large.54, 81, 83 Most of the small CNVs were judged not to be pathogenic, and the percentage of pCNVs stabilized at around 10%. This percentage may suggest a frequency of patients with MCA/MR caused by CNV affecting one or more genes, other than known syndromes and subtelomeric aberrations. The other patients may be affected by another cause undetectable by genomic microarray; for example a point mutation or microdeletion/duplication of a single gene, aberration of microRNA, aberration of methylation states, epigenetic aberration or partial uniparental disomy.

Table 6 Previous studies of analyzing patients with MCA and/or MR using aCGH targeted to whole genome

As recently hypothesized secondary insult, which is potentially another CNV, a mutation in a phenotypically related gene or an environmental event influencing the phenotype, may result in clinical manifestation.84 Especially, in two-hit CNVs, two models have been hypothesized: (1) the additive model of two co-occurring CNVs affecting independent functional modules and (2) the epistatic model of two CNVs affecting the same functional module.85 It also suggests difficulty in selecting an optimal platform in the clinical screening. Nevertheless, information on both pCNVs and bCNVs detected through studies using several types of microarrays is unambiguously significant because an accumulation of the CNVs will create a map of genotype–phenotype correlation that would determine the clinical significance of each CNV, illuminate gene function or establish a new syndrome.