Main

The extent to which copy-number variants (CNVs) might contribute to the missing heritability of common disorders is currently under debate2. Because most common simple CNVs are well tagged by single nucleotide polymorphisms (SNPs), it has recently been suggested that common CNVs are unlikely to contribute substantially to the missing heritability5. However, rare variants or recurring CNVs that have arisen on multiple independent occasions are unlikely to be captured by SNP tagging, and their identification will require alternative approaches.

We have previously proposed that cohorts with extreme phenotypes that include obesity may be enriched for rare but very potent risk variants4,6. Here we investigate 312 subjects, from three centres in the UK and France, presenting with congenital malformations and/or developmental delay in addition to obesity as defined previously6,7 (see Methods). Known syndromes (for example, Prader–Willi and fragile X) were excluded. A combination of array comparative genomic hybridization (aCGH), genotyping arrays, quantitative PCR (qPCR) and multiplex ligation-dependent probe amplification (MLPA) was used to identify and confirm the presence of a heterozygous deletion on 16p11.2 in nine individuals (2.9%). These deletions, estimated to be a total of 740 kilobases (kb) in size (one copy of a segmental duplication plus 593 kb of unique sequences; Fig. 1a), have previously been associated to varying extents with autism, schizophrenia and developmental delay8,9,10,11; however, the observed frequency of deletions in our cohort is appreciably higher than the reported frequencies in the cohorts from the previous studies (less than 1%), which did not include obesity as an inclusion criterion.

Figure 1: Identification and validation of deletions at 16p11.2.
figure 1

a, aCGH data showing the location of the 16p11.2 deletion. The data show the log2 intensity ratio for a deletion carrier compared with that for an undeleted control sample. Grey bars connected by a broken line denote the segmental duplication flanking the deletion region. Vertical bars indicate the positions of the probe pairs used for MLPA validation. Note that CGH and genotyping array probes targeted against segmental duplications may not accurately report copy number as a result of the increased number of homologous sequences in the diploid state. Genome coordinates are in accordance with the hg18 build of the reference genome. b, MLPA validation of 16p11.2 deletions. Representative MLPA results are shown, illustrating one instance of maternal transmission and two instances of de novo deletions. Genotyping data excluded the possibility of non-paternity. Full results for MLPA validation and inheritance analysis are shown in Supplementary Fig. 1. Each panel shows the relative magnitude of the normalized, integrated signal at each probe location, in order of chromosomal position of the MLPA probe pairs as indicated in a. Each panel corresponds to its respective position on the associated pedigree, as shown.

PowerPoint slide

A parallel, independent survey of aCGH and SNP-CGH data from eight cytogenetic centres in France, Switzerland and Estonia, involving 3,947 patients with developmental delay and/or malformations but this time without selection for obesity, revealed 22 unrelated cases with similar deletions (0.6%). This is a frequency consistent with those found in the previous studies8,9,10,11, but is significantly lower than for the above cohort, which included only obese subjects (P = 2.2 × 10-4, Fisher’s exact test).

Analysis of the available clinical data for these 22 new carriers indicated that, in addition to the ascertained cognitive deficits or behavioural abnormalities (including hyperphagia, specifically identified in at least nine cases; see Supplementary Table 1), a 16p11.2 deletion gave rise to a strongly expressed obesity phenotype in adults, with a more variable phenotype in childhood. All four teenagers and adults carrying a deletion were obese, whereas child carriers were also frequently either obese (4 of 15) or overweight (2 of 15), a tendency that has previously been noted11; the very young (under 2 years old) were of normal weight. This age-dependent penetrance was observed in all instances of deletions for which phenotypic data were available, whether from this study or from previously published reports10,11,12,13,14,15, and regardless of ascertainment (Fig. 2; see Supplementary Tables 2 and 3).

Figure 2: Dependence of BMI on age in subjects having a deletion at 16p11.2.
figure 2

Data are shown for all individuals carrying a deletion for whom phenotypic data were available. Similar data from this study only are shown in Supplementary Figs 2 and 3. Lines denote the thresholds corrected for age and gender (solid, male; broken, female) for obesity and morbid obesity. Squares, male; circles, female; black, ascertained for developmental delay; grey, not ascertained for developmental delay; filled, ascertained for obesity; open, not ascertained for obesity; diamonds, first-degree relative of proband; crosses, previously published data10,11,12,13,14,15. The 31-year-old male with a BMI of about 20 kg m-2 was diabetic, as determined by a fasting blood glucose of more than 7 mM.

PowerPoint slide

Taken together, the data from these parallel studies suggest a possible direct association of deletions at 16p11.2 with obesity, distinct from their cognitive phenotype. Also identified in these cohorts were instances of the reciprocal duplication, which has also been implicated in neurodevelopmental disorders, but with a variable phenotype and lower penetrance9,10,12. The frequency of the duplication in the two cohorts (12 of 4,183 (0.3%)) was consistent with previous reports for patients with cognitive deficits (0.3–0.7%)10,12. Carriers of the duplication neither were obese nor had reported hyperphagia.

To investigate further the association of 16p11.2 deletions with obesity, and to estimate the extent to which it is observed independently of ascertainment for neurodevelopmental symptoms, we performed algorithmic and statistical analyses of genome-wide SNP genotyping data (see Table 1) from Swiss (CoLaus16), Finnish (NFBC1966 (ref. 17)) and Estonian (EGPUT18) general population cohorts (11,856 subjects in total), from child obesity and adult morbid obesity case-control cohorts6,19,20 (1,224 and 1,548 subjects, respectively), from an extreme early-onset obesity cohort (SCOOP, 931 subjects) and from 141 patients undergoing bariatric weight-loss surgery (see Methods); in total, we identified 17 instances of deletions (and four duplications) with no significant gender bias (Table 1). In addition, we identified two further unrelated carriers of a deletion from 353 members of 149 families with sibling pairs discordant for obesity (SOS Sib Pair Study21). When DNA was available for further analysis (15 of 19 samples), the presence of a deletion was validated by using MLPA (Fig. 1b) or qPCR; the remaining deletions were validated by applying a second independent algorithm to the data. With the exception of a single individual who is apparently diabetic (fasting blood glucose more than 7 mM), all adult carriers of such deletions were obese, the majority being morbidly obese; similarly, each of the seven child or adolescent carriers had a BMI in the top 0.1% of the population range for their age and gender. None of the individuals ascertained on the basis of their obesity had any reported developmental delay or cognitive deficit; four subjects were reported as having hyperphagia.

Table 1 Frequency of detected 16p11.2 deletions in multiple cohorts

To enable sufficient statistical power to give robust conclusions, we combined data from the population and obesity cohorts in an overall case-control association analysis (the samples from sib-pair families were excluded to avoid complications due to their relatedness). In comparison with lean or normal weight subjects (see Table 1 and Methods), 16p11.2 deletions were associated with obesity (P = 5.8 × 10-7, Fisher’s exact test; odds ratio 29.8, 95% confidence limits 3.9 and 225) and morbid obesity (P = 6.4 × 10-8; odds ratio 43.0, 95% confidence limits 5.6 and 329) at or near genome-wide levels of significance. Expanding the control group to include all non-obese individuals increased the significance to P = 4.2 × 10-9 (obese) and P = 6.1 × 10-10 (morbidly obese).

Previous reports have indicated that these deletions are frequently not inherited from either parent but arise de novo, possibly by non-allelic homologous recombination between the more than 99% sequence-identical segmental duplications flanking the deleted region11,14. Therefore, where possible we investigated the parents of carriers of deletions, identifying 11 cases of maternal transmission and 4 of paternal transmission. The available data showed that all first-degree relatives carrying a deletion were also obese (Supplementary Table 1). In ten instances the deletion was apparently de novo (see Fig. 1b). Extrapolation to our full data set indicates that about 0.4% of all morbidly obese cases are due to an inherited 16p11.2 deletion. The frequency of de novo events is consistent with a previous report, in which ascertainment was for developmental delay and/or congenital anomalies11; by contrast, deletions are reported to be almost exclusively de novo in autistic subjects8,9,10.

Although they may be heterogeneous in nature, these deletions are highly likely to be the causal variants, representing the second most frequent genetic cause of obesity after point mutations in MC4R22,23. Their repeated de novo occurrence is likely to result in a lack of linkage disequilibrium with any other flanking variant—no consistent haplotype has been identified by analysis of the available surrounding genotypes. To assess the effect of a deletion on the expression of nearby genes (for example, the obesity GWAS-associated SH2B1 locus 800 kb distant24), we analysed available transcript data for subcutaneous adipose tissue samples from the discordant sibling cohort. Comparisons of the two subjects carrying a deletion with their corresponding non-obese siblings, and with other obese and non-obese subjects (Supplementary Fig. 4 and Supplementary Tables 4 and 5), showed that many, although not all, transcripts from within the deletion had a markedly decreased abundance (0.4–0.7-fold). In contrast, no clear evidence was found for consistent cis effects of the deletion on the abundance of messenger RNAs encoded by genes flanking the deletion. In addition, global analysis of this data set has not identified any trans-acting expression quantitative trait loci either within or nearby the deletion.

Thus, although we cannot completely exclude the possibility that a 16p11.2 deletion affects the expression of nearby genes (for instance, its impact may be different in other tissues), the expression analysis described strongly indicates that the observed phenotypes are likely to be due to haploinsufficiency of one or more of the about 30 genes within the deleted region. Indeed, rather than being due to a single haploinsufficiency, the phenotype may well result from the deletion of multiple genes with an impact on pathways central to the development of obesity (see Supplementary Table 5). Functional network analysis of the deleted genes has led to the suggestion of a similar multigene effect for the cognitive phenotype8. The extent to which there is overlap between the genes involved in the obesity and cognitive phenotypes remains to be elucidated.

There is a strong correlation between developmental and cognitive disabilities and the prevalence of obesity: patients with autism or who have learning disabilities have a greatly increased risk of obesity25, and the severely obese exhibit significant cognitive impairment26. Possible explanations include a direct causal relationship between obesity and developmental delay, the involvement of the same or related regulatory pathways, or different outcomes of the same set of behavioural disorders with complex pleiotropic effects and variable ages of onset and expressivities. The higher frequency of 16p11.2 deletions in the cohort ascertained for both phenotypes (2.9%), compared with cohorts ascertained for either phenotype alone (0.4% and 0.6%, respectively), confirms their impact on both obesity and developmental delay, adding to the evidence that these two phenotypes may be fundamentally interrelated.

Methods Summary

Obesity

Definitions for overweight, obesity and morbid obesity were based on previous studies6,7: for adults, BMI ≥ 25, 30 and 40 kg m-2 respectively; for children, BMI respectively above the 90th, 97th centiles and at least four standard deviations above the mean, calculated according to their age and gender from a French reference population27,28.

Statistics

All reported statistical tests used Fisher’s exact test29, performed on contingency tables constructed for the number of subjects carrying or lacking a 16p11.2 deletion versus the obesity status or ascertainment of the individual. Because no homozygous deletions were observed, it was unnecessary to make a prior distinction between recessive, additive and dominant models of disease risk. Odds ratios and 95% confidence limits were calculated as described30.

CNV discovery

Subjects ascertained for cognitive deficit/malformations with or without obesity were selected from those clinically referred for genetic testing; 16p11.2 deletions were identified in these individuals by standard clinical diagnostic procedures. Algorithmic analyses of GWAS data were performed variously using the cnvHap algorithm, a moving-window average-intensity procedure, a Gaussian mixture model, QuantiSNP, PennCNV, BeadStudio GT module, and Birdseed. When experimental validation was not possible, at least two independent algorithms were used for each data set.

Online Methods

Obesity phenotype

We used previously defined criteria to define overweight, obesity, and morbid (class III) obesity6,7. In adults, the thresholds were BMI ≥ 25, 30 and 40 kg m-2, respectively. In children and adolescents, we used age-specific and sex-specific centiles of BMI, calculated from a French reference population27,28, that approximately corresponded to these thresholds: overweight and obesity were defined by thresholds at the 90th and 97th centiles, respectively. Childhood morbid obesity was defined as BMI ≥ 4 standard deviations above the age-specific and sex-specific mean, which corresponds to a BMI of 40 kg m-2 between the ages of 20 and 30 years for both men and women; this threshold was used in the recruitment of the SCOOP severe early-onset obesity cohorts7. The age-specific and sex-specific thresholds used to define obesity and morbid obesity are shown in Fig. 1 and Supplementary Figs 1 and 2. No carriers of a 16p11.2 deletion were reported to be taking atypical antipsychotics (known to be associated with weight gain).

Patient and population cohorts

Patients referred for cognitive delay and obesity: a group of 33 patients was selected from those referred for genetic testing at the North West Thames Regional Genetics Service, based at Northwick Park Hospital in Harrow, UK, with approval from the Harrow Research Ethics Committee. Inclusion was based on three criteria: mental retardation, dysmorphology, and a weight greater than the 97th centile for age and gender. Abnormal karyotype, fragile X and Prader–Willi syndrome had previously been excluded.

A second group of 279 French children were selected from those referred to two centres (Laboratoire de Diagnostic Génétique, Nouvel Hôpital Civil, Strasbourg, France, and Centre de Génétique Chromosomique, Hôpital Saint-Vincent de Paul, GHICL, Lille, France). Inclusion was based on obesity plus at least one Prader–Willi-like syndromic feature (neonatal hypotonia and difficulty to thrive, mental retardation, developmental delay, behavioural problems, skin picking, facial dysmorphism, hypogenitalism or hypogonadism). Chromosomal abnormalities and Prader–Willi syndrome were excluded by karyotyping and DNA methylation analysis.

Patients referred for cognitive delay: patients with cognitive deficits are routinely referred to clinical genetics for aetiological work-ups including aCGH. We surveyed seven cytogenetic centres in France and Switzerland, identifying 3,870 patients ascertained for developmental delay and/or malformations. Also included in the study was a further 77 patients, ascertained on similar criteria, who were referred to the Department of Genetics, University of Tartu, Tartu, Estonia. These analyses were performed for clinical diagnostic purposes, all available phenotypic data (weight and height) being those provided anonymously by the clinician ordering the analysis. Consequently, research-based informed consent was not required by the institutional review board that approved the study.

CoLaus: this prospective population cohort was described previously16; 6,188 white individuals aged 35–75 years were randomly selected from the general population in Lausanne, Switzerland. These individuals underwent a detailed phenotypic assessment and were genotyped with the Affymetrix Mapping 500K array; 5,612 samples passed genotyping quality control. This study was approved by the institutional review boards of the University of Lausanne, and written consent was obtained from all participants. Because recruitment of this cohort required the ability to give informed consent, it is possible that the (statistically non-significant) lack of 16p11.2 deletions or duplications is due to an ascertainment bias. However, any such bias, if it exists, is very small and affects the identification of only one or two subjects carrying a deletion.

NFBC1966: the Northern Finland Birth Cohort 1966 is a prospective birth cohort of almost all individuals born in 1966 in the two northernmost provinces of Finland. Expectant mothers were enrolled, and clinical data collection took place prenatally, at birth, and at ages 6 months, 1 year, 14 years and 31 years. Biochemical and DNA samples were collected with informed consent at age 31 years. Genotyping with the Illumina Infinium 370cnvDuo array and phenotypic characteristics of the cohort were as described previously17. Phenotypic and genotyping data were available for 5,246 subjects after quality control.

EGPUT: the Estonian Genome Project is a biobank coordinated by the University of Tartu (EGPUT)18. The project is conducted in accordance with Estonian Gene Research Act, and all participants gave written informed consent. The cohort includes more than 39,000 individuals older than 18 years of age and reflects closely the age distribution in the Estonian population (33% male, 67% female; 83% Estonians, 14% Russians, 3% other). Subjects are recruited by general practitioners and hospital physicians and are then randomly selected. Computer Assisted Personal interview (CAPI) was filled during 1–2 h at the doctor’s office. The data included personal data (such as place of birth, place(s) of living and nationality), family history (four generations), educational and occupational history, lifestyle and anthropometric data. A total of 1,090 randomly selected subjects were genotyped with the Illumina 370cnvDuo array, 998 passing the required criteria (nationality, genotyping call rate and phenotype availability).

Case-control familial obesity: the adult-obesity case-control groups and the child-obesity case control groups were as published previously6, and were genotyped with the Illumina Human CNV370-duo array. In all, 643 children with familial obesity (BMI ≥ 97th centile corrected for gender and age, at least one obese first-degree relative, age less than 18 years), 581 non-obese children (BMI ≤ 90th centile), 705 morbidly obese adults with familial obesity (BMI ≥ 40 kg m-2, at least one obese first-degree relative with BMI ≥ 35 kg m-2, age ≥ 18 years) and 197 lean adults (BMI ≤ 25 kg m-2) passed quality control; this cohort included a further 646 control subjects from the DESIR prospective cohort19 (age at examination ≥ 45 years, normal fasting glucose in accordance with 1997 ADA criteria, BMI < 27 kg m-2) genotyped with the Illumina Hap300 array20. All participants or their legal guardians gave written informed consent, and all local ethics committees approved the study protocol.

Severe early-onset obesity cohort: the Genetics of Obesity Study (GOOS) cohort consists of more than 3,000 patients ascertained for severe obesity, defined as a BMI ≥ 4 standard deviations above the age-specific and sex-specific mean, and onset of obesity before 10 years of age. In this study we selected a discovery set of 1,000 UK Caucasian patients from this cohort in whom developmental delay had been excluded by routine clinical examination by experienced physicians (this cohort is referred to as SCOOP). Mutations in LEPR, POMC and MC4R were excluded by direct nucleotide sequencing and a karyotype was performed. DNA samples were analysed with Affymetrix Genome-Wide Human SNP Array 6.0 by Aros, of which 931 passed quality control.

Bariatric surgery cohort: patients undergoing elective bariatric weight-loss surgery were recruited for the ABOS study at Lille Regional University Hospital. Genotyping was performed with the Illumina Human 1M-duo array, and data from 141 adults passed quality control. All participants gave written informed consent, and the study protocol was approved by the local ethics committee.

Swedish discordant sibling cohort: the SOS Sib Pair Study cohort was as published previously21. It includes 154 nuclear families, each with BMI discordant sibling pairs (BMI difference > 10 kg m-2), giving a total of 732 subjects. Genotyping data with the Illumina 610K-Quad array was available for 353 siblings from 149 families. Expression data from subcutaneous adipose tissue (sampled after overnight fasting) were available for 360 siblings from 151 families. Subjects received written and oral information before giving written informed consent. The Regional Ethics Committee in Gothenburg approved the studies.

Statistical methods

In view of the low frequency of the 16p11.2 deletions, all reported statistical tests were conducted with Fisher’s exact test29. This was applied to comparisons of separately ascertained cohorts or categories and was performed on contingency tables constructed for the number of subjects carrying or lacking a 16p11.2 deletion (zero or one copies, because no homozygous deletions were observed) versus the obesity status or ascertainment of the individual. Because no homozygous deletions were observed, it was unnecessary to make a prior distinction between recessive, additive and dominant models of disease risk. For overall analysis of the obesity risk resulting from a deletion, cohorts were pooled in accordance with their obesity status determined according to the criteria described above, and the described tests were then applied to the pooled data. Odds ratios and 95% confidence limits were calculated as described30.

CNV discovery and validation

Clinical identification of 16p11.2 deletions: all diagnostic procedures (aCGH, qPCR, QMPSF and FISH) were conducted in accordance with the relevant guidelines of good clinical laboratory practice for the respective countries. All rearrangements in probands were confirmed by a second technique, and karyotyping was performed in all cases to exclude a complex rearrangement.

cnvHap: CNVs were detected in the child/adult case-control, bariatric surgery, SOS sibpair and NFBC cohorts using the cnvHap algorithm (L.J.M.C., J. E. Asher, R.G.W., J.S.E.-S.M., A.J.d.S., R.S., D. J. Balding, P.F. and A.I.F.B., unpublished observations); this method is based on a hidden Markov model that models transitions between copy-number states at the haplotype level, improving sensitivity and accuracy by capturing linkage disequilibrium information between CNVs and SNPs. The compiled JAR and associated parameter files can be downloaded from http://www.imperial.ac.uk/medicine/people/l.coin/. Sample data from the algorithm applied to the NFBC cohort are illustrated in Supplementary Fig. 5a.

After clustering of genotyping data with the internal Illumina BeadStudio cluster files, values for logR ratio (LRR) and B-allele frequency (BAF) were exported from each project and normalized: effects of percentage GC content on LRR were removed by regressing on GC and GC2, and wave effects31 were removed by fitting a Loess function. Normalized data for probes within 2.5 megabases of the 16p11.2 deletion were analysed with cnvHap, and CNV calls intersecting the single-copy sequences within the deletion (chr16:29514353–30107356, build hg18) were extracted. 16p11.2 deletions were identified by a minimum 90% of probes within the deleted region being called as having a decreased copy number.

All called 16p11.2 deletions were validated by direct analysis of LRR. Data for each probe were normalized by first subtracting the median value across all samples (so that the distribution of LRR for each probes was centred on zero), and then dividing by the variance across all samples (to correct for variation in the sensitivity of different probes to copy-number variation). The normalized data were then smoothed by application of a nine-point moving average and visualized graphically (see Supplementary Fig. 6); putative deletions were checked by subsequent manual confirmation of loss of heterozygosity across the entire region. Equally, all deletions called by this method were confirmed by cnvHap.

Gaussian mixture model: for the CoLaus cohort, raw genotyping data were normalized using the aroma.affymetrix framework32. Normalization steps included allelic cross-talk calibration33,34, intensity summarization using robust median average, and correction for any PCR amplification bias. Copy number (CN) ratios for a given sample, at a given SNP or CN probe, were computed as the log2 ratio of the normalized intensity of this probe divided by the median across all the samples. CN ratios were subsequently smoothed by fitting a Loess function31. CNV calling was performed with a new method based on a Gaussian mixture model (A.V., Z. Kutalik, T. Johnson, B. J. Stevenson, C. V. Jongeneel, D.W., V.M., P.V., G.W., J.S.B. and S.B., unpublished observations). This Gaussian mixture model fits four components (deletion, copy neutral, one additional copy and two additional copies) to CN ratios. The final copy number at each probe location is determined as the expected (dosage) copy number. The method has been validated by comparing test data sets with results from the CNAT35 and CBS36,37 algorithms and by replicating a subset of CoLaus subjects on Illumina arrays. All calls at the 16p11.2 locus made by the highly stringent CBS algorithm were replicated by the Gaussian mixture model. Principal components analysis detected no significant batch effects. Sample data from the algorithm applied to the CoLaus cohort are illustrated in Supplementary Fig. 5b.

PennCNV, QuantiSNP and Birdsuite: CNV discovery in the EGPUT cohort was performed with QuantiSNP38, PennCNV39 and BeadStudio GT module (Illumina). All analyses were conducted with the recommended settings, except changing EMiters to 25 and L to 1,000,000 in QuantiSNP. For PennCNV, the Estonian population-specific BAF file was used. Data from the SCOOP cohort were analysed with Affymetrix Power Tools and Birdsuite software40.

Multiplex ligation-dependent probe amplification (MLPA): MLPA was performed with standard methods41 using reagents obtained from MRC-Holland. The SALSA MLPA kit P343-B1 Autism-1 probe mix was used, which contained nine probes within the deleted region on 16p11.2, plus one probe upstream and one downstream of this locus (see Fig. 1a). MLPA products were separated with an AB3130 Genetic Analyser (Applied Biosystems) and outputs were analysed with GeneMarker software (Soft Genetics) and Microsoft Excel. Data normalization was performed by dividing the peak areas for each of the 11 test probes by the mean of 9 control probe peak areas. Normalized peak area data were then compared across the tested samples to determine which of them carried the 16p11.2 deletion.