Main

Genomic aneuploidy, defined as an abnormal number of copies of a genomic region, is a common cause of human genetic disorders. Classically, the term aneuploidy was restricted to the presence of supernumerary copies of whole chromosomes (trisomy), or absence of chromosomes (monosomy), but we extend this definition to include deletions or duplications of subchromosomal regions.

Trisomy 21 is a model of all human disorders that are the result of supernumerary copies of a genomic region. In this review, we focus on Down syndrome (DS) and human chromosome 21 (HSA21) to show the effect that genomics has had on our understanding of the 'disorders of the genome'. We discuss the recent advances in genome sequencing, comparative genome analysis, functional genome exploration, use of model organisms and lessons from models of gene overexpression. We also discuss the consequences of genomic dosage imbalance owing to an extra copy of a genomic segment (trisomy).

These are exciting times for genomic disorders, such as trisomy 21, because we now have the necessary tools to understand how three copies of a functional genomic element result in abnormal phenotypes.

Trisomies

According to the size of the triplicated genomic region, trisomies can be divided into four categories: complete, or whole-chromosome, trisomies; partial trisomies; microtrisomies and triplication of single genes or single functional genomic elements.

Whole-chromosome trisomies. Whole-chromosome trisomies that result from meiotic or mitotic non-disjunction events are common in humans; they account for 0.3–0.5% of live births. Trisomy for HSA21, which results in Down syndrome and occurs at 1 in 750 live births, is the most frequent event. Trisomies are often observed in a significant proportion of spontaneous abortions; for example, trisomy 16 is found in 1 out of 13, and trisomy 21 in 1 out of 43 such abortions1.

Partial trisomies. Partial (or segmental) trisomies that involve a genomic region of more than one chromosomal band (usually larger than 5 Mb) are much less frequent than whole-chromosome trisomies. They usually result from abnormal meiosis and segregation in individuals with balanced chromosomal rearrangements. One in about 1,800 newborns have an unbalanced, non-robertsonian rearrangement and approximately half of these are partial trisomies. Unbalanced ROBERTSONIAN TRANSLOCATIONS with trisomies of the long arms of ACROCENTRIC CHROMOSOMES occur in 1 of about 14,000 newborns1.

Microtrisomies. This type of trisomy is defined here as the partial trisomy of a genomic segment that is shorter than 3–5 Mb and that is not detectable by routine high-quality cytogenetic analysis. It is also known as segmental duplication. The incidence of microtrisomies is, at present, unknown. Most are due to unequal crossovers in meiosis, mediated by the presence of interchromosomal duplicons or low copy repeats (LCRs; 10–100 Kb each). These duplicons, which make up 5% of the human genome2, promote unequal recombination events that lead to microtrisomies and micromonosomies. Microduplications are seen, for example, in many cases of Charcot–Marie–Tooth disease, type 1A (CMT1A). This neurological disorder is caused by a 1.4 Mb duplication of chromosome 17p12, the result of recurrent non-allelic homologous recombination between duplicons that flank the duplicated segment3. Another more recent example of a microtrisomy is that of SHFM3 (a form of ECTRODACTYLY), which is caused by a duplication of 0.5 Mb on chromosome 10q24 (Ref. 4). New pathogenic or polymorphic microtrisomies will probably be identified using diagnostic methods such as BAC ARRAY CGH (comparative genomic hybridization)5.

Single-gene duplications. Duplications of only one gene or one functional genomic element can also be pathogenic; for example, duplicated PLP1 causes Pelizaeus–Mertzbacher disease, and duplicated PMP22 causes CMT1A6,7. The genomic duplications that include these genes also encompass additional coding sequences, but studies of transgenic mice that carry single gene duplications (either of Plp1 or Pmp22) indicate that the abnormal phenotype is caused by duplication of these two genes8,9.

Evidence from experiments using transgenic mice indicates that single gene dosage imbalance might cause abnormal phenotypes. For example, strain B6;SJL-Tg(EPO33)72Ptc/J, with a single copy of human EPO.

Finally, the duplication of a non-genic, but functional, genomic element might also be pathogenic. A recent example of this comes from the duplication of 30 Kb containing a conserved regulatory element that maps 1 Mb from the sonic hedgehog (Shh) locus and causes PREAXIAL POLYDACTYLY in the sasquatch (Ssq) mouse mutant10.

Genomic dosage imbalance frequently occurs in somatic cancer cells. Chromosomal instability of tumour cells, resulting in various trisomies and monosomies, is associated with tumour progression11; this topic deserves a separate discussion and we do not return to it here. Below, we focus on constitutional trisomies, that is, those that are present in all cells, and in particular on trisomy 21 as it is the most frequent example and serves as a model for all constitutional trisomies.

Origin of the extra HSA21

The parental origin of the supernumerary HSA21 in trisomy 21 was determined using highly informative polymorphic markers in DNA from parents and DS offspring. DNA markers near the centromere indicate the stage of meiosis during which the segregation error occurred. Homozygosity for all polymorphic markers throughout 21q (the long arm of HSA21) is indicative of a mitotic, post-zygotic error. Supplementary information S1 (table) lists the various origins of human trisomy 21.

For an extensive discussion of the origin of aneuploidy in all human trisomies, and the link between recombination and non-disjunction, the reader is referred to the excellent recent review by Hassold and Hunt12. The origin of translocation trisomy 21 is discussed in Refs 13,14.

Phenotypic variability of DS

There are two types of phenotypes that are observed in trisomy 21: those seen in every patient and those that occur only in a fraction of affected individuals (see Supplementary information S2 (table))15,16. For example, cognitive impairment is present in all patients with DS, whereas congenital heart defect occurs in 40% and ATRIOVENTRICULAR CANAL in 16% of patients. DUODENAL STENOSIS/ATRESIA, Hirschsprung disease and acute megakaryocytic leukemia occur 250-, 30- and 300-times more frequently, respectively, in patients with DS than in the general population. In addition, for any given phenotype there is considerable variability (severity) in expression. For example, the extent of cognitive impairment varies widely in individuals with DS17.

There are several working hypotheses that attempt to explain the phenotypic variability in trisomy 21 (or other trisomies; see Box 1). The gene (or genomic functional unit) dosage imbalance is the main molecular mechanism that requires further investigation.

Genomic content of HSA21

The almost complete, high-quality sequence of 21q was published in May 2000 (Ref. 18). There are three cloning gaps on 21q, each of which is only 20–30 Kb long. There are, therefore, 4 contigs for 21q (from centromere to telomere): 28,602; 229; 1,378; 3,432 Kb for a total length of 33,642,989 nucleotides (NCBI Build 34). Only a region of 281 Kb has been sequenced from the short arm of HSA21, 21p. Figure 1 provides a comparison of chromosome 21 features with those of other chromosomes. The comparisons are based on the recent NCBI builds, and on published sequences19,20.

Figure 1: Features of human chromosome 21 (HSA21).
figure 1

Each line describes a particular feature shown on the left. The position of HSA21 in each scale is shown as a red triangle. All other chromosomes are shown as blue triangles. The identity of chromosomes with extreme values per feature is also shown. The sources of information are: *ENSEMBL 34; GALA 34; §NCBI 34; Ref. 20. The high SNP density is probably a biased estimate owing to the updated estimate provided by Ref. 41. CNGs, conserved non-genic sequences; LINEs, long interspersed nuclear elements (such as L1 repeats) are retroelements present in over 100,000 copies in the mammalian genome; SINEs, short interspersed-repeat transposable elements.

HSA21 is among the smallest of human chromosomes and its 33.6 Mb-long arm represents 1% of the total sequences obtained (3020.3 Mb of NCBI Build 34 version 1). Table 1 lists other characteristics of HSA21. The total number of genes (protein-coding and non-coding RNAs (ncRNAs)) on 21q has not yet been conclusively determined. A total of 225 genes was estimated when the initial sequence of 21q was published18, 35% of those are homologous to Drosophila melanogaster, 35% to Caenorhabditis elegans and 17% to Saccharomyces cerevisiae genes. Subsequent on-going analysis based on computational methods, EST sequencing, laboratory verification and comparative genome analysis, resulted in an estimated 261–364 protein-coding genes (Table 1, references wherein and Refs 18,2128). These include potential transcripts that are supported by only one 'spliced' EST.

Table 1 Characteristics of human chromosome 21

There are gene-rich and gene-poor regions on 21q, which correlate with the G+C content. The so-called L isochores (G+C <43%) contain few genes (1 per 300 Kb), whereas the H3 isochores in GIEMSA light chromosomal bands (G+C >48%) contain most of the genes (1 per 58 Kb, according to the first annotation).

More research is needed for the complete and correct genic annotation of HSA21. This is particularly true for single exon genes, ncRNAs and rare transcripts. An international collaborative effort to re-annotate HSA21 is in progress. Computational analysis has provided initial evidence for 5 microRNAs (miRNAs) on 21q, although their function and potential involvement in DS remain unknown.

Comparative sequence analysis of human and other genomes, particularly that of the mouse, has resulted in the discovery of novel genes and verification of putative transcripts25,29. The availability of other mammalian genome sequences will further increase our ability to recognize additional genes and validate initial gene predictions. The 33.5 Mb chimpanzee chromosome 22, which is homologous to HSA21, has recently been sequenced30. There are 1.44% single nucleotide substitutions between the human and chimpanzee sequences, and nearly 68,000 insertions and deletions (indels); more than 99% of the indels are shorter than 300 bp. Remarkably, 86% of the 231 unambiguous coding sequences in both species show amino-acid differences. Indels within coding regions represent one of the main mechanisms that lead to protein diversity.

An interesting approach to improving the functional annotation of HSA21 involves studying transcriptional activity of the entire chromosome. In one study, oligonucleotide arrays that contain probes spaced, on average, 35 bp apart and that cover the entire 21q were used to estimate mRNA expression from 11 different human cancer cell lines31. As many as 9.7% of the probes showed positive hybridization signals in 5 of the 11 cell lines31, which indicate that the potentially transcribed genome is 10-fold larger than the current genic annotation. This transcription potential could be due to additional unidentified genes, RNA transcripts without protein-coding capacity, alternative RNA isoforms of previously annotated genes, or 'illegitimate' non-functional transcription. Subsequent analysis of these data for chromosomes 21 and 22, showed that 49% of the observed hybridization signal (transcription) was outside the known annotation32; 65% of these transcripts were verified by reverse transcriptase PCR (RT-PCR). These results emphasize that the annotation of HSA21 sequence is far from complete.

Another study combined chromatin immunoprecipitation and high-density oligonucleotide arrays to map more than 300 binding sites of the three transcription factors, Sp1, cMyc, and p53 on HSA21 (Ref. 33). As expected, these sites clustered near 5′ promoters of protein-coding genes and CPG ISLANDS but many were found in 3′ ends of known protein-coding genes. Most of the binding sites identified are, in fact, associated with ncRNAs of as yet unknown function. Intriguingly, this study provided initial evidence for widespread antisense transcription, the functional significance of which remains unknown.

Functional analysis of the HSA21-encoded proteins is one of the research priorities for the understanding of DS. Analysis of Swiss-Prot-listed HSA21-encoded proteins with Interpro for protein families, domains and functional sites, gave 207 entries (see Online links box). Analysis of the same proteins using the Gene Ontology Annotation (see Online links box) shows that they are involved in 87 different biological processes, have 81 different molecular functions and are localized in 26 different cellular components. The most frequent molecular function is DNA binding and transcription factor activity (15 proteins); the most common cellular localizations are in the nucleus and the plasma membrane (19 and 15 proteins, respectively); and the most common biological process that they are involved in is signal transduction (11 proteins).

The completion of the sequencing of HSA21 (Ref. 18) and of the mouse genome29,34 provided the first opportunity to compare DNA sequences of entire chromosomes of two mammalian species, so that conserved genomic elements that are likely to be functional can be identified. The comparison of the 33.5 Mb of HSA 21q with the orthologous mouse genomic regions on MMU16, MMU17 and MMU10 revealed 3,491 sequences of ≥100 bp and ≥70% identity without gaps. Unexpectedly, only 1,229 of those corresponded to exons of previously known genes35. Remarkably, the remaining 2,262 conserved sequences mapped preferentially to the gene-poor regions of 21q. About 80% of these sequences are located in intergenic regions, with the remaining 20% in introns. Experimental, bioinformatic and evolutionary analysis strongly suggested that these sequences are not 'functionally' transcribed, and do not correspond to protein-coding genes35,36. These sequences, which we call conserved non-genic (CNG), account for 1% of the HSA21 sequence. The CNGs are highly conserved in mammals; from primates to monotremes, to marsupials37,38,39,40. The remarkable conservation, over more than 150 million years of evolution since the common mammalian ancestor, strongly indicates that most CNGs are functional, although their function is unknown.

Genomic variability

The variability of HSA21 might be partially responsible for the different DS phenotypes. It is therefore necessary to determine the common and rare DNA variants on this chromosome. Approximately 1.26% of HSA21 consists of short sequence repeats that might be polymorphic18. In addition, 105,334 probable SNPs have been identified (see Online links box).

The initial assessment of LINKAGE DISEQUILIBRIUM structure of polymorphic sites involved re-sequencing 20 haploid human genomes from different ethnic groups using tiled oligonucleotide arrays of the non-repetitive fraction of 21q (Ref. 41). For this study, HSA21 from each individual was separated in somatic cell hybrids and haploid DNA hybridization probes were generated by long-range PCR. A total of 24,386 SNPs were identified with the rare allele observed at least twice (in addition to 11,603 nucleotide variants that were seen only once). There were 4,135 inferred HAPLOTYPE BLOCKS with an average length per block of 7.83 Kb (Ref. 41). Such studies describing the chromosome-wide linkage disequilibrium maps provide the genomic infrastructure for the determination of SNPs involved in gene expression variation, and in the predisposition to different variable phenotypic manifestations of DS and to common, complex phenotypes.

Mouse models and phenotypes

HSA21 is homologous to mouse chromosomal regions that map three different chromosomes. From 21cen to 21qter, about 23.2 Mb are homologous to MMU16, 1.1 Mb to MMU17, and 2.3 Mb to MMU10 (Fig. 2). Importantly, none of the existing mouse models perfectly mimic the chromosomal abnormality that is observed in DS.

Figure 2: Regions of synteny between human chromosome 21 (HSA21) and mouse chromosomes (MMUs) 16, 17 and 10.
figure 2

There are three partial trisomy mouse models of human trisomy 21, all trisomic for a portion of MMU16. The gene content of these partial trisomies is shown on the right. The list of gene names in Ts65Dn is from Ref. 43. Sequence and gene data are taken from ENSEMBL.

The best studied model is that of a partial (segmental) trisomy 16, named Ts65Dn (Ref. 42). The trisomy in this mouse mutant extends for at least 23.3 Mb from Mrpl39 to the Znf295 genes43 (predicted to contain 132 genes that are homologous to HSA21). Detailed morphological and behavioural characterization of the Ts65Dn mouse model revealed several abnormal phenotypes that are similar to those seen in human trisomy 21 (Refs 4451). Additional partial trisomies for MMU16 that extend from Sod1 to Znf295 (85 known genes) (Ts1Cje)52 and from App to Sod1 (46 known genes) (Ms1Cje/Ts65Dn)53 have been reported, and their phenotypic characterization is in progress. Neurological phenotypes in mouse models of DS are shown in Table 2. There is also a chimeric mouse model that has substantial numbers of cells that carry portions of HSA21 (Ref. 54).

Table 2 Mouse models of Down syndrome

The contribution of triplicated individual genes to the mouse phenotype could be further evaluated by the sequential deletion of one copy of the gene from the partial trisomy mouse. This could be achieved by crossing mice with a particular HSA21 gene knocked out with the partial trisomy mouse. One such example has been published recently for Ifngr and Ifnar2 (Ref. 55), which encode the interferon-δ and -α receptor 2, respectively. The resulting mouse with trisomy 16, but only 2 instead of 3 copies of these genes, showed significantly improved fetal growth and cortical neuron viability. A more direct approach involves creating transgenic mice that overexpress one gene that is orthologous to HSA21. Several such transgenic mice have been developed, following the first such mouse that carried SOD1 transgenes56 (Table 2). To achieve results that are biologically relevant to DS, the transgene (ideally the mouse gene) needs to be 'physiologically' regulated from its own regulatory elements and only one extra copy needs to be expressed. Ideally, a total transcript level (endogenous and from the transgene) should be 1.5 times that of the normal expression level. One such example has been described for Sim2; the mice that carry three copies of Sim2 are characterized by abnormal spatial exploration and social interactions, and reduced NOCICEPTION57.

Multicopy overexpression of a transgene from an exogenous promoter can also provide insights into gene function58, although not necessarily into its role in trisomy. The crossing of these animal models into different genetic backgrounds might uncover contributions of modifier genes that affect certain DS phenotypes.

Gene expression in trisomy 21

Characterizing the spatial and temporal expression of HSA21 genes is an important step towards understanding how gene dosage imbalance affects DS phenotypes.

Since the discovery of trisomy 21 in 1959 (Ref. 59), it has been hypothesized that the genes that are present in 3 copies are overexpressed 1.5-fold relative to the euploid state. This hypothesis has only recently been tested in the Ts65Dn partial trisomy mouse model of DS (Refs 43,60). In one study, a total of 78 genes present in 3 copies on MMU16 were tested by quantitative RT-PCR in 6 mouse tissues at 2 developmental stages (postnatal day 30 and 11 months old)60. Surprisingly, only 37% of genes are expressed at the theoretical value of 1.5-fold; 45% of the genes are expressed at levels significantly lower than 1.5-fold, 9% are not significantly overexpressed and 18% had expression levels greater than 1.5-fold (Fig. 3). Similarly, a second study using cDNA arrays on nylon filters and quantitative RT-PCR to examine nine adult tissues found that nearly all triplicated genes had elevated transcript levels in most tissues where they were expressed, whereas a few showed downregulation, compensation or strong overexpression in a tissue-specific manner43,60. These data not only provide candidate genes for contribution to DS phenotypes, but also highlight the complex regulation of gene expression related to genomic dosage imbalance.

Figure 3: Gene expression levels of trisomic genes in the Ts65Dn mouse model of Down syndrome.
figure 3

a | The Ts65Dn mouse is trisomic for a 16 Mb region of mouse chromosome 16 containing 132 genes. b | Summary of gene expression data of trisomic genes in Ts65Dn. Total data from 79 genes, 2 developmental stages and 6 tissues are shown. The percentages of genes with expression relative to the values 1.0 and 1.5 are also shown. c | Examples of expression data for the heart. The majority of genes present in 3 copies are expressed close to 1.5. A notable exception is the gene Ankrd3, which is highly overexpressed in both developmental stages P30, postnatal day 30;Ts, the trisomy state; Eu, the euploid state. Data are from Ref. 60.

The detailed and systematic gene expression profiles of HSA21 genes (and, for that matter, all human genes) should be one of the research priorities of the functional analysis of our genome. First attempts towards this goal have been recently published. A considerable number of mouse orthologues of HSA21 genes (160) have been studied by RNA in situ hybridization, by RT-PCR on adult and fetal mouse tissues, and by in silico mining of ESTs (Refs 61,62).

Initial attempts to study the global transcriptome differences between trisomy 21 and euploid state using microarrays or serial analysis of gene expression (SAGE) have also been published63,64,65,66 (Table 3). A meta-analysis of the existing data, and additional data collection, are needed to draw biologically meaningful conclusions and to develop further hypotheses about the aetiology of DS.

Table 3 Transcriptome studies of trisomy 21 (T21), or mouse model, versus euploid state

Finding candidate genes for DS phenotypes

Given the extent of sequencing data and the efforts to gather functional genomic data, how can we identify and prioritize candidate genes for their contribution to DS phenotypes? Several criteria could be used on the basis of present and future functional analysis of HSA21. The spatial and temporal pattern of expression is one criterion. Another is the potential functional classification of the predicted protein. Genes in which the expression varies widely between individuals might contribute to the variable phenotypes, whereas genes with no expression variation might underlie the phenotypes that are present in all individuals with DS. Abnormal phenotypes in transgenic mice with 1.5-fold overexpression would provide strong evidence for the role of such candidate genes.

The map position of a gene in a HSA21 interval that is associated with a given phenotype is another strong criterion. The study of rare patients with partial trisomy 21 defined the genomic regions that harbour genes associated with some DS phenotypes. For example, a region that is crucial for the heart defect (see Supplementary information S2) was identified in this way67; and so searches for genes and genetic variation that contribute to this phenotype should focus on this region.

A number of investigators have described a 'DS critical region' that specifically contains genes that contribute to cognitive defects or other DS features. However, the definition of these regions has been controversial as there are patients with partial triplications outside this region who, nevertheless, manifest some features of DS (Refs 6769). This issue should be re-examined now that the complete sequence of HSA21 and better diagnostic tools are available.

Finally, non-HSA21 transcripts with significant expression differences, as measured by microarray analysis in cells, tissues and organs with trisomy 21 versus normal organs, constitute a class of candidates that not only are likely to contribute to DS phenotypes, but could also be targets for potential therapeutic interventions.

Identifying and prioritizing CNGs that might contribute to DS phenotypes will pose a particular challenge. CNGs that are cis- or trans- regulators, or that harbour variation with functional consequences, are obvious candidates for dosage-related phenotypic abnormalities.

Diagnostic methods: old and new

Cytogenetic analysis of metaphase karyotypes remains the standard practice to identify not only trisomy 21, but also all other aneuploidies and balanced translocations. Over the past 10 years however, several other methods have been developed and used for the rapid detection of trisomy 21, either in fetal life or after birth (Fig. 4).

Figure 4: Trisomy 21 diagnostic methods: old and new.
figure 4

a | G-banded karyotype of a trisomy 21 female, showing three copies of human chromosome 21 (HSA21). b | Fluorescent in situ hybridization (FISH) of interphase nuclei of a trisomy 21 fetus. In each cell there are two green spots (LSI 13 SpectrumGreen probe, Vysis) and three red spots (LSI 21 SpectrumOrange probe, Vysis) marking the 13q14 and 21q22.13-q22-2 chromosomal regions, respectively. c | Quantitative fluorescence PCR (QF-PCR) of marker D21S1270 of a trisomy 21 proband (bottom line) and his mother (top line). There are three different alleles in the patient DNA (alleles of sizes 320, 325 and 330 bp), but only two in the patient's mother (alleles 320 and 325 bp). d | Paralogous sequence quantification (PSQ). Two paralogous sequences, one mapping to HSA21 and the other to HSA5, which differ by a single nucleotide (in the example shown here, T on HSA21 and C on HSA5), are amplified and sequenced quantitatively (by PYROSEQUENCING) to determine the quantity of the variable nucleotide in a trisomy 21 proband (bottom line) and a parent (top line). Assuming similar amplification of both paralogous sequences, ratios of 0.5 for HSA21/HSA5, the two variable nucleotides in a normal control, and ratios of 1.5 in a trisomy 21 sample are expected. In this example, we observe 51% T and 49% C content in the parent, and 59% T and 41% C content in the trisomy 21 child.

The most widely used is fluorescent in situ hybridization (FISH) of interphase nuclei, using HSA21-specific probes or whole-HSA21 PAINTING70. An alternative method that is now widely used in some countries is quantitative fluorescence PCR (QF-PCR), in which DNA polymorphic markers (microsatellites) on HSA21 are used to determine the presence of three different alleles70. This method relies on informative markers and the availability of parental DNA. Additional methods to measure copy number of DNA sequences include the multiple amplifiable probe hybridization (MAPH)71, and multiplex probe ligation assay (MLPA)72. A recent method, termed paralogous sequence quantification (PSQ), uses paralogous sequences to quantify the HSA21 copy number73. Finally, comparative genomic hybridization (CGH) on BAC chips can be used for the diagnosis of full trisomy or monosomy, and for partial (segmental) aneuploidies5,74.

Examples of protein dosage imbalance

How does a supernumerary (third) copy of normal (wild-type) alleles result in an abnormal phenotype? This question has, for a long time, interested researchers in this area75,76,77. Below, we provide some examples (mostly from model organisms) of potential molecular mechanisms that deal with this question.

Subunits of multimeric proteins. Many protein complexes consist of different subunits that are encoded by genes on different chromosomes. The stoichiometry of subunits is usually well controlled for the normal function of the complex. Disturbance of the normal stoichiometry of subunits, owing to a supernumerary copy of the gene that encodes one subunit, might result in protein complexes of abnormal composition and function. The nucleosome, for example, consists of a central core of eight histone proteins (two of each H2A, H2B, H3 and H4) with 146 bp of dsDNA coiled around it. The ratio of H2A and H2B over H3 and H4 is important for the fidelity of chromosomal replications and/or segregation in yeast78.

Another example is that of nicotinic acetylcholine receptors (nAChRs), which are usually composed of 2 α- and 3 β-subunits. Experiments in Xenopus laevis oocytes and human embryonic kidney cells have shown that alteration of the stoichiometry of the subunits of nAChR results in alterations of functional properties of the receptors79,80.

Transcription regulators. Three or more copies of genes that encode transcription regulators might result in abnormal expression of downstream target genes, which in turn could have phenotypic consequences. A characteristic example is provided by the mouse BAC transgenic experiments using the clock gene, which encodes a basic helix-loop-helix-PAS domain transcription factor, a component of the circadian pacemaker system. Normal mice have mean circadian periods of 23.48 hours, whereas transgenic mice harbouring just 1–2 supernumerary but normal copies of the clock transgene on a BAC had significantly shorter circadian periods with a mean of 22.89 hours81.

A further example is the protein product of the mouse Bmi1 locus. Bmi1 is a component of the chromatin-associated polycomb complex and is involved in maintaining the transcriptionally repressed state of genes; it modifies chromatin in a heritable way. Transgenic mice that overexpress Bmi1 have dose-dependent skeletal anomalies, namely anterior transformation of vertebral identity82. In both examples, the phenotypic abnormalities are a consequence of the increased amount of a specific protein.

Cell surface receptors and ligands. Triplication of genes that encode receptors and ligands might also result in phenotypic differences. An illustrative example comes from Notch signalling in cell-fate determination in the epidermis of D. melanogaster. Cells that contained three copies of the wild-type Notch, which encodes a transmembrane receptor, always adopted an epidermal fate, whereas cells with two copies of Notch had a 50% chance of adopting neural or epidermal fate83.

A good example of dosage effects of genes encoding ligands is that of erythropoietin (EPO), which is the primary protein that regulates mammalian erythropoiesis. Transgenic mice with an extra copy of EPO develop POLYCYTHAEMIA with HAEMATOCRIT of 80% (normal 45%)84.

Transporter molecules. Transporters such as members of the transmembrane ABC transporter family could also have gene dosage effects. ABCA1 transporter affects intracellular cholesterol transport, and pathogenic mutations in this locus cause high-density lipoprotein deficiency type 1. BAC transgenic mice with an extra copy of ABCA1 have significantly increased cholesterol efflux in different tissues, and elevated high-density lipoprotein (HDL) cholesterol85.

Cell adhesion molecules. Cell adhesion molecules might also cause dosage sensitive phenomena. For example, the rate of aggregation of synthetic vesicles carrying the neural cell adhesion molecule (NCAM) is proportional to the concentration of NCAM3.5. Therefore, a 50% increase or decrease in NCAM concentration causes a 4-fold increase or 90% decrease in cellular adhesiveness, respectively86.

Morphogens. Alterations in the concentration of morphogens can have considerable effects on development. One of the first examples to be described is bicoid (bcd) in D. melanogaster. Embryos derived from mothers that carry one, two, four or six copies of bcd-generated threshold concentrations of the homeodomain-containing BCD protein at progressively more posterior positions in these embryos87. These threshold concentrations of a morphogen dictate distinct developmental outcomes as a function of distance from the source; in the case of BCD protein gradient, the outcome is the development of the anterior body pattern of the fly.

Regulatory elements. Duplication of regulatory elements could result in altered gene expression and specific pathological phenotypes. For example, tandem duplication (owing to a transgene insertion) of a regulatory element 1 Mb upstream of the Shh locus in the Ssq mouse, causes ectopic Shh expression resulting in preaxial polydactyly10. Therefore, three copies of any functional element in the genome might be responsible for specific trisomy-related phenotypes.

Future directions of research

The dissection of the genomic infrastructure of HSA21, the appreciation of its genomic variability and the conservation of functional sequences in model organisms, provide new opportunities for understanding the function of the HSA21-encoded genes, and for explaining the molecular pathogenesis of different phenotypic manifestations of DS.

There are many important future research goals, which are listed below, but are by no means complete. Functional analysis of all HSA21 genes is a priority, particularly in the context of development (timing and cellular specificity of expression). Studying HSA21 orthologues in model organisms would facilitate such analysis. The results might lead to the identification of candidate genes for specific DS phenotypes and for gene dosage sensitivity.

Determining the variation of gene expression in different tissues is crucial. Genes with considerable allelic variation of expression in the population will probably be identified. These genes are probable candidates for allele-specific contribution to specific DS phenotypes. At the same time, genes without polymorphic variability of expression are probably those that contribute in an allele-independent manner to DS phenotypes. Furthermore, genes that show considerable overlap of the RNA expression output between normal and trisomy tissues or cells could be excluded as candidates. However, alterations in gene expression measured at the transcript level might not always accurately reflect alterations in protein levels88. It is therefore necessary to either validate the differences at the protein level and/or study differences in the proteome.

Genomic variation in cis that is responsible for gene expression variation will need to be uncovered. This variation could then be used in association studies to detect genes that are responsible for certain DS phenotypes. Initial association studies have been published for the heart defects that are associated with DS (Ref. 89).

Mouse models with trisomy of all HSA21 syntenic regions (that is, the MMU segments of chromosomes 16, 17 and 10) should be created and their phenotypes characterized in detail. Mice that are trisomic for single genes should also be generated as they provide valuable information on certain candidate genes. Another informative approach involves 'subtracting' the third allele of a gene from the trisomy mouse using appropriate crosses with heterozygous or homozygous mice that carry a deletion of that gene.

The function of the numerous CNGs on HSA21 will need to be determined. Identification of CNGs with potential gene regulatory function will reveal candidate genomic elements for dosage imbalance. Trisomy for some of them or their genomic variation might be related to certain DS phenotypes. CNGs that are not regulatory might also be involved in trisomy 21-linked phenotypes, owing to as yet unknown mechanisms.

Determination of the non-HSA21 (trans-acting) genes or non-coding sequences that predispose to DS phenotypes will also be important. Association, transmission disequilibrium and other tests could be used with genome-wide markers to define genomic regions that harbour variability, which contributes to the development of certain phenotypes. These genomic regions might also contribute to similar phenotypes in non-trisomy 21 individuals.

The extent of partial trisomy 21 and correlation with the presence of certain phenotypic features of DS needs to be re-evaluated. A complete tiling path of BAC or oligonucleotide array CGH, or other methods, could be used to determine precisely the genomic trisomy 21, and therefore provide a comprehensive list of candidate genes (coding and non-coding) and conserved functional elements.

Relevant miRNAs and other non-coding RNAs and their targets need to be identified, and dosage sensitivity of the miRNAs and their potential involvement in certain DS phenotypes need to be evaluated.

Global gene expression differences between trisomy and eusomy states need to be evaluated in human cell lines and tissues, or in those of model organisms such as the mouse. Once defined, such differences could be used for diagnostic purposes, but they will be most important for the initial determination of biological processes that are dysfunctional in full or partial trisomy 21. The problem of variability of expression owing to polymorphisms could be avoided by studying tissues from discrepant identical twins (one twin with trisomy 21 and the other with a normal karyotype), or from clonal cell lines obtained from individuals with mosaic trisomy 21.

All functional genomic elements of HSA21 need to be identified. The exploration of the genomic sequences is far from complete. Multi-species comparative sequence analysis will provide initial evidence for additional protein-coding potential, ncRNAs and regulatory elements. Experimental analysis using whole chromosome approaches will hopefully provide convincing evidence for the diverse biological roles of a plethora of so far unknown functional regions.

Finally, the sequencing of the short arm of HSA21, or at least the short arm of an acrocentric chromosome awaits completion. A complete contig of these sequences might be difficult or impossible to obtain owing to the presence of multiple repeats in each short arm and the 'homogenization' of the short arms of the acrocentric chromosomes. It is likely, however, that important but as yet unknown functional elements (including genes) might be present on these short arms.

Therapy for trisomy 21?

Our understanding of the molecular pathogenesis of DS is remarkably poor; it is therefore unlikely that an effective therapeutic intervention will be found within the next decade. However, attempts towards treatment could be made in model organisms or in cultured cells. For example, reducing the total transcript levels of the dosage sensitive genes (perhaps by small interfering RNA (siRNA)) might alter cellular or whole organism phenotypes. Another option could involve pharmacological interference with the dysregulated metabolic pathways or key target molecules that are crucial for phenotypic characteristics.

Ultimately, the elucidation of the phenotypic consequences of gene dosage imbalance in DS might provide new opportunities for therapeutic interventions.