Challenges and standards in integrating surveys of structural variation

Scherer, Stephen W; Lee, Charles; Birney, Ewan; Altshuler, David M; Eichler, Evan E; Carter, Nigel P; Hurles, Matthew E; Feuk, Lars

doi:10.1038/ng2093

Perspective
Published: 27 June 2007

Challenges and standards in integrating surveys of structural variation

Stephen W Scherer^1,2,
Charles Lee³,
Ewan Birney⁴,
David M Altshuler⁶,
Evan E Eichler⁷,
Nigel P Carter⁵,
Matthew E Hurles⁵ &
…
Lars Feuk¹

Nature Genetics volume 39, pages S7–S15 (2007)Cite this article

4842 Accesses
273 Citations
7 Altmetric
Metrics details

Abstract

There has been an explosion of data describing newly recognized structural variants in the human genome. In the flurry of reporting, there has been no standard approach to collecting the data, assessing its quality or describing identified features. This risks becoming a rampant problem, in particular with respect to surveys of copy number variation and their application to disease studies. Here, we consider the challenges in characterizing and documenting genomic structural variants. From this, we derive recommendations for standards to be adopted, with the aim of ensuring the accurate presentation of this form of genetic variation to facilitate ongoing research.

You have full access to this article via your institution.

Main

Structural variation in the genome refers to cytogenetically visible and (more commonly) submicroscopic variants, including deletions, insertions, duplications and large-scale copy number variants — collectively termed copy number variations (CNVs) — as well as inversions and translocations (Box 1)^1,2,3. Genome scanning technologies are now commonplace in many laboratories, allowing new structural variation to be recognized from general population surveys^{4,5,6,7,8,9,10,11,12} or studies of diseases^{13,14,15,16,17,18,19,20,21}. In fact, the Database of Genomic Variants^4,22 (see list of databases in Table 1) already contains entries (mainly CNVs) covering some 538 Mb (18.8% of the euchromatic genome) derived from the study of fewer than 1,000 genomes from individuals with no obvious disease phenotype.

Table 1 Databases

Full size table

This first round of observations came from several studies, each using a different technology platform and data processing algorithms, with different degrees of pre- and postexperimental standardization and validation. As a result, the data vary in quality and often have both high false-positive and false-negative rates. There is the very real possibility of the entire human genome soon being presented as 'structurally variant' in one form or another, based solely on studies of nondisease samples, which would be a distortion. It will be important for all future applications of structural variation information that the scope and detail of variants in the general population be accurately cataloged. In particular, medical genetics research — investigating structural variation profiles in individuals or clinical cohorts — will need a reliable foundation against which to interpret possible pathogenic findings in cytogenomic (Fig. 1), linkage and genome-wide association studies^21,23,24,25.

**Figure 1: Lexicon of genomic variation.**

The field of genomic structural variation, however, is on the cusp of change. Pioneering approaches, often fragmented or fraught with technical limitations, are being supplanted by new technologies that afford much higher resolution screening of the genome at lower cost. We anticipate that, in the next year, the quantity of structural variation data will increase by orders of magnitude owing to microarray-based experiments alone, not to mention the plethora soon to flow from clone-end^6,26 or whole-genome sequencing experiments^27,28,29,30. Many of these studies will survey nondisease samples for structural variation discovery to create control databases. Moreover, in little more than two years from the first description of global CNV distribution^4,5, the field is poised to make structural variation analyses standard in the design of all studies of the genetic basis of phenotypic variation. At this inflection point, we examine what is known about genomic structural variation, and consider perspectives and simple standards designed to safeguard integrity and maximize data utility for the immediate future.

Challenges in characterizing structural variants

Research into structural variation is currently at a state of development comparable to that of the earliest SNP studies. Initiatives to discover and characterize simpler structural variants — such as small insertions, deletions (indels) and balanced inversions — is likely to yield results in proportion to investment, as was the case for SNPs^31,32,33. However, for larger and particularly for more complex structural variants, there are additional confounding factors. To provide a framework for discussion of prospective standards, we group into five categories the major issues currently curbing progress in this field. Data quality, which has impact throughout these other issues, is discussed in the subsequent subsection. The majority of the discussion pertains to the variants classed as CNVs, as these represent the predominant form studied to date. Our comments also mostly target issues related to whole-genome discovery surveys.

Terminology. The newly recognized domain of structural variation is blurring the distinction between traditional cytogenetic and molecular analyses, as it fills the (albeit narrowing) gap between the limits of resolution of these earlier approaches to genetic variation (Fig. 1). Terminology established within each camp is sometimes unwieldy in the crossover (Box 1). Moreover, there is no standard nomenclature for structural variants that fall between those that can be classified by naming systems established from the cytogenetic^34,35 or mutation literature³⁶ (for example, indels). For some terms, such as CNV, there is added complication because they are being used regularly as a descriptor in both control and disease studies, but with different meaning. Different classes of CNVs are described in Redon et al.¹¹ and in Supplementary Figure 1 online. Nomenclature for genes encompassed by structural variants also needs to be considered, but no rules have yet been established.

Annotating complex structural variants. Many structural variants are large in size, flanked by or encompassing complex repetitive DNA sequences. They may be unbalanced in content or highly polymorphic, characteristics that pose significant challenges for detection and analysis. There are many complexities associated with classifying and characterizing CNVs (Supplementary Figs. 1, 2 and 3 online). As the precise rearrangement breakpoints are usually not resolved (because of coincidence with large repeats or because of low resolution coverage of assays), it is typically not possible to determine whether the underlying variants are identical by descent or represent independent events in close proximity to one another. Regions of high sequence identity may also cause cross-hybridization on comparative genome hybridization (CGH) platforms, leading to CNV calls in regions that are not actually variable (Supplementary Fig. 3). Determining the meiotic and mitotic characteristics of these variants — such as the de novo mutation rate, stability and level of mosaicism — can also be confounded not only by the complex nature of the underlying sequences but by technical and comparative limitations, including the source of the DNA (described below).

Technological limitations. At present, no single approach identifies all types of structural variation. Current scans of genome-wide structural variation are screening or discovery assays, and not definitive tests. In our hands, the testing of a single sample by different platforms and 'call' algorithms can lead to substantially different CNV call rates, owing to differing sensitivity, specificity, probe density and type of probe used (Table 2 and Supplementary Table 1 online). This matter is underscored by the relatively small degree of overlap among published datasets^2,37, even when assessing identical samples^7,9,10,11. The progress on CNV discovery to date is largely due to the availability of numerous microarray platforms, which detect quantitative imbalances. In contrast, there is currently no high-throughput, cost-effective method to scan the genome for inversions or translocations. Short of comparing 'finished' sequence assemblies from independent sources^38,39, it can take a multitude of approaches to identify, validate and sequence the compendium of structural variation comprehensively (Table 3 and Supplementary Table 2 online). Other issues, such as relative costs of arrays and reagents and availability of specialized equipment, often limit access to the most appropriate experiments.

Table 2 Copy number variants called on the same test sample (NA15510) using different experimental platforms and algorithms^a

Full size table

Table 3 Summary of 12 published surveys (2004–2007) of structural variation content in human genomes^a

Full size table

Characteristics of reference and test samples. Identification of variation requires comparison to either a reference DNA source^4,5,11,40,41, a reference dataset¹¹ or a reference genome sequence^6,39,42, which has implications for experimental design and interpretation of results⁴³. For example, at present, no standardized 'reference' control DNA has been adopted for laboratory experiments, and in some cases, 'pools' of samples or datasets are used to represent an averaged genome (Table 2). This lack of standard reference genomes can complicate both the designation of relative copy-number differences among samples from different projects and the standardization of databases (Table 1) that contain information about structural variants. Specifically, if in a single experiment it is impossible to distinguish a loss in the test sample from a gain in the reference sample, then two different studies may report the same CNV as a relative gain or loss (duplication or deletion), respectively. Moreover, using pools of DNA or their intensity outputs as hybridization controls or in comparative intensity analysis (Table 2) may lead to a decreased power to detect variants in highly polymorphic regions of the genome. In these regions, the pool will represent an intermediate between the polymorphic and nonpolymorphic states, resulting in smaller relative difference in intensity than a nonpolymorphic single reference would yield. In terms of annotating variants, the relative nature of CNV determination can pose a problem, as it leads to an overestimation of regions with both apparent gains and losses.

Ultimately, the underlying sequence characteristics of any newly identified structural variant will be compared to the human genome reference assembly. The latest release from the US National Center for Biotechnology Information (NCBI), called Build 36, is a mosaic of some 708 different sources¹, and covers mainly the euchromatic portion of the genome, with some 302 known gaps (http://www.ncbi.nlm.nih.gov/). Concomitance of incomplete or falsely merged regions of the reference assembly with the position of structural variants can confound comparisons of one against the other^44,45. Moreover, as many technologies use the NCBI reference sequence to guide product development, structural variants residing in the unannotated segments of the human genome may be missed (Supplementary Fig. 2). Test samples can also be from a mix of untransformed or transformed tissues, all impacting on interpretation^11,46. Finally, samples used to discover structural variants from control populations may have little or no genetic (for example, parent of origin) information or phenotypic assessment protocols attached to them. So, despite common presumptions, any variant described by such studies is not necessarily either neutral or benign.

Database issues. The main sources of information for human structural variation are the Database of Genomic Variants and the Human Structural Variation Database. Both are currently limited, in that variants are simply represented as they are described in publications and overlaid on the current reference assembly, without precise location of most breakpoints. There are some unpublished data at these sites, but so far there is no active effort to standardize CNV calling or characteristics through reexamination of the original primary data. Moreover, as the human reference assembly is updated in subsequent assemblies, sites of apparent structural variation can disappear and reappear, presenting a challenge for database management. Although Ensembl and UCSC Genome Browser display data from the Database of Genomic Variants, there is currently no standard requirement to submit published structural variants to any database. Further, there is no system for naming structural variants with unique accession numbers, and surprisingly, only a proportion of studies post their raw or underlying data, and full method of interpretation, for public access.

There are also many challenges in the layout and visualization of the data. For example, it is current practice to display structural variants using estimates of start- and end-points when the breakpoint(s) are suboptimally resolved. When there are two or more overlapping variants originating from the same study, they are sometimes grouped together even if they are not identical¹¹, and misgrouping can occur, particularly near segmental duplications. Moreover, as the number of surveys continues to grow, the CNVs discovered will become more redundant.

Presenting structural variation data in relation to the reference assembly can also be problematic^1,39 because the standard browsers were not designed to display these data. This issue notwithstanding, smaller variants (usually <10 kb) are present in NCBI's dbSNP, and a goal of the Human Structural Variation Database is to integrate structural variation data, such as fosmid paired-end sequences⁶, with the NCBI human reference sequence (including those regions not represented in the current assembly)²⁶. The Database of Genomic Variants will continue to display structural variation data originating from nondisease-defined samples, but stricter criteria for inclusion, as well as assessment and annotation of the quality standards described below, will become critical aspects of the curatorial process.

Content and quality of early studies of structural variants

To assess current practices in collection and validation of discovery data, we review and comment on 12 experimentally diverse and highly cited studies, each undertaken to search for structural variation in the human genome. In Table 3 and Supplementary Table 2, we summarize selected parameters and the strengths and weaknesses of these studies.

Genomes surveyed and reference samples. The number of genomes investigated with each study ranged from one (in sequence comparisons to reference assemblies^6,39) to 270 (in three studies of the HapMap collection^9,10,11). Appropriate attention was given to samples being from unrelated individuals or from families, and ethnic diversity was usually noted. Tissue sources of DNA were heterogeneous, and whether or not they were transformed or cultured was inconsistently documented. Phenotypic information would generally have been unknown, or assumed to be unremarkable (from 'healthy volunteers'), although Iafrate et al. included samples with known karyotypic abnormalities as controls⁴, and Wong et al. used some material from cancer programs⁴¹. Each study used different reference sample(s) for genome comparison. One used pooled DNA⁴, three compared to the reference human genome assembly^6,39,42, one made a variety of comparisons⁵ and the other CGH approaches each used a different single male reference sample. Future studies will increase the variety of genomes surveyed, and these would benefit from a consensus standard of documented information about their sources. In contrast, a smaller number of reference sequences would facilitate the process of collective documentation.

Primary discovery methods. Table 3 is organized according to the methods used to search for structural variants. The upper portion includes seven studies that employed CGH, each with a different array platform, encompassing a range of probe size, complexity and resolution. One approach^9,40 targeted regions associated with segmental duplications, but the rest spanned the genome, with arrays carrying from 2,000 up to about 26,000 clones in genome tiling-path arrays^11,41. Redon et al.¹¹ added a second complementary screening strategy based on relative fluorescence intensities with arrays designed originally for SNP genotyping. The lower portion of Table 3 summarizes five studies with completely different strategies, based on genomic sequence comparisons. These studies used existing data from either the reference human genome sequence^6,39,42 or the HapMap project^7,10 to mine for deletions and other relatively small structural rearrangements. The fosmid-based approach⁶ and sequence comparison³⁹ were able to discern orientational as well as quantitative variants.

Experimental quality controls. Before structural variants can be revealed by genome comparisons, positive data arising from other biological or technical causes need to be filtered. Biological differences that were variously accounted for among these studies include (i) male-female X and Y chromosome dosage differences^9,11,40, (ii) somatic rearrangements of the immunoglobulin genes^5,11, (iii) cell-culture artifacts such as mosaic trisomies⁴⁶ and (iv) results of genomic instability of virus-transformed cell lines¹¹. Similarly, any variation relative to a reference human genome sequence in the computational approaches must be interpreted in light of the known gaps and potential assembly artifacts^1,6,39.

As these screening strategies are themselves biological, with associated technical artifacts, replication is the most important experimental tool for assessing the validity of observations, and it took many forms among these studies. Within each CGH array, clones were typically in duplicate or triplicate. Interexperimental replication involved ostensibly the same conditions and/or an experimental alternate, such as 'dye-swap' of the two fluorochrome labels between the test and reference samples. The means of dealing with discordant replicates was inconsistent among the studies, and sometimes difficult to discern from the publications. In most studies^4,9,11,40, discordant dye-swap results were eliminated, but in Wong et al.⁴¹, only 20% of samples were assayed in both orientations. Within each study, experiments also showed variable background 'noise', and some studies repeated and/or deleted individual assays that did not meet a defined quality threshold. When sources of 'noise' are nonrandom, replication alone will reproducibly yield false positive calls, which argues for replication by diverse methods.

Other controls showed the effectiveness of the respective screening methods. Self-versus-self hybridization was used^4,5,9,40 to estimate somatic effects and/or numbers of false positive calls. Two studies assayed samples with previously characterized imbalances^4,40. Sharp et al.⁴⁰ showed the enhanced (11-fold) effectiveness of their targeted 'hot spot' array relative to a genome-wide assay. Redon et al.¹¹ evaluated concordance between their two primary platforms and undertook numerous technical replicates.

Each study defined its own algorithm for 'calling' differences between sample and reference as putative structural variants. As for all screening assays, they were driven to optimize both sensitivity and specificity of the ascertainment, but approaches to this balance differed. Redon et al.¹¹ set parameters in their algorithm to allow fewer than 5% false positive 'calls' per experiment. Other studies set thresholds and assessed numbers of false positives retrospectively. Some reported these type I errors in relation to the number of clones in the array^4,40,41 and others relative to the proportion of positive calls^5,7, prohibiting a direct comparison of specificity among the various studies. Sensitivity was harder to assess, and arguably impossible without knowledge of the true (or at least gold standard–based data) underlying numbers of structural variants. Estimates ranged from 5% false negatives⁹ to 50% power to detect 25-kb deletions⁷, but sensitivity was generally compromised in favor of specificity.

Structural variants identified. Assay design had a strong impact on the type and size of structural variants detected (Fig. 1, Supplementary Fig. 2 and Table 2). All revealed quantitative variation (gains or losses), but three recognized only deletions^7,8,10, and two could also detect evidence of inversions^6,39. Sizes of variant segments could be as small as 1 bp with computational alignments^39,42 (though many of these were smaller than our defining size threshold of 1 kb¹). Small deletions were detected through haploid hybridization (70 bp–10 kb)⁸ or oligonucleotide (SNP) footprints (1–404 kb)⁷ (1–745 kb)¹⁰, and the fosmid approach revealed variants in the range of library inserts (40 kb)⁶. Array methods approached the larger end of the spectrum for CNVs (collectively, about 50 kb–1 Mb)^{4,5,9,11,40,41}. BAC clone probes tend to initially overestimate the apparent size of variants, as the clones may be large relative to the variant segment(s) they harbor, and the more sensitive the platform, the greater the overestimation^11,47. Oligonucleotide arrays, on the other hand, approach the boundaries of variable segments from within, and should provide more accurate size estimates as long as the region has sufficient probe density.

The architecture of a variant region can influence its apparent size. Independently discrete genomic segments whose borders overlap can form a variable region characterized as much larger than its component variants, or containing complex rearrangements of smaller independently variable elements (Supplementary Figs. 1 and 3). As a result, the basis for definitions of overlap, variants, variant regions, merged variants, locations and so forth have been discretionary and varied. The field is probably ready for functional consensus in this area.

The earliest surveys reported about 100 variants or regions^4,5; more recently, Wong et al. reported a disproportionate 3,654 CNVs, from which only 800 were considered 'high frequency' and more likely to be true positives⁴¹. Sequence comparisons flagged many more thousands of sites^39,42, albeit ones that were much smaller and often reflected sequence assembly artifacts. Each of the 12 studies in Table 3 added a majority of apparently new variant loci, though as the catalog of genomic structural variants accumulates, the number of such new additions will eventually plateau.

Validation of putative structural variants. We reemphasize that the discovery strategies in Table 3 are screening tests, which draw attention to genome segments with an increased probability of harboring true structural variation. Eventually, comprehensive sequence data will document the breadth and detail of each variable region and individual variant, as illustrated by fosmid insert sequence data⁶ and direct sequence assembly comparisons³⁹. In the meantime, various validation strategies have been applied to subsets of putative variants in each of the discovery reports. These included (i) FISH of metaphase, interphase or fiber chromosomes using various clones or PCR-amplified molecules; (ii) PCR or quantitative PCR (qPCR) for allele loss or quantitative variation; (iii) multiple ascertainment, whereby considerable weight was given to whether or not a putative variant was seen in more than one individual or had been reported in previous studies; (iv) array CGH to validate computational screening results^6,7 or for finer resolution of BAC-screening results by oligonucleotide arrays^9,41; (v) sequence analysis of fosmid inserts to confirm calls and to assess some discordant ones^6,9; (vi) allele-specific fluorescence intensities¹⁰ and (vii) familial clustering⁴¹.

These assays were variously applied to subsets of data, and outcomes were used effectively in some studies^7,10,11 to further evaluate the sensitivity and specificity and/or error rates of the primary screening methods. The proportion of putative variant loci that have been individually validated by means other than multiple ascertainments remains small, presumably due to the technical challenges of the confirmatory tests. All studies provided some information about the frequency of each putative structural variant or region, both as an argument for validation and to characterize the findings. A growing consensus in the field is for more validation of variants using two or more technologies.

Recommendations for standards

Based on our enumeration of the challenges facing this new field and a thorough review of published experimental designs, we provide four broad guidelines that follow the natural progression of experimentation as an initial step toward the development of standards. As the field matures, these guidelines should serve as precursors to stricter standards that undergo regular and comprehensive vetting by the community⁴⁸. We are struck by the resemblance to issues raised by the MIAME (minimum information about a microarray experiment) standards⁴⁹, as well as by Lander and Kruglyak⁵⁰, with recommendations to find the right balance of stringency and value judgment to avoid as much error as possible without delaying discovery. The latter paper's recommendations for modifiers (suggestive, significant, highly significant and confirmed) might well be adapted for the statistical annotation of structural variants in databases.

In their current form, the recommended standards could also serve as a checklist for reviewers and editors as they assess manuscripts that report structural variation data. Moreover, as more structural variation data are reported and the nature of the variants becomes better understood, curators of databases would be at greater liberty to accept or reject complete or partial datasets according to established quality thresholds.

1. Describing the sample. The study should report the origin of each sample (for example, new or from a repository) and all of its characteristics, including the source (for example, blood, cell line, tissue) and karyotypic status, as well as the age, sex, ethnicity and phenotype (disease or nondisease features) of the donor. For surveys aiming to capture structural variation from the general population for control databases, there should be particular emphasis on detailing the extent of phenotype investigation. The study should also accurately document the genetic relationship of samples and any manipulation of the samples such as cell-culturing conditions or whole genome amplification, including protocols for extracting and labeling samples. Previous publications using the sample and all associated aliases should be listed.

2. Reporting experiments. Upon publication, the researchers must declare all aspects of the experimental design and results, including the experimental platform (for example, all clone or sequence identifiers used in arrays), technical procedures, data extraction and processing protocols, the version of the reference genome sequence used for comparison or annotation, and all validation results. The information must be made available in a format that enables unambiguous interpretation, replication of the experiment and the opportunity for other researchers to reanalyze the data to verify the conclusions^48,49. For example, many array CGH experiments are performed using different test and reference samples, a variable number of spot replicates and differential use of dye-swap replicates. These methodological details affect the interpretation of the data and inferences regarding the presence or absence of a particular structural variant. Most existing new structural variation data are being generated using microarrays; therefore, suitable repositories include the Gene Expression Omnibus (GEO)⁵¹, ArrayExpress⁵² and CIBEX⁵³ databases. As more sequence data emerge in structural-variation discovery initiatives, it is important that the underlying sequences and traces be made publicly available. Similarly, methodological differences exist in alignment algorithms; in addition to simple lists of sequence differences between assemblies or traces, the underlying alignments from which these events were called should be available.

3. Quality control. All studies should apply stringent criteria to ensure an accurate empirical estimation of the performance of the detection protocol used. Ideally, the parameters of the detection should be calibrated using a limited set of test data to achieve an acceptable level of false positive among the regions that are called. There are several metrics for this estimation, for example, the false discovery rate⁵⁴. Parameters should be set to maximize screening specificity (minimize false positive calls) without undue compromise to sensitivity. To simplify this process, we recommend that all studies include at least one (and preferably more) standard control sample to be used as a reference for comparison. Initially, we propose sample NA15510 from the US National Institute of General Medical Sciences (NIGMS) Human Genetic Cell Repository, as it has already been characterized using a number of platforms (Table 2), and is also now being sequenced. A second reference sample could be NA10851, as it has also been characterized extensively¹¹.

In addition to calibrating the parameters used for CNV calling, the quality of the total set of variants called across the entire sample set should be assessed. This requires unbiased sampling of the putative variants to be validated: that is, not just assessing those called most frequently, but ensuring representation of the entire frequency distribution. Good examples from the different experimental approaches outlined in Table 3 include validation of singleton and nonsingleton error rates¹¹, estimation of fosmid read-pair error rates by sequencing the fosmid⁶ and estimation of error rates using a secondary technology such as oligonucleotide arrays⁷. It should no longer be considered sufficient to estimate the error rates by extrapolating from self-self experiments, without confirming that the estimated error rates were indeed correct and investigating how individual experimental error rates translate into study-wide error rates.

4. Describing structural variants. The study should thoroughly report characteristics of the structural variants, including sequence content (start and end points or complete sequence content with appropriate annotation), and population frequency and distribution (if known), including samples and assays used to determine these parameters. A future challenge will be to develop standards for defining CNV regions (CNVRs)—merging data from different individuals and different surveys into a single set of CNVRs. The ideal situation would be that each 'called' CNVR has an audit trail of both the experimental data and the processing of the data to the final call. Robust documentation of standardized CNVRs in databases will require specific rules to be established, and although their description is beyond the scope of this Perspective, the writing of it will stimulate future discussion. For CNVs and CNVRs, the definitions and criteria used by Redon et al.¹¹ offer a good framework to build on (also see Supplementary Fig. 1). The current limitations in breakpoint resolution make it difficult to assign specific accession numbers to CNVs. However, once structural variants are described with boundaries mapped at nucleotide resolution, identifiers should be assigned using a nomenclature similar to that currently used for SNPs.

Summary and the future

Many of the issues confronting the field of structural variation will be resolved as advances in technology allow robust and economical analysis of structural variants at the nucleotide level in multiple genomes. Such techniques will include 'tiling path'-coverage oligonucleotide arrays, paired-end sequence relationship comparisons, and partial or complete sequence assembly comparisons. The ultimate standard will be sequence resolution of all structural variation in a defined set of reference individuals to establish a benchmark for genotyping platforms. We do not foresee that any one approach will capture all genetic variation reliably, nor, for at least a few more years, will a single strategy predominate over microarray-based approaches. Therefore, the main challenges from this point onward will surely include managing a huge data volume, integrating information from various discovery platforms and discerning phenotypic implications. New issues will arise, such as how to best annotate structural variation data in individual diploid genome assemblies (arising from personalized sequencing projects), as well as how to put haplotypes of structural variants (with or without SNPs) into context with respect to the latest human reference sequence. Structural variation data should also assist SNP, linkage disequilibrium and gene expression determination, but new database tools will be required to fully interpret the data.

Structural variation discoveries offer the potential to bridge a longstanding gap between cytogenetic and sequence-based investigations, and unify our understanding of genetic variation. Interestingly, at the onset of writing, we tried to sidestep the topic of terminology (and nomenclature), but kept returning to it in some way or another as we worked to define and distill the breadth of issues before us. In fact, it was the issue of terminology that highlighted the extreme heterogeneity in data being published, with the related strengths, caveats and differences in the studies being attributable in part to the different backgrounds of the researchers involved.

An equally intricate issue for data integration in the future will be categorizing structural variants in terms of whether they are 'normal', 'disease-causing' or 'phenotype-associated', as these designations can be part of a continuous range^1,24,55,56. In Table 4, we put forward ideas of annotation modifiers that will assist in maximizing the utility of structural variation information. Molecular cytogeneticists have always been faced with this dilemma and its particular implications in the prenatal or diagnostic setting. Now, with the ability to readily recognize submicroscopic and sequence-level variation, the question of how to differentiate benign and disease-associated structural changes will be increasingly important. There are already well defined examples in which the presence of a structural variant correlates directly with a syndrome or phenotype, such as the many dosage-related microdeletions and duplications that cause genomic disorders^{57,58,59,60,61,62,63} (also see the DECIPHER database). Family-based studies can demonstrate whether a change is de novo or has been inherited and, in the latter case, whether there are likely to be associated phenotypic consequences (noting there are numerous examples of variable expression of phenotype and disease in inherited chromosomal rearrangements)^1,21,55. Otherwise, large population studies and control and disease reference databases will provide the best source of information about a structural variant's frequency and likelihood of causing a phenotypic outcome.

Table 4 Classification of modifiers used for the description of structural variation^a

Full size table

Notwithstanding the challenges, we believe that the recommendations presented here offer necessary first steps toward standardization of many of the variables that, if ignored, will impede progress. At the same time, we recognize that consensus is important, and that standards require time to mature before adoption and implementation⁴⁸. With some ground rules now set, it is also our intention to continue discussions with the genomic structural variation research community at the most relevant meeting opportunities.

Note: Supplementary information is available on the Nature Genetics website.

Box 1: Terminology

Terms that are part of the current vocabulary for structural variation are in bold type below, set into the context of some key definitions and related comments.

Structural variant. Structural variant is the umbrella term to encompass a group of genomic alterations involving segments of DNA typically larger than 1 kb, and which can be microscopic or submicroscopic¹. We use the term as a neutral descriptor with nothing implied about frequency, association with disease or phenotype, or lack thereof. This definition of size, though perhaps somewhat arbitrary, was undertaken to accommodate this significant class of variation that spans the gap between small variants (such as variable number of tandem repeats (VNTRs)) detected with molecular genetic assays and those recognized microscopically on karyotypes. The structural variation may be quantitative (copy number variants comprising deletions, insertions and duplications) and/or positional (translocations) or orientational (inversions).

Copy number variation/variant (CNV). We use these terms to refer to a DNA segment of at least 1 kb in size, for which copy number differences have been observed in the comparison of two or more genomes. Without further annotation, CNV carries no implication of relative frequency or phenotypic effect. These quantitative variants can be genomic copy number gains (insertions or duplications) or losses (deletions or null genotypes) relative to a designated reference genome sequence. A copy number polymorphism (CNP) is a CNV that occurs in more than 1% of the population.

CNV locus or CNV region (CNVR). Merging of independently ascertained, but overlapping, genomic segments creates the representation of a CNV locus (that is, a segment at a fixed chromosomal position); the accumulation of data gradually will reveal the true underlying structure of the variant segment. In some cases, this will be a discrete cassette of DNA; in others, it will be a multiplex arrangement of variant units in close proximity, forming a CNV region¹¹. A given variable segment can be detected with multiple clones in a single array or by different arrays in different studies, and its borders gradually fine-tuned with targeted assays. By their very nature, these segments may have different forms among the individuals used for their discovery.

Insertion/deletion ('indel'). Indel is a collective abbreviation to describe relative gain or loss of a segment of one or more nucleotides in a genomic sequence. It allows the designation of a difference between genomes in situations where the direction of sequence change cannot be inferred: for example, when a reference or ancestral sequence has not been defined. It has typically been used to denote relatively small-scale variants (particularly those smaller than 1 kb); however, we do not propose any size restriction for its use.

Segmental duplication (also called low-copy repeat (LCR) or duplicon). A segment of DNA >1 kb in size that occurs in two or more copies per haploid genome, with the different copies sharing >90% sequence identity^44,64,65. These segments can also be CNVs. The duplicated blocks predispose to nonallelic homologous recombination.

Human genome reference assembly. The standard reference DNA sequence (or assembly) of the human genome⁶⁶ that is regularly curated (successive updates named 'builds'). The assembly is derived mostly (>60%) of DNA from a bacterial artificial chromosome (BAC) library made from a single donor, with the rest of the sequence originating from a mosaic of other sources. The current assembly covers most of the euchromatic regions of the human genome, but there are still some gaps remaining, and many of these co-locate with segmental duplications and/or CNVs.

Aneuploidy, aneusomy and heteromorphism. These terms have origins in classical cytogenetics and describe structural variants at the largest end of the scale. Aneuploidy is the state of having an abnormal number of chromosomes. Similarly, segmental aneusomy, in reference to a portion of a chromosome, implies abnormality. Heteromorphism (literally, 'different form') has come to imply normal variation, or an atypical chromosome form not associated with an abnormal phenotype. Such large-scale variants are often the basis for dysfunction owing to dosage imbalance (such as for segmental aneusomy syndromes⁶⁷), but may also be part of normal functional variation.

Minor-allele frequency versus altered copy-number frequency. The minor allele is the less common allele at a polymorphic locus. The use of this term is complicated when a locus is multiallelic. Locke et al.⁹ proposed use of altered copy-number frequency because measurements of copy number are on diploid samples and screening methods do not necessarily distinguish the two independent alleles. Redon et al.¹¹ adopted the convention of assuming that the minor allele is the derived allele; thus, deletions have a minor allele of lower copy number and duplications have a minor allele of higher copy number.

Nonmendelian inheritance (also called mendelian incompatibilities or mendelian inconsistencies). These terms refers to transmission from parent(s) to offspring in a manner that does not conform to expectations of classical allelic segregation. (Avoid the term 'mendelian errors'.) Evidence in family studies ('trios' in the HapMap data) of apparent nonmendelian inheritance for a genomic segment indicates that copy number variation may be involved^7,10.

References

Feuk, L., Carson, A.R. & Scherer, S.W. Structural variation in the human genome. Nat. Rev. Genet. 7, 85–97 (2006).
Article CAS Google Scholar
Freeman, J.L. et al. Copy number variation: new insights in genome diversity. Genome Res. 16, 949–961 (2006).
Article CAS Google Scholar
Sharp, A.J., Cheng, Z. & Eichler, E.E. Structural variation of the human genome. Annu. Rev. Genomics Hum. Genet. 7, 407–442 (2006).
Article CAS Google Scholar
Iafrate, A.J. et al. Detection of large-scale variation in the human genome. Nat. Genet. 36, 949–951 (2004).
Article CAS Google Scholar
Sebat, J. et al. Large-scale copy number polymorphism in the human genome. Science 305, 525–528 (2004).
Article CAS Google Scholar
Tuzun, E. et al. Fine-scale structural variation of the human genome. Nat. Genet. 37, 727–732 (2005).
Article CAS Google Scholar
Conrad, D.F., Andrews, T.D., Carter, N.P., Hurles, M.E. & Pritchard, J.K. A high-resolution survey of deletion polymorphism in the human genome. Nat. Genet. 38, 75–81 (2006).
Article CAS Google Scholar
Hinds, D.A., Kloek, A.P., Jen, M., Chen, X. & Frazer, K.A. Common deletions and SNPs are in linkage disequilibrium in the human genome. Nat. Genet. 38, 82–85 (2006).
Article CAS Google Scholar
Locke, D.P. et al. Linkage disequilibrium and heritability of copy-number polymorphisms within duplicated regions of the human genome. Am. J. Hum. Genet. 79, 275–290 (2006).
Article CAS Google Scholar
McCarroll, S.A. et al. Common deletion polymorphisms in the human genome. Nat. Genet. 38, 86–92 (2006).
Article CAS Google Scholar
Redon, R. et al. Global variation in copy number in the human genome. Nature 444, 444–454 (2006).
Article CAS Google Scholar
Simon-Sanchez, J. et al. Genome-wide SNP assay reveals structural genomic variation, extended homozygosity and cell-line induced alterations in normal individuals. Hum. Mol. Genet. 16, 1–14 (2007).
Article CAS Google Scholar
Vissers, L.E. et al. Array-based comparative genomic hybridization for the genomewide detection of submicroscopic chromosomal abnormalities. Am. J. Hum. Genet. 73, 1261–1270 (2003).
Article CAS Google Scholar
Locke, D.P. et al. BAC microarray analysis of 15q11-q13 rearrangements and the impact of segmental duplications. J. Med. Genet. 41, 175–182 (2004).
Article CAS Google Scholar
Shaw-Smith, C. et al. Microarray based comparative genomic hybridisation (array-CGH) detects submicroscopic chromosomal deletions and duplications in patients with learning disability/mental retardation and dysmorphic features. J. Med. Genet. 41, 241–248 (2004).
Article CAS Google Scholar
de Vries, B.B. et al. Diagnostic genome profiling in mental retardation. Am. J. Hum. Genet. 77, 606–616 (2005).
Article CAS Google Scholar
Koolen, D.A. et al. A new chromosome 17q21.31 microdeletion syndrome associated with a common inversion polymorphism. Nat. Genet. 38, 999–1001 (2006).
Article CAS Google Scholar
Sharp, A.J. et al. Discovery of previously unidentified genomic disorders from the duplication architecture of the human genome. Nat. Genet. 38, 1038–1042 (2006).
Article CAS Google Scholar
Shaw-Smith, C. et al. Microdeletion encompassing MAPT at chromosome 17q21.3 is associated with developmental delay and learning disability. Nat. Genet. 38, 1032–1037 (2006).
Article CAS Google Scholar
Urban, A.E. et al. High-resolution mapping of DNA copy alterations in human chromosome 22 using high-density tiling oligonucleotide arrays. Proc. Natl. Acad. Sci. USA 103, 4534–4539 (2006).
Article CAS Google Scholar
Szatmari, P. et al. Mapping autism risk loci using genetic linkage and chromosomal rearrangements. Nat. Genet. 39, 319–328 (2007).
Article CAS Google Scholar
Zhang, J., Feuk, L., Duggan, G.E., Khaja, R. & Scherer, S.W. Development of bioinformatics resources for display and analysis of copy number and other structural variants in the human genome. Cytogenet. Genome Res. 115, 205–214 (2006).
Article CAS Google Scholar
Cooper, G.M., Nickerson, D.A. & Eichler, E.E. Mutational and selective effects on copy-number variants in the human genome. Nat. Genet. 39, S22–S29 (2007).
Article CAS Google Scholar
Lee, C., Iafrate, A.J. & Brothman, A.R. Copy number variations and clinical cytogenetic diagnosis of constitutional disorders. Nat. Genet. 39, S48–S54 (2007).
Article CAS Google Scholar
McCarroll, S.A. & Altshuler, D.M. Copy-number variation and association studies of human disease. Nat. Genet. 39, S37–S42 (2007).
Article CAS Google Scholar
Eichler, E.E. et al. Completing the map of human genetic variation. Nature 447, 161–165 (2007).
Article CAS Google Scholar
Shendure, J., Mitra, R.D., Varma, C. & Church, G.M. Advanced sequencing technologies: methods and goals. Nat. Rev. Genet. 5, 335–344 (2004).
Article CAS Google Scholar
Bennett, S.T., Barnes, C., Cox, A., Davies, L. & Brown, C. Toward the $1,000 human genome. Pharmacogenomics 6, 373–382 (2005).
Article CAS Google Scholar
Bentley, D.R. Whole-genome re-sequencing. Curr. Opin. Genet. Dev. 16, 545–552 (2006).
Article CAS Google Scholar
Service, R.F. Gene sequencing. The race for the $1000 genome. Science 311, 1544–1546 (2006).
Article CAS Google Scholar
Altshuler, D. et al. An SNP map of the human genome generated by reduced representation shotgun sequencing. Nature 407, 513–516 (2000).
Article CAS Google Scholar
Mullikin, J.C. et al. An SNP map of human chromosome 22. Nature 407, 516–520 (2000).
Article CAS Google Scholar
Sachidanandam, R. et al. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409, 928–933 (2001).
Article CAS Google Scholar
Report of the Standing Committee on Human Cytogenetic Nomenclature, ISCN 1985. An International System for Human Cytogenetic Nomenclature. Birth Defects Orig. Artic. Ser. 21, 1–117 (1985).
Heim, S. Genetic nomenclature: ISCN and ISGN. Pediatr. Hematol. Oncol. 13, iii (1996).
Article CAS Google Scholar
den Dunnen, J.T. & Antonarakis, S.E. Mutation nomenclature extensions and suggestions to describe complex mutations: a discussion. Hum. Mutat. 15, 7–12 (2000).
Article CAS Google Scholar
Eichler, E.E. Widening the spectrum of human genetic variation. Nat. Genet. 38, 9–11 (2006).
Article CAS Google Scholar
Istrail, S. et al. Whole-genome shotgun assembly and comparison of human genome assemblies. Proc. Natl. Acad. Sci. USA 101, 1916–1921 (2004).
Article CAS Google Scholar
Khaja, R. et al. Genome assembly comparison identifies structural variants in the human genome. Nat. Genet. 38, 1413–1418 (2006).
Article CAS Google Scholar
Sharp, A.J. et al. Segmental duplications and copy-number variation in the human genome. Am. J. Hum. Genet. 77, 78–88 (2005).
Article CAS Google Scholar
Wong, K.K. et al. A comprehensive analysis of common copy-number variations in the human genome. Am. J. Hum. Genet. 80, 91–104 (2007).
Article CAS Google Scholar
Mills, R.E. et al. An initial map of insertion and deletion (INDEL) variation in the human genome. Genome Res. 16, 1182–1190 (2006).
Article CAS Google Scholar
Carter, N.P. Methods and strategies for analyzing copy number variation using DNA microarrays. Nat. Genet. 39, S16–S21 (2007).
Article CAS Google Scholar
Cheung, J. et al. Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence. Genome Biol. 4, R25 (2003).
Article Google Scholar
Bailey, J.A. & Eichler, E.E. Primate segmental duplications: crucibles of evolution, diversity and disease. Nat. Rev. Genet. 7, 552–564 (2006).
Article CAS Google Scholar
Risin, S., Hopwood, V.L. & Pathak, S. Trisomy 12 in Epstein-Barr virus-transformed lymphoblastoid cell lines of normal individuals and patients with nonhematologic malignancies. Cancer Genet. Cytogenet. 60, 164–169 (1992).
Article CAS Google Scholar
Carson, A.R., Feuk, L., Mohammed, M. & Scherer, S.W. Strategies for the detection of copy number and other structural variants in the human genome. Hum. Genomics 2, 403–414 (2006).
Article CAS Google Scholar
Burgoon, L.D. The need for standards, not guidelines, in biological data reporting and sharing. Nat. Biotechnol. 24, 1369–1373 (2006).
Article CAS Google Scholar
Brazma, A. et al. Minimum information about a microarray experiment (MIAME)—toward standards for microarray data. Nat. Genet. 29, 365–371 (2001).
Article CAS Google Scholar
Lander, E. & Kruglyak, L. Genetic dissection of complex traits: guidelines for interpreting and reporting linkage results. Nat. Genet. 11, 241–247 (1995).
Article CAS Google Scholar
Barrett, T. et al. NCBI GEO: mining tens of millions of expression profiles–database and tools update. Nucleic Acids Res. 35, D760–D765 (2007).
Article CAS Google Scholar
Parkinson, H. et al. ArrayExpress–a public database of microarray experiments and gene expression profiles. Nucleic Acids Res. 35, D747–D750 (2007).
Article CAS Google Scholar
Ikeo, K., Ishi-i, J., Tamura, T., Gojobori, T. & Tateno, Y. CIBEX: center for information biology gene expression database. C. R. Biol. 326, 1079–1082 (2003).
Article CAS Google Scholar
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B (Methodological) 57, 289–300 (1995).
Google Scholar
Feuk, L., Marshall, C.R., Wintle, R.F. & Scherer, S.W. Structural variants: changing the landscape of chromosomes and design of disease studies. Hum. Mol. Genet. 15 (special no. 1), R57–R66 (2006).
Article CAS Google Scholar
Lee, J.A. & Lupski, J.R. Genomic rearrangements and gene copy-number alterations as a cause of nervous system disorders. Neuron 52, 103–121 (2006).
Article CAS Google Scholar
Lupski, J.R. et al. DNA duplication associated with Charcot-Marie-Tooth disease type 1A. Cell 66, 219–232 (1991).
Article CAS Google Scholar
Ewart, A.K. et al. Hemizygosity at the elastin locus in a developmental disorder, Williams syndrome. Nat. Genet. 5, 11–16 (1993).
Article CAS Google Scholar
Chance, P.F. et al. Two autosomal dominant neuropathies result from reciprocal DNA duplication/deletion of a region on chromosome 17. Hum. Mol. Genet. 3, 223–228 (1994).
Article CAS Google Scholar
Chen, K.S. et al. Homologous recombination of a flanking repeat gene cluster is a mechanism for a common contiguous gene deletion syndrome. Nat. Genet. 17, 154–163 (1997).
Article CAS Google Scholar
Small, K., Iber, J. & Warren, S.T. Emerin deletion reveals a common X-chromosome inversion mediated by inverted repeats. Nat. Genet. 16, 96–99 (1997).
Article CAS Google Scholar
Potocki, L. et al. Molecular mechanism for duplication 17p11.2— the homologous recombination reciprocal of the Smith-Magenis microdeletion. Nat. Genet. 24, 84–87 (2000).
Article CAS Google Scholar
Kurotaki, N. et al. Haploinsufficiency of NSD1 causes Sotos syndrome. Nat. Genet. 30, 365–366 (2002).
Article CAS Google Scholar
Bailey, J.A., Yavor, A.M., Massa, H.F., Trask, B.J. & Eichler, E.E. Segmental duplications: organization and impact within the current human genome project assembly. Genome Res. 11, 1005–1017 (2001).
Article CAS Google Scholar
Bailey, J.A. et al. Recent segmental duplications in the human genome. Science 297, 1003–1007 (2002).
Article CAS Google Scholar
International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature 431, 931–945 (2004).
Budarf, M.L. & Emanuel, B.S. Progress in the autosomal segmental aneusomy syndromes (SASs): single or multi-locus disorders? Hum. Mol. Genet. 6, 1657–1665 (1997).
Article CAS Google Scholar
Fiegler, H. et al. Accurate and reliable high-throughput detection of copy number variation in the human genome. Genome Res. 16, 1566–1574 (2006).
Article CAS Google Scholar
Komura, D. et al. Genome-wide detection of human copy number variations using high-density DNA oligonucleotide arrays. Genome Res. 16, 1575–1584 (2006).
Article CAS Google Scholar
Lin, M. et al. dChipSNP: significance curve and clustering of SNP-array-based loss-of-heterozygosity data. Bioinformatics 20, 1233–1240 (2004).
Article CAS Google Scholar
Nannya, Y. et al. A robust algorithm for copy number detection using high-density oligonucleotide single nucleotide polymorphism genotyping arrays. Cancer Res. 65, 6071–6079 (2005).
Article CAS Google Scholar
Colella, S. et al. QuantiSNP: an objective Bayes hidden-Markov model to detect and accurately map copy number variation using SNP genotyping data. Nucleic Acids Res. 35, 2013–2025 (2007).
Article CAS Google Scholar
Conrad, D.F. & Hurles, M.E. The population genetics of structural variation. Nat. Genet. 39, S30–S36 (2007).
Article CAS Google Scholar

Download references

Acknowledgements

We thank Dr. Janet Buchanan for assistance in manuscript preparation and D. Pinto, C. Marshall, R. Redon, I. Ragoussis and A. Carson for sharing ideas and unpublished data. The work is supported by Genome Canada/Ontario Genomics Institute, The Centre for Applied Genomics, the Canadian Institutes of Health Research (CIHR), the McLaughlin Centre for Molecular Medicine, the Canadian Institute of Advanced Research and the Hospital for Sick Children Foundation. M.E.H. and N.P.C. are supported by the Wellcome Trust. L.F. is supported by CIHR and S.W.S. is an Investigator of CIHR and holds the GlaxoSmithKline/CIHR Pathfinder Chair in Genetics and Genomics at the Hospital for Sick Children and the University of Toronto.

Author information

Authors and Affiliations

Stephen W. Scherer and Lars Feuk are at The Centre for Applied Genomics and Program in Genetics and Genomic Biology, The Hospital for Sick Children, 14th Floor, Toronto Medical Discovery Tower, MaRS Discovery District, 101 College Street, Room 14-701, Ontario M5G 1L7, Canada.,
Stephen W Scherer & Lars Feuk
Stephen W. Scherer is in the Department of Molecular and Medical Genetics, University of Toronto, Toronto, Ontario M5G 1L7, Canada. steve@genet.sickkids.on.ca,
Stephen W Scherer
Charles Lee is in the Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts 02115, USA.,
Charles Lee
Ewan Birney is at the European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.,
Ewan Birney
Nigel P. Carter and Matthew E. Hurles are at the Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.,
Nigel P Carter & Matthew E Hurles
David M. Altshuler is in the Program in Medical and Population Genetics, Broad Institute of Harvard University and the Massachusetts Institute of Technology, 7 Cambridge Center, Cambridge, Massachusetts 02142, USA.,
David M Altshuler
Evan E. Eichler is in the Department of Genome Sciences and Howard Hughes Medical Institute, University of Washington School of Medicine, Seattle, Washington 98195, USA.,
Evan E Eichler

Authors

Stephen W Scherer
View author publications
You can also search for this author in PubMed Google Scholar
Charles Lee
View author publications
You can also search for this author in PubMed Google Scholar
Ewan Birney
View author publications
You can also search for this author in PubMed Google Scholar
David M Altshuler
View author publications
You can also search for this author in PubMed Google Scholar
Evan E Eichler
View author publications
You can also search for this author in PubMed Google Scholar
Nigel P Carter
View author publications
You can also search for this author in PubMed Google Scholar
Matthew E Hurles
View author publications
You can also search for this author in PubMed Google Scholar
Lars Feuk
View author publications
You can also search for this author in PubMed Google Scholar

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Scherer, S., Lee, C., Birney, E. et al. Challenges and standards in integrating surveys of structural variation. Nat Genet 39 (Suppl 7), S7–S15 (2007). https://doi.org/10.1038/ng2093

Download citation

Published: 27 June 2007
Issue Date: July 2007
DOI: https://doi.org/10.1038/ng2093

This article is cited by

Toward cis-regulation in soybean: a 3D genome scope
- Lingbin Ni
- Zhixi Tian
Molecular Breeding (2023)
Understanding Mendelian errors in SNP arrays data using a Gochu Asturcelta pig pedigree: genomic alterations, family size and calling errors
- Katherine D. Arias
- Isabel Álvarez
- Félix Goyache
Scientific Reports (2022)
Contribution of introns to the species diversity associated with the apicomplexan parasite, Neospora caninum
- Larissa Calarco
- John Ellis
Parasitology Research (2020)
Probe-based association analysis identifies several deletions associated with average daily gain in beef cattle
- Lingyang Xu
- Liu Yang
- Junya Li
BMC Genomics (2019)
Diversity and regulatory impact of copy number variation in the primate Macaca fascicularis
- Andreas R. Gschwind
- Anjali Singh
- Tobias Heckel
BMC Genomics (2017)

Challenges and standards in integrating surveys of structural variation

Abstract

Main

Box 1: Terminology

References

Acknowledgements

Author information

Authors and Affiliations

Ethics declarations

Competing interests

Supplementary information

Supplementary Fig. 1

Supplementary Fig. 2

Supplementary Fig. 3

Supplementary Table 1

Supplementary Table 2

Rights and permissions

About this article

Cite this article

This article is cited by

Toward cis-regulation in soybean: a 3D genome scope

Understanding Mendelian errors in SNP arrays data using a Gochu Asturcelta pig pedigree: genomic alterations, family size and calling errors

Contribution of introns to the species diversity associated with the apicomplexan parasite, Neospora caninum

Probe-based association analysis identifies several deletions associated with average daily gain in beef cattle

Diversity and regulatory impact of copy number variation in the primate Macaca fascicularis

Search

Quick links

Abstract

Main

References

Acknowledgements

Author information

Authors and Affiliations

Ethics declarations

Competing interests

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links