Abstract
We report the 207-Mb genome sequence of the North American Arabidopsis lyrata strain MN47 based on 8.3× dideoxy sequence coverage. We predict 32,670 genes in this outcrossing species compared to the 27,025 genes in the selfing species Arabidopsis thaliana. The much smaller 125-Mb genome of A. thaliana, which diverged from A. lyrata 10 million years ago, likely constitutes the derived state for the family. We found evidence for DNA loss from large-scale rearrangements, but most of the difference in genome size can be attributed to hundreds of thousands of small deletions, mostly in noncoding DNA and transposons. Analysis of deletions and insertions still segregating in A. thaliana indicates that the process of DNA loss is ongoing, suggesting pervasive selection for a smaller genome. The high-quality reference genome sequence for A. lyrata will be an important resource for functional, evolutionary and ecological studies in the genus Arabidopsis.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Greilhuber, J. et al. Smallest angiosperm genomes found in Lentibulariaceae, with chromosomes of bacterial size. Plant Biol. 8, 770–777 (2006).
Gregory, T.R. et al. Eukaryotic genome size databases. Nucleic Acids Res. 35, D332–D338 (2007).
Gaut, B.S. & Ross-Ibarra, J. Selection on major components of angiosperm genomes. Science 320, 484–486 (2008).
Pellicer, J., Fay, M.F. & Leitch, I.J. The largest eukaryotic genome of them all? Bot. J. Linn. Soc. 164, 10–15 (2010).
Bennetzen, J.L., Ma, J. & Devos, K.M. Mechanisms of recent genome size variation in flowering plants. Ann. Bot. 95, 127–132 (2005).
Hawkins, J.S., Proulx, S.R., Rapp, R.A. & Wendel, J.F. Rapid DNA loss as a counterbalance to genome expansion through retrotransposon proliferation in plants. Proc. Natl. Acad. Sci. USA 106, 17811–17816 (2009).
Piegu, B. et al. Doubling genome size without polyploidization: dynamics of retrotransposition-driven genomic expansions in Oryza australiensis, a wild relative of rice. Genome Res. 16, 1262–1269 (2006).
Vitte, C., Panaud, O. & Quesneville, H. LTR retrotransposons in rice (Oryza sativa, L.): recent burst amplifications followed by rapid DNA loss. BMC Genomics 8, 218 (2007).
Woodhouse, M.R. et al. Following tetraploidy in maize, a short deletion mechanism removed genes preferentially from one of the two homologs. PLoS Biol. 8, e1000409 (2010).
Paterson, A.H. et al. The Sorghum bicolor genome and the diversification of grasses. Nature 457, 551–556 (2009).
Johnston, J.S. et al. Evolution of genome size in Brassicaceae. Ann. Bot. 95, 229–235 (2005).
Oyama, R.K. et al. The shrunken genome of Arabidopsis thaliana. Plant Syst. Evol. 273, 257–271 (2008).
Wright, S.I., Lauga, B. & Charlesworth, D. Rates and patterns of molecular evolution in inbred and outbred Arabidopsis. Mol. Biol. Evol. 19, 1407–1420 (2002).
Ossowski, S. et al. The rate and molecular spectrum of spontaneous mutations in Arabidopsis thaliana. Science 327, 92–94 (2010).
Beilstein, M.A., Nagalingum, N.S., Clements, M.D., Manchester, S.R. & Mathews, S. Dated molecular phylogenies indicate a Miocene origin for Arabidopsis thaliana. Proc. Natl. Acad. Sci. USA 107, 18724–18728 (2010).
Kuittinen, H. et al. Comparing the linkage maps of the close relatives Arabidopsis lyrata and A. thaliana. Genetics 168, 1575–1584 (2004).
Koch, M.A. & Kiefer, M. Genome evolution among cruciferous plants: a lecture from the comparison of the genetic maps of three diplod species–—Capsella rubella, Arabidopsis lyrata subsp. petraea, and A. thaliana. Am. J. Bot. 92, 761–767 (2005).
Yogeeswaran, K. et al. Comparative genome analyses of Arabidopsis spp.: inferring chromosomal rearrangement events in the evolutionary history of A. thaliana. Genome Res. 15, 505–515 (2005).
Lysak, M.A. et al. Mechanisms of chromosome number reduction in Arabidopsis thaliana and related Brassicaceae species. Proc. Natl. Acad. Sci. USA 103, 5224–5229 (2006).
Berr, A. et al. Chromosome arrangement and nuclear architecture but not centromeric sequences are conserved between Arabidopsis thaliana and Arabidopsis lyrata. Plant J. 48, 771–783 (2006).
Swarbreck, D. et al. The Arabidopsis Information Resource (TAIR): gene structure and function annotation. Nucleic Acids Res. 36, D1009–10014 (2007).
Lim, J.K. & Simmons, M.J. Gross chromosome rearrangements mediated by transposable elements in Drosophila melanogaster. Bioessays 16, 269–275 (1994).
Stankiewicz, P. et al. Genome architecture catalyzes nonrecurrent chromosomal rearrangements. Am. J. Hum. Genet. 72, 1101–1116 (2003).
Korbel, J.O. et al. Paired-end mapping reveals extensive structural variation in the human genome. Science 318, 420–426 (2007).
Lee, J., Han, K., Meyer, T.J., Kim, H.S. & Batzer, M.A. Chromosomal inversions between human and chimpanzee lineages caused by retrotransposons. PLoS ONE 3, e4047 (2008).
Braumann, I., van den Berg, M.A. & Kempken, F. Strain-specific retrotransposon-mediated recombination in commercially used Aspergillus niger strain. Mol. Genet. Genomics 280, 319–325 (2008).
Woodhouse, M.R., Pedersen, B. & Freeling, M. Transposed genes in Arabidopsis are often associated with flanking repeats. PLoS Genet. 6, e1000949 (2010).
Ranz, J.M. et al. Principles of genome evolution in the Drosophila melanogaster species group. PLoS Biol. 5, e152 (2007).
The Chimpanzee Sequencing and Analysis Consortium. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437, 69–87 (2005).
Clark, R.M. et al. Common sequence polymorphisms shaping genetic diversity in Arabidopsis thaliana. Science 317, 338–342 (2007).
Borevitz, J.O. et al. Genome-wide patterns of single-feature polymorphism in Arabidopsis thaliana. Proc. Natl. Acad. Sci. USA 104, 12057–12062 (2007).
Enright, A.J., Van Dongen, S. & Ouzounis, C.A. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30, 1575–1584 (2002).
Michelmore, R.W. & Meyers, B.C. Clusters of resistance genes in plants evolve by divergent selection and a birth-and-death process. Genome Res. 8, 1113–1130 (1998).
Thomas, J.H. Adaptive evolution in two large families of ubiquitin-ligase adapters in nematodes and plants. Genome Res. 16, 1017–1030 (2006).
Yang, X. et al. The F-box gene family is expanded in herbaceous annual plants relative to woody perennial plants. Plant Physiol. 148, 1189–1200 (2008).
Tuskan, G.A. et al. The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313, 1596–1604 (2006).
Jaillon, O. et al. The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature 449, 463–467 (2007).
Velasco, R. et al. A high quality draft consensus sequence of the genome of a heterozygous grapevine variety. PLoS ONE 2, e1326 (2007).
Li, L., Stoeckert, C.J. Jr. & Roos, D.S. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13, 2178–2189 (2003).
SanMiguel, P., Gaut, B.S., Tikhonov, A., Nakajima, Y. & Bennetzen, J.L. The paleontology of intergene retrotransposons of maize. Nat. Genet. 20, 43–45 (1998).
Devos, K.M., Brown, J.K. & Bennetzen, J.L. Genome size reduction through illegitimate recombination counteracts genome expansion in Arabidopsis. Genome Res. 12, 1075–1079 (2002).
Hollister, J.D. & Gaut, B.S. Epigenetic silencing of transposable elements: a trade-off between reduced transposition and deleterious effects on neighboring gene expression. Genome Res. 19, 1419–1428 (2009).
Nordborg, M. et al. The pattern of polymorphism in Arabidopsis thaliana. PLoS Biol. 3, e196 (2005).
Petrov, D.A., Sangster, T.A., Johnston, J.S., Hartl, D.L. & Shaw, K.L. Evidence for DNA loss as a determinant of genome size. Science 287, 1060–1062 (2000).
Petrov, D.A., Lozovskaya, E.R. & Hartl, D.L. High intrinsic rate of DNA loss in Drosophila. Nature 384, 346–349 (1996).
Charlesworth, B. Evolutionary rates in partially self-fertilizing species. Am. Nat. 140, 126–148 (1992).
Knight, C.A., Molinari, N.A. & Petrov, D.A. The large genome constraint hypothesis: evolution, ecology and phenotype. Ann. Bot. 95, 177–190 (2005).
Jaffe, D.B. et al. Whole-genome sequence assembly for mammalian genomes: Arachne 2. Genome Res. 13, 91–96 (2003).
Demuth, J.P., De Bie, T., Stajich, J.E., Cristianini, N. & Hahn, M.W. The evolution of mammalian gene families. PLoS ONE 1, e85 (2006).
Prachumwat, A. & Li, W.H. Gene number expansion and contraction in vertebrate genomes with respect to invertebrate genomes. Genome Res. 18, 221–232 (2008).
Drosophila 12 Genomes Consortium. et al. Evolution of genes and genomes on the Drosophila phylogeny. Nature 450, 203–218 (2007).
Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F. & Higgins, D.G. The CLUSTAL-X Windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25, 4876–4882 (1997).
Wicker, T. et al. A unified classification system for eukaryotic transposable elements. Nat. Rev. Genet. 8, 973–982 (2007).
McCarthy, E.M. & McDonald, J.F. LTR_STRUC: a novel search and identification program for LTR retrotransposons. Bioinformatics 19, 362–367 (2003).
Edgar, R.C. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5, 113 (2004).
Xiong, Y. & Eickbush, T.H. Origin and evolution of retroelements based upon their reverse transcriptase sequences. EMBO J. 9, 3353–3362 (1990).
Zhang, X. & Wessler, S.R. Genome-wide comparative analysis of the transposable elements in the related species Arabidopsis thaliana and Brassica oleracea. Proc. Natl. Acad. Sci. USA 101, 5589–5594 (2004).
Swofford, D.L. PAUP*: Phylogenetic Analysis Using Parsimony (*and Other Methods): Version 4. (Sinauer Associates, Sunderland, Massachusetts, USA, 2003).
Simillion, C., Vandepoele, K., Saeys, Y. & Van de Peer, Y. Building genomic profiles for uncovering segmental homology in the twilight zone. Genome Res. 14, 1095–1106 (2004).
Altschul, S.F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).
Smith, T.F. & Waterman, M.S. Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981).
Pearson, W.R. Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. Genomics 11, 635–650 (1991).
Kent, W.J. BLAT–—the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).
Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
Katoh, K., Asimenos, G. & Toh, H. Multiple alignment of DNA sequences with MAFFT. Methods Mol. Biol. 537, 39–64 (2009).
Acknowledgements
The US Department of Energy Joint Genome Institute (JGI) provided sequencing and analyses under the Community Sequencing Program supported by the Office of Science of the US Department of Energy under Contract No. DE-AC02-05CH11231. We are particularly grateful to D. Rokhsar and K. Barry for providing leadership for the project at JGI. We thank J. Borevitz, A. Hall, C. Langley, J. Nasrallah, B. Neuffer, O. Savolainen and S. Wright for contributing to the initial sequencing proposal submitted to the Community Sequencing Program at JGI, C. Lanz and K. Lett for technical assistance, and P. Andolfatto and R. Wing for comments on the manuscript. This work was supported by National Science Foundation (NSF) DEB-0723860 (B.S.G.), NSF DEB-0723935 (M.N.), NSF MCB-0618433 (J.C.C.), NSF IOS-0744579 (M.E.N.), NIH GM057994 (J.B.), grant GABI-DUPLO 0315055 of the German Federal Ministry of Education and Research (K.F.X.M.), ERA-NET on Plant Genomics (ERA-PG) grant ARelatives from the Deutsche Forschungsgemeinschaft (D.W.) and Institute for the Promotion of Innovation through Science and Technology in Flanders (IWT) and the Inter-University Network for Fundamental Research (P6/25, BioMaGNet) (Y.V.d.P.), a Gottfried Wilhelm Leibniz Award of Deutsche Forschungsgemeinschaft (DFG) (D.W.), the Austria Academy of Sciences (M.N.) and the Max Planck Society (D.W. and Y.-L.G.).
Author information
Authors and Affiliations
Contributions
J.B., J.C.C., B.S.G., I.V.G., Y.-L.G., K.F.X.M., M.N., Y.V.d.P. and D.W. conceived the study; M.E.N. provided the biological material; J.C., J.-F.C., R.M.C., N.F., J.G. and Y.-L.G. performed the experiments; E.G.B., J.A.F., N.F., H.G., Y.-L.G., G.H., J.D.H., T.T.H., R.P.O., S.O., P.P., A.A.S., J.S., K.S., M.S., X.W. and L.Y. analyzed the data; and Y.-L.G., T.T.H., M.N. and D.W. wrote the paper with contributions from all authors.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Text and Figures
Supplementary Note, Supplementary Tables 1–5 and Supplementary Figures 1–4 (PDF 1278 kb)
Rights and permissions
About this article
Cite this article
Hu, T., Pattyn, P., Bakker, E. et al. The Arabidopsis lyrata genome sequence and the basis of rapid genome size change. Nat Genet 43, 476–481 (2011). https://doi.org/10.1038/ng.807
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/ng.807
This article is cited by
-
tRNA-Cys gene clusters exhibit high variability in Arabidopsis thaliana
BMC Plant Biology (2023)
-
Ancestral self-compatibility facilitates the establishment of allopolyploids in Brassicaceae
Plant Reproduction (2023)
-
Molecular mechanisms of adaptive evolution in wild animals and plants
Science China Life Sciences (2023)
-
Recent speciation associated with range expansion and a shift to self-fertilization in North American Arabidopsis
Nature Communications (2022)
-
Overexpression of cotton genes GhDIR4 and GhPRXIIB in Arabidopsis thaliana improves plant resistance to root-knot nematode (Meloidogyne incognita) infection
3 Biotech (2022)