Abstract
Plants intimately associate with diverse bacteria. Plant-associated bacteria have ostensibly evolved genes that enable them to adapt to plant environments. However, the identities of such genes are mostly unknown, and their functions are poorly characterized. We sequenced 484 genomes of bacterial isolates from roots of Brassicaceae, poplar, and maize. We then compared 3,837 bacterial genomes to identify thousands of plant-associated gene clusters. Genomes of plant-associated bacteria encode more carbohydrate metabolism functions and fewer mobile elements than related non-plant-associated genomes do. We experimentally validated candidates from two sets of plant-associated genes: one involved in plant colonization, and the other serving in microbe–microbe competition between plant-associated bacteria. We also identified 64 plant-associated protein domains that potentially mimic plant domains; some are shared with plant-associated fungi and oomycetes. This work expands the genome-based understanding of plant–microbe interactions and provides potential leads for efficient and sustainable agriculture through microbiome engineering.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Change history
05 April 2018
In the version of this article initially published, owing to technical errors during production Supplementary Tables 2–26 were linked to the incorrect legends, and replacement files posted were corrupted. The errors have been corrected in the HTML version of the paper.
References
Ley, R. E. et al. Evolution of mammals and their gut microbes.Science320, 1647–1651 (2008).
Baumann, P. Biology bacteriocyte-associated endosymbionts of plant sap-sucking insects. Annu. Rev. Microbiol.59, 155–189 (2005).
Sprent, J. I. 60Ma of legume nodulation. What’s new? What’s changing? J. Exp. Bot.59, 1081–1084 (2008).
Pfeilmeier, S., Caly, D. L. & Malone, J. G. Bacterial pathogenesis of plants: future challenges from a microbial perspective: Challenges in Bacterial Molecular Plant Pathology. Mol. Plant Pathol.17, 1298–1313 (2016).
Chowdhury, S. P., Hartmann, A., Gao, X. & Borriss, R. Biocontrol mechanism by root-associated Bacillus amyloliquefaciens FZB42—a review. Front. Microbiol.6, 780 (2015).
Fibach-Paldi, S., Burdman, S. & Okon, Y. Key physiological properties contributing to rhizosphere adaptation and plant growth promotion abilities of Azospirillum brasilense.FEMS Microbiol. Lett.326, 99–108 (2012).
Santhanam, R. et al. Native root-associated bacteria rescue a plant from a sudden-wilt disease that emerged during continuous cropping. Proc. Natl. Acad. Sci. USA112, E5013–E5020 (2015).
Peters, N. K., Frost, J. W. & Long, S. R. A plant flavone, luteolin, induces expression of Rhizobium meliloti nodulation genes. Science233, 977–980 (1986).
Hiei, Y., Ohta, S., Komari, T. & Kumashiro, T. Efficient transformation of rice (Oryza sativa L.) mediated by Agrobacterium and sequence analysis of the boundaries of the T-DNA.Plant J.6, 271–282 (1994).
Hueck, C. J. Type III protein secretion systems in bacterial pathogens of animals and plants. Microbiol. Mol. Biol. Rev.62, 379–433 (1998).
Bulgarelli, D. et al. Revealing structure and assembly cues forArabidopsis root-inhabiting bacterial microbiota. Nature488, 91–95 (2012).
Lundberg, D. S. et al. Defining the core Arabidopsis thaliana root microbiome. Nature488, 86–90 (2012).
Bulgarelli, D., Schlaeppi, K., Spaepen, S., Ver Loren van Themaat, E. & Schulze-Lefert, P. Structure and functions of the bacterial microbiota of plants. Annu. Rev. Plant Biol.64, 807–838 (2013).
Ofek-Lalzar, M. et al. Niche and host-associated functional signatures of the root surface microbiome. Nat. Commun.5, 4950 (2014).
Gottel, N. R. et al. Distinct microbial communities within the endosphere and rhizosphere of Populus deltoides roots across contrasting soil types. Appl. Environ. Microbiol.77, 5934–5944 (2011).
Bai, Y. et al. Functional overlap of the Arabidopsis leaf and root microbiota. Nature528, 364–369 (2015).
Hardoim, P. R. et al. The hidden world within plants: ecological and evolutionary considerations for defining functioning of microbial endophytes. Microbiol. Mol. Biol. Rev.79, 293–320 (2015).
Bulgarelli, D. et al. Structure and function of the bacterial root microbiota in wild and domesticated barley. Cell Host Microbe17, 392–403 (2015).
Hacquard, S. et al. Microbiota and host nutrition across plant and animal kingdoms. Cell Host Microbe17, 603–616 (2015).
Tatusov, R. L., Galperin, M. Y., Natale, D. A. & Koonin, E. V. The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res.28, 33–36 (2000).
Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res.44, D457–D462 (2016).
Haft, D. H., Selengut, J. D. & White, O. The TIGRFAMs database of protein families. Nucleic Acids Res.31, 371–373 (2003).
Huntemann, M. et al. The standard operating procedure of the DOE-JGI Microbial Genome Annotation Pipeline (MGAP v.4). Stand. Genomic Sci.10, 86 (2015).
Emms, D. M. & Kelly, S. OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy.Genome Biol.16, 157 (2015).
Finn, R. D. et al. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res.44, D279–D285 (2016).
Ives, A. R. & Garland, T. Jr. Phylogenetic logistic regression for binary dependent variables. Syst. Biol.59, 9–26 (2010).
Brynildsrud, O., Bohlin, J., Scheffer, L. & Eldholm, V. Rapid scoring of genes in microbial pan-genome-wide association studies with Scoary.Genome Biol.17, 238 (2016).
Hultman, J. et al. Multi-omics of permafrost, active layer and thermokarst bog soil microbiomes. Nature521, 208–212 (2015).
Louca, S. et al. Integrating biogeochemistry with multiomic sequence information in a model oxygen minimum zone. Proc. Natl. Acad. Sci. USA113, E5925–E5933 (2016).
Coutinho, B. G., Licastro, D., Mendonça-Previato, L., Cámara, M. & Venturi, V. Plant-influenced gene expression in the rice endophyteBurkholderia kururiensis M130. Mol. Plant Microbe Interact.28, 10–21 (2015).
Long, S. R. Rhizobium-legume nodulation: life together in the underground. Cell56, 203–214 (1989).
Ruvkun, G. B., Sundaresan, V. & Ausubel, F. M. Directed transposon Tn5 mutagenesis and complementation analysis of Rhizobium meliloti symbiotic nitrogen fixation genes. Cell29, 551–559 (1982).
Hershey, D. M., Lu, X., Zi, J. & Peters, R. J. Functional conservation of the capacity for ent-kaurene biosynthesis and an associated operon in certain rhizobia. J. Bacteriol.196, 100–106 (2014).
Nett, R. S. et al. Elucidation of gibberellin biosynthesis in bacteria reveals convergent evolution. Nat. Chem. Biol.13, 69–74 (2017).
Scharf, B. E., Hynes, M. F. & Alexandre, G. M. Chemotaxis signaling systems in model beneficial plant-bacteria associations. Plant Mol. Biol.90, 549–559 (2016).
Büttner, D. & He, S. Y. Type III protein secretion in plant pathogenic bacteria. Plant Physiol.150, 1656–1664 (2009).
Gao, R. et al. Genome-wide RNA sequencing analysis of quorum sensing-controlled regulons in the plant-associated Burkholderia glumae PG1 strain. Appl. Environ. Microbiol.81, 7993–8007 (2015).
Weller-Stuart, T., Toth, I., De Maayer, P. & Coutinho, T. Swimming and twitching motility are essential for attachment and virulence ofPantoea ananatis in onion seedlings.Mol. Plant Pathol.18, 734–745 (2017).
De Weger, L. A. et al. Flagella of a plant-growth-stimulatingPseudomonas fluorescens strain are required for colonization of potato roots. J. Bacteriol.169, 2769–2773 (1987).
de Weert, S. et al. Flagella-driven chemotaxis towards exudate components is an important trait for tomato root colonization by Pseudomonas fluorescens. Mol. Plant Microbe Interact.15, 1173–1180 (2002).
Ravcheev, D. A. et al. Comparative genomics and evolution of regulons of the LacI-family transcription factors. Front. Microbiol.5, 294 (2014).
Yamauchi, Y., Hasegawa, A., Taninaka, A., Mizutani, M. & Sugimoto, Y. NADPH-dependent reductases involved in the detoxification of reactive carbonyls in plants. J. Biol. Chem.286, 6999–7009 (2011).
Burstein, D. et al. Genome-scale identification of Legionella pneumophila effectors using a machine learning approach. PLoS Pathog.5, e1000508 (2009).
Dean, P. Functional domains and motifs of bacterial type III effector proteins and their roles in infection. FEMS Microbiol. Rev.35, 1100–1125 (2011).
Stebbins, C. E. & Galán, J. E. Structural mimicry in bacterial virulence. Nature412, 701–705 (2001).
Price, C. T. et al. Molecular mimicry by an F-box effector ofLegionella pneumophila hijacks a conserved polyubiquitination machinery within macrophages and protozoa.PLoS Pathog.5, e1000704 (2009).
Rothmeier, E. et al. Activation of Ran GTPase by a Legionella effector promotes microtubule polymerization, pathogen vacuole motility and infection. PLoS Pathog.9, e1003598 (2013).
Xu, R.-Q. et al. AvrAC(Xcc8004), a type III effector with a leucine-rich repeat domain from Xanthomonas campestris pathovar campestris confers avirulence in vascular tissues of Arabidopsis thaliana ecotype Col-0. J. Bacteriol.190, 343–355 (2008).
Shevchik, V. E., Robert-Baudouy, J. & Hugouvieux-Cotte-Pattat, N. Pectate lyase PelI of Erwinia chrysanthemi 3937 belongs to a new family. J. Bacteriol.179, 7321–7330 (1997).
Cesari, S., Bernoux, M., Moncuquet, P., Kroj, T. & Dodds, P. N. A novel conserved mechanism for plant NLR protein pairs: the “integrated decoy” hypothesis. Front. Plant Sci.5, 606 (2014).
Sarris, P. F. et al. A plant immune receptor detects pathogen effectors that target WRKY transcription factors. Cell161, 1089–1100 (2015).
Sarris, P. F., Cevik, V., Dagdas, G., Jones, J. D. & Krasileva, K. V. Comparative analysis of plant immune receptor architectures uncovers host proteins likely targeted by pathogens. BMC Biol.14, 8 (2016).
Le Roux, C. et al. A receptor pair with an integrated decoy converts pathogen disabling of transcription factors to immunity. Cell161, 1074–1088 (2015).
Brown, G. D. & Netea, M. G. (eds.). Immunology of Fungal Infections. (Springer, Dordrecht, The Netherlands, 2007).
Gadjeva, M., Takahashi, K. & Thiel, S. Mannan-binding lectin—a soluble pattern recognition molecule. Mol. Immunol.41, 113–121 (2004).
Ma, Q.-H., Tian, B. & Li, Y.-L. Overexpression of a wheat jasmonate-regulated lectin increases pathogen resistance. Biochimie92, 187–193 (2010).
Xiang, Y. et al. A jacalin-related lectin-like gene in wheat is a component of the plant defence system. J. Exp. Bot.62, 5471–5483 (2011).
Yamaji, Y. et al. Lectin-mediated resistance impairs plant virus infection at the cellular level. Plant Cell24, 778–793 (2012).
Weidenbach, D. et al. Polarized defense against fungal pathogens is mediated by the Jacalin-related lectin domain of modular Poaceae-specific proteins. Mol. Plant9, 514–527 (2016).
Sahly, H. et al. Surfactant protein D binds selectively toKlebsiella pneumoniae lipopolysaccharides containing mannose-rich O-antigens. J. Immunol.169, 3267–3274 (2002).
Osborn, M. J., Rosen, S. M., Rothfield, L., Zeleznick, L. D. & Horecker, B. L. Lipopolysaccharide of the gram-negative cell wall. Science145, 783–789 (1964).
Tans-Kersten, J., Huang, H. & Allen, C. Ralstonia solanacearum needs motility for invasive virulence on tomato. J. Bacteriol.183, 3597–3605 (2001).
Cole, B. J. et al. Genome-wide identification of bacterial plant colonization genes. PLoS Biol.15, e2002860 (2017).
Poggio, S. et al. A complete set of flagellar genes acquired by horizontal transfer coexists with the endogenous flagellar system in Rhodobacter sphaeroides. J. Bacteriol.189, 3208–3216 (2007).
Ho, B. T., Dong, T. G. & Mekalanos, J. J. A view to a kill: the bacterial type VI secretion system. Cell Host Microbe15, 9–21 (2014).
MacIntyre, D. L., Miyata, S. T., Kitaoka, M. & Pukatzki, S. TheVibrio cholerae type VI secretion system displays antimicrobial properties. Proc. Natl. Acad. Sci. USA107, 19520–19524 (2010).
Tian, Y. et al. The type VI protein secretion system contributes to biofilm formation and seed-to-seedling transmission of Acidovorax citrulli on melon. Mol. Plant Pathol.16, 38–47 (2015).
Peiffer, J. A. et al. Diversity and heritability of the maize rhizosphere microbiome under field conditions. Proc. Natl. Acad. Sci. USA110, 6548–6553 (2013).
Agler, M. T. et al. Microbial hub taxa link host and abiotic factors to plant microbiome variation. PLoS Biol.14, e1002352 (2016).
Bokulich, N. A., Thorngate, J. H., Richardson, P. M. & Mills, D. A. Microbial biogeography of wine grapes is conditioned by cultivar, vintage, and climate. Proc. Natl. Acad. Sci. USA111, E139–E148 (2014).
Coleman-Derr, D. et al. Plant compartment and biogeography affect microbiome composition in cultivated and native Agave species. New Phytol.209, 798–811 (2016).
Shade, A., McManus, P. S. & Handelsman, J. Unexpected diversity during community succession in the apple flower microbiome. MBio4, e00602–e00612 (2013).
Turner, T. R. et al. Comparative metatranscriptomics reveals kingdom level changes in the rhizosphere microbiome of plants. ISME J.7, 2248–2258 (2013).
Edwards, J. et al. Structure, variation, and assembly of the root-associated microbiomes of rice. Proc. Natl. Acad. Sci. USA112, E911–E920 (2015).
Kroj, T., Chanclud, E., Michel-Romiti, C., Grand, X. & Morel, J.-B. Integration of decoy domains derived from protein targets of pathogen effectors into plant immune receptors is widespread. New Phytol.210, 618–626 (2016).
Mukhtar, M. S. et al. Independently evolved virulence effectors converge onto hubs in a plant immune system network. Science333, 596–601 (2011).
Vimr, E. & Lichtensteiger, C. To sialylate, or not to sialylate: that is the question. Trends Microbiol.10, 254–257 (2002).
de Jonge, R. et al. Conserved fungal LysM efector Ecp6 prevents chitin-triggered immunity in plants. Science329, 953–955 (2010).
Doty, S. L. et al. Diazotrophic endophytes of native black cottonwood and willow. Symbiosis47, 23–33 (2009).
Weston, D. J. et al. Pseudomonas fluorescens induces strain-dependent and strain-independent host plant responses in defense networks, primary metabolism, photosynthesis, and fitness. Mol. Plant Microbe Interact.25, 765–778 (2012).
Rinke, C. et al. Insights into the phylogeny and coding potential of microbial dark matter. Nature499, 431–437 (2013).
Beszteri, B., Temperton, B., Frickenhaus, S. & Giovannoni, S. J. Average genome size: a potential source of bias in comparative metagenomics.ISME J.4, 1075–1077 (2010).
Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res.25, 1043–1055 (2015).
Varghese, N. J. et al. Microbial species delineation using whole genome sequences. Nucleic Acids Res.43, 6761–6771 (2015).
Kerepesi, C., Bánky, D. & Grolmusz, V. AmphoraNet: the webserver implementation of the AMPHORA2 metagenomic workflow suite. Gene533, 538–540 (2014).
Wu, M., Chatterji, S. & Eisen, J. A. Accounting for alignment uncertainty in phylogenomics. PLoS One7, e30288 (2012).
Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2—approximately maximum-likelihood trees for large alignments. PLoS One5, e9490 (2010).
Sen, A. et al. Phylogeny of the class Actinobacteria revisited in the light of complete genomes. The orders ‘Frankiales’ and Micrococcales should be split into coherent entities: proposal of Frankiales ord. nov., Geodermatophilales ord. nov., Acidothermales ord. nov. and Nakamurellales ord. nov. Int. J. Syst. Evol. Microbiol.64, 3821–3832 (2014).
Edgar, R. C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics26, 2460–2461 (2010).
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods12, 59–60 (2015).
Wang, Z. & Wu, M. A phylum-level bacterial phylogenetic marker database. Mol. Biol. Evol.30, 1258–1262 (2013).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Series B Stat. Methodol.57, 289–300 (1995).
Finn, R. D. et al. HMMER web server: 2015 update. Nucleic Acids Res.43, W30–W38 (2015).
Alexeyev, M. F. The pKNOCK series of broad-host-range mobilizable suicide vectors for gene knockout and targeted DNA insertion into the chromosome of gram-negative bacteria. Biotechniques26, 824–826 (1999).
Hadjithomas, M. et al. IMG-ABC: a knowledge base to fuel discovery of biosynthetic gene clusters and novel secondary metabolites. MBio6, e00932 (2015).
Katoh, K., Misawa, K., Kuma, K. & Miyata, T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform.Nucleic Acids Res.30, 3059–3066 (2002).
Stamatakis, A., Hoover, P. & Rougemont, J. A rapid bootstrap algorithm for the RAxML Web servers. Syst. Biol.57, 758–771 (2008).
Finkel, O. M., Béjà, O. & Belkin, S. Global abundance of microbial rhodopsins. ISME J.7, 448–451 (2013).
Traore, S. M. Characterization of Type Three Effector Genes of A. citrulli, the Causal Agent of Bacterial Fruit Blotch of Cucurbits. (Virginia Polytechnic Institute and State University, Blacksburg, VA, 2014).
Basler, M., Ho, B. T. & Mekalanos, J. J. Tit-for-tat: type VI secretion system counterattack during bacterial cell-cell interactions.Cell152, 884–894 (2013).
Acknowledgements
The work conducted by the US Department of Energy Joint Genome Institute, a DOE Office of Science User Facility, is supported by the Office of Science of the US Department of Energy under contract no. DE-AC02-05CH11231. J.L.D. and S.G.T. were supported by NSF INSPIRE grant IOS-1343020, and J.L.D. was also supported by DOE–USDA Feedstock Award DE-SC001043 and by the Office of Science (BER), US Department of Energy, grant no. DE-SC0014395. S.H.P. was supported by NIH Training Grant T32 GM067553-06 and was a Howard Hughes Medical Institute (HHMI) International Student Research Fellow. D.S.L. was supported by NIH Training Grant T32 GM07092-34. J.L.D. is an Investigator of the HHMI, supported by the HHMI and the Gordon and Betty Moore Foundation (GBMF3030). M.E.F. was supported by NIH Dr. Ruth L. Kirschstein NRSA Fellowship F32-GM112345. D.A.P. and T.-Y.L. were supported by the Genomic Science Program, US Department of Energy, Office of Science, Biological and Environmental Research as part of the Oak Ridge National Laboratory Plant Microbe Interfaces Scientific Focus Area (http://pmi.ornl.gov) and Plant Feedstock Genomics Award DE-SC001043. Oak Ridge National Laboratory is managed by UT-Battelle, LLC, for the US Department of Energy under contract DE-AC05-00OR22725. J.A.V. was supported by a SystemsX.ch grant (Micro2X) and a European Research Council (ERC) advanced grant (PhyMo). We thank I. Bertani, C. Bez, R. Bowers, D. Burstein, A. Chun Chen, D. Chiniquy, B. Cole, O. Cohen, A. Copeland, J. Eisen, E. Eloe-Fadrosh, M. Hadjithomas, O. Finkel, H. Schnitzel Meule Fux, N. Ivanova, J. Knelman, R. Malmstrom, R. Perez-Torres, D. Salomon, R. Sorek, T. Mucyn, R. Seshadri, T.K. Reddy, L. Ryan, and H. Sberro Livnat for general help, text editing, and ideas for this work. We thank R. Walcott (University of Georgia, Athens, GA, USA) for providing the Acidovorax citrulli VasD mutant strain.
Author information
Authors and Affiliations
Contributions
A.L. performed most data analysis and wrote the paper. I.S.G. performed phylogenetic inference, performed phylogenetically aware analyses, analyzed the data, provided the supporting website, and contributed to manuscript writing. M. Mittelviefhaus and J.A.V. designed and performed experiments related to Hyde1 gene function and contributed to manuscript writing. S.C. isolated single bacterial cells and prepared metadata for data analysis. F.M. analyzed data. S.H.P. analyzed data and contributed to manuscript writing. J.M. produced a mutant strain for Hyde1. K.W. tested Hyde1 toxicity inE. coli. G.D. and V.V. produced deletion mutants and designed and performed rice root colonization experiments. K.S. helped in data analysis. B.R.A. prepared metadata for data analysis. D.S.L., T.-Y.L., S.L., Z.J., M. McDonald, A.P.K., M.E.F., and S.L.D. isolated bacteria from different plants or managed this process. T.G.d.R. managed the sequencing project. S.R.G., D.A.P., and R.E.L. managed bacterial isolation efforts and contributed to manuscript writing. B.Z. managed Hyde1 deletion and toxicity testing. S.G.T. contributed to manuscript writing. T.W. managed single-cell isolation efforts and contributed to manuscript writing. J.L.D. directed the overall project and contributed to manuscript writing.
Corresponding authors
Ethics declarations
Competing interests
J.L.D. is a cofounder of and shareholder in, and S.H.P. collaborates with, AgBiome LLC, a corporation that aims to use plant-associated microbes to improve plant productivity.
Additional information
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–29 and Supplementary Note 1.
Supplementary Table 1
All genomes used. Lists of all genomes used from nine taxa (pre-filtration). Cells filled with yellow are Brassicaceae root isolates from the USA, cells filled with green are single cells isolated from Arabidopsis thaliana, cells filled with pink are poplar isolates, cells filled with blue are recently published leaf and root Arabidopsis and soil isolates from Europe, cells filled with purple are maize root isolates. “Filtered out?” column is ‘N’ if genome is retained for usage in analysis after QA process. “Representative genome taxid” – taxon id of another genome (different row in the same tab) representing at least two redundant genomes. Completeness and contamination values were calculated with CheckM. Full genome sequence, gene annotation, and metadata of each genome used can be found in the IMG website https://img.jgi.doe.gov/. For example the metadata of taxon id 2558860101 can be found in https://img.jgi.doe.gov/cgibin/mer/main.cgi?section=TaxonDetail&page=taxonDetail&ta xon_oid=2558860101.
Supplementary Table 2
Statistics of genomes in the taxa used
Supplementary Table 3
Sequencing and assembly information of new genomes
Supplementary Table 4
Abundance of the nine taxa in 16S marker gene surveys. The relative abundances of taxa composing a specific taxon were taken from the different publications and were added to yield the relative abundance of that taxon. In those cases with biological replicates, e.g. in Lundberg et al. Nature 2012 we used the median value.
Supplementary Table 5
Genome size comparison. Genome size comparison between the different isolation sites done by t-test and PhyloGLM. Each cell denotes the group with the largest genomes, if the difference is significant (P < 0.05). N.S. - not significant. PhyloGLM test takes into account the phylogenetic structure of the taxon.
Supplementary Table 6
COG-to-COG category mapping
Supplementary Table 7
Acinetobacter PA/NPA/RA/soil genes/domains. Phylogenetic diversity is the median pairwise distance between the genomes hosting the genes in the cluster. Values for each test are "Y", "N", or "Untested" (clusters were untested when there was insufficient phylogenetic signal, they were too small or were found in all genomes). To be considered as a significant cluster inpfam/COG/TIGRFAM/KO + hypergbin/hypergcn, we used qvalue< 0.05 (Benjamini Hochberg FDR corrected). To be considered as significant cluster in OrthoFinder + hypergbin/hypergcn, we used Bonferroni-corrected P < 0.1.To be considered as a significant PA/RA cluster in phyloglmcn/phyloglmcn, we used q-value < 0.05 (Benjamini Hochberg FDR corrected) and an estimate > 0 (or estimate < 0 for significant NPA/soil). To be considered as a significant PA/RA cluster in Scoary, we used P < 0.05 for three tests: Fisher exact test (Benjamini Hochberg FDR corrected), worst pairing scenario test, and empirical test and odds ratio or Fisher exact test > 1 (odds ratio < 1 for NPA/soil).
Supplementary Table 8
Actinobacteria1 PA/NPA/RA/soil genes/domains. Phylogenetic diversity is the median pairwise distance between the genomes hosting the genes in the cluster. Values for each test are "Y", "N", or "Untested" (clusters were untested when there was insufficient phylogenetic signal, they were too small or were found in all genomes). To be considered as a significant cluster in pfam/COG/TIGRFAM/KO + hypergbin/hypergcn, we used qvalue < 0.05 (Benjamini Hochberg FDR corrected). To be considered as significant cluster in OrthoFinder + hypergbin/hypergcn, we used Bonferroni-corrected P < 0.1. To be considered as a significant PA/RA cluster in phyloglmcn/phyloglmcn, we used q-value < 0.05 (Benjamini Hochberg FDR corrected) and an estimate > 0 (or estimate < 0 for significant NPA/soil). To be considered as a significant PA/RA cluster in Scoary, we used P < 0.05 for three tests: Fisher exact test (Benjamini Hochberg FDR corrected), worst pairing scenario test, and empirical test and odds ratio or Fisher exact test > 1 (odds ratio < 1 for NPA/soil).
Supplementary Table 9
Actinobacteria2 PA/NPA/RA/soil genes/domains. Phylogenetic diversity is the median pairwise distance between the genomes hosting the genes in the cluster. Values for each test are "Y", "N", or "Untested" (clusters were untested when there was insufficient phylogenetic signal, they were too small or were found in all genomes). To be considered as a significant cluster in pfam/COG/TIGRFAM/KO + hypergbin/hypergcn, we used qvalue < 0.05 (Benjamini Hochberg FDR corrected). To be considered as significant cluster in OrthoFinder + hypergbin/hypergcn, we used Bonferroni-corrected P < 0.1. To be considered as a significant PA/RA cluster in phyloglmcn/phyloglmcn, we used q-value < 0.05 (Benjamini Hochberg FDR corrected) and an estimate > 0 (or estimate < 0 for significant NPA/soil). To be considered as a significant PA/RA cluster in Scoary, we used P < 0.05 for three tests: Fisher exact test (Benjamini Hochberg FDR corrected), worst pairing scenario test, and empirical test and odds ratio or Fisher exact test > 1 (odds ratio < 1 for NPA/soil).
Supplementary Table 10
Alphaproteobacteria PA/NPA/RA/soil genes/domains. Phylogenetic diversity is the median pairwise distance between the genomes hosting the genes in the cluster. Values for each test are "Y", "N", or "Untested" (clusters were untested when there was insufficient phylogenetic signal, they were too small or were found in all genomes). To be considered as a significant cluster in pfam/COG/TIGRFAM/KO + hypergbin/hypergcn, we used q- value < 0.05 (Benjamini Hochberg FDR corrected). To be considered as significant cluster in OrthoFinder + hypergbin/hypergcn, we used Bonferroni-corrected P < 0.1. To be considered as a significant PA/RA cluster in phyloglmcn/phyloglmcn, we used q-value < 0.05 (Benjamini Hochberg FDR corrected) and an estimate > 0 (or estimate < 0 for significant NPA/soil). To be considered as a significant PA/RA cluster in Scoary, we used P < 0.05 for three tests: Fisher exact test (Benjamini Hochberg FDR corrected), worst pairing scenario test, and empirical test and odds ratio or Fisher exact test > 1 (odds ratio < 1 for NPA/soil).
Supplementary Table 11
Bacillales PA/NPA/RA/soil genes/domains. Phylogenetic diversity is the median pairwise distance between the genomes hosting the genes in the cluster. Values for each test are "Y", "N", or "Untested" (clusters were untested when there was insufficient phylogenetic signal, they were too small or were found in all genomes). To be considered as a significant cluster in pfam/COG/TIGRFAM/KO + hypergbin/hypergcn, we used q-value < 0.05 (Benjamini Hochberg FDR corrected). To be considered as significant cluster in OrthoFinder + hypergbin/hypergcn, we used Bonferroni-corrected P < 0.1. To be considered as a significant PA/RA cluster in phyloglmcn/phyloglmcn, we used q-value < 0.05 (Benjamini Hochberg FDR corrected) and an estimate > 0 (or estimate < 0 for significant NPA/soil). To be considered as a significant PA/RA cluster in Scoary, we used P < 0.05 for three tests: Fisher exact test (Benjamini Hochberg FDR corrected), worst pairing scenario test, and empirical test and odds ratio or Fisher exact test > 1 (odds ratio < 1 for NPA/soil).
Supplementary Table 12
Bacteroidetes PA/NPA/RA/soil genes/domains. Phylogenetic diversity is the median pairwise distance between the genomes hosting the genes in the cluster. Values for each test are "Y", "N", or "Untested" (clusters were untested when there was insufficient phylogenetic signal, they were too small or were found in all genomes). To be considered as a significant cluster in pfam/COG/TIGRFAM/KO + hypergbin/hypergcn, we used qvalue < 0.05 (Benjamini Hochberg FDR corrected). To be considered as significant cluster in OrthoFinder + hypergbin/hypergcn, we used Bonferroni-corrected P < 0.1. To be considered as a significant PA/RA cluster in phyloglmcn/phyloglmcn, we used q-value < 0.05 (Benjamini Hochberg FDR corrected) and an estimate > 0 (or estimate < 0 for significant NPA/soil). To be considered as a significant PA/RA cluster in Scoary, we used P < 0.05 for three tests: Fisher exact test (Benjamini Hochberg FDR corrected), worst pairing scenario test, and empirical test and odds ratio or Fisher exact test > 1 (odds ratio < 1 for NPA/soil).
Supplementary Table 13
Burkholderiales PA/NPA/RA/soil genes/domains. Phylogenetic diversity is the median pairwise distance between the genomes hosting the genes in the cluster. Values for each test are "Y", "N", or "Untested" (clusters were untested when there was insufficient phylogenetic signal, they were too small or were found in all genomes). To be considered as a significant cluster in pfam/COG/TIGRFAM/KO + hypergbin/hypergcn, we used qvalue < 0.05 (Benjamini Hochberg FDR corrected). To be considered as significant cluster in OrthoFinder + hypergbin/hypergcn, we used Bonferroni-corrected P < 0.1. To be considered as a significant PA/RA cluster in phyloglmcn/phyloglmcn, we used q-value < 0.05 (Benjamini Hochberg FDR corrected) and an estimate > 0 (or estimate < 0 for significant NPA/soil). To be considered as a significant PA/RA cluster in Scoary, we used P < 0.05 for three tests: Fisher exact test (Benjamini Hochberg FDR corrected), worst pairing scenario test, and empirical test and odds ratio or Fisher exact test > 1 (odds ratio < 1 for NPA/soil).
Supplementary Table 14
Pseudomonas PA/NPA/RA/soil genes/domains. Phylogenetic diversity is the median pairwise distance between the genomes hosting the genes in the cluster. Values for each test are "Y", "N", or "Untested" (clusters were untested when there was insufficient phylogenetic signal, they were too small or were found in all genomes). To be considered as a significant cluster in pfam/COG/TIGRFAM/KO + hypergbin/hypergcn, we used qvalue < 0.05 (Benjamini Hochberg FDR corrected). To be considered as significant cluster in OrthoFinder + hypergbin/hypergcn, we used Bonferroni-corrected P < 0.1. To be considered as a significant PA/RA cluster in phyloglmcn/phyloglmcn, we used q-value < 0.05 (Benjamini Hochberg FDR corrected) and an estimate > 0 (or estimate < 0 for significant NPA/soil). To be considered as a significant PA/RA cluster in Scoary, we used P < 0.05 for three tests: Fisher exact test (Benjamini Hochberg FDR corrected), worst pairing scenario test, and empirical test and odds ratio or Fisher exact test > 1 (odds ratio < 1 for NPA/soil).
Supplementary Table 15
Xanthomonadaceae PA/NPA/RA/soil genes/domains. Phylogenetic diversity is the median pairwise distance between the genomes hosting the genes in the cluster. Values for each test are "Y", "N", or "Untested" (clusters were untested when there was insufficient phylogenetic signal, they were too small or were found in all genomes). To be considered as a significant cluster in pfam/COG/TIGRFAM/KO + hypergbin/hypergcn, we used qvalue < 0.05 (Benjamini Hochberg FDR corrected). To be considered as significant cluster in OrthoFinder + hypergbin/hypergcn we used Bonferroni-corrected P < 0.1. To be considered as a significant PA/RA cluster in phyloglmcn/phyloglmcn, we used q-value < 0.05 (Benjamini Hochberg FDR corrected) and an estimate > 0 (or estimate < 0 for significant NPA/soil). To be considered as a significant PA/RA cluster in Scoary, we used P < 0.05 for three tests: Fisher exact test (Benjamini Hochberg FDR corrected), worst pairing scenario test, and empirical test and odds ratio or Fisher exact test > 1 (odds ratio < 1 for NPA/soil).
Supplementary Table 16
Validation of PA/NPA/RA/soil genes through metagenomes. a. Samples used (n=38), b. Summary of results based on two sided t test.
Supplementary Table 17
Validation of PA genes in Paraburkholderia kururiensis M130. a. Mutant used and statistical tests results, b. Raw data: cfu/g root, 3. Primers used.
Supplementary Table 18
The number of operons predicted by different approaches.
Supplementary Table 19
Reproducible PA domains. a. Protein domains that are significantly PA in at least three taxa by at least two tests. NA – test results are not available (untested), NS – non-significant result. b. Fractions for LacI proteins within genomes, c. Fraction of pfam00248 domain within genomes.
Supplementary Table 20
DNA motifs predicted to be bound by LacI transcription factors. Predicted promoter sequences are intergenic sequences, at least 25 bp long, located upstream of carbohydrate metabolism and transport genes that are found directly adjacent to LacI genes. The most abundant kmers of different lengths were detected using wordcount (Emboss package). The most abundant motifs found in multiple taxa were compared against their distribution in random intergenic sequences using the Fisher exact test.
Supplementary Table 21
PREPARADOs. Pfam domains that are both significant PA/RA domains (reproducibly found as such in multiple taxa or by multiple approaches) and more abundant in plants than in bacteria according to Pfam (PREPARADOs). Pfams labeled in yellow are carbohydrate-related and are part of proteins found in eukaryotes and bacteria with full length sequence similarity, having an N-terminus signal peptide, and lacking a transmembrane domain. Cells marked in green are domains that are predicted to be secreted by Sec or T3SS (over >50% of the bacterial proteins having the domain are predicted to be secreted by these secretion systems).
Supplementary Table 22
Full-length proteins conserved between PA bacterial genes and eukaryotic genes. LAST alignment results of PREPARADO-containing proteins from bacteria (query) against plant, fungi, oomycetes, and protist proteins from Refseq (target). Only alignments that are over 40% identity and stretch across at least 90% of the query and target length are shown.
Supplementary Table 23
Jekyll and Hyde. Gene homologs of Jekyll and Hyde proteins based on protein homologs on IMG; To find all homologs and paralogs of Jekyll and Hyde genes (a-d) we used IMG blast search with e value threshold of 1e-5 against all IMG isolates, some of which were not included in the original comparartive analysis and hence their genes are not part of any cluster. Since Hyde1 proteins are rapidly evolving, they are scattered across multiple OrthoFinder orthogroups. Metadata in a-d was retrieved from IMG website. a. Jekyll protein homologs of Acidovorax gene Ga0102403_10160, b. Hyde1 protein homologs of Acidovorax protein Aave_1071, c. Hyde1-like protein homologs of Pseudomonas protein A243_06583, d. Hyde2 homologs of Ga0078621_123530, e. Hyde1-like-Hyde2 loci in representative Proteobacteria, one per genus, and their location adjacent to T6SS genes and within genomes that encode T6SS. Hyde2 was found based on blast search against the nr db with Acav_4635 as the query.
Supplementary Table 24
Divergence of Jekyll gene operon. An analysis of the Jekyll gene cluster that is presented in Figure 6b. Control genes are shown in Figure S26c. The table summarizes a comparison between multiple sequence alignments of the Jekyll locus (Figure S24b) and the control genes (Figure S24c).
Supplementary Table 25
Toxicity of Hyde proteins and recovery of prey cells confronted with Hyde-encoding Acidovorax and different mutants. Includes primers used to make Acidovorax deletion strains, strains used as prey and their antibiotic resistance, raw results for cell toxicity and competition assays.
Supplementary Table 26
Significant orthogroups (orthofinder clusters) supported by three statistical approaches: either hypergbin, phyloglmbin, and Scoary, or hypergcn, phyloglmcn, and Scoary
Rights and permissions
About this article
Cite this article
Levy, A., Salas Gonzalez, I., Mittelviefhaus, M. et al. Genomic features of bacterial adaptation to plants. Nat Genet 50, 138–150 (2018). https://doi.org/10.1038/s41588-017-0012-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-017-0012-9
This article is cited by
-
Predictions of rhizosphere microbiome dynamics with a genome-informed and trait-based energy budget model
Nature Microbiology (2024)
-
Microbiome homeostasis on rice leaves is regulated by a precursor molecule of lignin biosynthesis
Nature Communications (2024)
-
bacLIFE: a user-friendly computational workflow for genome analysis and prediction of lifestyle-associated genes in bacteria
Nature Communications (2024)
-
Legume rhizodeposition promotes nitrogen fixation by soil microbiota under crop diversification
Nature Communications (2024)
-
Speciation Features of Ferdinandcohnia quinoae sp. nov to Adapt to the Plant Host
Journal of Molecular Evolution (2024)