Main

For most of the twentieth century, our understanding of microbial diversity relied on microscopy, cultivation and experimentation to identify the genetic and phenotypic characteristics of isolated microorganisms. Although these approaches led to many foundational discoveries, the magnitude of microbial diversity and the phylogenetic relationships among microorganisms were poorly understood. Extending from the groundbreaking work of researchers such as Woese and Fox1, small subunit ribosomal RNA (SSU rRNA) was established as a marker to survey the diversity of natural microbial communities2,3, originally by a comparison to 25 published full-length reference sequences4. This new molecular view of the microbial world led Norman Pace to recommend that “we undertake a representative survey of microbial diversity in the environment” (Ref. 5) on the grounds that this would answer many questions about the microbial biosphere.

Cost and labour considerations limited the scope of early molecular studies of microbial communities (Fig. 1), and the sample sizes involved were inadequate6. Nevertheless, traditional assessments of microbial diversity were identified as being analogous to sampling of community members in traditional (macro)ecological analyses7, enabling the adoption of non-parametric estimators of diversity. This provided an empirical target for the sequencing effort that was required to evaluate and compare community diversity, which was well timed for the introduction of Sanger-based 'tag' sequencing8,9,10 and the subsequent advent of high-throughput sequencing (HTS) technologies.

Figure 1: Rank–abundance curves.
figure 1

The graph on the left shows a schematic rank–abundance curve, which indicates the coverage depth ranking of different experimental techniques used in taxonomic surveys of microorganisms: gel fingerprinting (including denaturing gradient gel electrophoresis), clone libraries, metagenomics and high-throughput sequencing (HTS) of PCR amplicons. The approximate threshold for the detection of rare-biosphere organisms is indicated. The graphs on the right show typical rank–abundance curves for high-diversity environments (such as soil) and low-diversity environments (such as faeces), demonstrating the differences in the size of the long 'tail' that corresponds to the rare biosphere. OTU, operational taxonomic unit.

PowerPoint slide

The initial discovery of vast sequence diversity at low relative abundance involved an analysis of marine water column and sediment samples by Sogin et al.11. They generated thousands of short rRNA gene sequences (tags) and showed that even when these reads were clustered into broad categories, grouping sequences with identities that varied by as much as 10%, the number of unique sequences in each sample remained in the thousands. Such astounding genetic diversity, and presumably corresponding phenotypic diversity, is consistent with the evolution of unicellular organisms over billions of years. The absence of suitable methods for sampling high-diversity microbial habitats explained why these organisms had largely eluded detection before this work.

The discovery of a rare biosphere implicated low-abundance taxa as important contributors to assessments of α-diversity (local species diversity within sites or habitats) and β-diversity (the differentiation of species diversity across sites or habitats; see Box 1). However, because little was known about the distributions of and controls on rare-biosphere taxa, the initial recognition of such high diversity in the rare biosphere left unanswered questions about the global distributions of rare microorganisms and whether rare-biosphere taxa distribute according to Baas Becking's dictum: “everything is everywhere, but, the environment selects” (Ref. 12; see also Ref. 13).

Many rare-biosphere taxa have important ecological roles, as described below, serving as nearly limitless reservoirs of genetic and functional diversity14,15. A greater understanding of species that are adapted to life at low relative abundance will improve our understanding of microbial ecology, help to identify and resolve taxonomic 'blind spots' in the tree of life and inform emerging applications, such as gene family discovery and bioprospecting. In this Review, we discuss the ecology of rare microbial populations, the factors influencing their abundance and distribution patterns, and the methods that are used to target and explore this largely unrecognized genomic and phylogenetic diversity.

Defining the rare biosphere

Several HTS methods that were originally developed for SSU rRNA gene sequencing11,16,17,18,19 have revealed an enormous complement of low-abundance microbial taxa (Fig. 1). Defining the rare biosphere in such data has been arbitrary, and the methods used include relative abundance cut-offs (approximately 0.1%20 or 0.01%21), sequence counts in generated data sets (for example, two sequences per sample22) and empirical thresholds23,24. In practice, it is common to remove low-abundance sequences from analyses at a specific numerical or ecological threshold, such as the contribution to community dissimilarity23. Because rare species are substantial contributors to community ecology — especially temporal variability (for example, conditionally rare taxa25) — and to ecological resilience, excluding some subsets of rare taxa from analyses can blunt the interpretation of the data set. Indeed, increased weighting of the rare biosphere in β-diversity analyses (for example, by including the incidence of conditionally rare taxa25 or by using unweighted UniFrac26) has provided novel insights into microbial communities25,27,28.

Regardless of the threshold used for defining the rare biosphere, microbial community abundance distributions based on marker gene surveys (Fig. 1) typically show a long 'tail' of low-relative-abundance operational taxonomic units (OTUs). The length and shape of this tail differ depending on the diversity of the sampled community (see the graphs on the right in Fig. 1) and on the underlying species–abundance distribution29,30,31,32, which is poorly understood for most microbial communities33 (Box 1). The relative contributions of PCR and sequencing artefacts to assessments of rare microbial taxa have also been subject to debate34,35, because such artefacts can manifest, superficially, as rare OTUs, which also affects the length and shape of the long tail. Methods for filtering and analysing bona fide rare-biosphere sequences are required to understand how these artefacts manifest and to model these errors (Box 2).

Distribution, ecology and phylogeny

Microbial abundance is subject to factors that influence growth and immigration versus those that influence death and emigration36,37 (Fig. 2). Previous descriptions of the ecological relevance of rare microbial taxa indicated that rare microorganisms are susceptible to local elimination through environmental or biological processes that select for the survival of abundant organisms (as reviewed by Pedrós-Alió in Refs 20, 38), but these selective pressures are in contrast to the potential role of rare taxa as a 'seed bank' for eventual colonization under optimal conditions11,20,25. Indeed, various biotic and abiotic factors influence the abundance of rare microorganisms over time (Fig. 2). In this section, we explore the factors that contribute to the known distributions and phylogenies of rare taxa.

Figure 2: Hypothetical temporal abundance profiles for rare-biosphere microorganisms.
figure 2

The species that are present in the rare biosphere may have a specific ecology and exert stronger influences on their local environment than their relative abundance would suggest. The schematic shows hypothetical temporal abundance profiles for various different members of the rare biosphere. The r-selected microorganisms that are periodically recruited from the rare biosphere (blue line) can switch between abundant and rare, depending on periodic environmental conditions such as temperature and seasonality. Such periodic resource exploitation is associated with increased susceptibility to predation. The r-selected microorganisms that are occasionally recruited from the rare biosphere (red line) persist with relatively rare abundances, responding to occasional episodic or stochastic cues such as precipitation and stress. Again, such periodic resource exploitation is associated with increased susceptibility to predation. Microbial taxa that show periodic increases in abundance but are permanently rare (green line) are adapted to exist at low relative abundance and respond with K-selected shifts in abundance, persistently avoiding predation and occupying a narrow niche. This lifestyle is associated with increased susceptibility to starvation. Permanently rare taxa (yellow line), which include possible keystone species, show persistent low-abundance distributions, occupying a narrow niche, and can escape from predation, although there is increased susceptibility to starvation. Transiently rare taxa (purple line) are occasionally rare owing to immigration. Their persistence depends on suitable conditions for survival and reproduction. Taxa exhibiting periodic or occasional recruitment from the rare biosphere can be considered to be conditionally rare taxa, as recently proposed25. The grey shading represents seasonal variations.

PowerPoint slide

Variable patterns of abundance, distribution and activity. The abundance of rare microbial taxa can reflect environmental selection, similarly to the situation for dominant taxa, such as dynamics observed during a disturbance from a source of off-shore oil39 and environmental filtering for factors such as season and water depth in the study of bacterioplankton populations40. Rare-biosphere organisms can also exhibit a specific biogeography, which is distinct from that of abundant taxa and contributes to the discrimination of environmental samples. For example, an analysis of fine-scale (1 mm) diversity of sulfidic sediment samples demonstrated that maximal β-diversity was associated with rare OTUs41. Similarly, Gobet et al.42 identified non-random distributions of rare bacterial taxa in coastal sands in response to nutrient stress, although these rare fractions were not predicted to contribute to key biogeochemical processes. Bowen et al.43 demonstrated consistent rare-biosphere composition within multiple replicate salt marsh sediment samples that were highly distinct from a nearby water column sample. Recently, rare marine microeukaryotic taxa were found to display unique biogeographical distributions and were probably active and responsive to environmental change44. Often, deviations of rare taxa from broad environmental filtering are related to a narrow ecological niche, such as that suitable for copiotrophic taxa, which may be adapted to less frequent disturbances40.

In addition to a unique biogeography, rare microbial taxa can be associated with unique abundance distributions and phylogenetic compositions. Galand et al.21 sampled marine bacterial and archaeal communities from the Arctic Ocean, which revealed biogeographical patterns, higher dissimilarity to database sequences and unique species–abundance distributions associated with the rare biosphere, comprising a log-normal distribution for abundant taxa and a log-series distribution for rare taxa. This study reinforced the finding that abundance distributions can vary with ecosystem diversity and α-diversity (Box 1), influencing the size and composition of the rare biosphere. Understanding this relationship, as well as the taxonomic profiles of rare-biosphere organisms, is fundamental for their ecological study. Although the relative abundance of rare microbial taxa will not necessarily correlate with phylogenetic novelty, several recent studies have demonstrated unique phylogenetic characteristics associated with low-abundance members of microbial communities15,21,45,46.

Abundance changes in dominant OTUs can obscure population dynamics of low-abundance taxa, even though differential distributions of rare taxa may be relevant to β-diversity analyses. Discriminating life-history strategies among persistent low-abundance OTUs and organisms undergoing cyclical changes in abundance can help to identify factors that contribute to population dynamics in the rare biosphere (Fig. 2). The study of such ecological patterns requires long-term and repeated HTS of samples, coincident with the collection of appropriate metadata47, and could be undertaken within existing human microbiome and environmental monitoring projects such as the Human Microbiome Project48, the Earth Microbiome Project49 and the International Census of Marine Microbes (ICoMM).

The importance of time-series sampling and large data sets was demonstrated in a recent study of rare-biosphere dynamics. Shade et al.25 analysed 3,237 samples across 42 time-series data sets, identifying conditionally rare taxa. Contributing up to 28% of community membership, these taxa were also largely responsible for the discrimination of communities at different time points. This study also observed that variations in the abundance of rare organisms correlated with specific events, such as disturbance. Rare-biosphere organisms, which are low contributors to total community abundance but important contributors to α-diversity, represent a substantial amount of ecological potential. To best identify the causes of rare-to-prevalent community dynamics, both deep sequencing and deep sampling are recommended for uncovering the controls on rare microbial taxon abundances, reinforcing the utility of long-term sample collections. Furthermore, investigations of conditionally rare taxa coupled with rich sample metadata will help to elucidate the dynamics of rare-biosphere members.

Among collected metadata, abiotic factors — such as experimental scale and stochastic events — can influence rare-biosphere dynamics. Species observed as rare may be abundant within their microenvironment, or microniche, but be rare overall owing to a paucity of suitable microenvironments50. Such an observation is consistent with a study of bacterioplankton40, which suggested that rare community members exploit fine variations in conditions. Rare-species dynamics can be disproportionately influenced by local random events owing to low abundance and low evenness caused by such microniches. This contrast between random and deterministic processes greatly influences the ecological development and potential of local ecosystems, largely affecting recruitment from the rare biosphere (Fig. 2). This influence on ecological resiliency is negatively proportional to the incidence of guilds (groups of organisms with ecological redundancy but differing optimal conditions50).

Rare taxa may also be disproportionately active in some ecosystems, as seen in a study of marine waters proximal to Delaware Bay51. In this study, deviations from ratios of equal rRNA to rDNA (that is, the transcript to the rRNA gene) suggested either increased (>1) or decreased (<1) activity. The majority of detected OTUs displayed an approximate ratio of 1/1. Nonetheless, threefold more rare OTUs contributed to the set of high-activity OTUs, whereas a similar proportion of abundant OTUs contributed to the low-activity set51. A previous study of lake ecosystems also demonstrated this inverse relationship between relative abundance and the ratio of rRNA transcripts to rRNA genes52. Combined with observations of highly active rare-biosphere members from stream biofilm communities53 and from benthic biofilms of glacier-fed streams54, it seems as though microorganisms represented by long tails in rank–abundance plots (Fig. 1) are more active and dynamic (for example, see Fig. 2) than expected based on their abundance in single time-point DNA-based analyses.

Predation and prey avoidance. Predation and viral lysis are factors that benefit low-abundance organisms when these negative interactions are indiscriminate (Fig. 2). Viral lysis depends on encounter probabilities38,55, which are low for rare species. One suggested advantage to subsisting at low relative abundance has therefore been escape from predation, which is reflected by the killing-the-winner hypothesis56. This hypothesis also provides at least a partial explanation for the 'paradox of the plankton', which refers to the fact that plankton diversity is higher than expected given the comparatively uncomplicated resource limitations observed in aquatic environments57. Similarly, selective predation by bacterivores is a function of size, activity and abundance58,59. In such cases, organisms that are specifically adapted to persistent low abundance may have distinct strategies for predator avoidance, in contrast to transient low-abundance organisms. Predation is one factor that sustains high diversity in communities that would otherwise be shaped by competitive exclusion.

Host-specific predation and parasitism also contribute to increased bacterial species diversity. The phage–bacteria relationship is often highly specific. Analogous to the Janzen–Connell hypothesis60,61, which proposes predation as a mechanism for promoting coexistence and maintenance of high plant diversity in tropical ecosystems, phages may alter competition among bacterial species to increase coexistence and overall diversity62. However, the dynamics of this interaction are not well studied, and evidence suggests that environmental conditions are likely to play a substantial part44. Parallel studies of bacteria and the associated phage or phages will contribute to the understanding of microbial diversity and dynamics of the rare biosphere, which is important given the relationship between bacterial diversity and ecosystem function63.

Dormancy, the microbial seed bank and stored ecological potential. The existence of stable resting stages could have a substantial impact on measures of the rare biosphere52. The proportion of viable microorganisms with greatly reduced metabolic activity is variable, ranging from 20% in host-associated environments, such as the human gut or the rumen, to potentially the majority of microorganisms in environments such as soil64. The recovery of viable thermophilic organisms from Arctic tundra65 and Arctic marine sediment66 showed that there is a widespread oceanic distribution of bacterial endospores. Specifically, Hubert et al.66 sampled Arctic sediments and identified active fermentation and sulfate-reducing activity in sediment slurries incubated between 41 °C and 62 °C, which reflected germination and activity of endospores of the Firmicutes, including members of the sulfate-reducing genus Desulfotomaculum. The authors estimated that, based on known cell-specific sulfate-reduction rates and depth-specific activity profiles, the accumulation of hyperthermophilic spores settling at the Svalbard sediment surface from oceanic dispersal was 2 × 108 m−2 year−1, which is equivalent to approximately 55 spores cm−2 day−1. The authors of another study67 estimated similar rates for Aarhus Bay sediment and suggested that sulfate-reducing thermophiles have a viability half-life of 300 years. This indicates that these particular thermophilic members of the rare biosphere contribute to a taphonomic gradient of preservation and gradual decomposition (Box 2). These inactive rare-biosphere cells can be used as tracers of microbial dispersion by oceanic currents and show non-ubiquitous distributions resulting from passive dispersal68. Notably, in contrast to organisms in the Archaea or Bacteria domains, dormancy does not play as important a part in microbial eukaryotes, especially in planktonic species44.

Dormancy, when not part of a taphonomic gradient, is also an important factor in microbial seed bank dynamics69,70,71. Phases of low abundance or dormancy are often cyclical or occasional, persisting until populations are presented with suitable ecological conditions (Fig. 2). Rare-biosphere organisms can therefore respond to environmental change by acting as an active microbial seed bank with a pool of ecological potential. As part of the microbial seed bank, low-abundance organisms can contribute to myriad ecosystem dynamics and become dominant under favourable conditions25,72,73, sometimes undergoing rapid changes in abundance, such as those occuring in harmful algal blooms or during human disease.

One important implication for aquatic environments is the apparent cosmopolitan nature of phytoplankton that contribute to harmful algal blooms in response to nutrient supply. Phytoplankton populations can persist for long periods of time and initiate r-selected growth responses, or 'bloom' proliferation, under favourable conditions74. A similar assessment of rapid abundance changes, which are characteristic of algal blooms in response to specific environmental triggers, will be valuable in predicting recruitment from a rare-biosphere seed bank and will guide environmental management.

One of the most deeply sequenced global sites is the L4 marine station at the Western Channel Observatory, which has been subjected to both time-series (2003–2008) and single-site deep sequencing69,71,75. More than 95% of OTUs observed over 6 years of sampling were recovered at a single time point, when that single sample was sequenced to a depth of more than 10 million sequences71. The recent discovery of thaumarchaeotal vitamin B12 production in aquatic environments also demonstrated winter archaeal abundance at station L4, in contrast to the low relative abundance of related genes at other sampled time points181. Together, these results provided evidence that the microbial rare biosphere can serve as a recruitment pool for periodic population size increases. Furthermore, comparing a deep-sequenced L4 library to 356 diverse sample data sets from the globally distributed ICoMM data revealed substantial (44% of all taxa) cross-sample OTU overlap75, which is consistent with the idea that “everything is everywhere, but, the environment selects” (Ref. 12; see also Ref. 13). It is unclear whether this broad OTU recovery at a single site by extensive sequencing would be observed at other locations or be restricted to certain marine environments.

A persistent functional microbial seed bank was identified in studies of marine sponges of the phylum Porifera, wherein symbionts of the candidate phylum Poribacteria, among others, are abundant in host sponge tissues. HTS approaches identified Poribacteria and other putative sponge symbionts at low abundance in surrounding seawater76, implicating the rare biosphere in environmental transmission of these sponge microorganisms. In a subsequent survey of >12 million 16S rRNA gene sequences associated with ICoMM, almost half of 'sponge-specific' bacteria were detected in seawater and additional non-sponge environmental samples77. Importantly, the analysis also demonstrated the presence of sponge-specific bacteria in RNA-based sequence data, suggesting that some of these microorganisms survive and are viable outside their sponge host. Regardless, the presence of at least some bacterial species outside the host indicates possible host acquisition of the symbiont through colonization by rare-biosphere taxa rather than vertical transmission. Similarly, skin from amphibians (Rana catesbeiana and Notophthalmus viridescens) may recruit rare microorganisms from the surrounding environment in a host-specific manner78. Many skin organisms can be mutualistic in amphibians, suggesting a notable role for rare-biosphere organisms in ecosystem health.

Recruitment of rare taxa from the microbial seed bank has also been observed in freshwater environments. A microbial survey of Toolik Lake (Alaska, USA) identified a hydrological connection between upslope terrestrial communities and the lake, via headwater streams, as a source of microorganisms for surface-water communities79. In fact, four out of ten dominant taxa in Toolik Lake were present as rare-biosphere members in upslope soil samples (analogous to the red line in Fig. 2). The observation of abundant lake taxa that are potentially seeded from the surrounding catchment suggests that these taxa have at least transient roles in lake ecology and establishes a clear taxonomic interaction between terrestrial and freshwater ecosystems. As the survey's authors suggest, such terrestrial–aquatic links are perhaps not surprising given that soils contain aquatic microniches.

Despite evidence for the recruitment of rare-biosphere microorganisms from the microbial seed bank, other microorganisms may be persistent rare-biosphere members (for example, see Fig. 2). For example, a survey of Western Arctic Ocean bacteria revealed persistent interseasonal rarity of marine bacterial OTUs80. This persistent rarity suggests a K-selected growth strategy for organisms with low fecundity that occupy a narrow niche but contribute substantially to α-diversity. More recently, Hugoni et al.81 monitored archaeal communities in surface coastal waters and showed that several taxa exhibited a cyclical abundance with occasional dominance, similar to temporal distributions observed at station L4 (Refs 69, 70, 71). Other rare-biosphere archaeal taxa may have been inactive because these OTUs were less abundant, were distinct from database sequences and displayed low overall representation in corresponding RNA-based assessments. A third group of taxa were persistent rare-biosphere members that showed high activity based on RNA transcript abundances. The third group may be considered active rare-biosphere archaea, which are arguably adapted to low relative abundance subsistence. The taxonomic affiliations and seasonal dynamics of these persistently rare archaea were consistent with more abundant organisms81, suggesting that their life-history strategy is different from cyclical seed bank dynamics. The observation that rare microorganisms are phylogenetically similar to those that are abundant is intriguing.

Functional cache and ecosystem stability. Disturbance events are powerful structural influences on microbial ecosystems and community stability82. The resilience of an ecosystem — that is, the ability of a community to stabilize and regain its pre-disturbance properties — is largely influenced by species diversity. Greater biodiversity increases the likelihood that functions will be maintained by ensuring functional redundancy (that is, the insurance hypothesis83, which states that biodiversity maintains or enhances ecosystem function during environmental change). The more genetically diverse the community, the more likely it is to contain functions that counteract the disturbance event. This directly implicates low-abundance organisms in the maintenance of ecosystem health as they can act as a diverse source pool that responds to such disturbance events.

Community abundance profiles change over time as a response to disturbance25,82. A controlled water column mixing experiment, investigating oxygen and nutrient exchange in stratified water, demonstrated ecosystem resilience when communities experiencing changes in nutrient and oxygen levels recovered to control conditions within 10 days84. Similarly, in an ecosystem-level manipulation with artificial mixing of a temperate lake during peak summer stratification, microbial communities recovered in both the epilimnion and hypolimnion82. Importantly, the abundance of some rare groups (for example, Gammaproteobacteria in the hypolimnion) increased substantially in response to the disturbance event. In the same lake-level manipulation, conditionally rare taxa responded to a natural baseline disturbance (for example, seasonal mixing), a forced disturbance or both, further establishing that rare-biosphere organisms have a role in ecosystem resilience25. Some rare-biosphere members were able to exploit the forced manipulation only — that is, conditions that would not naturally occur in the lake (such as increased anoxic conditions in the hypolimnion).

The rare biosphere establishes a functional cache or resource pool for responding to disturbance events, thereby providing a mechanism for a diverse assemblage of organisms to persist in a community, independent of acquisition probabilities (that is, the reintroduction of organisms to the community). This reduces the lag in ecosystem recovery (resilience). The rare biosphere can harbour a persistent functional pool of ecological potential, which, through recruitment, may be broadly useful in promoting ecological stability. Indeed, dormancy and related seed banks are strategies for responding to temporally variable environments, contributing to community resilience52.

Keystone species. Microbial communities are core contributors to ecosystem function, with important roles in biogeochemical processes, including nutrient turnover and fixation85,86,87. Keystone species have a disproportionately large effect on their environment relative to their abundance, often serving as gatekeepers to ecosystem functions, as is the case for specialist degraders88. Inferring ecological roles from microbial community surveys poses a substantial challenge; however, co-occurrence networks show promise in identifying keystone species in HTS data89,90. The effects of keystone species can also be rooted in co-evolution through reductive genome evolution in free-living organisms, in which some members of the community maintain otherwise lost functions, supplementing the community as a whole, resulting in dependent and helper community members (described in the Black Queen hypothesis)91.

Species abundance is not always the best predictor of community contributions; rare populations can serve as keystone microbial species with disproportionately large effects on ecosystem services92. Many common or occasional members of the rare biosphere contribute to gatekeeper functions, including autotrophic organisms, such as nitrifiers; microorganisms that may have relatively narrow substrate preferences, such as methanotrophs and methylotrophs; and those that degrade low concentrations of specific chemical substrates93. Importantly, rare-biosphere members may be undetected when using traditional low-throughput approaches for linking rare-biosphere taxa with aspects of biogeochemistry94 such as culturing and molecular fingerprinting (Fig. 1).

The biogeochemical niche of keystone organisms, which is defined by specific biotic and abiotic characteristics, can be extremely rare owing to microenvironmental heterogeneity, especially in soils. For example, sulfate-reducing Desulfosporosinus sp. represented only a small proportion (0.006%) of the community in a peatland soil in Bavaria95. Nevertheless, high cell-specific reduction rates and carbon assimilation, implicated through DNA stable-isotope probing (DNA-SIP), suggested that this rare bacterium is responsible for a substantial proportion of the observed sulfate reduction and carbon flow95. These observations were consistent with previous DNA-SIP research demonstrating that presumed low-abundance Methylophaga spp. and novel uncultured Gammaproteobacteria were responsible for methanol assimilation in marine surface-water samples96.

Rare-biosphere members may also act as keystone species, with important contributions to biogeochemical cycling, but contribute substantial biomass to an environmental sample. Specifically, giant sulfur bacteria of the Beggiatoa97, Thioploca98 and Thiomargarita99 genera represent a substantial proportion of sediment biomass and sulfur cycling because of their large-diameter (0.1–0.7 mm) cells, but they have a relatively low numerical abundance100.

Microbial keystone species also have interactions that are not related to nutrient dynamics. In one experiment, the reduction of rare species in soil increased plant biomass and plant quality, as measured by nutrient concentration101. This suggests that some rare species may have a detrimental effect on plant production. However, the effect may not be exclusively negative, as there was some evidence that when all of the microorganisms were present, the levels of plant defensive compounds were higher and the nutrient concentration was lower, making the plant less suitable for herbivore predation and less prone to disease. Additionally, when rare microorganisms were present, aphid body size was decreased101.

Keystone species can also influence β-diversity disproportionately. Researchers observed that two moderately abundant species, Bacteroides fragilis and Bacteroides stercoris, had a disproportionate influence on the structure of the human gut microbiome102. They hypothesized that the abundance of keystone species contributed substantially to individual human gut microbiome uniqueness, which is consistent with contributions of conditionally rare taxa to community dissimilarity25.

The ecological potential present in species that are persistent in the rare biosphere, especially those that fill keystone roles, enables a flexible and responsive taxonomic profile that minimizes lag in ecosystems: gatekeeper species can be recruited from the rare biosphere rather than the ecosystem relying on immigration. It is easy to envision how this behaviour might be selected at the community level. Theoretically, rare microbial organisms have multiple potential strategies — some of which are unique — for responding to their environments (Fig. 2). Studying these strategies and related ecological factors will improve our understanding of microbial ecology.

Targeted exploration of the rare biosphere

Capturing microbial diversity through sequencing and culturing necessarily abstracts the underlying community — both approaches are subject to bias and artefacts. Because rare microbial species are difficult to distinguish from the background noise of sequencing error and false discovery103, what techniques can we use to survey the long tail of diversity?

Many different techniques can be used to filter signal from noise in marker gene surveys (Box 2), including interpreting error profiles in amplicon libraries, removing incorrect sequences such as chimaeras, and understanding the relationship between sequence diversity and OTUs. Much like the search for truffles104, indiscriminate surveys offer little benefit and the use of tools for directed targeting increases the probability of finding a specimen. In this section, we discuss approaches for identifying and studying the 'truffles' of microbial ecology — namely, rare-biosphere species that represent unexplored branches of microbial life ('dark matter')105, taxonomic blind spots (specific taxonomic novelty)45,46 and the potential fourth domain of life106.

Cultivation. Although HTS has supplanted clone libraries and Sanger sequencing for microbial community surveys, cultivation remains highly relevant for recovering rare microbial taxa. In contrast to direct sequencing, recovering taxa by culturing is not solely dictated by community abundance; traditional culture techniques can be useful for enriching and recovering low-abundance microorganisms. Recently, combinations of culture-dependent and culture-independent analyses uncovered members of the rare biosphere from a deep-sea coral reef107 and an apple orchard soil108. All of the isolates obtained from the reef sample were rare-biosphere members, based on the corresponding HTS data. From the apple orchard, only 40% of the 1,054 culture-based OTUs were recovered by pyrosequencing and most culture-based OTUs corresponded to rare members of the pyrosequencing library. Note that culturing studies do not necessarily disproportionately recover rare-biosphere members; cultivation also recovers abundant organisms109,110. Generally, the recovery of rare species by cultivation is a function of numbers. If it is correct that most bacterial species cannot be readily cultured and the rare fraction has many more microbial taxa, then cultures would be more likely to be derived from the rare fraction. As a result, cultivating novel rare-biosphere microorganisms can be particularly valuable for investigating highly diverse environments, such as soils111,112.

Although culturing provides access to rare microbial species, the specific benefit of a culture-based approach is that rare species obtained in culture enable specific physiological experiments113,114. Similar to the value of cultivating and characterizing Chthonomonas calidirosea of the phylum Armatimonadetes (formerly known as OP10)115, much can be learned about the physiology and biochemistry of rare-biosphere taxa by pure culture studies. Recovering representatives of prominent bacterial divisions and clades without cultivated representatives will benefit from focusing culturing efforts on environments harbouring those taxa; this is analogous to recent targeted single-cell sorting efforts that expanded the taxonomic and genomic breadth of the bacterial tree from selected aquatic habitat samples105. These approaches to studying rare species are convergent on the practice of microbial screening for the recovery of medical compounds.

Taxonomic blind spots: phylogenetic novelty and taxonomic dark matter. Rare and novel microorganisms provide challenging targets but are an exciting source of ecological, taxonomic and genomic discoveries. In addition to expanding the known breadth of microbial diversity, directed investigations into phylogenetic novelty will affect gene discovery and bioprospecting. A large fraction of HTS marker gene libraries is unclassified16, and most OTUs belong to the rare-biosphere fraction. Targeting taxonomic novelty in HTS data sets has uncovered new bacterial lineages45,46 and characterized microbial dark matter105. The Eukarya rare biosphere also contains substantial phylogenetic novelty. Logares et al.44 identified several distantly related clades that seemed to be exclusively or transiently rare. Similarly, targeting divergent sequences in existing data helped to identify a novel, uncultured, plastid-bearing eukaryotic lineage, the rappemonads28. These novel clades, potentially even phyla, are drawn from both poorly characterized and well-studied lineages in the tree of life. The high sequence divergence observed in some of these clades implies considerable genomic novelty, indicating that we are only scratching the surface of existing genomic diversity105. Furthermore, novel deep branches in phylogenetic analysis of metagenomic and marker gene surveys inform our understanding of early evolution and investigations into the potential fourth domain of life106. Only a small fraction of existing HTS libraries have been investigated for phylogenetic novelty and this is therefore a fruitful avenue for further taxonomic and phylogenetic research.

HTS marker gene surveys require PCR amplification, and the primers used are biased towards common or well-studied taxa. This results in taxonomic blind spots, where groups are either not amplified or amplified to artefactually low levels16. Even though analysing the rare biosphere in its entirety poses substantial challenges for discriminating signal from noise, individual low-abundance clades and phylogenetic novelty can be investigated directly. Many sequences recovered in HTS marker gene studies cannot be attributed to known taxa. Maintaining these sequences as 'unclassified' can blunt ecological inferences. Some novel and unclassified short-sequence fragments do represent legitimate microbial species that can, in some cases, significantly diverge from known lineages. Such taxa may have influential roles in an ecosystem, functioning as keystone species, or represent uncharacterized evolutionary novelty as part of biological dark matter105, despite not being major contributors to α-diversity and β-diversity.

Targeted analysis of such unclassified members of HTS surveys, which are often members of the rare biosphere, is challenging because primers must be sufficiently lineage-specific. The SSU rRNA regions targeted in HTS marker gene studies are highly variable and can contain short synapomorphic nucleotide segments, especially in unclassified groups. These regions can be exploited when targeting rare or novel lineages, which would expand on early work using data sets from serial analysis of ribosomal sequence tags116. In a pilot study using an Arctic tundra soil sample (from Alert, Nunavut), appreciable phylogenetic novelty was recovered using primers that exploit this specific mapping46. Phylogenetic targeting of this tundra V3 SSU rRNA gene library recovered near full-length sequences corresponding to seven novel taxonomic lineages. Two lineages were especially notable: a previously unknown clade within the brown rice cluster 1 (BRC1) candidate phylum117, and two clades that were both sisters to the very early branching cyanobacterium Gloeobacter, a genus containing multiple plesiomorphic traits. One observed novel cyanobacterial clade diverged by 10% from Gloeobacter spp. across 70% of the 16S rRNA gene and was even more distantly related to remaining Cyanobacteria, suggesting that Gloeobacteraceae are much more diverse than previously recognized. Interestingly, the sequence was nearly identical to only one other sequence in public databases, a cyanobacterium observed in Antarctica118, implying a bipolar distribution or anthropogenic dispersal. Combined with the recent Candidatus Gloeomargarita lithophora119 and non-photosynthetic candidate phylum Melainabacteria120, which is a sibling to Cyanobacteria, a more complicated evolutionary picture of the Cyanobacteria is emerging through targeted rare-biosphere analyses.

PCR introduces amplification biases121, which obscure the representation of microorganisms in the environment122,123. One way to circumvent PCR biases is by filtering marker genes from randomly sequenced environmental DNA that is generated in metagenomic studies124,125. The increasing capacity of HTS also facilitates genome assembly from metagenomic data. This is relatively trivial for common organisms with highly developed, pre-existing scaffolds, although noise would still exist in the data depending on experimental procedures (for example, there can be chimerism among closely related organisms). Efficient algorithms for genome assembly from metagenomes without closely related reference genomes will enable more thorough study of uncultivated and rare organisms. One such approach, differential coverage binning126, used multiple metagenomes generated from a single site to assemble high-quality genomes. This study assembled 31 bacterial genomes from a sequencing batch reactor, 12 of which were essentially complete, containing ≥99% of essential genes. Notably, four of the complete genomes were from the candidate phylum TM7, representing low relative abundances of 0.06–1.58% of the original metagenomic reads. As sequencing depth continues to increase, this technique may be increasingly relevant for assembling genomes from truly rare organisms.

Although it is challenging to remove artefacts from legitimate sequence data, it is clear that the rare biosphere contains sequences that represent genuine taxonomic blind spots. Mining of existing sequence data sets for such novel OTUs may reveal a wealth of taxonomic and phylogenetic information. Not only can such data-mining efforts provide useful insights into ecological phenomena, but expanding the taxonomic breadth of the tree of life will also improve sequence alignment and taxonomic assignment algorithms, thus providing increased accuracy in future studies.

Future of the rare biosphere

Studying the microbial ecology of the rare biosphere requires clear and consistent models of baseline dynamics across multiple ecosystems and time points. Correlating specific taxonomic characteristics of the rare biosphere with comprehensive sample metadata will go a long way towards identifying keystone species and understanding microbial seed bank dynamics, dormancy and biogeography. Large-scale studies involving deep sequencing75 and time-series designs72, combined with targeting novelty45,46,105, provide a useful framework for studying low-abundance organisms.

Although Baas Becking's dictum that “everything is everywhere, but, the environment selects” (Ref. 12) is not confirmed by existing data, and cannot be falsified, HTS studies have demonstrated that rare-biosphere taxa are both broadly distributed and under environmental selection. There is broad but not ubiquitous distribution and evidence of extensive worldwide passive dispersal of thermophilic endospores in marine sediments68. Additionally, this dictum suggests that broadly distributed and highly selective environments would be expected to have more similar microbial communities than adjacent sites. Indeed, Caron and Countway50 observed that deep-water (>150 m) communities of microbial eukaryotes in the North Pacific and North Atlantic were more similar across basins than to adjacent euphotic zone samples. A rare species of Gloeobacterales observed at only Arctic (Alert, NU) and Antarctic sites46 provides another example from recent research. In light of these contrasting findings from HTS surveys, perhaps it would be fitting to modify Becking's dictum in moving forward: everything can distribute everywhere, especially as rare microbial taxa, but the environment selects.

Our understanding of the metabolic and genomic potential of rare and uncharacterized organisms is limited by technical challenges: most notably, the extent of environmental metagenomes127 and the estimated sequencing coverage required to access rare organisms in some habitats128,129. Such large data sets require substantial investment in computational infrastructure and analysis techniques. Supplementing HTS technologies with existing and developing methods — such as cell sorting130,131, genome recovery132, multiple displacement amplification133, stable-isotope probing134, functional metagenomics135, targeted amplification of rare taxa136, cultivation111, co-occurrence networks137 and indicator species analysis138 — can provide a more comprehensive understanding of the roles and distributions of rare-biosphere organisms within microbial communities. Such investigations hold promise for future groundbreaking discoveries akin to pursuits of the fourth domain of life106 or to the sequencing of the large number of unknown lineages representing microbial dark matter105, in which rare-biosphere organisms have a considerable role.

Recent developments in genomics have demonstrated the feasibility of sequencing genomes from single cells139 as well as assembling rare genomes from metagenomes126. Focusing such efforts on understudied areas of the bacterial tree has increased our understanding of microbial genomic diversity105 and improved anchoring of metagenomic reads. Additionally, targeted genomics enables the comparison of persistently rare, conditionally rare and abundant taxa. Are there genomic features that delineate each group — for example, dormancy, rapid growth or predator avoidance in persistently rare species? Phylogenetic novelty correlates with the rare biosphere; therefore, our understanding of the taxonomy, genomics and ecology of rare organisms will continue to progress. We are still only scratching the surface of the rare biosphere, which is very likely to contain much more diversity than has been accessed so far. Furthermore, cultivation-based approaches have led to the recovery of many taxa140 and these approaches will continue to serve as useful tools for the recovery of organisms, genes and products for human benefit from many different environments. Indeed, the long tail of microbial diversity is more active and dynamic than previously recognized and harbours a substantial amount of ecological, taxonomic and genomic potential.