Substrate-restricted methanogenesis and limited volatile organic compound degradation in highly diverse and heterogeneous municipal landfill microbial communities

Sauk, Alexandra H.; Hug, Laura A.

doi:10.1038/s43705-022-00141-4

Download PDF

Article
Open access
Published: 13 July 2022

Substrate-restricted methanogenesis and limited volatile organic compound degradation in highly diverse and heterogeneous municipal landfill microbial communities

ISME Communications volume 2, Article number: 58 (2022) Cite this article

2223 Accesses
7 Citations
3 Altmetric
Metrics details

Subjects

Abstract

Microbial communities in landfills transform waste and generate methane in an environment unique from other built and natural environments. Landfill microbial diversity has predominantly been observed at the phylum level, without examining the extent of shared organismal diversity across space or time. We used 16S rRNA gene amplicon and shotgun metagenomic sequencing to examine the taxonomic and functional diversity of the microbial communities inhabiting a Southern Ontario landfill. The microbial capacity for volatile organic compound degradation in leachate and groundwater samples was correlated with geochemical conditions. Across the landfill, 25 bacterial and archaeal phyla were present at >1% relative abundance within at least one landfill sample, with Patescibacteria, Bacteroidota, Firmicutes, and Proteobacteria dominating. Methanogens were neither numerous nor particularly abundant, and were predominantly constrained to either acetoclastic or methylotrophic methanogenesis. The landfill microbial community was highly heterogeneous, with 90.7% of organisms present at only one or two sites within this interconnected system. Based on diversity measures, the landfill is a microbial system undergoing a constant state of disturbance and change, driving the extreme heterogeneity observed. Significant differences in geochemistry occurred across the leachate and groundwater wells sampled, with calcium, iron, magnesium, boron, meta and para xylenes, ortho xylenes, and ethylbenzene concentrations contributing most strongly to observed site differences. Predicted microbial degradation capacities indicated a heterogeneous community response to contaminants, including identification of novel proteins implicated in anaerobic degradation of key volatile organic compounds.

Microbial methane cycling in a landfill on a decadal time scale

Article Open access 16 November 2023

Patterns of protist diversity associated with raw sewage in New York City

Article Open access 09 July 2019

Composition, Diversity and Functional Analysis of the Modern Microbiome of the Middle Triassic Cava Superiore Beds (Monte San Giorgio, Switzerland)

Article Open access 31 December 2019

Introduction

The environmental impact and monetary cost of municipal solid waste (MSW) storage and management are growing concerns for municipalities and countries around the world. MSW generation has increased exponentially with rising populations, increased development, and urbanization [1, 2]. By 2025, the global annual production of waste will reach an estimated 2.2 billion tons, and is not predicted to hit a maximum in this century, reaching 4 billion tons annually in 2100 [1, 3]. This is an unsustainable rate of increase.

Landfills are the most common end point for MSW in many countries, including Canada, the United States, and China. Landfills are the third largest contributor to anthropogenic methane emissions, contributing 11% of annual global methane emissions and making them a focus area for mitigating climate change [3,4,5]. Waste degradation in landfills is controlled by the microbial communities within the landfill and the built characteristics of the landfill, such as leachate collection systems and cover soils [4, 6]. The first three steps of the general waste decomposition process are reliant on bacteria: hydrolysis; acidogenesis, including both fermentation and beta oxidation; and acetogenesis [6]. The last step, methanogenesis, is dependent on methanogenic archaea [6]. Landfill deposits are diverse, both chemically and physically, which can inhibit or prevent these microbial degradation processes [7]. Volatile organic compounds (VOCs) like chlorinated ethenes and hydrocarbons commonly contaminate landfill waste, due to improper dumping or as legacy waste deposited prior to regulations on disposal. Microbially mediated volatile organic compound (VOC) degradation in landfills impacts landfill emissions as well as contaminant fate when VOCs are leaked into the surrounding groundwater and terrestrial environment. Understanding the ecology and diversity of the bacterial and archaeal community structure in landfills will strengthen our understanding of the MSW decomposition process, allowing for better control of methane production and more efficient waste management and contaminant mitigation strategies.

Despite recent interest in landfill microbial diversity [7,8,9,10,11,12], much is still unknown about landfill-associated microbial communities and their distributed functions. Most early research focused on specific aspects of waste degradation in landfills and the microbes responsible, with particular interest in methane cycling [13, 14] and cellulose degradation [15, 16] (See [17] and references within for a review of direct landfill surveys and bioreactor-based examinations). With the advent of high-throughput sequencing techniques like 16S rRNA gene amplicon sequencing, overall landfill microbial diversity and community composition have also been examined [7, 11]. The most abundant phyla consistently identified from landfills include the Firmicutes, Bacteroidota, Campylobacterota (formerly Epsilonproteobacteria [18]) and Proteobacteria [6, 7, 10, 11]. Methanogenic archaea in landfills are typically predominantly hydrogenotrophs, with Methanobacteriales and Methanomicrobiales frequently at high abundance earlier in the landfill lifecycle [19]. Methanogens with the capacity for acetoclastic, hydrogenotrophic, or multiple methanogenic pathways are also common in landfills [19,20,21]. A number of rare and/or unclassified microorganisms have also been found in recent landfill studies, with some at high abundance [6, 7]. These studies allow for functions to be inferred for microorganisms with well-characterized relatives, but the 16S rRNA gene cannot be used to infer functions for unclassified microorganisms that are uncultivated or newly described [6, 11]. There is a recognized need for metagenome-level community profiling from landfills [20].

Several environmental or geochemical factors that influence microbial community composition and heterogeneity have been identified. The age of landfilled waste has been correlated with microbial community composition characteristics [6, 7, 11, 22, 23]. Community composition was also correlated with moisture [10, 11] and ammonium concentration [6, 10]. Other chemicals that showed a link to microbial community composition included barium, chloride, sulfate, and copper [7, 10]. Other chemical factors seem to affect microbial communities in a site-specific manner, and their effects will depend on the types of waste deposited and other geochemical conditions at each site of interest [7, 22, 24, 25].

The study site for this research was a municipal waste landfill in southern Ontario, Canada that opened in 1972. This landfill is a conventional sanitary landfill with onsite waste sorting, compacting, and daily soil covers. The landfill is well-instrumented, with over 100 leachate wells (LW) across the site as well as three composite leachate cisterns (CLC). The leachate wells are routinely sampled by regional waste management staff to determine the chemical composition of the leachate. There are also groundwater wells (GW) bordering the landfill for monitoring the conditions of the adjacent aquifer and any leachate leaks (e.g., there is on-going leachate infiltration from the area near LW3 into the aquifer near GW1, Fig. 1). In order to understand waste degradation processes, methane emission profiles, and the transformation and movement of contaminants within the site, it is important to understand microbial community heterogeneity as well as biodegradation capacity for contaminants of concern across the landfill. Here, we combine metagenomic and 16S rRNA gene amplicon sequencing techniques to characterize the distribution, heterogeneity, and diversity of the microbial communities in a Southern Ontario municipal landfill. We additionally investigate how the predicted microbial degradative capacities connect with geochemical conditions across the site.

**Fig. 1: Map of landfill sampling locations at the Southern Ontario landfill.**

Materials and methods

Sample collection

In the initial sampling event on July 14, 2016, a sample was collected from the composite leachate cistern by filtering the leachate through a 0.2 μm poly-ethersulfone filter followed by a 0.1 μm poly-ethersulfone filter in series (CLC_T1_0.2 and CLC_T1_0.1, respectively). Both filters were kept for DNA extractions. On July 20, 2016, a larger-scale sampling was conducted, sampling the composite leachate cistern (CLC_T2), three leachate wells (LW1, LW2, LW3), and two groundwater wells (GW1, GW2). Leachate and groundwater samples were collected by pumping liquid through a filter apparatus with a 3 μm glass fiber pre-filter in series with a 0.1 μm poly-ethersulfone filter until filters clogged. The pre-filter was discarded while the 0.1 μm filters with microbial biomass were kept. All filters were frozen on dry ice in the field and transferred to a −80 °C freezer until processed. DNA was extracted from the biomass using the Powersoil DNA extraction kit (MoBio) following the manufacturer’s instructions with one modification: filters were sliced into pieces and added to the bead tube in place of a soil sample.

Relevant measurements for volatile and non-volatile compound concentrations at the leachate and groundwater wells are conducted each year in October and April by a contracted consulting company. For 2016, the average values for these two sampling points were used to estimate compound concentrations in July, the time of microbial biomass sampling (Supplementary Table 1). The impacted groundwater well, GW1, did not have current non-volatile concentration measurements available. For this well, measurements from 2011 were included for comparison purposes only (Supplementary Table 1). No geochemical measurements were available for the composite leachate cistern.

Sequencing

All eight samples were sent to the US Department of Energy’s Joint Genome Institute (JGI) for 16S rRNA gene amplicon sequencing: LW1, LW2, LW3, CLC_T1 0.1 μm and 0.2 μm filters, CLC_T2, GW1, and GW2. The JGI amplified the V4 region of the 16S rRNA gene using the forward primer 515 F (Parada) (5’-GTGYCAGCMGCCGCGGTAA-3’) and the reverse primer 806 R (Apprill) (5’- GGACTACNVGGGTWTCTAA -3’) using in-house protocols (as described here, but with the above listed primers: https://jgi.doe.gov/wp-content/uploads/2016/06/DOE-JGI-iTagger-methods.pdf). Amplicons were sequenced on the MiSeq platform (Illumina) with extraction negative controls, amplification negative controls, and positive controls, and reads were quality control checked using the iTagger pipeline [26].

Six DNA samples were sent to the JGI for metagenomic sequencing, assembly, and annotation: LW1, LW2, LW3, CLC_T1 (0.2 μm filter), CLC_T2, and GW1. The CLC_T1 and GW2 0.1 μm filters resulted in insufficient DNA and were not shot-gun sequenced. Metagenomes were sequenced as paired-end 150 bp reads using the HiSeq platform (Illumina) and annotated using the DOE-JGI Metagenome Annotation Pipeline (MAP v.4) [27].

Phylogenetic trees

All assembled and annotated 16S rRNA genes in the landfill metagenomes were downloaded from the JGI Integrated Microbial Genomics (IMG) server (IMG Genome IDs: CLC_T1: 3300014203, CLC_T2: 3300014206, LW1: 3300014204, LW2: 3300015214, LW3: 3300014205, GW1: 3300014208). Genes were sorted by length in Geneious 11.0.5 (https://www.geneious.com) and curated to a minimum length of 600 bp. The landfill metagenome-derived 16S rRNA genes as well as a reference set of 16S rRNA genes from known organisms were aligned with the SILVA SINA algorithm [28]. Unaligned bases at the ends of the genes were removed and sequences below 70% identity to a reference sequence were automatically removed from the dataset by SINA. To curate the SINA alignment, columns containing 97% or more gaps were removed, a region of poor alignment was manually trimmed from the 3’ end, and sequences falling below 600 bp post-trimming were removed. A phylogenetic tree was inferred using FastTree in Geneious to check for poorly aligned or divergent sequences. In this processing, 195 sequences were removed that did not meet quality standards. The final 16S rRNA gene alignment included 1903 reference sequences and 2306 sequences from the metagenome samples, and had 1521 positions. A high-quality phylogenetic tree was inferred from the curated final alignment using RAxML-HPC2 8.2.12 [29] on CIPRES [30] under model GTRCAT, with 100 alternative bootstrap iterations run from 100 starting trees. The full tree topology is presented in Supplementary File 1.

All amino acid sequences for 16 syntenic, universally-present, single copy ribosomal protein genes (RpL2, L3, L4, L5, L6, L14, L15, L16, L18, L22, L24 and RpS3, S8, S10, S17, S19) for the landfill metagenomes were downloaded from the JGI IMG server using annotation keyword-based identification [31]. Ribosomal protein datasets were screened for the Archaeal/Eukaryotic type, which were removed, as were short (<45 aa) sequences. Each individual protein set was aligned with a reference set of genes [32] using MAFFT 7.402 [33] on CIPRES. Alignment columns containing ≥95% gaps were removed using Geneious. IMG-derived sequence names were trimmed to 8 digits after the metagenome code (e.g., Ga0172377_100004578 → Ga0172377_10000457) to remove gene-specific identifiers and allow for concatenation by scaffold name. The protein gene alignments were concatenated in numeric order (L2 → L24, followed by S3 → S19). Concatenated sequences that contained less than 50% of the total expected number of aligned amino acids were removed. The final alignment was 3452 columns long and contained 2914 reference organisms and 1265 scaffolds from the metagenome samples. A phylogenetic tree was inferred using RAxML-HPC Blackbox on CIPRES using the following parameters: sequence type - protein; protein substitution matrix – LG; and estimate proportion of invariable sites (GTRGAMMA + I) – yes [29, 30]. The full tree topology is presented in Supplementary File 1.

16S rRNA amplicon sequence analyses

The demultiplexed and barcode-trimmed 16S rRNA gene amplicons from the JGI were analyzed using QIIME2 [34]. Forward and reverse reads were separated using khmer [35]. Primers were trimmed from the forward and reverse reads using cutadapt in QIIME2 [36]. The forward reads were truncated at 231 base pairs and the reverse reads at 230 base pairs based on the quality score visualization produced by QIIME2 in the demux summary step. Reads were denoised using paired denoising in DADA2 within the QIIME2 platform which also merges the reads [37]. Sequence variants were determined using DADA2 and summarized using feature-table summarize in QIIME2. Taxonomic assignment of the 16S rRNA gene amplicons was based on a phylogenetic tree produced by QIIME2 in which the taxonomy classifier was trained with the SILVA 99% taxonomy classification for the 16S rRNA gene from the April 2018 SILVA 132 release [38]. Phylum names were updated as per the GTDB database taxonomy changes by Parks et al. (2018) for diversity comparisons.

Metagenomic binning

All scaffolds >2500 bp were included in the binning process. The binning algorithm CONCOCT [39] was used in Anvi’o [40] to automatically cluster each metagenome’s scaffolds using a combination of scaffold tetranucleotide frequencies and read-mapped coverage data from all six metagenomes. Gene annotations were imported from the JGI annotations, overriding the automated annotation pipeline in Anvi’o. The bins were manually refined for the six metagenomes using Anvi’o, focusing on completion and quality metrics to guide bin refinements. High quality bins were considered those with greater than 70% completion and less than 10% redundancy. Read mapping was used to calculate coverage for McrA and VOC-degradation gene-encoding scaffolds, restricted to scaffolds 2.5 kb or longer.

Diversity analyses

Diversity analyses on the 16S rRNA gene amplicon sequence variants (ASVs) identified by QIIME2 [34] included the alpha diversity metrics Faith’s phylogenetic diversity [41] and Pielou’s evenness [42], calculated based on four sample types: impacted groundwater well, unimpacted groundwater well, leachate well, and composite leachate cistern. A Shannon diversity index analysis with rarified sequence depth of 53,518 was conducted using QIIME2 and visualized using phyloseq [43] in R. A Chao1 statistic was not calculated, as data processing with QIIME2 and DADA2 removes all singleton ASVs, which the Chao1 statistic requires. For beta diversity measures, full ASV and taxonomy tables were input to unweighted and weighted UniFrac distances principle coordinate analyses, calculated using phyloseq and visualized in R for all samples. The prevalence across samples of ASVs with a count of 2 or more and belonging to phyla with relative abundance greater than 1% or present in multiple sites was determined using phyloseq and visualized in R. The phyla with ASVs present in five or more sites was visualized using ggplot2 in R.

A principal component analysis (PCA) was conducted using vegan [44] in R for all 16S rRNA gene amplicon ASVs and the16S rRNA gene amplicon ASVs present at five or more sites. The ASV count data was Hellinger transformed to reduce the weight of ASVs with low counts and zeros. The leachate wells and the two groundwater well samples were included in the PCA to allow for comparison with environmental parameters, which are available for those sites. Environmental data was not available for the composite leachate cistern site and so CLC samples were included in the analysis only for comparison with the other samples.

Metagenome-derived sequences were classified at the phylum level based on their placement within reference clades on the 16S rRNA and concatenated ribosomal protein phylogenetic trees. Metagenome sequences placing outside of or between phyla were assigned to either “Unclassified Archaea” or “Unclassified Bacteria” as appropriate. Phylum names were updated from the NCBI taxonomy to conform to the GTDB database taxonomy by Parks et al. (2018). Bins were identified at the phylum level using the scaffold assignments from the 16S rRNA gene and concatenated ribosomal protein phylogenetic trees. Bin abundances were determined using the average fold coverage data for all scaffolds in the bin. Phylum abundance per sample was calculated by summing the average fold coverage data for each scaffold on the tree assigned to the phylum, where the scaffold acts as a proxy for the underlying microbial population. Microbial diversity comparisons were visualized using stacked bar plots produced using ggplot2 in R [45].

Chemical data analyses

Chemical measurements provided by the Southern Ontario landfill 2016 annual report were used to determine variance of non-volatile and volatile compounds over time for the three leachate wells and the unimpacted groundwater well. GW1 only has non-volatile compound measurements for one time point in 2011 and so variance could not be calculated. Non-detects, where a compound, if present, is below the detection limit, were treated as zeros. The measurements were log transformed and visualized in a heatmap using heatmap3 [46] in R. Metal and volatile compounds detected in a majority of samples were used for further analysis. The measurements from April and October of 2016 were averaged to estimate the concentrations at the time of microbial biomass sampling.

PCA for the metals and volatile compounds were conducted using vegan [44] in R. The metal and volatile compound concentrations were square root transformed to reduce the range of the values as different compounds differed in concentration by orders of magnitude (Supplementary Table 2). Data for leachate wells and the two groundwater well samples were included for the volatile analysis, but GW1 was excluded from the non-volatile compound analysis as no data were available for that site in 2016. A PCA was also conducted using vegan in R for the other geochemical parameters measured at the sites that are not characterized as non-volatile or volatile compounds (e.g., total dissolved solids (TDS)).

Methanogenesis and VOC degradation capacity

KEGG KO numbers for mcrA and key anaerobic degradation enzymes for the dominant VOCs detected at the Southern Ontario landfill were searched from the annotations for all six metagenomes. Reductive dehalogenases’ catalytic subunits (RdhA), responsible for chlorinated ethene, ethane, and benzene degradation, were annotated by pfam13486 instead of a KO. AbcA, the carboxylase associated with anaerobic benzene degradation [47] does not belong to a KO, and instead was searched using manual BLASTp [48] using three characterized enzymes as queries (ADJ94002.1, WP_011237597, GI10697123) with an initial threshold of e < 1e⁻³⁰.

Annotated proteins were screened through a combination of phylogenetic placement and/or in-depth annotation using BlastKoala [49] and NCBI’s conserved domains feature [50]. All hits were required to have a minimum length of 250 amino acids or a length at least 50% that of the reference sequences if that minimum was below 250 aa (i.e., >200 aa for RdhA, >204 aa for AbcA). Outgroup proteins were derived from literature for each protein of interest (see Table 1 for outgroup protein names and references).

Table 1 Detection of anaerobic volatile organic compound degradation proteins.

Full size table

Proteins were aligned to reference and outgroup sequences using Muscle SRC v. 3.8.1551, columns containing >97% gaps were trimmed using Geneious Prime v. 2021.2.2, and phylogenetic trees inferred using FastTree2 [51]. Metagenome-derived protein sequences that passed the length threshold, affiliated with the correct clade in phylogenetic trees, and had consistent annotations to the functions of interest from BlastKoala or the Conserved Domains Database were kept (Table 1). Connection to high quality MAGs was determined based on scaffold IDs. MAG taxonomy was based on GTDB-tk [52]. Relative distribution compared to VOC concentration at each sampling site was assessed (Table 1).

RdhA and AbcA were selected for deeper examination. Using the CIPRES phylogenomics webserver [30], alignments were tested for the best model of evolution under ModelTest-NG v.0.1.5 [53] and inference of maximum likelihood trees conducted using RAxML-HPC v. 8.2.12 [54] under the best-fit model (LG + G + I for AbcA; VT + G + I for RdhA), and with automatic bootstopping to identify the appropriate number of bootstrap resamplings. For reductive dehalogenases, a reference set from [55] was used to confirm reductive dehalogenase protein annotations, phylogenetic affiliation, and potential substrate specificities. For AbcA, the three proteins associated with tigrfam TIGR02723 were included as positive controls, with reference sequences for UbiD carboxylase (pfam01977) included as an outgroup [56].

Results

Phylum level diversity

Solid waste sampling of the landfill was not possible, as disruption to the landfill cover was not permitted. Instead, we sampled leachate from monitoring wells to gain insight to the planktonic microbial community circulating within the landfill. Samples were collected from three leachate wells (LW1, LW2, and LW3), two samples from a composite leachate cistern at time points separated by one week (CLC_T1 and CLC_T2), and samples from two groundwater wells (GW1 and GW2) adjacent to the landfill (Fig. 1). 16S rRNA amplicons and shotgun metagenomes were generated and processed as discussed in the methods.

The 16S rRNA amplicon sequences were taxonomically classified and relative abundances were determined using QIIME2 [34]. From the 16S rRNA gene analysis, 8030 amplicon sequence variants (ASVs) were identified across the sampled sites with an average of 1147 ASVs per site. In tandem, metagenomic scaffolds were identified to the phylum level via placement on phylogenetic trees inferred based on the 16S rRNA gene and a suite of sixteen concatenated ribosomal proteins. Phylogenetic trees included 1265 and 2306 metagenome-derived sequences for the ribosomal protein and 16S rRNA gene trees, respectively. The total number of medium or higher quality metagenome assembled genomes (MAGs, >70% completeness, <10% contamination) resolved from the six metagenomes was 503. Taxonomy information was combined with scaffold coverage data to determine the relative abundances of phyla present in the landfill metagenomes. Twenty-five phyla were present at greater than 1% relative abundance in at least one landfill sample (Fig. 2). Phylum level profiles were relatively consistent between the 16S rRNA gene amplicon and metagenomic sequencing data (Fig. 2). A notable exception was the Patescibacteria (Candidate Phylum Radiation), which make up a comparatively reduced proportion of the 16S rRNA gene amplicon results (max relative abundance of 30.78%, in GW1) but exhibit the highest relative abundances in the metagenomic data (mean relative abundance of 34% and max relative abundance of 79%, in GW1, based on the coverage of the ribosomal protein-encoding scaffolds). The Bacteroidota (mean: 16%, max: 31.89% in CLC_T1), Firmicutes (mean: 10.19%, max: 28.74% in CLC_T1), and Proteobacteria (mean: 10%, max: 28% in LW2) were also highly abundant across the landfill sites.

**Fig. 2: Relative abundances for phyla present at greater than 1% abundance in at least one sample.**

Alpha and beta diversity metrics

Alpha and beta diversity metrics were calculated based on the 16S rRNA gene amplicon sequences using QIIME2 and the phyloseq package in R [43]. All of the landfill samples had a Shannon index above 5.0 for the 16S rRNA gene amplicon data (Fig. 3). There was no significant difference between the sample types (groundwater, leachate wells, leachate cisterns) when considering Faith’s phylogenetic diversity (Supplementary Fig. 1A). The eight samples also exhibited high Pielou’s evenness (J’ > 0.74) with no significant differences between sample types (Supplementary Fig. 1B).

**Fig. 3: Observed diversity and Shannon index for the eight landfill-associated samples.**

Principle coordinates analysis (PCoA) plots using weighted and unweighted UniFrac distances based on 16S rRNA gene amplicon ASVs showed separation of the samples by type (Supplementary Fig. 2). The inclusion of abundance data in the weighted UniFrac analysis increased the explained variation on axes 1 and 2 by a combined 24.1%, suggesting that presence/absence and phylogenetic distance data implemented in the unweighted UniFrac are not sufficient to resolve the differences in beta diversity between sites in two dimensions. The inclusion of differences in abundance and overlap of ASVs between sites increased separation of the samples by type.

Diversity of ASVs

The prevalence of 16S rRNA ASVs was determined using phyloseq and visualized in ggplot2 [45] in R (Fig. 4). The abundance of ASVs present at 5 or more sites was summarized by phylum (Supplementary Fig. 3). Although phylum level diversity was relatively consistent across the composite leachate cistern, leachate wells, and GW1 sample, the diversity at the ASV level is nearly entirely non-overlapping. The majority of ASVs identified from the top 25 phyla are present in only a single sample (Fig. 4) with only 121 of 8030 ASVs present across five or more samples (Fig. 4 and Supplementary Fig. 3). In addition to the top 25 phyla, ASVs belonging to LCP-89, Micrarchaeota, and an unclassified group of Deltaproteobacteria were also present in five or more sites. The abundance of phyla with populations across 5 or more phyla ranges by several orders of magnitude from 134 total ASV counts for Elusimicrobiota to 83,545 total ASV counts for Bacteroidota (Supplementary Fig. 3). Of the 8030 ASVs, 73.82% were found in only one sample and the number of ASVs shared between any two sites is at maximum 1165 ASVs (Fig. 4 and Supplementary Table 1). Principle component analysis (PCA) for all ASVs showed separation of composite leachate cisterns, leachate wells, and groundwater wells is driven by highly abundant ASVs (Supplementary Fig. 4A). When considering only ASVs present at five or more sites, LW2 is separated from LW1 and LW3 along PC2 and GW1 is separated from all other sites along PC1 (Supplementary Fig. 4B).

**Fig. 4: Prevalence of 16 S rRNA amplicon ASVs across sampling sites.**

Microbial diversity at groundwater wells

There are marked differences in the groundwater microbial communities from GW1 and GW2, the leachate-impacted and unimpacted wells, respectively. GW1 has a high abundance of Patescibacteria while also sharing a more similar phylum-level profile to the leachate wells than to GW2 (Fig. 2). The sample from GW2 had insufficient microbial biomass for metagenomic sequencing, but 16S rRNA gene amplicon sequencing showed that GW2 has a distinct microbial community compared to all other sites, including a higher relative abundance of Nanoarchaeaota (20.7%) and Omnitrophota (14.1%) (Fig. 2). The difference in microbial community composition between GW1 and GW2 is also reflected in their alpha diversity metrics. GW1 has the lowest Shannon index of the eight samples (Fig. 3) as well as the lowest Faith’s phylogenetic diversity (Supplementary Fig. 1A). A lower richness and evenness are expected in GW1, as the mixing of leachate and groundwater creates a suboptimal environment for microorganisms adapted to either environment [57].

Analysis of geochemical parameters

Geochemical parameters, including concentrations of volatile and non-volatile compounds, are measured quarterly by a contracted monitoring company. The statistical power available for analysis of geochemical parameters in the landfill was limited by the availability of data. Non-volatile compound measurements were only available for four sites and volatile compound measurements for five sites (Supplementary Table 2). Non-volatile and volatile compound concentrations varied significantly between sites when compared using an ANOVA (p < 9.14e⁻¹⁴ and p < 2e⁻¹⁶, respectively), with a large range between sites for several non-volatile and volatile compounds (Fig. 5). The date of measurement was not significant for either volatiles or non-volatiles when compared using an ANOVA (p = 0.56 and p = 0.73, respectively). The April and October 2016 measurements for the PCA analysis were averaged to estimate conditions during the July sampling for microbial biomass. Sodium and potassium were removed as outliers because their excessively high concentrations in LW2 (Supplementary Table 2) caused their variance to mask any differences in other compounds in the analyses. From the PCA, calcium, iron, magnesium, and to a lesser degree, boron contributed to the differences between the leachate wells and GW2 (Fig. 5C). For the volatile compounds, nearly all of the observed variation is explained by PC1 (97.4%), largely due to the punctuated presence of m- & p- xylenes in LW1 and LW3, and of o. xylenes and ethylbenzene in LW1 (Fig. 5B, D).

**Fig. 5: Environmental variation between landfill and aquifer sites for non-volatile (mg/L) and volatile (µg/L) compounds.**

Methanogen populations

The potential for methanogenesis was determined using annotations for the alpha subunit of methyl-coenzyme M reductase (McrA; K00399). A total of 94 McrA protein-coding sequences were identified from the six metagenomes, ranging from 1 (GW1) to 25 (CLC_T2) per metagenome (Table 1). Of these, 31 were encoded on scaffolds >2.5 kb, and 17 were binned into high quality MAGs (Table 2). The taxonomic affiliations of the mcrA-encoding MAGs include nine methylotrophic members of the Methanomethylphilaceae as well as seven acetoclastic MAGs from the Methanoregulaceae (3), Methanotrichaceae (3), and Methanocullaceae (1) (Table 2). The final mcrA-encoding MAG is classified as a Methanofastidiosaceae, predicted to use methylated thiols as input to the methanogenesis pathway. McrA-encoding scaffolds and MAGs were moderately abundant, with an average scaffold coverage of 17.72 (dataset average: 13.5, median = 7.4), and MAG average coverages from 6.36–42.32 (Table 2, average coverage of all MAGs: 14.95–31.02 across the six metagenomes).

Table 2 Genome information for MAGs carrying VOC degradation genes or McrA as a marker for methanogenesis.

Full size table

VOC degradation capacity

An annotation-based screen was conducted to assess the potential capacity for volatile organic compound degradation, focusing on anaerobic degradation of chlorinated solvents (ethenes, ethanes, benzenes), BTEX compounds, 1,4-dichlorobenzene and chlorobenzene, and 1,4-dioxane as the predominant VOCs impacting the site. Following curation of annotated proteins for phylogenetic consistency and homology to characterized VOC degrading proteins, 111 protein-coding genes with VOC-degradation relevance were identified (Table 1), 12 of which were associated with high quality MAGs (Table 2).

For the reductive dehalogenases, 76 genes were detected, but only 22 passed the length threshold for phylogenetic placement and substrate specificity examination. All reductive dehalogenase genes, including ones too short for placement, were identified from the landfill metagenome samples. The metagenome for GW1 did not contain any reductive dehalogenase genes, despite this being the only site where chlorinated solvents were detected in the geochemical analyses (68 µg/L total concentration, Supplementary Data 2). Reductive dehalogenases have been identified from a diverse suite of organisms with organohalide respiration capacity [55]. All RdhA genes detected at the southern Ontario landfill are most closely related to those from the Chloroflexota organisms Dehalococcoides and Dehalogenimonas (Fig. 6). One partial protein gene from LW3 has high homology to TceA, the Dehalococcoides-encoded RdhA involved in degradation of trichloroethene (TCE) to dichloroethene (DCE) [58]. In our screen, there were no homologs to VcrA and BvcA, the two known proteins that can dechlorinate vinyl chloride (VC) to non-toxic ethene [59, 60], indicating VC degradation may be limited or absent. No other landfill-derived sequences were associated with reference sequences with known substrate specificities (Fig. 6).

**Fig. 6: Maximum likelihood trees placing landfill-derived reductive dehalogenases (RdhA) and anaerobic benzene decarboxylases (AbcA) within their respective gene families.**

For anaerobic benzene degradation, 183 AbcA hits were identified via BLASTp, with 177 passing the length requirement. Based on the phylogeny containing AbcA and UbiD representative proteins, 7 of these sequences place within or next to the AbcA clade, and were scored as potential AbcA in the landfill metagenomes (Table 1, Fig. 6). AbcA genes were identified from LW2 and LW3, while benzene was detected at LW1, LW3, and GW1. One AbcA gene was associated with a MAG from the gammaproteobacterial genus Sterolibacterium (LW3_68). Current characterized Benzene degraders with AbcA are from the genera Thauera and Aromatoleum, making this an expansion of the taxonomic as well as sequence diversity of this recently described remediation-relevant protein family.

Anaerobic xylene and toluene degradation was screened based on presence of benzylsuccinate synthase (BssA) [61, 62]. An initial 147 annotated proteins were curated to 26 based on length requirements and phylogenetic affiliations, with BlastKoala confirming annotation for 22 of these as well as an additional 14 proteins (Supplementary Data 2). The 40 proteins passing tree-based curation and/or BlastKoala annotation are reported in Table 1. Unlike for chlorinated solvents, xylene and toluene degradative capacity tracks with contaminant concentrations: LW1 and LW3 have both the highest concentration of xylene and toluene, as well as the highest count of predicted benzylsuccinate synthases (LW1: 1307 µg/L, 10 BssA; LW3: 211.2 µg/L, 26 BssA). LW2 and GW1 have lower concentrations of xylene and toluene, and fewer detected BssA (LW2: 43.3 µg/L, 3 BssA; GW1: 0.1 µg/L, 1 BssA). No bssA genes were detected in the CLC metagenomes.

Ethylbenzene degradation capacity was examined through ethylbenzene dehydrogenase (EbdA), a member of the dimethylsulfoxide (DMSO) reductase family [63]. An initial 112 sequences were cut to 41 following length filtration, with 40 placing as putative EbdAs on a tree rooted with nitrate reductases, dimethylsulfoxide dehydrogenases, selenate reductases, perchlorate reductases, and chlorate reductases, following the tree in [64]. Of these, ten were associated with two MAGs, one member of the Rhodocyclaceae encoded seven EbdA proteins (LW3_42), and one member of the Sterolibacterium encoded 3 (LW3_67; not the same MAG as encoded the AbcA). Ethylbenzene was detected at all four sites with geochemical data. LW1 had the highest concentration (275 µg/L), with no detected EbdA from the corresponding metagenome. LW2 and LW3 had moderate levels of ethylbenzene, and strong EbdA counts (LW2: 8.25 µg/L, 19 EbdA; LW3: 17.5 µg/L, 19 EbdA). The remaining two EbdA were identified from CLC_T2, while GW1, which had trace ethylbenzene (0.1 µg/L) and CLC_T1 had no EbdA detected.

Anaerobic degradation of chlorobenzene and 1,4-dichlorobenzene is catalyzed by chlorobenzene dihydrodiol dehydrogenase (TcbB). Only 4 hits were identified based on KO annotations, all from LW2. Of these, only 1 had >75% ID at the amino acid level to chlorobenzene dihydrodiol dehydrogenase as its closest database homolog. The other three were more closely related to cis-2,3-dihydrobiphenyl-2,3-diol dehydrogenases. None were assigned to the correct annotation using BlastKoala. The one putative TcbB is reported here. LW1 and LW3 have detectable chlorobenzenes in the leachate (6.0 and 15.3 µg/L net chlorobenzenes, respectively), while LW2, the sample with a putative TcbB, did not have any detectable chlorobenzenes.

1,4-dioxane degradation was included in this screen as a contaminant of interest for the site engineers. There are currently no known anaerobic 1,4-dioxane degradation pathways. To examine the latent potential for aerobic dioxane degradation, we focused on DxmA and PrmA, the two enzymes capable of 1,4-dioxane degradation without requiring induction from a co-contaminant (e.g., toluene, propane) [65,66,67,68,69]. From an initial set of 20 putative DxmA and 33 PrmA, only one DxmA, from LW2, passed length requirements and phylogenetic tree-based curation. Clustering with the DxmA from Pseudonocardia dioxanivorans, this protein is annotated with an aromatic and alkene monooxygenase hydroxylase domain by NCBI’s conserved domain database, and was encoded on a high quality MAG (LW2_26). Notably, LW2_26, from an unclassified genus within the family Solirubrobacteraceae, is the 14th most abundant MAG across all metagenomes (average coverage = 118.64; Table 2). 1,4-Dioxane was only detected at GW1 (26 µg/L), whose paired metagenome contained no identified dioxane degradative capacity.

The curated VOC degradation proteins were moderately abundant, with average scaffold coverages for genes on scaffolds over 2.5 kb ranging from 10.2–26.94 (BssA = 10.2; RdhA = 11.6; TcbB (one gene) = 17.4; DxmA (one gene) = 19.4; EbdA = 24.6; AbcA = 26.9; all scaffolds’ average coverage = 13.5, median = 7.4).