Main

We developed a protocol for mapping m1A at single-nucleotide resolution (m1A-seq), relying on the property of m1A to lead to typical misincorporation and truncation profiles upon reverse transcription3. Key steps in the protocol include (1) antibody-based enrichment of m1A-comprising mRNA fragments1,2, (2, optional step) Dimroth rearrangement, by which m1A sites are converted to N6-methyladenosine (m6A)2,4, which eliminates the m1A misincorporation and truncation patterns, and (3) reverse transcription using either a highly processive reverse transcriptase (TGIRT) with a greater tendency to lead to misincorporations at m1A sites5,6, or a less processive one (SuperScript III (SS)) tending to lead to premature truncations (Fig. 1a and Supplementary Note 1).

Figure 1: Establishment of m1A-seq and characterization of 205 putative m1A-containing sites.
figure 1

a, Scheme depicting the m1A-seq pipeline. Following poly(A) selection, mRNA is fragmented into approximately 100-nucleotide-long fragments. m1A-containing fragments undergo enrichment using an anti-m1A antibody, following which they are subjected to reverse transcription using either TGIRT or SuperScript enzymes, leading predominantly to misincorporation and premature truncation, respectively. As controls, the immunoprecipitation is either omitted (input) or followed by a Dimroth rearrangement which converts m1A to m6A (IP + Dimroth). b, Venn diagram depicting the overlap between sites detected across the four datasets analysed here. The number of putative m1A sites within all classes of RNA are indicated in black; sites within tRNA molecules are indicated in parenthesis in red. AMV, avian myeloblastosis virus reverse transcriptase. c, Pie chart summarizing the classes of RNAs in which the 205 putative m1A containing sites were observed. Within tRNAs, the sites are further classified on the basis of the position within tRNA harbouring the sites. d, Table presenting all the putative m1A sites detected in this study, excluding those in tRNA molecules. For each site, we present the percentage of misincorporation in input and IP experiments (Mis. (input) (%) and Mis. (IP) (%)), which were calculated as the mean respective values across the datasets in which these sites were detected, the percentage truncation (Trunc. (%)) as estimated from m1A-seq-SS upon m1A-IP, the number of samples and experiments (Samp. and Exp.) in which the site was independently detected, and the sequence surrounding the putative m1A site. We highlight two sequence motifs that are reproducibly found within them in red and green. The putative m1A site is highlighted in yellow. e, Misincorporation, truncation, and coverage plots for putative m1A sites identified in this study. The graphical representation, inspired by ref. 3, depicts the truncation rate (black line, left y axis), misincorporation rate (stacked barplot, left y axis), and the overall coverage (grey shade, right y axis) in a sequence window surrounding the putative m1A site.

PowerPoint slide

We applied m1A-seq-SS and m1A-seq-TGIRT to RNA derived from HEK293T cells. Each dataset included samples sequenced either directly (‘input’), or following m1A-immunoprecipitation (‘IP’), or following both m1A-IP and Dimroth rearrangement (‘IP + Dimroth’). We enhanced these two datasets with 8 input and IP sample pairs from ref. 2, and 14 sample pairs from ref. 1. These experimental datasets collectively comprised an unprecedented depth of more than 2 billion reads (Supplementary Table 1). We developed a single, common analytical pipeline to identify m1A-specific misincorporation profiles, relying on statistical tests assessing whether misincorporation rates were significantly higher in (1) IP samples, compared with input, or (2) IP compared with IP + Dimroth. In addition, we required putatively modified sites to have at least two distinct types of misincorporation (for example, A→T and A→G), and used the reverse transcriptase truncation rate as an optional filter (Extended Data Fig. 1a and Methods). Collectively, we identified 205 high-confidence putative m1A sites, and 72 additional lower-ranking ones (Fig. 1b–e and Supplementary Table 2). Quantifications of reverse transcriptase truncation and misincorporation levels were highly reproducible among replicates (Extended Data Fig. 1b–d). Each site was detected, on average, in approximately 11 independent samples (interquartile range 8–16), and 115 sites were independently identified in at least 2 datasets (Fig. 1b, c). As expected, use of TGIRT-based m1A-seq led to higher misincorporation and lower truncation rates at the detected sites compared with the SuperScript-based libraries (Extended Data Fig. 1e, f); IPs highly enriched for misincorporation rates, and Dimroth rearrangement reduced them (Extended Data Fig. 1g, h). We established our approach as highly sensitive and specific, as it allowed de novo discovery of almost all known classes of m1A sites in cytosolic tRNAs and rRNA, with very few false positives (Supplementary Note 2a). We establish misincorporation rates as a quantitative, relative estimate of m1A levels, providing a lower boundary on m1A stoichiometry (Extended Data Fig. 1i and Supplementary Note 2b).

Surprisingly, our dataset comprised predominantly well-established sites in rRNA and tRNA (Fig. 1c), and only 15 sites in mRNA and long non-coding RNAs (lncRNAs), 10 of which were in cytosolic transcripts and 5 in mitochondria (Fig. 1d). Most sites had very low misincorporation rates (less than 2.5%) in the absence of antibody-mediated enrichment (Fig. 1d), suggesting that they are modified at low stoichiometries. Two notable exceptions, both previously shown to harbour RNA:DNA sequencing differences of unknown nature, were in (1) tRNA-like mascRNA (MALAT1-associated small cytoplasmic RNA)7 and (2) mitochondrially encoded ND5 transcript8,9 (Supplementary Note 3). Misincorporation levels at the detected mRNA sites in the cytosol across thousands of samples from more than 50 tissues in the Genotype-Tissue Expression (GTEx) collection10 similarly revealed very low misincorporation levels, but high levels in rRNAs and in mascRNA serving as controls (Extended Data Fig. 2 and Supplementary Note 4), suggesting that the low m1A levels are not specific only to tumorous cell lines. We further exclude the possibility that the low number of sites in mRNAs is due to decreased detection power in mRNAs that are less expressed (Supplementary Note 5 and Extended Data Fig. 3a, b). In addition, we observe the previously described 5′-biased distribution of m1A-seq peaks1,2 (Extended Data Fig. 4a–d and Supplementary Table 3), but find no m1A-specific misincorporation profiles within these regions (Supplementary Note 5). It remains to be established whether these enrichments originate from complex modification patterns at the first transcribed nucleotide, or are experimental artefacts. Thus, with few exceptions, m1A is rarely observed at internal sites on mRNA, and typically at low stoichiometry.

All sites in cytosolic mRNAs and in Malat1 mascRNA comprised a single motif, or slight deviations thereof, consisting of the sequence GUUCNANNC (A = m1A) within a strong hairpin structure typically comprising a 5-base pair (bp) stem and a 7-bp loop (Fig. 2a; see also Supplementary Note 6 and Extended Data Fig. 5a, b). This sequence and structural motif is identical to the T-loop of tRNAs, where m1A at position 58 is catalysed by the TRMT6/TRMT61A complex at precisely the same relative position11,12. Consistently, knockdown of TRMT6/TRMT61A resulted in elimination of m1A from T-loop harbouring mRNAs (Extended Data Fig. 5c). Conversely, overexpression of TRMT6/TRMT61A followed by m1A-seq-TGIRT markedly increased misincorporation rates at the detected sites (Fig. 2b), and led to accumulation of m1A at 384 cytosolic mRNA and lncRNA sites (Fig. 2c and Supplementary Table 4), a massive enrichment with respect to the 10 originally identified sites. The GUUCNANNC motif was highly enriched at these sites (present in 193 out of 384 sites) (Fig. 2d), which were further enriched for a stable T-loop-like structure (Fig. 2e), typically consisting of a 7 bp loop (Extended Data Fig. 5d) and a 6–7 bp stem (Extended Data Fig. 5e). A subset of the peaks could be further validated by seeking m1A sites that had converted to m6A following Dimroth treatment of mRNA extracted from TRMT6/TRMT61A-overexpressing cells (Supplementary Note 7 and Extended Data Fig. 6a–e). To directly explore the determinants of specificity of TRMT6/TRMT61A, we used a plasmid library comprising thousands of T-loop sequences and systematically mutated counterparts13, all cloned as 3′ untranslated region (UTR) elements downstream of a reporter. Co-transfection of this plasmid pool with TRMT6/TRM61A into HEK293T cells allowed reconstitution of the m1A misincorporation signal precisely at the predicted position (Fig. 2f). Systematic point-mutation of each position in the GUUCNANNC motif allowed functionally reconstruction of the consensus required for modification via the TRMT6/TRMT61A complex (Fig. 2g), highlighting requirements for G–C base pairing at positions −5 and +3, and the requirements for a pyrimidine, a cytosine, and a purine at positions −3, −2 and −1, respectively. Systematic structural mutants and compensatory mutations demonstrated direct dependency of misincorporation on stem stability (Fig. 2h). These analyses demonstrate that within the cytosol, m1A is catalysed at T-loop-like elements via TRMT6/TRMT61A.

Figure 2: Cytosolic m1A sites share a sequence and structural motif and are strongly induced upon overexpression of TRMT6/TRMT61A.
figure 2

a, Predicted secondary structures of sequence environment surrounding putative m1A-containing sites in cytoplasmic mRNA and lncRNAs (see also Extended Data Fig. 5a); note the 7 bp loop stabilized by a relatively strong stem with terminal G–C base pair and UUCNANY loop. b, Frequencies of misincorporation rates obtained in input versus IP RNA fractions, in wild type (WT) versus upon overexpression of TRMT6/TRMT61A, shown for three putative m1A sites. Error bars reflect binomial confidence intervals, following read aggregation from two biological replicates. c, Pie chart as in Fig. 1c, depicting the distribution into RNA classes of 495 putative m1A sites detected upon overexpression of TRMT6/TRMT61A, of which 384 are within mRNA and lncRNA molecules. Mito., mitochondrial; Cyto., cytosolic. d, Sequence motif obtained by unbiased sequence analysis of the sequences surrounding the putative sites in mRNA upon overexpression. e, Distributions of predicted free energies among the 384 putative m1A sites in mRNA + lncRNAs, compared with randomly shuffled controls. These values are derived on the basis of parsing of the predicted secondary structures obtained using RNAfold. Note that lower free energies are indicative of more thermodynamically stable structures. P value is based on Mann–Whitney U-test. f, Misincorporation percentages across three regions that were monitored in the massively parallel reporter assay. Positions are numbered with respect to the predicted m1A site, and the ‘0’ position with predicted m1A is highlighted in yellow. g, Functionally reconstructed sequence motif, based on measurement of misincorporation rates at each of the nucleotides in each of the displayed positions into every other nucleotide across each of the 74 T-loops (Methods). h, Misincorporation rates following abolition and gradual restoration of the stem (thick line, median; box boundaries, 25% and 75% percentiles; whiskers, 1.5-fold interquartile range). The numbers of consecutive complementary bases in the stem region (beginning with the T-loop proximal bases) are indicated in the x axis.

PowerPoint slide

We next focused on the site in the mitochondrially encoded ND5 gene, harbouring the highest levels of modification in our dataset, with misincorporation rates of approximately 25% in polyadenylated RNA (poly(A) mRNA) from HEK293T cells (Supplementary Note 8). The absence of misincorporations in reads from the transcribed antisense strand (Extended Data Fig. 7a) and in DNA (Extended Data Fig. 7b) ruled out DNA heteroplasmy as their source. Misincorporation levels in ND5 RNA across thousands of GTEx samples were highly tissue specific: essentially absent in muscle and in heart but approximately 30% (median) in ovary and pituitary gland (Fig. 3a). Targeted sequencing of the ND5 locus in human muscle and ovary samples confirmed these findings (Fig. 3b). The relatively high m1A levels observed in the human ovary samples prompted an exploration of ND5 methylation in development. Strikingly, single-cell RNA-seq (scRNA-seq) from 1,529 individual cells14 revealed misincorporation levels greater than 75% at the eight-cell stage (Fig. 3c), roughly equivalent to those observed in 16S rRNA, which was previously shown to be methylated at nearly 100% stoichiometry15. Levels of misincorporation decreased with developmental progression, and by day 7 reached approximately 12.5% (Fig. 3c). scRNA-seq analysis of 124 single cells, spanning a wider developmental range16, extended these findings, and revealed that from the metaphase II oocyte to the four-cell embryo, methylation levels at ND5 are probably close to 100% (Fig. 3d and Supplementary Note 9) followed by a precipitous decrease (Fig. 3d). As zygotic mitochondrial transcription begins around the eight-cell developmental stage17, we speculated that m1A might mark particularly stable maternal transcripts that persist up to zygotic mitochondrial transcription. Indeed, transcriptional arrest in HEK293T cells using actinomycin D or ethidium bromide led to an approximately fourfold increase in ND5 misincorporation levels (Fig. 3e and Extended Data Fig. 7c, respectively), confirming the association between ND5 stability and methylation. Thus, m1A in ND5 is highly tissue- and developmentally specific, and may serve as a mark of a subset of ND5 transcripts, which are maternally inherited and dominate until zygotic mitochondrial transcription at roughly the eight-cell stage.

Figure 3: Tissue- and development-specific methylation in ND5 catalysed via TRMT10C.
figure 3

a, Highly tissue-specific distribution of misincorporation rates at ND5:1374 across 29 tissues on the basis of more than 9,000 RNA-seq datasets obtained from the GTEx collection; particularly high levels are observed in the pituitary gland and ovary. In comparison, misincorporation levels, pooled across all tissues, are presented for chromosome M (chrM):2617, a position in the mitochondrial 16S rRNA; This position was previously demonstrated to be methylated close to 100% (ref. 15), and thus allows a rough calibration of the readout. Boxplots represent median and interquartile range, as indicated in Fig. 2h. b, Misincorporation levels at ND5:1374 in human muscle and ovary samples, on the basis of targeted, strand-specific sequencing of the ND5 locus in poly(A) RNA. Error bars, binomial confidence interval from a single sample. c, Distributions of misincorporation levels across 1,529 individual cells from 88 human preimplantation embryos ranging from developmental day 3 (corresponding to the eight-cell stage) to day 7 (ref. 14); in this study the authors used SuperScript II for reverse transcriptase. 16S rRNA methylation levels are shown in comparison as in a. d, Distributions of misincorporation levels across 124 cells spanning a developmental range from metaphase II oocytes to late blastocyst16; in this study the authors used SuperScript III for reverse transcriptase. e, Misincorporation levels at ND5:1374 measured at the indicated time-points following actinomycin-D-mediated transcriptional arrest (n = 3). f, Misincorporation levels at ND5:1374 in HEK293T cells after overexpression (OE) of TRMT10C, treatment with short interfering RNAs (siRNAs) directed against TRMT10C (siTRMT10C), or treatment with a control siRNA. Error bars, binomial confidence interval based on a single, representative replicate. g, Misincorporation levels at ND5:1374 across 102 samples from human ovaries, colour-coded on the basis of the presence of a SNP at position 13708 (red, wild type; blue, SNP).

PowerPoint slide

We hypothesized that m1A in ND5 is catalysed by TRMT10C, which catalyses methylation at position 9 of mitochondrial tRNAs18. Indeed, TRMT10C knockdown led to almost complete abolition of methylation at ND5:1374; TRMT10C overexpression resulted in a 50% increase in methylation levels (Fig. 3f). Analysis of the GTEx data revealed that ND5 methylation levels are under genetic control, as they are strongly correlated across different tissues from the same individual (Extended Data Fig. 7d, e). Detailed analysis revealed a relatively common single nucleotide polymorphism (SNP) (G13708A) two bases upstream of the ND5 site (at 13710), severely reducing the ability of ND5 to undergo methylation (Fig. 3g). Targeted sequencing of the ND5 locus in lymphoblastoid cell lines from individuals harbouring this SNP compared with controls confirmed these results (Extended Data Fig. 7f). G13708A is among the defining SNPs of the Eurasian J haplogroup, and is thought to affect the clinical expression of Leber hereditary optic neuropathy in western Eurasians19,20,21,22. Our results indicate that this haplotype has lost the ability to undergo efficient ND5 methylation.

We speculated that the Watson–Crick disruptive nature of m1A would prevent effective translation of modified codons. Polysome fractionation experiments revealed a substantial and highly significant reduction in misincorporation levels in ND5 in the heavier fractions relative to lighter ones (Fig. 4a and Extended Data Fig. 8a), suggesting repressed translation of m1A-harbouring transcripts. Consistently, upon overexpression of TRMT6/TRMT61A, we found a pronounced and highly significant depletion of m1A-modified mRNA in the ribosome-heavy fractions relative to the ribosome-poor fraction (Fig. 4b). This was observed for cytosolic sites that were present either in the 5′ UTR or within the coding DNA sequence (CDS), but not for a site present in the 3′ UTR (Fig. 4b). We next cloned a 60 bp region harbouring the m1A sites in the PRUNE gene in-frame and upstream of a firefly luciferase coding region (Fig. 4c). Co-transfection of this construct with TRMT6/TRM61A led to high m1A levels (Fig. 4d), which were eliminated upon disrupting the sequence or structure, but restored via a compensatory structural mutation (Fig. 4c, d). The point-mutation of the m1A site in PRUNE or structural disruption led to an approximately twofold increased luciferase levels compared with the sequences harbouring intact consensus sequences and T-loop structures (Fig. 4e). Conversely, no decrease—and even a slight increase (P = 0.03)—was observed when introducing the wild-type element into the 3′ UTR upon overexpression of TRMT6/TRMT61A compared with controls (Extended Data Fig. 8b–d). The consistent translational repression associated with m1A sites within the 5′ UTR or CDS, but not in the 3′ UTR, suggests that it may be dependent on ribosomal scanning or translation (Supplementary Note 10 and Extended Data Fig. 9).

Figure 4: m1A-containing mRNAs are inefficiently translated.
figure 4

a, Misincorporation rates at the ND5 locus across the polysomal fractions, measured using strand-specific targeted sequencing (n = 3). Dots, measurements; red bar, mean. b, Misincorporation rates at four selected loci following overexpression of TRMT6/TRMT61A, measured across four polysomal fractions (n = 3), displayed as in a. c, Depiction of the four designed variants, perturbing either the modified site or the secondary structure, on the basis of the methylated site in the PRUNE gene. The methylated position is plotted in red, the perturbed position in magenta. d, Misincorporation percentages at the designed site, quantified across the four constructs using targeted sequencing (n = 3). Dots, measurements; red bar, mean. Note that for the ‘site mutation’ variant the misincorporation rate reflects the fraction of reads not harbouring a ‘T’, in contrast to all remaining variants in which it reflects the fraction not harbouring an ‘A’. e, Renilla-normalized firefly luciferase levels in TRMT6/TRMT61A-overexpressing cells divided by the corresponding ratio in non-overexpressing control for the four tested constructs. Dots, measurements; red bars, mean. Presented are t-test-based P values.

PowerPoint slide

Collectively, the ability to map and quantify m1A at single-nucleotide resolution allowed redefinition of its genome-wide distribution, diverging substantially from previous reports1,2, and, further, its biogenesis, functions, and potential mechanisms of action to be addressed. Similarly to pseudouridine13,23,24,25 and 5-methylcytosine26, m1A is also catalysed via co-opting of the tRNA/rRNA modifying machineries. The repressive impact of m1A on translation probably underlies its scarcity in cytosolic mRNAs. The marked reduction in ND5 methylation in mitochondria following the eight-cell stage coincides with activation of zygotic transcription at this stage, an increase in oxygen consumption27,28,29, and changes in mitochondrial morphology30, collectively suggestive of a developmental, regulatory role for m1A methylation at this position.

Methods

No statistical methods were used to predetermine sample size. The experiments were not randomized. The investigators were not blinded to allocation during experiments and outcome assessment.

Cell culture for knockdown and overexpression experiments

Human HEK293T cells (American Type Culture Collection (ATCC); passage numbers 5–15; no further verification of cell line identity was performed, but the cells screened negatively for mycoplasma) were plated in six-well plates at 20% confluency. siRNAs targeting TRMT10C (Thermo Fisher s29784), TRMT6 (s28400), and TRMT61A (s41859) were transfected using Lipofectamine RNAiMAX (Life Technologies AB4427037) following the manufacturer’s protocols, with two siRNA boosts with a 48 h interval between them; siRNA targeting TRMT6/TRMT61A was co-transfected with half of the recommended amount of siRNA targeting TRMT6 and the other half targeting TRMT61A. As negative controls, we used Ambion In Vivo Negative Control #2 siRNA (catalogue number 4390846). Cells were harvested at 96 h. For overexpression, plasmids encoding full-length TRMT6/TRMT61A under a cytomegalovirus promoter were obtained from OriGene. The plasmids were transfected into HEK293T cells using PolyJet (SignaGene) with one boost of the plasmid at 24 h. Cells were harvested 48 h after transfection.

Blocking transcription or translation

Transcription was blocked using actinomycin D (Sigma) at a concentration of 10 μg ml−1 or ethidium bromide (Amresco) at a concentration of 0.4 μg ml−1. Translation was blocked using cycloheximide (Sigma) at concentration of 100 μg ml−1.

Human RNA

Total RNA extracted from ovary and muscles from a human donor was obtained from Takara. Lymphoblastoid cell lines were obtained from Coriell.

RNA preparation for m1A-seq

RNA was extracted from cells using NucleoZOL (Macherey-Nagel). Enrichment of poly(A)+ RNA from total RNA was performed using Oligo(dT) Dynabeads (Invitrogen) according to the manufacturer’s protocol. The mRNA was chemically fragmented into approximately 100-nt-long sections using RNA fragmentation reagent (Ambion). The sample was cleaned using Dynabeads (Life Technologies), and resuspended in 20 μl IPP buffer (150 mM NaCl, 0.1% NP-40, 10 mM Tris-HCl, pH 7.5).

m1A-seq and m6A-seq

The protocol we developed for transcriptome-wide mapping m1A is based on our previously published protocol for mapping N6-methyladenosine (m6A)31. Briefly, 40 μl of protein-G magnetic beads were washed and resuspended in 200 μl of IPP buffer, and tumbled with 5 μl of affinity-purified anti-m1A polyclonal antibody (MBL) at room temperature for 30 min. RNA was added to the antibody-bead mixture, and incubated for 2 h at 4 °C. The RNA was then washed twice in 200 μl of IPP buffer, twice in low-salt IPP buffer (50 mM NaCl, 0.1% NP-40, 10 mM Tris-HCl, pH 7.5), and twice in high-salt IPP buffer (500 mM NaCl, 0.1% NP-40, 10 mM Tris-HCl, pH 7.5), and eluted in 30 μl RLT (Qiagen). To purify the RNA, 20 μl of MyOne Silane Dynabeads (Life Technologies) were washed in 100 μl RLT, resuspended in 30 μl RLT, and added to the eluted RNA. Sixty microlitres of 100% ethanol were added to the mixture, the mixture attached to the magnet, and the supernatant discarded. After two washes in 100 μl of 70% ethanol, the RNA was eluted from the beads in 10 μl of H2O. Dimroth rearrangements were performed as described in ref. 2. For mapping m6A following Dimroth conversion of m1A, RNA extracted from TRMT6/TRMT61A-overexpressing cells was first Dimroth rearranged as described in ref. 2, and subjected to m6A-seq as previously published31.

Library preparation

Strand-specific m1A RNA-seq libraries were generated on the basis of the protocol described in refs 32, 33. Briefly, RNA was first subjected to FastAP Thermosensitive Alkaline Phosphatase (Thermo Scientific), followed by a 3′ ligation of an RNA adaptor using T4 ligase (New England Biolabs). Ligated RNA was reverse transcribed either using SuperScript III (Invitrogen) or TGIRT-III (InGex), and the cDNA was subjected to a 3′ ligation with a second adaptor using T4 ligase. The single-stranded cDNA product was then amplified for 9–12 cycles in a PCR reaction. Libraries were sequenced on Illumina Nextseq platforms generating short paired-end reads, ranging from 25 to 55 bp from each end.

Identification of putative m1A sites

A human reference genome was generated on the basis of the hg19 assembly of the human genome, supplemented with tRNA, rRNA, and snRNAs, obtained from the modomics database34. Non-enriched (input), m1A-enriched (IP), and Dimroth-treated m1A-enriched (IP + Dimroth) samples were aligned to the genome, using STAR aligner35 with an increased stringency allowing only up to three mismatches per each read pair (‘–outFilterMismatchNmax 3′). All duplicates were marked using picard tools MarkDuplicates.jar’, and non-primary alignments were removed. The identity of each nucleotide at each genomic position was extracted using ‘samtools mpileup’, with max per-file depth settings of ‘-d 100000’. A custom script was used to parse the pileup format into a tabular format summarizing the abundance of each nucleotide at each position. All positions harbouring an ‘A’ in the annotated sense strand, with at least two mismatches, occurring in at least 10% of the reads overlapping it were recorded.

All sites, recorded across any replicates and any of the samples, were pooled into a single data set. Misincorporation rates at each of these pooled sites were subsequently re-extracted for each of the experiments (allowing analysis of each site in each experiment, even if they did not pass the initial thresholds in that particular experiment). A bona fide m1A site is expected to have higher mismatch rates in IP compared with input, and lower levels in IP + Dimroth compared with IP. Hence, we used a χ2 test on the basis of the aggregated misincorporation counts (across replicates) to test the hypotheses that (1) the number of mismatches in the IP sample was higher than input, and (2) the number of mismatches in IP was higher than in IP + Dimroth. The following criteria were then used for identification of putative m1A sites: (1) at least one of the two calculated P values was significant (P < 0.05); (2) the product of the P values was less than 0.01; (3) a difference in misincorporation rate between the sample with the lowest levels of misincorporation and the highest level of at least 0.2; (4) the sites had to be covered by at least ten reads in at least two samples; (5) at least 1% of all reads mapping to the site (across all replicates and samples) had to be ‘A’, at least 1% of all reads mapping to the site (across all replicates and samples) had to be ‘T’, and at least 1% of all reads mapping to the site (across all replicates and samples) had to be ‘C’ or ‘G’. These last criteria were set to aid in the discrimination of SNPs (where only one alternative to an ‘A’ is expected) from m1A sites (where typically more than one type of misincorporation is observed). Sites harbouring identical sequences in a 24 bp window (12 bp upstream + 11 bp downstream) surrounding the putative site were filtered, to retain only one. Nonetheless, owing to merging of sites from different data sets and the multiple loci from which identical or nearly identical tRNAs were transcribed, a subset of duplication was retained—and are flagged as such—within tRNA entries in Supplementary Table 2.

This pipeline was applied to four batches of samples: (1) input, IP, and IP + Dimroth (two replicates each) to which we applied m1A-seq-SS; (2) input, IP, and IP + Dimroth (two replicates each) to which we applied m1A-seq-TGIRT; (3) 16 samples generated in ref. 2 with RNA-seq readouts from HepG2 cells in m6A-IP or input, with and without Dimroth rearrangements downloaded from Gene Expression Omnibus (GEO; accession number GSE70485); (4) 28 samples generated in ref. 1, with m1A mapped under different genetic perturbations and upon different stimuli downloaded from GEO (accession number GSE73941). The ‘high-confidence’ data set of 205 sites comprised all sites for which at least two significant P values were obtained across any of the comparisons performed across any of the data sets; the ‘low confidence’ sites comprised all sites associated with a single significant P value. Of note, to accommodate the distinct experimental design in the data sets obtained from refs 1, 2, we adapted the precise sets of comparisons that were performed by the analytical pipeline. Specifically, in addition to assessing whether IP differed from input, and from IP + Dimroth, in the ref. 2 data set we further derived χ2-based P values to assess whether the Dimroth rearrangement in the input samples led to reduced mismatch levels compared with in its absence. In the data set produced in ref. 1, the authors did not use a Dimroth rearrangement, but instead relied on treating the RNA with AlkB, an Escherichia coli-derived demethylase that eliminates m1A. Reference 1 further compared measurements upon knockout of AlkBH3, which they found to demethylate m1A. Accordingly, we performed four statistical tests, examining differences in mutation rates between (1) IP versus input in wild-type samples, (2) IP versus input across stress conditions (H2O2, starvation), (2) IP versus IP + AlkB, and (3) IP in wild-type cells versus IP in ALKBH3 knockout cells. Of note, given that any site passing any of the statistical tests was considered a putative site, effectively our criteria for identifying putative m1A sites in the data sets of refs 1, 2 are more lenient than the criteria we applied for the two data sets we generated. Finally, in the data set generated upon overexpression of TRMT6/TRMT61A, we did not perform Dimroth rearrangements; instead we performed two tests, examining differences in misincorporation rates between (1) input samples and IP samples, and (2) input samples following overexpression of TRMT6/TRMT61A compared with controls. All analyses were performed using the identical computational pipeline, into which we fed, as parameters, the precise comparisons to be made.

Although our m1A-seq approach provides strand-specific data, in the initial analyses in Figs 1 and 2 strand specificity was inferred from the genomic annotation rather than the read. This allowed application of an identical pipeline to the data generated in this study and the two previously published data sets. To analyse m1A at the ND5 locus, we subsequently called mutations in a strand-specific manner (separately inferring mutations on the + and − strands), to prevent dilution of misincorporation signal from the antisense strand.

Quantification of reverse transcriptase truncations

For each of the putatively identified m1A sites, we calculated the rate of transcription termination at position +1 with respect to the site. This was performed by first artificially merging all read pairs into a single, artificial read extending from the beginning of one read to the end of its mate, and then using bedtools to count the number of reads beginning and overlapping each position. The ratio between the two was defined as the stop rate at that position, as performed in ref. 23.

Identification of m1A peaks

Peak detection was performed on the basis of our previously published approach for detecting peaks in m6A-seq data31,36. Specifically, an in-house script was first used to project all reads aligning to the genome upon the human transcriptome. Only reads fully matching a transcript structure, as defined by the ‘UCSC Known Genes’ transcriptome annotation, were retained. Such reads were computationally extended in transcriptome space from the beginning of the first read to the end of its mate, and coverage in transcriptome space was calculated for each nucleotide across all transcripts.

Putative m1A sites were identified using a three-step approach, as follows. (1) Peak detection within genes. To search for enriched peaks in the m6A IP samples, we scanned each gene using sliding windows of 100 nt with 50-nt overlaps. Each window was assigned a peak over median (POM) score, defined as mean coverage in the window/median coverage across the gene. Windows with POM scores greater than 4 (that is, greater than fourfold enrichment) and with a mean coverage of more than ten reads were retained. Overlapping windows were merged together, and for each disjoint set of windows in transcriptome space we recorded its start, end, and peak position, corresponding to the position with the maximal coverage across the window. (2) Ensuring that peaks were absent in input. We repeated the procedure in step 1 for the input sample. We eliminated from all subsequent analysis all windows that were detected in both steps 1 and 2. (3) Comparison of multiple samples. To search for consistently occurring peaks across different samples, we first merged the coordinates of all windows from all samples passing steps 1 and 2, to define a set of disjoint windows passing these filters in at least one of the samples. For each such window, we recalculated the peak start, end, peak position, and POM score (as defined above) across each of the samples using the approach in step 1. In addition, for each window we calculated a peak over input (POI) score, corresponding to the fold change of coverage across the window in the IP sample over the coverage in the input sample. To account for differences in sample depth, we estimated the mean difference between IP and input samples across the 500 most highly expressed genes, which we used as an estimate for background. We subtracted this background from the POI score.

On the basis of careful examination of the peaks at the beginning of transcripts, which revealed that in many cases the peaks originated from the first transcribed nucleotide, we used the approach we described in ref. 36. Briefly, this approach relies on calculating the fold change upon m1A-IP, compared with input, in reads beginning at each of the first 50 annotated positions in each transcript. For the analysis displayed in Extended Data Fig. 4d, we first calculated these ratios across the set of SuperScript IP and input samples. We then integrated the quantifications of fold changes by extracting the maximum fold change per position per transcript. We then quantified the proportion of pileups harbouring an ‘A’ as a function of this fold change, revealing that pileups beginning with an ‘A’ were more frequent at the higher-confidence sites, harbouring stronger fold changes.

mRNA expression analysis

To estimate expression levels, reads were aligned against the human genome using RSEM (version 1.2.31) with default parameters37. For robust comparison between different samples, we used trimmed mean of M values (TMM) normalization of the RSEM read counts38 as implemented by the edgeR package39 in R.

Prediction of RNA secondary structure

For predicting secondary structure in the region surrounding putative m1A sites, we extracted a sequence window of 24 bp, including 12 bp upstream of the modified site and 11 bp downstream. Free-energy calculations and predicted secondary structures were calculated using RNAfold version 2.1.5. The secondary structures were subsequently parsed, using an in-house script, to quantify the stem and loop lengths (Fig. 2).

Massively parallel reporter assay

The design and cloning of the massively parallel reporter assay library into a plasmid were described in ref. 13. A 10 cm plate of HEK293T cells was transiently transfected with equal amounts of TRMT6, TRMT61A, and the library plasmid using PolyJet (SignaGene). RNA was purified using NucleoZOL reagent (Macherey-Nagel). Sequence-specific m1A-seq-TGIRT was performed on total RNA essentially as described in ref. 13, except reverse transcription from the constant sequence stemming from the library plasmid (AGCATTAACCCTCACTAAAGGGAAAGG) was done using TGIRT-III (InGex). Adaptor ligation and PCR enrichment with an inner-plasmid-specific primer (GGTCCGATATCGAATGGCGC) were performed as previously described13.

Alignment of the massively parallel reporter assay data was performed as previously described13. To quantify misincorporations we used ‘samtools mpileup’, as described above. For the sequence logo depicted in Fig. 2j, we first extracted the 75th percentile of misincorporation rates following point-mutation of each of the indicated sites across each of the assayed 74 T-loops into each of the 4 nucleotides. For each position, this value was then divided by the sum of this value across all four nucleotides, to yield ‘relative misincorporation rate’ (summing up to 1, at each position). The height of each nucleotide at each position was then plotted in direct proportion to its relative misincorporation rate.

Annotation of mitochondrial and tRNA sites

All reads were aligned to the chrM assembly forming part of the human hg19 assembly, and the Supplementary Tables provide positions with respect to it. For consistency with the mitochondrial community, within the manuscript we refer to positions with respect to the slightly more updated chrM_rCRS assembly. For tRNAs, we refer to all positions in the figures on the basis of the standard tRNA nomenclature (so that the anticodon nucleotides are always numbered 34–36, and the T-loop between positions 54 and 60).

Quantification of misincorporation in GTEx and single-cell RNA-seq data

Raw fastq files were obtained for each of the files in these data sets, and aligned using STAR (as above). Mpileup was applied to quantify misincorporation levels across the positions detected in this study. For Fig. 4c, d, we filtered out all single cells in which an SNP was observed at position 13708, as this SNP severely reduces methylation. An SNP was called in this position, on the basis of the RNA-seq data, if more than 80% of the reads corresponded to the SNP nucleotide.

Polysome fractionation

Polysome fractionation was done as specified in ref. 40, with one exception: we used 10–50% sucrose gradient instead of 5–50%.

Targeted sequencing of m1A amplicons

For targeted measurement of m1A levels at specific loci, reverse transcription was done on 1 μg of RNA using random hexamers and TGIRT-III (InGex) reverse transcriptase. Amplicons were PCR amplified using a nested PCR approach, involving a first amplification step with gene-specific primers and a partial Illumina adaptor tail, and a second amplification leading to the incorporation of the full-length Illumina adapters. For ND5 amplicons, reverse transcription was done using a strand-specific primer instead of random hexamers, to avoid contamination by the ND5 antisense transcript. The resultant amplicon was amplified with primers including the full-length Illumina adapters in a single step. All primers can be found in Supplementary Table 5.

Luciferase assay

For the luciferase experiments, two plasmids were used: (1) pGL4.73, for expression of Renilla luciferase under an SV40 promoter, and (2) a plasmid encoding an ATG start codon followed by 60 bp surrounding the PRUNE m1A site (GCGGAGGCCGATTCGCCGTGTGGCGGGTTCGAGTCCCGCCTCCTGACTCTGGCCTCTAGTC) followed by firefly luciferase, all driven by a cytomegalovirus promoter. We constructed three derivatives based on this plasmid, point mutating the sequence and structure, as described in the text. These plasmids were pooled together and transfected into cells with either control DNA or with TRMT6/61A overexpression plasmids. The luciferase assay was done with a Promega kit according to the manufacturer’s instructions.

Code availability

Code for the analyses described in this paper is available from the corresponding author upon request.

Data availability

All m1A-seq data sets generated in this manuscript have been deposited in the GEO under accession number GSE97419. All other data are available from the corresponding author upon reasonable request.