Main

Nucleic acid bases exist predominantly in one neutral tautomeric form. This in turn gives rise to the strict Watson–Crick (WC) pairing rules (Fig. 1a) that govern how genetic information is replicated, transcribed, and translated. However, if bases adopt alternative energetically disfavoured tautomeric or anionic forms (Fig. 1a), pairing rules can be violated and new functions can emerge. For example, although rarely observed, minor tautomeric and anionic bases can form WC-like dG•dT1,2,3, dA•dC4, and rG•rU5,6 mispairs that are believed to contribute to spontaneous mutations1,7,8,9 and translational errors10. Chemical modifications that stabilize or lock bases in their anionic or enol-like forms can be mutagenic11,12 or expand the decoding capacity of transfer RNAs13,14. In addition, anionic and tautomeric forms of the bases are believed to play crucial roles in nucleic acid catalysis15,16, RNA–ligand recognition17,18, and in the therapeutic mechanisms of nucleic acid base analogues19.

Figure 1: Chemical exchange in dG•dT mispairs.
figure 1

a, WC dG•dC, sterically prohibited WC dG•dT, and WB dG•dT (R measured nuclei highlighted in red ovals). Below are four WC-like tautomeric and anionic (implied charge delocalization) bp. b, DNA duplex with a 13C/15N site-labelled dG•dT mispair. c, RD profiles for dG•dT (25 °C and pH 6.9) showing R2 + Rex as a function of the spin lock offset (Ωeff−1) and power (ωSL−1, in insets) with global fits to dG-N1, dG-C8, dT-N3, and dT-C6. Error bars represent experimental uncertainty (one s.d., see Methods). , , and P < 0.0001. d, RD profiles showing three-state exchange (25 °C and pH 8.4) and global three-state fit to dG-N1 and dT-N3. Error bars represent experimental uncertainty (one s.d.). , , and P < 0.0001.

PowerPoint slide

Despite growing evidence that rare tautomeric and anionic bases have important roles in nucleic acids, their occurrence, stabilities, and biological significance has remained elusive. Characterizing rare tautomeric and anionic bases in polynucleotides is a longstanding problem because such energetically unfavourable species typically exist in low abundance, for short periods of time, and involve movements of protons that are difficult to visualize at the atomic level. NMR relaxation dispersion (RD) techniques20,21,22 enable characterization of low-populated (populations of 0.1–10%) transient (lifetimes of micro- to milliseconds) states of nucleic acids23,24,25 that are often referred to as ‘excited states’ (ES). Here, we use NMR RD to characterize transient WC-like dG•dT and rG•rU mispairs in DNA and RNA that are stabilized by rare tautomeric and anionic bases and obtain evidence that they have universal roles in misincorporation during replication and translation.

Transient WC-like dG•dT tautomer mispair

dG•dT mispairs generally adopt a distinct ‘wobble’ (WB) geometry (Fig. 1a) since a WC geometry results in a steric clash between imino protons (Fig. 1a). However, enol tautomers of dG or dT, or their anionic form, can alleviate this steric clash and allow formation of WC-like dG•dT mispairs (Fig. 1a). Soon after the discovery of the DNA double helix, Watson and Crick hypothesized that such WC-like mispairs could provide a basis for spontaneous mutations7. We used NMR rotating frame spin relaxation (R) RD20,26,27 to examine whether WB dG•dT mispairs can transiently morph into such WC-like dG•dT mispairs in canonical DNA duplexes. For these studies we used a hairpin DNA duplex (hp-GT DNA) containing a site-specifically 13C/15N-labelled dG•dT WB mispair (Fig. 1b and Extended Data Fig. 1). Exchange between WB and WC dG•dT mispairs entails deprotonation of either dG-N1 or dT-N3 via tautomerization (neutral) or ionization (charged), both of which would induce large changes in N1/3 chemical shifts (CSs) and therefore give rise to substantial 15N RD. In contrast, because a WB-to-WC transition preserves an anti base and C2′-endo sugar pucker, it is expected to induce smaller changes in the sugar (dG-C1′ and dT-C1′) and base (dG-C8 and dT-C6) carbon CSs and therefore induce more limited 13C RD. We observed very substantial 15N RD at base imino dG-N1 and to a lesser extent at dT-N3, much less substantial 13C RD at base dG-C8 and dT-C6, and essentially no 13C RD at sugar dG-C1′ and dT-C1′ at pH 6.9 and 25 °C (Fig. 1c and Extended Data Figs 2, 3). This unique pattern of RD is consistent with exchange directed towards a transient WC-like mispair (Fig. 1a). It is inconsistent with exchange directed towards other base pair (bp) geometries such as Hoogsteen24 or base opened states (Supplementary Discussion 1 and Extended Data Fig. 4). A second exchange process that was apparent at pH 8.4 (Fig. 1d) will be discussed further below. Similar RD profiles were observed in a different DNA duplex (Extended Data Figs 1, 2), indicating that the observed dG•dT exchange occurs robustly in DNA duplexes.

The RD data measured at dG-N1, dT-N3, dG-C8 and dT-C6 could be globally fitted (Supplementary Table 1 and Extended Data Figs 2, 3) to a single exchange process directed towards an excited state (ES1) that has a population (pES1) of 0.17% and a lifetime (τES1) of 0.38 ms (Fig. 2a). ES1 is characterized by 15N CSs that are considerably downfield shifted for dG-N1 (ΔωN1 = +36 p.p.m.) and to a lesser extent dT-N3 (ΔωN3 = +18 p.p.m.) (Fig. 2b, Extended Data Fig. 4). The downfield-shifted imino nitrogen CSs are unprecedented for nucleic acids, and are directed towards the CSs of bases (dG and dT) that have been deprotonated owing to ionization or modifications that lock an enol-like form (Δω ≈ + 50–60 p.p.m.) (Extended Data Fig. 5)28,29,30. On the other hand, ES1 features much smaller changes in carbon CSs (Fig. 2b), consistent with a WB-to-WC transition.

Figure 2: Characterizing WC-like transient states.
figure 2

a, Population and lifetime of dG•dT ES1/ES2 measured in hp-GT DNA as a function of temperature (at pH 6.9) and pH (at 25 °C). Error bars in fitted parameters reflect experimental uncertainty (one s.d.). b, Differences between the GS (referenced to 0 p.p.m.) and ES CSs (Δω = ωES − ωGS) for hp-GT DNA. Error bars reflect experimental uncertainty (one s.d.). c, Measured CSs for ES1 are plotted against DFT-predicted values. and P = 0.003.

PowerPoint slide

It would be highly energetically disfavoured to simultaneously deprotonate dG-N1 and dT-N3 when forming ES1. Moreover, although the magnitude of the ES1 N1/N3 downfield CSs strongly suggests deprotonation, it is not as far downfield shifted as expected based on deprotonation of nucleotides in free solution (Extended Data Fig. 5). Therefore, the strongly but incompletely downfield-shifted dG-N1 and dT-N3 CSs suggest that ES1 consists of at least two WC-like species in rapid exchange on the NMR timescale in which either dG-N1 (dGenol•dT or dG•dT) or dT-N3 (dG•dTenol or dG•dT) is deprotonated (Fig. 1a). The ES1 population and CSs are largely independent of pH within the pH range of 6.0–7.9 (Fig. 2a and Supplementary Discussion 2). This is inconsistent with exchange directed towards ionic dG•dT and dG•dT (Extended Data Fig. 5). Rather, the population of ES1 increases with temperature (Fig. 2a) as expected for tautomeric species dGenol•dT and dG•dTenol.

Based on the measured pES1 and τES1, the free energy difference (ΔG) between GS and ES1 is 3.8 kcal mol−1 and the forward free energy barrier (ΔG) is 16.4 kcal mol−1 (Extended Data Fig. 6). These values are in good agreement with computationally predicted parameters (2.8–5.6 kcal mol−1 (ref. 31) and 17–21 kcal mol−1 (refs 31, 32), respectively) for dG•dT WB-to-WC tautomer transitions.

These computational studies also predict that dGenol•dT probably exists in fast exchange on the NMR timescale (free energy barrier 5–6 kcal mol−1, ref. 31) with a minor (20%) dG•dTenol species (Fig. 1a and Supplementary Discussions 3, 4). Under these conditions, the measured ES1 CSs would represent a population-weighted average of the two tautomeric states (Methods). We find that all ES1 CSs (dG-N1, dT-N3, dG-C8 and dT-C6) are in quantitative agreement with values predicted by density functional theory (DFT)33 calculations for a weighted dGenol•dT(80%)dG•dTenol(20%) equilibrium (Figs 2c and 3a).

Figure 3: Mutate-and-CS fingerprinting ES1 and ES2.
figure 3

a, Multi-state equilibrium between WB and WC-like dG•dT mispairs. ES1/ES2 populations and weights are shown (25 °C and pH 6.9). pES2 estimated based on the observed apparent pKa. b, CS fingerprinting dG•dT ES1/ES2 using chemical modifications and structure-based DFT predictions of CSs. ES1 DFT CSs are given for 80:20 dGenol•dT:dG•dTenol weighting. Error bars reflect experimental uncertainty (one s.d.). c, m6dG•dT structure12,30. d, dG•5BrdU ES2 stabilized relative to dG•dT ES2. Populations and weights (10 °C and pH 6.9) are shown.

PowerPoint slide

Transient WC-like dG•dT anionic mispair

Interestingly, upon increasing the pH to 8.4, we observed evidence for a second excited state (ES2), which is seen as a second peak in the off-resonance RD profile of dT-N3 (Fig. 1d). Global fitting of these RD data (Extended Data Figs 2 and 3) revealed two excited states (ES1 and ES2) that are most probably arranged in a linear topology (ES1dG•dTES2).

Compared to ES1, ES2 (ΔG = 16.4 and ΔG = 4.59 kcal mol−1) has a considerably lower population (pES2 ≈ 0.04%) and lifetime (τES2 ≈ 70 μs) at pH 8.4 (Fig. 2a). The dG-N1 and dT-N3 ES2 CSs are not only ‘swapped’ relative to ES1 such that dT-N3 experiences the larger downfield shift (ΔωN3 = +56 p.p.m.) while dG-N1 experiences a smaller downfield shift (ΔωN1 = +9 p.p.m.) (Fig. 2b), they are also more asymmetric in favour of a deprotonated dT species. In addition, unlike pES1, pES2 increases considerably with pH, consistent with ionization and the formation of dG•dT (Fig. 3a). The ES2 CSs are in excellent agreement with values computed using DFT assuming a WC-like dG•dT (100%) species (Fig. 3a, b). However, we cannot rule out that dG•dT is in rapid equilibrium with a WC-like dG•dT or an inverted wobble (iWB) geometry3,34,35 (Extended Data Fig. 4 and Supplementary Discussion 1) that falls outside detection limits (Fig. 3a).

Fingerprinting the dG•dT excited states

We adapted a mutate-and-CS fingerprint strategy24,25 to test the proposed ES1 and ES2 (Fig. 3a); chemical modifications are used to trap an ES, or induce specific perturbations to the GSES equilibrium. We trapped ES1 (dGenol•dT) using the mutagenic base O6-methyl-2′-deoxyguanosine (m6dG) (Fig. 3c and Extended Data Fig. 7), which is known to adopt a distorted WC dGenol•dT-like mispair12,30. Relative to the WB, this modification resulted in negligible changes in dG-C1′ (ΔωC1′ = −0.1 p.p.m.) and dT-C1′ (ΔωC1′ = +0.7 p.p.m.) CSs, and a small downfield shift in dG-C8 (ΔωC8 = +1 p.p.m.), consistent with the RD-derived ES1 CSs (Fig. 3b and Extended Data Fig. 7). The modification induced a small upfield shift in dT-C6 (ΔωC6 = −0.5 p.p.m.) that is inconsistent with the downfield dT-C6 CS (Fig. 3b) observed by RD. However, such a deviation is expected on the basis of DFT calculations (ΔωC6 = −2.3 p.p.m. for dT-C6 in the m6dG•T pair) (Supplementary Discussion 5) and can be attributed to minor deviations from an ideal dGenol•dT WC-like mispair geometry (Fig. 3c)12,30. Severe line broadening did not permit measurement of the dT-N3 CSs in these non-isotopically enriched samples.

To test the proposed dG•dT ES2, we measured the difference in dT-N3 CS between neutral and anionic dTTP (ΔωN3 = +55 p.p.m.) and found them to be in excellent agreement with the dT-N3 CS differences measured by RD (ΔωN3 = +56 p.p.m.) (Fig. 3b and Extended Data Fig. 5). In addition, we used the mutagenic thymidine-analogue, 5-bromo-2′-deoxyuridine (5BrdU) to push the equilibrium towards dG•5BrdU (Fig. 3d). This modification lowers the pKa of 5BrdU-N3 (8.6) and favours a WC-like dG•5BrdU geometry at high pH11. This modification increased the population of ES2 (ΔG = 15.1 and ΔG 3.37 kcal mol−1) by over two orders of magnitude at the expense of ES1 (ΔG = 16.0 and ΔG = 4.67 kcal mol−1) while minimally affecting the ES1 and ES2 CSs (Fig. 3b). The consistencies in ES2 CSs between dG•dT and dG•5BrdU mispairs further support a dominant WC-like ES2, rather than an iWB bp, in naked DNA. The unaffected ES1 CSs indicate that 5BrdU does not appreciably impact the dGenol5BrdUdG•5BrdUenol equilibrium (Fig. 3d) relative to dG•dT (Fig. 3a), consistent with prior computational studies31,36.

Transient WC-like rG•rU mispairs in RNA

If the observed ESs correspond to WC-like dG•dT mispairs, it is predicted that similar ESs should arise in rG•rU mispairs in RNA where WC base pairs are also readily accommodated within the A-form helix. To test this hypothesis, we carried out analogous pH- and temperature-dependent RD measurements on two RNA duplexes. RD profiles measured for rG•rU in A-form RNA (Fig. 4a and Extended Data Fig. 8) are very similar to those measured for dG•dT in B-form DNA (Fig. 1c, d). Global analyses of the RD data (Supplementary Table 1) revealed an apparent three-state exchange process at pH ≥ 7.9 (Extended Data Fig. 3). The RD-derived CSs (Fig. 4b and Extended Data Fig. 9), together with the pH and temperature dependence of the populations and lifetimes (Fig. 4c), are consistent with rG•rUenolrGenol•rU as ES1 and rG•rU as ES2 (Fig. 4d) as observed in DNA. The rG•rU ES1 forward free energy barrier (ΔG = 15.7 and ΔG = 3.86 kcal mol−1) is consistent with barriers measured for dG•dT ES1 (16.4 kcal mol−1) and computationally predicted barriers for G•U tautomerization (17.1 kcal mol−1)32.

Figure 4: Transient tautomeric and anionic WC-like mispairs in A-form RNA.
figure 4

a, RNA duplex and R RD profiles with three-state global fits to rG-N1 and rU-N3 (20 °C and pH 7.9). Error bars represent experimental uncertainty (one s.d.). , , and P < 0.0001. b, CSs for rG•rU ES1/ES2 compared to structure-based DFT predictions and rUTP ionization. Error bars reflect experimental uncertainty (one s.d.). c, Population and lifetime of rG•rU ES1/ES2 measured as a function of temperature (at pH 6.9) and pH (at 20 °C). dG•dT ES1 and ES2 shown (in grey) for comparison. Error bars reflect experimental uncertainty (one s.d.). d, Multi-state equilibrium between WB and WC-like rG•rU mispairs (20 °C and pH 7.9).

PowerPoint slide

Computational studies36 show that dG•dUenol is stabilized relative to dG•dTenol. This is predicted to tilt the rapid rGenol•rUrG•rUenol equilibrium in favour of rG•rUenol (40%) in RNA as compared to dG•dTenol (20%) in DNA (Methods). We find that in RNA, the ES1 rU-N3 CSs are slightly more downfield shifted (ΔωN3 = +30 p.p.m.) than rG-N1 (ΔωN1 = +26 p.p.m.) (Fig. 4b). Reweighting the DFT-predicted CSs assuming 60:40 ratios of rGenol•rU:rG•rUenol gives an excellent fit to RD-derived values (Fig. 4b), and are in better agreement than 80:20 rGenol•rU:rG•rUenol (Extended Data Fig. 9). Changes observed in the ES1 CSs at higher pH suggest a potentially more complex exchange process (Supplementary Discussion 6). As with dG•dT, the WC-like rG•rU may exist in equilibrium with both a WC-like rG•rU and/or an iWB rG•rU (Fig. 4d).

rG•rU wobbles are widespread in RNA, where they play important structural and functional roles37. We therefore examined whether the ESs observed here would occur in more complex RNA structural contexts. Indeed, the rU-N3 15N RD profiles measured for two WB rG•rU mispairs in a 69-nucleotide Bacillus subtilis guanine riboswitch (Extended Data Fig. 8) provide initial evidence (ΔωN3 = +44–47 p.p.m. and pES ≈ 0.04% at pH 7.9) for WC-like rG•rU mispairs in more complex RNA structures (Extended Data Fig. 9 and Supplementary Discussion 7). Therefore, we expect that transient WC-like rG•rU mispairs exist robustly across the RNA transcriptome.

Biological implications

Initial selection of NTPs during replication, and tRNAs during translation, strongly relies on WC stereochemical geometry as a means of discriminating against mispairs38,39. The low error rate (10−3 to 10−6) during initial selection accounts for most of the overall fidelity of replication (10−6 to 10−10)40,41,42 and translation (10−3 to 10−5)43,44,45. By stereochemically mimicking the WC geometry, the ES WC-like dG•dT and rG•rU mispairs observed here can provide a mechanism for evading WC fidelity checks during initial substrate selection1,8,10. The intrinsic probabilities with which WC-like mispairs form have long been suspected as important determinants of substitution mutation probability8,10. By carrying out the first measurements of the intrinsic probabilities with which WC-like dG•dT and rG•rU mispairs form in native DNA/RNA systems, we are able to obtain unique insights into the mechanisms of misincorporation and the potential roles of ES1/ES2.

We find that the probabilities with which WC-like dG•dT ES1 and ES2 form in duplex DNA (10−3 to 10−5) span the dGTP•dT/dG•dTTP misincorporation and base substitution probabilities seen during replication using high-fidelity polymerases with little/no proofreading capabilities (Fig. 5a, Extended Data Fig. 10 and Supplementary Discussion 8)46,47,48. Similarly, the WC-like rG•rU ES1 and ES2 probabilities (10−3 to 10−4) span the majority of amino acid misincorporation probabilities arising specifically due to rG•rU pairing at any codon position (10−3 to 10−5)43,44 (Fig. 5b and Supplementary Discussion 8). Note that some of the amino acid misincorporation probabilities (10−5) are lower than the measured rG•rU ES2 probability, which could be due to translational proofreading45 and/or lower pH conditions that destabilize ES2. These results, together with previous structural studies showing that WB and WC-like mispairs can exist within polymerase1,2,3 and ribosome5,6,13 active sites, strongly suggest that energetic competition between WB and WC-like mispairs is robust and is a key determinant of misincorporation probability during replication and translation (Supplementary Discussion 9). However, additional studies are needed to determine the probabilities with which WC-like mispairs form within the unique environment of polymerases and ribosomes. A recent MD study found that rGenol•rU tautomers can be stabilized in a ribosome context, but challenges their involvement in decoding errors based on predicted tRNA binding energies49.

Figure 5: Correlation between WC-like bp and misincorporation probabilities.
figure 5

a, dG•dT ES1 (blue square) and ES2 (green triangle) probabilities; error bars reflect experimental uncertainty (one s.d.). pH-dependent dGTP•dT47. Averaged dGTP•dT/dG•dTTP misincorporation and base substitution probabilities; error bars reflect the s.d. (Supplementary Discussion 8). ES2 fit to Henderson–Hasselbalch equation shown as green line (see Methods). A blue line connects ES1 points for visualization purposes. b, rG•rU ES1 (blue square, n =  9) and ES2 (green triangle, n = 5) probabilities measured at varied conditions (Supplementary Table 1), and amino acid misincorporation probabilities (red ‘X’, n =  104) due to rG•rU mispairs as given in ref. 44. Errors bars in ES1/ES2 reflect experimental uncertainty (one s.d.). c, The probabilities of dGTP•dT/dG•dTTP misincorporation (error as given) by avian myeloblastosis virus RT (ref. 47) versus the pKa-predicted probability of forming a WC-like dG•dT mispair as a function of pH; error bars reflect experimental uncertainty (one s.d.). dGTP•dT and P = 0.0001. dG•dTTP and P = 0.004.

PowerPoint slide

The pH-dependent dG•dT misincorporation probability47,48,50 points to the involvement of an anionic species in misincorporation1,47. Our results strongly suggest that this species is most likely to be ES2 dG•dT and not the energetically disfavoured dG•dT. We observe excellent agreement between the pKa-predicted probability of forming ES2 dG•dT and pH-dependent dGTP•dT/dG•dTTP misincorporation probabilities measured for a reverse transcriptase47 that lacks any proofreading ability (Fig. 5c and Extended Data Fig. 10). We note that the correlation is reduced at more extreme pH, near the pKas of other ionizable groups in proteins and DNAs (Extended Data Fig. 10 and Supplementary Discussion 10). We also find that dG•5BrdU enhances the probability of forming a WC-like ES2 (Fig. 3d) and correspondingly results in an approximately eightfold increase in dG•5BrdU misincorporation47. These data indicate that, for this polymerase, misincorporation proceeds predominantly via a WC-like dG•dT as proposed47. It is very likely that parameters such as polymerase types, DNA sequence, and the chemical environment can affect the relative stabilities and lifetimes of the anionic, tautomeric, and WB mispairs3. Therefore we expect this to affect the flux through distinct misincorporation pathways involving different WC and WB conformations, which may help to explain the broad range of misincorporation probabilities.

Our findings suggest that unconventional enol tautomeric and anionic bases exist robustly in genomes. We anticipate that these rare tautomeric and anionic bases play unique roles in DNA damage induction and repair, nucleic acid recognition, chemical modifications of nucleic acids, and catalysis. The NMR methods outlined here can immediately be applied to characterize tautomeric and anionic species, which we believe will not be restricted to dG•dT and rG•rU mispairs, but rather will be widespread across diverse nucleic acid motifs.

Methods

Sample preparation

NMR buffer

All duplex DNA and RNA samples were buffer-exchanged using a centrifugal concentrator (EMD Millipore) into a solution containing 25 mM sodium chloride (100 mM sodium chloride for Dickerson-GT DNA), 15 mM sodium phosphate, 0.1 mM EDTA, and 10% D2O with variable pH (6.0, 6.4, 6.8, 6.9, 7.9). pH values of 8.4 were obtained for hp-GT DNA and hp-GU-20 RNA samples by direct titration of pH 7.9 samples with filtered 0.5 M NaOH solution. Monovalent ion concentration subsequently increased by a small amount proportional to the NaOH titrated in but did not affect DNA/RNA conformation as confirmed by NMR. Natural isotopic abundance oligonucleotide sample concentrations ranged from 2 to 3.5 mM. 13C/15N-labelled oligonucleotide sample concentrations ranged from 0.7 to 3.5 mM. xpt-G riboswitch sample was diluted to 30 μM in a solution containing saturated guanine, denatured, and annealed on ice. Sample was buffer exchanged against either potassium acetate (pH 6.7) or sodium phosphate (pH 7.9) buffer and concentrated to 0.7–1.7 mM. Mg2+ was titrated in until total concentration was 5 mM.

Site-specifically 13C/15N-labelled DNA samples

Selectively 13C/15N-labelled DNA samples (hp-GT DNA, 5BrU5-hp-GT and 8BrG15-hp-GT DNA) were purchased from the Yale Keck Oligonucleotide Synthesis Facility and were synthesized using commercially available 2′-deoxyguanosine DMT-phosphoramidite (98% 13C10, 98% 15N5) and 2′-deoxythymidine phosphoramidite (98% 13C10, 98% 15N2) purchased from Cambridge Isotope Labs. hp-GT DNA was selectively 13C/15N-labelled at dT5 and dG15, 5BrU5-hp-GT DNA was 13C/15N-labelled at dG15, and 8BrG15-hp-GT was 13C/15N-labelled at dT5. Samples were purified using RP-HPLC before buffer exchange. NMR experiments were used to confirm native folding of hp-GT, 5BrU5-hp-GT, and 8BrG15-hp-GT DNA constructs.

Enzymatic synthesis of 13C/15N-labelled DNA samples

The Dickerson-GT DNA sample was synthesized in vitro using uniformly enriched 13C/15N dGTP and dTTP (Silantes GmbH) as previously described51. Reaction mixture was centrifuged and filtered to remove excess pyrophosphate and concentrated down to 1 ml in a 3 kDa molecular weight cut-off centrifugal concentrator (EMD Millipore). Sample was mixed with 1 ml of a formamide-based denaturing loading dye, denatured at 95 °C for 5 min, and loaded onto a 33 × 102 cm sequencing gel (20% polyacrylamide/8M urea) and run for 12 h to resolve target oligonucleotide from template and other nucleic acid species. Target band was shadowed briefly using a UV hand-lamp and excised before gel electroelution (Whatman, GE Healthcare), followed by ethanol precipitation. Sample purity was confirmed using gel electrophoresis (20% polyacrylamide/8M urea) stained with SybrGOLD before buffer exchange.

Enzymatic synthesis of 13C/15N-labelled RNA samples

The hp-GU-20, hp-GU-24 and xpt-G riboswitch RNA samples were prepared using in vitro transcription as previously described25 using uniformly enriched 13C/15N ribonucleotide triphosphates (hp-GU-20 RNA: rGTP & rUTP only, hp-GU-24: All and xpt-G: All). Purification was carried out as described above for 13C/15N-labelled DNA construct.

Unlabelled and unlabelled-modified DNA samples

hp-GT and Dickerson-GT DNA constructs at natural isotopic abundance were purchased from Integrated DNA Technologies. The O6-methyl-2′-deoxyguanosine mismatch constructs (m6G15-hp-GT and m6G4-Dickerson-GT DNA) were purchased from the Yale Keck Oligonucleotide Synthesis Facility. hp-GT and Dickerson-GT DNA constructs were desalted before buffer exchange. Unlabelled-modified samples were purified using RP-HPLC before buffer exchange.

Isotopically enriched dNTP and rNTP samples

Uniformly 13C/15N enriched dGTP, dTTP, rGTP and rUTP samples were purchased (Silantes GmbH) and added to an NMR buffer (25 mM sodium chloride, 15 mM sodium phosphate, 0.1 mM EDTA and 10% D2O at pH 6.9). Sample was adjusted to pH ≈ 12.5 directly using 5 M NaOH.

NMR experiments

Resonance assignment. The CS assignment for DNA and RNA constructs were obtained using aromatic [13C, 1H], aliphatic [13C, 1H], imino [15N, 1H] heteronuclear and [1H, 1H] nuclear Overhauser effect spectroscopy (NOESY) homonuclear correlation experiments. The data for all DNA constructs were acquired on an 18.8 T Agilent spectrometer equipped with a triple resonance HCN cryogenic probe, for the uniformly 13C/15N-labelled hp-GU-24 RNA construct on a 14.1 T Bruker Avance spectrometer equipped with a triple-resonance HCN cryogenic probe, and for the xpt-G RNA riboswitch on a 14.1 T Agilent spectrometer equipped with a Bruker HCPN cryogenic probe. All data were processed and analysed using the software NMRpipe52 and SPARKY (T. D. Goddard and D. G. Kneller, SPARKY 3, University of California, San Francisco), respectively. Resonance assignment for exchangeable and non-exchangeable 1H was performed using the 2D [1H, 1H] WATERGATE NOESY experiment53 (mixing time 250 ms) as described previously27,54, and their covalently bonded 13C/15N nuclei were assigned using heteronuclear single/multiple quantum coherence correlation experiments (HSQC or HMQC). For the labelled RNA constructs conventional HSQC experiments53 were acquired for all spins, while for unlabelled DNA constructs conventional HSQC was used for the aliphatic C1′ spins and band-selective optimized flip angle short-transient (SOFAST) HMQC55,56 were employed for the imino and aromatic spins.

15N R relaxation dispersion. 1D 15N R RD experiments27,57 targeting imino nitrogen resonances of interest were carried out at 14.1 T (hp-GT, 5BrU5-hp-GT, Dickerson-GT DNA and hp-GU-24 RNA) and 16.4 T (hp-GU-20 RNA) as previously described57. Raw data were processed using NMRpipe52 to generate a series of peak intensities. On- and off-resonance R RD profiles were recorded using spinlock powers (ωSL−1) ranging from 100 to 2,000 Hz, with the absolute offset frequencies (Ω−1 Hz) ranging from 0–3.5× the applied spinlock power (Supplementary Table 1). Offset frequencies greater than 3.5× the given spinlock power were not used owing to substantial R1 relaxation contributions27. Magnetization of the spins of interest was allowed to relax under an applied spinlock for the following durations: 0–120 ms for N1/N3 of hp-GT, 5BrU5-hp-GT, Dickerson-GT DNA and hp-GU-24 RNA; 0–100 ms for N1/N3 of hp-GU-20 RNA; and 0–80 ms, 0–74 ms, 0–68 ms for N3 of the xpt-G riboswitch.

13C R relaxation dispersion. 13C R RD experiments targeting carbon resonances of interest were carried out at 14.1 T as previously described24,27. On- and off-resonance R RD profiles were measured using spinlock powers (ωSL) ranging from 150 to 3,500 Hz, with the absolute offset frequencies (Ω−1, Hz) ranging from 0–3.5× the applied spinlock power (Supplementary Table 1). Magnetization of the spins of interest were allowed to relax under an applied spinlock for the following durations: 0–60 ms for C1′/C6/C8 of hp-GT DNA and C1′ of Dickerson-GT, 0–50 ms for C6/C8 Dickerson-GT DNA.

13C/15N CSs of ionized dNTPs and rNTPs. Carbon and nitrogen CSs of neutral and deprotonated dNTPs and rNTPs were collected at 25 °C and pH ≈ 6.9 and ≈ 12.5 using a 2D [15N, 13C] HMQC experiment on a 14.1 T Agilent spectrometer equipped with a Bruker HNCP cryoprobe.

Analysis of R data

Fitting of 13C and 15N R data. R values were calculated by fitting the decay of peak intensity versus relaxation delay to a monoexponential58. Errors in R were estimated using spectral noise and duplicate R data points using a suite of Mathematica notebooks58. Measured on- and off-resonance R data were globally fit to algebraic equations describing N-site chemical exchange using a Levenberg–Marquardt method, weighted to the experimental error in the R data. For two-state exchange, data was fit to the Laguerre equation (1)59 under the valid assumption that the populations are highly asymmetric, such that pGS pES where pES < 0.3. For the three-state chemical exchange model where kBC = kCB = 0, data was fit to both the three-state Laguerre equation (2) as well as the general three-state equation (3)59, where pGS pES and pES < 0.1. Fitted parameters derived from equation (2) and equation (3) are in excellent agreement with one another (Supplementary Table 1). Statistical tests, F-test and Akaike information criterion (AIC), were used to select the best-fit N-state exchange model24,25 (Extended Data Fig. 3). 13C and 15N RD data from the dG•dT or rG•rU mispair resonances for each construct, at each temperature and pH condition were fitted globally (where kex and pB are the shared-parameters) when possible. In the cases where 15N RD data was available but 13C RD data showed no chemical exchange, did not exhibit three-state exchange, or was not collected, the 15N N1/N3 RD data were globally fitted as described above. Bloch–McConnell60 (B-M) numerical simulations were used to validate the algebraic approximations for two- and three-state exchange scenarios.

Two-state Laguerre equation59,61:

Three-state Laguerre equation with no minor exchange25:

Three-state general equation with no minor exchange59,62:

in which R1 and R2 are the intrinsic longitudinal and transverse relaxation rates (s−1). The exchange rates are defined as kexi = kGS→ESi + kESi→GS, where kGS→ESi = pESikexi and kESi→GS = pGSkexi and where i = 1 or 2. The CS difference between the GS and ESs is given by ΔωESi = ΩESi − ΩGS, where Ω = Ωobs − ωrf defines the resonance offset from the carrier frequency (ωrf), Ωobs = (ΩGSpGS + ΩES1pES1) or Ωobs = (ΩGSpGS + ΩES1pES1 + ΩES2pES2), and where Ωobs ≈ ΩGS when pGS pESi, as is the case in the ESs measured. The average effective spinlock field in the rotating frame is given by ω2eff = Ω2 + ω2SL and ω2GS = (ΩGS – ωrf)2 + ω2SL, ω2ESi = (ΩESi – ωrf)2 + ω2SL and ωSL is the spinlock power. The tilt angle in the rotating frame is given by θ = arctan(ωSL/Ω).

Analysis of the three-state exchange model

We repeated measurements of RD profiles for hp-GT dG-N1 and dT-N3 at pH 8.4 and 25 °C on a different spectrometer and also obtained data that is consistent with three-state exchange (Extended Data Fig. 2 and Supplementary Table 1). In addition, we collected one additional spinlock power (800 Hz) for both resonances, and find that the inclusion of this extra spinlock power has negligible effect on the fitted exchange parameters indicating that they are robustly determined by the measured data. We note that differences in the RD profiles and fitted parameters between spectrometers are largely within error, with minor differences likely arising due to small differences in temperature and/or spinlock calibrations.

The dT-N3 resonance of hp-GT DNA at 25 °C and pH 7.9 was also individually refit assuming both two-state and three-state exchange. The individual dT-N3 three-state fit gave very similar fitted parameters as the dT-N3 in the three-state global fit with dG-N1. The F-test (at 0.05 significance level) favoured the three-state individual fit model over the two-state individual fit model. AIC also favoured the three-state model, estimating the three-state model to be 3.1 × 1015 times more likely to be the correct model than the two-state model. The individual two-state and three-state fits to dG-N1 RD profiles give similar agreement when the ES2 CS is fixed based on the globally fitted value; however, statistical tests slightly favour the simpler model (AIC favours two-state by approximately twofold).

The three-state model is supported by statistical tests (F-test and AIC favour the three-state over two-state exchange model) and also by B-M simulations (data not shown) for the hp-GT dG-N1 and dT-N3 data at pH 7.9. Here, R values were simulated, noise corrupted, and evaluated for the ability to report on the three-state exchange following the procedure reported in Bothe et al.63.

Bloch–McConnell three-state numerical simulations

Parameters derived from the algebraic three-state fits, along with the ωrf and ωSL, were used to simulate numerical solutions to the three-state B-M equations60 (Extended Data Fig. 3). The B-M simulations were carried out using a procedure similar to that described recently for two-state exchange63. Simulations were carried out assuming a 0.25 s relaxation delay under the applied ωSL.

Thermodynamic analysis of R RD-derived parameters

Temperature-dependent analysis of forward and reverse exchange rates in the site-labelled (dG15•dT5 13C/15N) hp-GT DNA and G/U labelled hp-GU-20 RNA samples were carried out as previously described24. This analysis employed the ES populations and exchange rates obtained based on two-state global fitting of dG15-N1 and dT5-N3 RD data measured in hp-GT DNA at 10, 20, 25 and 30 °C. For hp-GU-20 RNA, the analysis employed populations and exchange rates obtained based on two-state and three-state global fitting of rG16-N1 and rU5-N3 RD data measured at 10, 20, 25 and 30 °C at pH 6.9. Errors in the fitted thermodynamic parameters are given by weighted fits of the modified van’t Hoff equation24 to the RD-derived forward and reverse exchange rates and their errors. The NMR sample temperature was calibrated using 99.8% methanol-d4 (Cambridge Isotope Laboratories) using the equation T = −16.7467(Δδ)2 – 52.5130(Δδ) + 419.1381, where Δδ is the difference in CS (p.p.m.) between the hydroxyl and methyl proton64.

Density functional theory geometry optimizations and CS calculations

All DFT calculations33 were performed using Gaussian 09c (Gaussian, Inc.)65 and carried out on the University of Michigan’s Advanced Research Computing HPC cluster, Flux, as previously described24 with minor modifications to DFT method and basis set used. Geometry optimizations were carried out using the empirical exchange-correlation functional, M06-2X66, with the 6-31+G(d,p) basis-set. The 13C and 15N isotropic magnetic shielding (σ13C and σ15N) were calculated using the GIAO method with M06-2X/6-31+G(d,p). CSs of the nucleobases (δ13C and δ15N) were calculated by δ13C/15N = σ13C/15N – σTMS/NH3, where σTMS and σNH3 are the isotropic magnetic shieldings calculated for the reference compounds trimethylsilane (13C) and NH3 (15N), respectively.

Input structures for DFT calculations

We generated idealized B-/A-form helices corresponding to our sequence contexts (hp-GT DNA and hp-GU-24 RNA) using make-NA (J. Stroud, make-NA, http://structure.usc.edu/make-na/server.html; 2011). The duplexes were truncated to the trinucleotide step centred on the target mispair (GTG/CGC for hp-GT DNA and GUA/UGC for hp-GU-24 RNA). The sugar and phosphate moieties were removed and replaced with a methyl for i (dG•dT or rG•rU mispair), i + 1, and i − 1 base pairs to save on computational time. Although the structures lack a 2′-deoxyribose or ribose sugar moieties, they will be denoted as dG•dT or rG•rU to avoid confusion. All heavy atoms were frozen for the i + 1 and i − 1 base pairs while geometry optimizations were carried out for protons and heavy atoms of the central dG•dT or rG•rU base pair as well as the protons of the i + 1/i − 1 pairs. We performed full geometry optimizations on: dG•dT and rG•rU GS WB pairs, dGenol•dT, dG•dTenol, rGenol•rU, rG•rUenol, and dG•dT and rG•rU ES WC-like pairs. All converged to the expected WB or WC-like geometries. In the instance where the geometry optimizations of dG•dT and rG•rU starting states failed to converge to a stable WC-like dG•dT/rG•rU geometry (and instead converged to an iWB geometry, in vacuum), the WC-like dG•dTenol/rG•rUenol states were converted to dG•dT/rG•rU and geometry optimizations were carried out on the protons only. CSs for each state were calculated and later used in calculating population weighted CSs assuming different populations of these mispair species. In addition, while the sugar moieties were truncated to methyl groups to save on computation time, it should be noted that previous DFT studies of nucleotides have shown that tautomerization, primarily of pyrimidines, can have an effect on the sugar conformation67. However, we can rule out large changes in sugar pucker arising in ES1 based on the negligibly small chemical exchange contributions to both dG-C1′ and dT-C1′ (see Extended Data Fig. 2) and only very small changes in dG-C1′ and dT-C1′ CSs upon locking the enol-like form with m6dG•dT (|ΔωC1′| ≤ 0.7 p.p.m., see Fig. 3b and Extended Data Fig. 7). Finally, while the CSs of the anionic pairs are predicted assuming planar pair geometry, prior computational studies of G•T and G•T nucleobase pairs in isolation have shown that they can favour non-planar and non-WC geometry35.

dGenol•dT distance dependent DFT calculations

We carried out distance dependent DFT calculations68 on a pair of WC-like dGenol•dT N1/N9-methyl nucleobases in vacuum using the M06-2X method and 6-31+G(d,p) basis-set, as described above. The geometry of a dG•dT wobble pair and dGenol•dT WC-like pair was optimized with no constraints before CS calculations. The ideal N1-N3 distance of the dGenol•dT WC-like mispair was then manually varied from 2.44 Å to 3.8 Å in increments of 0.1 Å from 2.44–3.04 Å and then to 3.8 Å (Extended Data Fig. 4). At every increment the proton positions alone were optimized and CSs were calculated relative to an optimized dG•dT wobble base pair.

Population-weighted average DFT-predicted CS calculations

Based on the computationally predicted energetic differences between interconverting dGenol•dT and dG•dTenol base mispairs in water (ΔG = 0.7–0.8 kcal mol−1)31 and in a weakly polar medium (ΔG = 0.99 kcal mol−1)69, we can predict that the dG•dT ES1 CSs represent a population weighted average between interconverting dGenol•dT(80%)dG•dTenol(20%) states. Thus, the DFT-predicted CSs for dGenol•dT and dG•dTenol were summed in a population-weighted manner. It is noted that the computationally predicted energetic stabilities of the tautomeric states differ when calculated in water versus vacuum, or a weakly polar medium, with the values predicted in water giving the greatest agreement with our experimental results.

In the case of rG•rU ES1, computational studies have shown that dG•dUenol is 1 kcal mol−1 more stable than dG•dTenol in a DNA fragment36, suggesting that an rGenol•rUrG•rUenol equilibrium should be titled slightly more towards rG•rUenol than dG•dTenol in DNA. We can qualitatively estimate the relative stability between rGenol•rU and rG•rUenol to be 60:40 based on a best fit to the RD-derived CSs.

pKa fitting and probability estimation

The apparent pKas for hp-GT ES2, 5BrU-hp-GT ES2, and dGTP•dT misincorporation (pH 6.5–8.6) were fit to the Henderson–Hasselbalch equation using a Monte-Carlo (MC) approach. Here, 106 pBs at pH 7.9 and/or 8.4 were selected from a Gaussian distribution with mean pB value and standard deviation representing the uncertainty in pB based on fitting of the RD data. 106 fits to equation (4) were then carried out assuming these pB values to generate 106 pKas.

Where pB is the probability of forming ES2 or dGTP•dT misincorporation probability at a given pH. The resulting fitted pKa values were fitted to a Gaussian distribution. The mean value of the Gaussian distribution is the reported pKa value and the standard deviation is assumed to be the error. An analogous approach was used to back-calculate predicted pBs at a given pH using the pKa derived by the above method.