Abstract
Analogous to alternative splicing, alternative polyadenylation (APA) has long been thought to occur independently at proximal and distal polyA sites. Using fractionation-seq, we unexpectedly identified several hundred APA genes in human cells whose distal polyA isoforms are retained in chromatin/nuclear matrix and whose proximal polyA isoforms are released into the cytoplasm. Global metabolic PAS-seq and Nanopore long-read RNA-sequencing provide further evidence that the strong distal polyA sites are processed first and the resulting transcripts are subsequently anchored in chromatin/nuclear matrix to serve as precursors for further processing at proximal polyA sites. Inserting an autocleavable ribozyme between the proximal and distal polyA sites, coupled with a Cleave-seq approach that we describe here, confirms that the distal polyA isoform is indeed the precursor to the proximal polyA isoform. Therefore, unlike alternative splicing, APA sites are recognized independently, and in many cases, in a sequential manner. This provides a versatile strategy to regulate gene expression in mammalian cells.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
All deep sequencing data from this study have been deposited in the Gene Expression Omnibus under series accession number GSE165742. Source data are provided with this paper. Other data are available upon reasonable request. Source data are provided with this paper.
References
Derti, A. et al. A quantitative atlas of polyadenylation in five mammals. Genome Res 22, 1173–1183 (2012).
Brumbaugh, J. et al. Nudt21 controls cell fate by connecting alternative polyadenylation to chromatin signaling. Cell 172, 106–120 (2018).
Sandberg, R., Neilson, J. R., Sarma, A., Sharp, P. A. & Burge, C. B. Proliferating cells express mRNAs with shortened 3′ untranslated regions and fewer microRNA target sites. Science 320, 1643–1647 (2008).
Alt, F. W. et al. Synthesis of secreted and membrane-bound immunoglobulin mu heavy chains is directed by mRNAs that differ at their 3′ ends. Cell 20, 293–301 (1980).
Ji, Z., Lee, J. Y., Pan, Z., Jiang, B. & Tian, B. Progressive lengthening of 3′ untranslated regions of mRNAs by alternative polyadenylation during mouse embryonic development. Proc. Natl Acad. Sci. USA 106, 7028–7033 (2009).
Singh, I. et al. Widespread intronic polyadenylation diversifies immune cell transcriptomes. Nat. Commun. 9, 1716 (2018).
Lee, S. H. et al. Widespread intronic polyadenylation inactivates tumour suppressor genes in leukaemia. Nature 561, 127–131 (2018).
Mayr, C. & Bartel, D. P. Widespread shortening of 3′ UTRs by alternative cleavage and polyadenylation activates oncogenes in cancer cells. Cell 138, 673–684 (2009).
Masamha, C. P. et al. CFIm25 links alternative polyadenylation to glioblastoma tumour suppression. Nature 510, 412–416 (2014).
Mueller, A. A., van Velthoven, C. T., Fukumoto, K. D., Cheung, T. H. & Rando, T. A. Intronic polyadenylation of PDGFRalpha in resident stem cells attenuates muscle fibrosis. Nature 540, 276–279 (2016).
Mayr, C. What are 3′ UTRs doing? Cold Spring Harb. Perspect. Biol. 11, a034728 (2019).
Spies, N., Burge, C. B. & Bartel, D. P. 3′ UTR-isoform choice has limited influence on the stability and translational efficiency of most mRNAs in mouse fibroblasts. Genome Res 23, 2078–2090 (2013).
Gruber, A. R. et al. Global 3′ UTR shortening has a limited effect on protein abundance in proliferating T cells. Nat. Commun. 5, 5465 (2014).
Shi, Y. et al. Molecular architecture of the human pre-mRNA 3′ processing complex. Mol. Cell 33, 365–376 (2009).
Shi, Y. & Manley, J. L. The end of the message: multiple protein-RNA interactions define the mRNA polyadenylation site. Genes Dev. 29, 889–897 (2015).
Proudfoot, N. J. Ending the message: poly(A) signals then and now. Genes Dev. 25, 1770–1782 (2011).
Nevins, J. R. & Darnell, J. E. Jr. Steps in the processing of Ad2 mRNA: poly(A)+ nuclear sequences are conserved and poly(A) addition precedes splicing. Cell 15, 1477–1493 (1978).
Salditt-Georgieff, M., Harpold, M., Sawicki, S., Nevins, J. & Darnell, J. E. Jr. Addition of poly(A) to nuclear RNA occurs soon after RNA synthesis. J. Cell Biol. 86, 844–848 (1980).
McCracken, S. et al. The C-terminal domain of RNA polymerase II couples mRNA processing to transcription. Nature 385, 357–361 (1997).
Hirose, Y. & Manley, J. L. RNA polymerase II is an essential mRNA polyadenylation factor. Nature 395, 93–96 (1998).
Bentley, D. L. Coupling mRNA processing with transcription in time and space. Nat. Rev. Genet. 15, 163–175 (2014).
Tian, B. & Manley, J. L. Alternative polyadenylation of mRNA precursors. Nat. Rev. Mol. Cell Biol. 18, 18–30 (2017).
Wuarin, J. & Schibler, U. Physical isolation of nascent RNA chains transcribed by RNA polymerase II: evidence for cotranscriptional splicing. Mol. Cell. Biol. 14, 7219–7225 (1994).
Tilgner, H. et al. Deep sequencing of subcellular RNA fractions shows splicing to be predominantly co-transcriptional in the human genome but inefficient for lncRNAs. Genome Res. 22, 1616–1625 (2012).
Pandya-Jones, A. et al. Splicing kinetics and transcript release from the chromatin compartment limit the rate of Lipid A-induced gene expression. RNA 19, 811–827 (2013).
Herman, R., Weymouth, L. & Penman, S. Heterogeneous nuclear RNA-protein fibers in chromatin-depleted nuclei. J. Cell Biol. 78, 663–674 (1978).
Xing, Y. G. & Lawrence, J. B. Preservation of specific RNA distribution within the chromatin-depleted nuclear substructure demonstrated by in situ hybridization coupled with biochemical fractionation. J. Cell Biol. 112, 1055–1063 (1991).
Mortillaro, M. J. et al. A hyperphosphorylated form of the large subunit of RNA polymerase II is associated with splicing complexes and the nuclear matrix. Proc. Natl Acad. Sci. USA 93, 8253–8257 (1996).
Han, J., Xiong, J., Wang, D. & Fu, X. D. Pre-mRNA splicing: where and when in the nucleus. Trends Cell Biol. 21, 336–343 (2011).
Castello, A. et al. System-wide identification of RNA-binding proteins by interactome capture. Nat. Protoc. 8, 491–500 (2013).
Saitoh, N. et al. Proteomic analysis of interchromatin granule clusters. Mol. Biol. Cell 15, 3876–3890 (2004).
Melnik, S. et al. The proteomes of transcription factories containing RNA polymerases I, II or III. Nat. Methods 8, 963–968 (2011).
Hegele, A. et al. Dynamic protein-protein interaction wiring of the human spliceosome. Mol. Cell 45, 567–580 (2012).
Zhu, Y. et al. Molecular mechanisms for CFIm-mediated regulation of mRNA alternative polyadenylation. Mol. Cell 69, 62–74 (2018).
Lackford, B. et al. Fip1 regulates mRNA alternative polyadenylation to promote stem cell self-renewal. EMBO J. 33, 878–889 (2014).
Yao, C. et al. Transcriptome-wide analyses of CstF64-RNA interactions in global regulation of mRNA alternative polyadenylation. Proc. Natl Acad. Sci. USA 109, 18773–18778 (2012).
Herzog, V. A. et al. Thiol-linked alkylation of RNA to assess expression dynamics. Nat. Methods 14, 1198–1204 (2017).
Hu, J., Lutz, C. S., Wilusz, J. & Tian, B. Bioinformatic identification of candidate cis-regulatory elements involved in human mRNA polyadenylation. RNA 11, 1485–1493 (2005).
Sheets, M. D., Ogg, S. C. & Wickens, M. P. Point mutations in AAUAAA and the poly (A) addition site: effects on the accuracy and efficiency of cleavage and polyadenylation in vitro. Nucleic Acids Res. 18, 5799–5805 (1990).
Chen, L. et al. R-ChIP using inactive RNase H reveals dynamic coupling of r-loops with transcriptional pausing at gene promoters. Mol. Cell 68, 745–757 (2017).
Fong, N. et al. Effects of transcription elongation rate and Xrn2 exonuclease activity on RNA Polymerase II termination suggest widespread kinetic competition. Mol. Cell 60, 256–267 (2015).
Karginov, F. V. et al. Diverse endonucleolytic cleavage sites in the mammalian transcriptome depend upon microRNAs, Drosha, and additional nucleases. Mol. Cell 38, 781–788 (2010).
Logan, J., Falck-Pedersen, E., Darnell, J. E. Jr. & Shenk, T. A poly(A) addition site and a downstream termination region are required for efficient cessation of transcription by RNA polymerase II in the mouse beta maj-globin gene. Proc. Natl Acad. Sci. USA 84, 8306–8310 (1987).
Connelly, S. & Manley, J. L. A functional mRNA polyadenylation signal is required for transcription termination by RNA polymerase II. Genes Dev. 2, 440–452 (1988).
Kamieniarz-Gdula, K. et al. Selective roles of vertebrate PCF11 in premature and full-length transcript termination. Mol. Cell 74, 158–172 (2019).
Eaton, J. D., Francis, L., Davidson, L. & West, S. A unified allosteric/torpedo mechanism for transcriptional termination on human protein-coding genes. Genes Dev. 34, 132–145 (2020).
Legendre, M. & Gautheret, D. Sequence determinants in human polyadenylation site selection. BMC Genomics 4, 7 (2003).
Enriquezharris, P., Levitt, N., Briggs, D. & Proudfoot, N. J. A pause site for RNA polymerase-Ii is associated with termination of transcription. EMBO J. 10, 1833–1842 (1991).
Pinto, P. A. et al. RNA polymerase II kinetics in polo polyadenylation signal selection. EMBO J. 30, 2431–2444 (2011).
Denome, R. M. & Cole, C. N. Patterns of polyadenylation site selection in gene constructs containing multiple polyadenylation signals. Mol. Cell. Biol. 8, 4829–4839 (1988).
Bauren, G., Belikov, S. & Wieslander, L. Transcriptional termination in the Balbiani ring 1 gene is closely coupled to 3′-end formation and excision of the 3′-terminal intron. Genes Dev. 12, 2759–2769 (1998).
Niwa, M., Rose, S. D. & Berget, S. M. In vitro polyadenylation is stimulated by the presence of an upstream intron. Genes Dev. 4, 1552–1559 (1990).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Kechin, A., Boyarskikh, U., Kel, A. & Filipenko, M. cutPrimers: a new tool for accurate cutting of primers from reads of targeted next generation sequencing. J. Comput Biol. 24, 1138–1143 (2017).
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Gruber, A. J. et al. A comprehensive analysis of 3′ end sequencing data sets reveals novel polyadenylation signals and the repressive role of heterogeneous ribonucleoprotein C on cleavage and polyadenylation. Genome Res. 26, 1145–1159 (2016).
Bushnell, B., Rood, J. & Singer, E. BBMerge–accurate paired shotgun read merging via overlap. PLoS ONE 12, e0185056 (2017).
Neumann, T. et al. Quantification of experimentally induced nucleotide conversions in high-throughput sequencing datasets. BMC Bioinf. 20, 258 (2019).
Braunschweig, U. et al. Widespread intron retention in mammals functionally tunes transcriptomes. Genome Res. 24, 1774–1786 (2014).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Acknowledgements
This study was supported by grants from the China NSFC projects (grant nos. 31922039, 31670827 and 31871316) and the National Key R&D Program of China (grant no. 2017YFA0504400) to Y.Z., the China NSFC projects (grant no. 31800689) to P.T., and Hubei Provincial Natural Science Foundation of China (grant nos. 2020CFA057 to Y.Z. and 2020CFA017 to Y.Z. and Y.X.). Part of computation in this work was done on the supercomputing system in the Supercomputing Center of Wuhan University.
Author information
Authors and Affiliations
Contributions
P.T. and Y.Z. conceived the study. P.T., G.L., L.H., W.R. and C.Z. performed the experiments. Y.Y., M.W., X.G. and Y.Z. performed the analysis of the sequencing data. X.Z., D.L. and Y.X. contributed to critical experimental information. P.T. and Y.Z. wrote the manuscript with input from Y.Y. X.-D.F. participated in the project from 2016 to 2018 as a visiting professor and then contributed to manuscript packaging and revision in 2021. All authors discussed the results and approved the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review information
Nature Structural and Molecular Biology thanks Nicholas Proudfoot and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Anke Sparmann and Beth Moorefield were the primary editors on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Nascent RNAs are tightly associated with the nuclear matrix.
a, Relative RNA amount extracted from different cellular fractions. Data are presented as mean values ±SD (n = 3). b, Western blotting of the extracts from different fraction with the antibody N20 targeting the largest subunit of RNA polymerase II. Data were from n = 1 independent experiment. c, Histogram of coSI (completed Splicing Index) values of exons. d, Boxplots of coSI values in bins of exons by their distances to the annotated polyA site. e, UCSC genome browser tracks showing the rRNA depleted RNA-seq signals in the nuclear matrix across the genes ADK and GPHN, respectively. The lower and upper hinges of the boxplots correspond to the first and third quartiles (the 25th and 75th percentiles). The upper whisker extends from the hinge to the largest value no further than 1.5 * IQR from the hinge (where IQR is the interquartile range, or distance between the first and third quartiles). The lower whisker extends from the hinge to the smallest value at most 1.5 * IQR of the hinge. The centre line represents the median value.
Extended Data Fig. 2 The distribution of long 3′UTR isoform in three cellular compartments.
a, UCSC genome browser tracks showing the genes with a higher percentage of longer 3′UTR isoforms (dotted box) in both NM and NP, compared with CY, from polyA+ RNA-seq. NM: nuclear matrix, NP: nucleoplasm, CY: cytoplasm. b, Gene examples with higher percentage of long 3′UTR isoforms mainly observed in NM, and similar lever in NP and CY. c, Illustration of Percentage of dPAS Usage Index (PDUI) computation. d, Distribution of the PDUI in three cellular compartments. PDUI values are from the 654 genes identified in Fig. 1 f.
Extended Data Fig. 3 Characterization of nuclear matrix polyA- RNA.
a, Percentages of exonic and intronic reads in CY, NP, NM polyA+ and NM polyA−RNA-seq data. b,c, UCSC genome browser view of NM polyA+ and polyA− RNA signals in the region from TSS (transcription start site) to 10 kb downstream TES (transcription end site) for FUBP1 (b) and hnRNPA2B1 (c). d, Meta profiles of NM polyA+ and polyA− RNA signals in the ±250 nt region flanking the pPAS (left) and dPAS (right). The mean read densities were normalized and shown in arbitrary units (a.u.).
Extended Data Fig. 4 The distal polyA sites are stronger than the proximal polyA sites.
a, Reporter constructed to test the strengths of polyA sites. Fluc and Rluc are Firefly luciferase and Renilla luciferase, respectively. The polyA site (PAS) to be tested is the region ±200 nt flanking the cleavage site. IRES, internal ribosomal entry site. b, Relative luciferase activity of different polyA sites represented by mean ± SD (n = 5). The p-values are based on a two-tailed unpaired t test: ***p < 0.001. c, Constructs with the paired proximal and distal PASs in two different orders. The primers used for RT-PCR were indicated. d, RT-PCR results of two different constructs (C1 and C2) are shown for PASs from 4 genes. Data were from n = 1 independent experiment.
Extended Data Fig. 5 Quality control for the SLAM PAS-seq.
a, Rates of nucleotide substitutions with or without DRB treatment. b, Relative T-C nucleotide conversion of mRNA and mitochondria RNA w/o DRB treatment. Data are presented as mean values ± SD (n = 2). c, Correlation coefficients of expression values in CPM between pairs of samples. total: total reads; new: nascent reads with T-C conversion. d, Difference of d/p ratios between the nascent RNA and steady state RNA. NM, NM&NP, NP, and others are different groups of genes with longer 3′UTR isoform enriched in the NM only, both NM and NP, NP only (groups as Fig. 1 g), non-enriched in neither NM nor NP (gray points in Fig. 1e,f), respectively. NP: nucleoplasm; NM: nuclear matrix. The p-values are based on a two-tailed unpaired t test. e, Scatter plot showing the d/p ratios of total RNA-seq with or without 4sU treatment in cell culture. f, Gene counts with different numbers of APA pairs using the most distal PAS as reference. g, Summary of genes classified by the d/p ratio of the nascent over steady state RNA of their APA pairs. h, Box plot showing the d/p ratios of total RNA is altered slightly upon transcription inhibition with DRB treatment. The dotted lines represent the value ±1. The lower and upper hinges of the boxplots correspond to the first and third quartiles (the 25th and 75th percentiles). The upper whisker extends from the hinge to the largest value no further than 1.5 * IQR from the hinge (where IQR is the interquartile range, or distance between the first and third quartiles). The lower whisker extends from the hinge to the smallest value at most 1.5 * IQR of the hinge. The centre line represents the median value.
Extended Data Fig. 6 Sequence features around the proximal and distal polyA sites.
a, Enrichment score (log2[frequency in PAS / frequency in random sequence]) for AAUAAA, UGUA, and GU/U-rich motifs around the cleavage sites (±100 nt) of all polyA sites. b, Nucleotide frequency around (±100 nt) pPAS and dPAS in 4 different groups as in Fig. 3c. c, Distribution of U-rich 6-mer motifs (described in the Methods) in the region −100–0 nt upstream of the pPAS (blue) and dPAS (yellow) from 4 different groups.
Extended Data Fig. 7 Isoforms with extended 3′UTR do not produce proteins.
a, Illustration of alternative UTR (aUTR) and its insertion in the 3′UTR of GFP reporter gene. b, The effect of aUTR from CPSF6 and FUBP1 on the expression reporter detected by WB. Another flag expression plasmid was used as transfection control. Data were from n = 2 independent experiments. c, Two models for the processing of the proximal polyA sites. Model 1: the proximal and distal polyA sites are processed independently from each other (left); Model 2: the processing of distal polyA site is prior to the proximal one and the dPAS isoform acts as the intermediate for the proximal one (right). d, Validation of the inserted ribozyme by Sanger sequencing.
Extended Data Fig. 8 Little impact of ribozyme insertion downstream of the distal PAS on gene expression.
a, Diagram of the reporters with ribozyme insertion downstream of dPAS in the FUBP1-based or hnRNPA2B1-based reporter. b,c, The effects of ribozyme insertion downstream of dPAS in FUBP1 (b) or hnRNPA2B1 (c) on the exogenous reporter at the protein (left) and RNA (right) levels. The GFP plasmid served as control for transfection. Data were from n = 3 independent experiments and are presented as mean values± SD for the two genes. d, Diagram of ribozyme insertion upstream (up) or downstream (down) of dPAS in endogenous hnRNPA2B1. e, PCR validation of ribozyme insertion downstream of dPAS in hnRNPA2B1. Data were from n = 1 independent experiment. f, Effects of ribozyme insertion downstream of dPAS in endogenous hnRNPA2B1 at the protein (left) and RNA (right) levels. Bar graph represents mean ± SD (n = 3). The p-values are based on a two-tailed unpaired t test: *p < 0.05, n.s. represents non-significant.
Extended Data Fig. 9 Features of the 5′ end signals of Cleave-seq.
a, UCSC genome browser view of the 5′ end signals of polyA+ RNA from Cleave-seq on DGCR8 gene. The zoom-in view of the 20 nt region flanking the Drosha cleavage site (left dotted) is shown in the dotted box on the right (* marks the 5′ end base). b, Upset plot of the identified cleavage sites from 4 Cleave-seq libraries. Only the peaks found in at least 2 samples were included. For the 654 genes with longer 3′UTR enriched in NM, the number of peaks located in the downstream 10 nt window of their pPAS and dPAS are shown in the upper right box. c, Violin plot depicting the ratio of the number of Cleave-seq reads in the 10 nt window downstream pPAS over that of dPAS (n = 160). The dotted grey line represents 10-fold ratio. The red line represents the median. The lower and upper hinges of the box plots correspond to the first and third quartiles (the 25th and 75th percentiles). The p-value was determined using the two-sided Wilcoxon test.
Extended Data Fig. 10 Regulation of splicing through progressive polyadenylation.
a,b, Splicing analysis of the rescue reporters by RT-PCR for construct C5 (a) and C6 (b) in Fig. 6b. Data were from n = 1 independent experiment. c, Illustration of the formula to calculate the PIR score. d, Proposed model for progressive polyadenylation mediated splicing regulation.
Supplementary information
Supplementary Table 1
Supplementary Table 1 Nuclear matrix polyA+ RNA interacting proteins. Supplementary Table 2 Primers used in this study.
Source data
Source Data Fig. 1
Unprocessed western blots.
Source Data Fig. 2
Unprocessed western blots/gels.
Source Data Fig. 2
Source data for NM RNA interactome mass spectrometry.
Source Data Fig. 4
Unprocessed western blots/gels.
Source Data Fig. 4
Statistical source data.
Source Data Fig. 5
Unprocessed western blots.
Source Data Fig. 5
Statistical source data.
Source Data Fig. 6
Unprocessed western blots/gels.
Source Data Extended Data Fig. 1
Unprocessed western blots.
Source Data Extended Data Fig. 1
Statistical source data.
Source Data Extended Data Fig. 4
Unprocessed gels.
Source Data Extended Data Fig. 4
Statistical source data.
Source Data Extended Data Fig. 5
Statistical source data.
Source Data Extended Data Fig. 7
Unprocessed western blots.
Source Data Extended Data Fig. 8
Unprocessed western blots/gels.
Source Data Extended Data Fig. 8
Statistical source data.
Source Data Extended Data Fig. 10
Unprocessed gels.
Rights and permissions
About this article
Cite this article
Tang, P., Yang, Y., Li, G. et al. Alternative polyadenylation by sequential activation of distal and proximal PolyA sites. Nat Struct Mol Biol 29, 21–31 (2022). https://doi.org/10.1038/s41594-021-00709-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41594-021-00709-z
This article is cited by
-
Nuclear mRNA decay: regulatory networks that control gene expression
Nature Reviews Genetics (2024)
-
YTHDC1 as a tumor progression suppressor through modulating FSP1-dependent ferroptosis suppression in lung cancer
Cell Death & Differentiation (2023)
-
Context-specific regulation and function of mRNA alternative polyadenylation
Nature Reviews Molecular Cell Biology (2022)
-
The transcriptional terminator XRN2 and the RNA-binding protein Sam68 link alternative polyadenylation to cell cycle progression in prostate cancer
Nature Structural & Molecular Biology (2022)