Abstract
The RIKEN Mouse Gene Encyclopaedia Project, a systematic approach to determining the full coding potential of the mouse genome, involves collection and sequencing of full-length complementary DNAs and physical mapping of the corresponding genes to the mouse genome. We organized an international functional annotation meeting (FANTOM) to annotate the first 21,076 cDNAs to be analysed in this project. Here we describe the first RIKEN clone collection, which is one of the largest described for any organism. Analysis of these cDNAs extends known gene families and identifies new ones.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Roest Crollius, H. et al. Estimate of human gene number provided by genome-wide analysis using Tetraodon nigroviridis DNA sequence. Nature Genet. 25, 235–238 (2000).
Ewing, B. & Green, P. Analysis of expressed sequence tags indicates 35,000 human genes. Nature Genet. 25, 232–234 (2000).
Liang, F. et al. Gene index analysis of the human genome estimates approximately 120,000 genes. Nature Genet. 25, 239–240 (2000).
Carninci, P. & Hayashizaki, Y. High-efficiency full-length cDNA cloning. Methods Enzymol. 303, 19–44 (1999).
Carninci, P. et al. High-efficiency full-length cDNA cloning by biotinylated CAP trapper. Genomics 37, 327–336 (1996).
Carninci, P. et al. Normalization and subtraction of cap-trapper-selected cDNAs to prepare full-length cDNA libraries for rapid discovery of new genes. Genome Res. 10, 1617–1630 (2000).
Ewing, B., Hillier, L., Wendl, M. C. & Green, P. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8, 175–185 (1998).
Ewing, B. & Green, P. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 8, 186–194 (1998).
Gautheret, D., Poirot, O., Lopez, F., Audic, S. & Claverie, J. M. Alternate polyadenylation in human mRNAs: a large-scale analysis by EST clustering. Genome Res. 8, 524–530 (1998).
Huang, X., Adams, M. D., Zhou, H. & Kerlavage, A. R. A tool for analyzing and annotating genomic sequences. Genomics 46, 37–45 (1997).
Huang, X. & Madan, A. CAP3: A DNA sequence assembly program. Genome Res. 9, 868–877 (1999).
Thompson, J. D., Higgins, D. G. & Gibson, T. J. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680 (1994).
Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genet. 25, 25–29 (2000).
Croft, L. et al. ISIS, the intron information system, reveals the high frequency of alternative splicing in the human genome. Nature Genet. 24, 340–341 (2000).
Hanke, J. et al. Alternative splicing of human genes: more the rule than the exception? Trends Genet. 15, 389–390 (1999).
Rubin, G. M. et al. Comparative genomics of the eukaryotes. Science 287, 2204–2215 (2000).
Adams, M. D. et al. The genome sequence of Drosophila melanogaster. Science 287, 2185–2195 (2000).
Aravind, L. & Koonin, E. V. SAP- a putative DNA-binding motif involved in chromosomal organization. Trends Biochem. Sci. 25, 112–114 (2000).
Matsuda, H. Detection of conserved domains in protein sequences using a maximum-density subgraph algorithm. IEICE Trans. Fundamentals Electron. Commun. Comput. Sci. E83-A, 713–721 (2000).
Pesole, G., Liuni, S. & D'Souza, M. PatSearch: a pattern matcher software that finds functional elements in nucleotide and protein sequences and assesses their statistical significance. Bioinformatics 16, 439–450 (2000).
Carninci, P. et al. Thermostabilization and thermoactivation of thermolabile enzymes by trehalose and its application for the synthesis of full length cDNA. Proc. Natl Acad. Sci. USA 95, 520–524 (1998).
Batzoglou, S., Pachter, L., Mesirov, J. P., Berger, B. & Lander, E. S. Human and mouse gene structure: comparative analysis and application to exon prediction. Genome Res. 10, 950–958 (2000).
Itoh, M. et al. Automated filtration-based high-throughput plasmid preparation system. Genome Res. 9, 463–470 (1999).
Shibata, K. et al. RIKEN integrated sequence analysis (RISA) system-384-format sequencing pipline with 384 multicapillary sequencer. Genome Res. 10, 1757–1771 (2000).
Gordon, D., Abajian, C. & Green, P. Consed: a graphical tool for sequence finishing. Genome Res. 8, 195–202 (1998).
Fukunishi, Y. & Hayashizaki, Y. Amino-acid translation program for full-length cDNA sequences with frame-shift error. Physiol. Genomics. (in the press).
Acknowledgements
We thank the following (in alphabetical order) for discussion, encouragement and technical assistance: R. Abagyan, T. Akimura, K. Arakawa, M. Boguski, L. Corbani, T. A. Dragani, J. T. Eppig, S. Fujimori, G. Grillo, T. Haga, T. Hanagaki, S. Hanaoka, S. Hatta, N. Hayatsu, K. Hiramoto, T. Hiraoka, T. Hirozane, Y. Hodoyama, F. Hori, T. Hubbard, R. Hynes, K. Ikeda, K. Ikeo, C. Imamura, K. Imotani, S. Inoue, H. Kato, N. Kikuchi, Y. Kojima, A. Konagaya, M. Kouda, S. Koya, M. Kubota, S. Kumagai, C. Kurihara, M. Kusakabe, F. Licciulli, S. Liuni, L. Maltais, T. Matsuyama, L. McKenzie, A. Miyazaki, K. Mori, M. Muramatsu, M. Nakamura, K. Nomura, N. Nukina, K. Numata, R. Numazaki, M. Ohno, Y. Okuma, H. Ono, C. Owa, Y. Ozawa, G. Pertea, S. Ramachandran, E. M. Rubin, N. Saga, H. Saitou, H. Sakai, C. Sakai, A. Sakurai, H. Sano, D. Sasaki, L. Sato, C. Schneider, J. Schug, T. Shiraki, M. B. Soares, Y. Sogabe, C. Stoeckert, H. Sugawara, R. Sultana, H. Suzuki, M. Tagami, A. Tagawa, F. Takahashi, S. Takaku-Akahira, M. Takeuchi, T. Tanaka, Y. Tateno, Y. Tejima, J. Todd, A. Tomaru, S. Tonegawa, T. Toya, A. Wada, L. Wagner, A. Watahiki, T. Yamamura, T. Yamashita, T. Yao, A. Yasunishi, T. Yokota, S. Yokoyama, A. Yoshiki and K. Yotsutani. We also thank N. Kazuta, Y. Sigemoto, H. Torigoe and T. Washida for secretarial assistance. This study has been mainly supported by a grant for the RIKEN Genome Exploration Research Project and CREST (Core Research for Evolutional Science and Technology) to Y.H. Further support came from ACT-JST (Research and Development for Applying Advanced Computational Science and Technology) of Japan Science and Technology Corporation (JST) to Y.H. and H.M., and the Science and Technology Agency of the Japanese Government to Y.H. and Y.O. (All funds from the Science Technology Agency of the Japanese Government.) This work was also supported by a Grant-in-Aid for Scientific Research on Priority Areas and Human Genome Program, from the Ministry of Education, Science and Culture, and by a Grant-in-Aid for a Second Term Comprehensive 10-Year Strategy for Cancer Control from the Ministry of Health and Welfare to Y.H. Authors’ contributions: J. Kawai and Y. Okazaki contributed as organizers in phase II team and FANTOM, respectively. A. Shinagawa and H. Bono contributed as managers in sequence data production system and computing system, respectively. J. Quackenbush, P. Carninci, M. J. Brownstein, D. A. Hume, C. Schönbach, H. Suzuki and C. Weitz acted as senior managers of the annotation project.
Author information
Authors and Affiliations
Consortia
Corresponding author
Supplementary information
Supplementary Figure 1
A Distribution of the length of 21,076 insert DNAs (DOC 34 kb)
B Sequence accuracy
Supplementary Figure 2
A SAP domain containing RIKEN clones (DOC 41 kb)
B Phylogenetic tree of the known OATPs and RIKEN clones
Supplementary Figure 3
Alignment of the amino acid sequences of the known OATPs (DOC 135 kb)
Supplementary Figure 4
Mapping of RIKEN clones using Radiation Hybrid (DOC 31 kb)
Supplementary Table 1
DDBJ accsession number and MGI ID for RIKEN ID (TXT 801 kb)
Supplementary Table 2
A Full-length evaluation (DOC 80 kb)
B Analysis of Alternative Splicing in Redundant Clone Set
Supplementary Table 4
A RIKEN clones containing a DSP domain (DOC 997 kb)
B RIKEN clones containing a consensus kinase signature motif
C InterPro Motifs in RIKEN clones
Supplementary Table 5
A Motifs identified by maximum density subgraph analysis (DOC 64 kb)
B UTR Functional Elements
Supplementary Table 6
RH Map of RIKEN Clones using RH databases (DOC 56 kb)
Supplementary Table 7
A Databases that were used for the functional annotation (DOC 82 kb)
B Software that was used during full-length sequencing and the functional annotation
Supplementary Table 8
Strategies of Gene Ontology Assignment (DOC 32 kb)
Supplementary methods
This information is also available at: http://www.gsc.riken.go.jp/e/FANTOM/supplement/ (DOC 32 kb)
Rights and permissions
About this article
Cite this article
The RIKEN Genome Exploration Research Group Phase II Team and the FANTOM Consortium. Functional annotation of a full-length mouse cDNA collection. Nature 409, 685–690 (2001). https://doi.org/10.1038/35055500
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1038/35055500
This article is cited by
-
Computational approaches and challenges for identification and annotation of non-coding RNAs using RNA-Seq
Functional & Integrative Genomics (2022)
-
Functional signatures of evolutionarily young CTCF binding sites
BMC Biology (2020)
-
The Rab5 activator RME-6 is required for amyloid precursor protein endocytosis depending on the YTSI motif
Cellular and Molecular Life Sciences (2020)
-
Bridging the gap between reference and real transcriptomes
Genome Biology (2019)
-
The FANTOM5 collection, a data series underpinning mammalian transcriptome atlases in diverse cell types
Scientific Data (2017)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.