Abstract
Both the generation and the analysis of proteome data are becoming increasingly widespread, and the field of proteomics is moving incrementally toward high-throughput approaches. Techniques are also increasing in complexity as the relevant technologies evolve. A standard representation of both the methods used and the data generated in proteomics experiments, analogous to that of the MIAME (minimum information about a microarray experiment) guidelines for transcriptomics, and the associated MAGE (microarray gene expression) object model and XML (extensible markup language) implementation, has yet to emerge. This hinders the handling, exchange, and dissemination of proteomics data. Here, we present a UML (unified modeling language) approach to proteomics experimental data, describe XML and SQL (structured query language) implementations of that model, and discuss capture, storage, and dissemination strategies. These make explicit what data might be most usefully captured about proteomics experiments and provide complementary routes toward the implementation of a proteome repository.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Wilkins, M.R., Williams, K.L., Appel, R.D. & Hochstrasser, D.F. (eds.) Proteome Research: New Frontiers in Functional Genomics (Springer, Berlin, 1997).
Pennington, S.R. & Dunn, M.J. (eds.) Proteomics. From Protein Sequence to Function (BIOS, Oxford, UK, 2001).
Attwood, T.K. The quest to deduce protein function from sequence: the role of pattern databases. Int. J. Biochem. Cell. Biol. 32, 139–155 (1999).
Oliver, S. Guilt–by–association goes global. Nature 403, 601–603 (2000).
Hoogland, C. et al. The 1999 SWISS–2DPAGE database update. Nucleic Acids Res. 28, 286–288 (2000).
Sanchez, J.C. et al. The mouse SWISS–2DPAGE database: a tool for proteomics study of diabetes and obesity. Proteomics 1, 136–163 (2001).
Brazma, A. et al. Minimum information about a microarray experiment (MIAME)—toward standards for microarray data. Nat. Genet. 29, 365–371 (2001).
Booch, G., Rumbaugh, J. & Jacobson, I. The Unified Modelling Language User Guide (Addison Wesley, Massachusetts, 1997).
Spellman, P.T. et al. Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biol. 3, 0046.1–0046.9 (2002).
Unlu, M. et al. Difference gel electrophoresis: a single gel method for detecting changes in protein extracts. Electrophoresis 18, 2071–2077 (1997).
Gygi, S.P. et al. Quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nat. Biotechnol. 17, 994–999 (1999).
Eng, J.K., McCormack, A.L. & Yates, J.R. III An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spec. 5, 976–989 (1994).
Creasy, D.J., Cottrell, D.M., Perkins, J.S. & Pappin, D.N. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551–3567 (1999).
Sidhu, K.S. et al. Bioinformatic assessment of mass spectrometric chemical derivatisation techniques for proteome database searching. Proteomics 1, 1368–1377 (2001).
Mewes, H.W. et al. Overview of the yeast genome. Nature (Suppl.) 387, 7–65 (1997).
Keller, A., Nesvizhskii, A.I., Kolker, E. & Aebersold, R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74, 5383–5892 (2002).
Acknowledgements
Special thanks go to Francesco Brancia, Jenny Ho, and Sandy Yates for their critical appraisal of the Schema at various stages. This work was supported by a grant from the Investigating Gene Function (IGF) Initiative of the Biotechnology & Biological Sciences Research Council to S.G.O., N.W.P., A.B., S.G., S.H., P.C., and A.J.P.B. for the COGEME (Consortium for the Functional Genomics of Microbial Eukaryotes) program. D.B.K. thanks the BBSRC for financial support, also under the IGF initiative. K.L.G. is supported by the North West Regional e-Science centre (ESNW), within the UK eScience Programme. Many people have contributed their advice and expertise to the design of PEDRo, at various meetings formal and otherwise, notably attendees at the 2002 Proteomics Standards Initiative meeting of the Human Proteome Organisation at the European Bioinformatics Institute.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Taylor, C., Paton, N., Garwood, K. et al. A systematic approach to modeling, capturing, and disseminating proteomics experimental data. Nat Biotechnol 21, 247–254 (2003). https://doi.org/10.1038/nbt0303-247
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1038/nbt0303-247
This article is cited by
-
An integrative top-down and bottom-up qualitative model construction framework for exploration of biochemical systems
Soft Computing (2015)
-
SILEC: a protocol for generating and using isotopically labeled coenzyme A mass spectrometry standards
Nature Protocols (2012)
-
A unified framework for managing provenance information in translational research
BMC Bioinformatics (2011)
-
Assembling proteomics data as a prerequisite for the analysis of large scale experiments
Chemistry Central Journal (2009)
-
An open-source representation for 2-DE-centric proteomics and support infrastructure for data storage and analysis
BMC Bioinformatics (2008)