Standards for systems biology

Brazma, Alvis; Krestyaninova, Maria; Sarkans, Ugis

doi:10.1038/nrg1922

Review Article
Published: 01 August 2006

Standards for systems biology

Alvis Brazma¹,
Maria Krestyaninova¹ &
Ugis Sarkans¹

Nature Reviews Genetics volume 7, pages 593–605 (2006)Cite this article

1266 Accesses
97 Citations
3 Altmetric
Metrics details

Key Points

Several standards are contributing to advancement of knowledge in biology, however most standardization initiatives are still in the investment stage for biologists.
Developing a complete and self-contained standard in biology involves four steps: conceptual model design, model formalization, development of a data exchange format and implementation of the supporting tools.
In life sciences, standards development typically is done by grass roots movements, and it is difficult to persuade funding agencies to fund such activities.
Although it might be faster for a single organization to develop its own standards, a bottom-up community consensus approach is key to the long-term acceptance and usefulness of standards.
Developing and deploying a standard creates an overhead, which can be expensive. Standards related to a particular technology have a life span that is no longer than the technology itself and there is only a limited period of time in which the overheads can be paid off.
The body of biological knowledge is incomplete and expanding rapidly; therefore, standards that describe biological knowledge have to be flexible, and a mechanism of change must be a part of the standard.
To avoid proliferation of standards, common features of existing standards should be re-used wherever possible. Simplicity, but not oversimplification, is the key to success.

Abstract

High-throughput technologies are generating large amounts of complex data that have to be stored in databases, communicated to various data analysis tools and interpreted by scientists. Data representation and communication standards are needed to implement these steps efficiently. Here we give a classification of various standards related to systems biology and discuss various aspects of standardization in life sciences in general. Why are some standards more successful than others, what are the prerequisites for a standard to succeed and what are the possible pitfalls?

You have full access to this article via your institution.

Download PDF

Orchestrating single-cell analysis with Bioconductor

Article 02 December 2019

Tutorial: guidelines for annotating single-cell transcriptomic maps using automated and manual methods

Article 24 May 2021

Initial recommendations for performing, benchmarking and reporting single-cell proteomics experiments

Article 02 March 2023

Main

Barbarism is the absence of standards to which appeal can be made. José Ortega y Gasset

Historically, standards emerged from a need for a recommended practice in the manufacturing of products. The main purpose of standards is to help to fit together many pieces to make a useful whole. In information technology, standards are needed to exchange large volumes of information when many smaller transactions are needed. Examples of established standards include a Hypertext Transport Protocol (HTTP) and an Exensible Markup Language (XML) for structuring data.

In the life sciences, the advances in high-throughput experimental technologies have allowed the acquisition of data on a large scale, and at least some level of standardization is needed to manage and use these data (Box 1). Examples of data standards relevant to systems biology include Gene Ontology (GO) for describing gene function¹, Minimum Information About a Microarray Experiment (MIAME) for describing microarray experiments², Systems Biology Markup Language (SBML)³ and Cell Markup Language (CellML) for describing biomolecular simulations⁴.

Standardization has not only become a popular topic in bioinformatics, but it has almost developed into a field of its own. New standards-related acronyms, such as MAGE, MO, MIAPE, MISFISHIE, MIRIAM and MIACA, are appearing almost monthly (Tables 1a,b,2) and just keeping track of them requires some effort. Most of these initiatives for developing standards are community-based and involve close collaboration of biologists, bioinformaticians and information technologists. Nevertheless, the sheer number of different standards and the pace at which the field is changing is making it difficult, even for professionals, to keep track of all the new developments^5,6.

Table 1a Life sciences standards: acronyms and their definitions

Full size table

Table 1b Life sciences standards: more acronyms and their definitions

Full size table

Table 2 Classification of systems biology standards in two dimensions^*

Full size table

What is the aim of these standardization initiatives? How many of these standards are used and how many have contributed to advances in biology? The goal of this review is to help systems biologists to navigate the rapidly changing maze of data standards in life sciences, and to encourage biologists to provide feedback on these developments. Such feedback is necessary to ensure that the developed standards are as close as possible to addressing the real needs of biology.

It is important to distinguish between standards that specify how to actually do experiments and standards that specify how to describe experiments. Recommendations such as what standard reporters (probes) should be printed on microarrays or what quality control steps should be used in an experiment belong to the first category. Here we focus on the standards that specify how to describe and communicate data and information.

Four steps in developing standards

By information or data communication standard we mean a convention on how to encode data or information about a particular domain (such as gene function) that enables unambiguous transfer and interpretation of this information or data (in this article, we treat the terms 'data' and 'information' as synonyms).

There are several international and national bodies, such as OMG, W3C, IEEE, ANSI and IETF (see Table 1b for definitions), that formally approve standards or provide a framework for standards development. Although some of the systems biology standards have been certified (for example, SBML is officially adopted by IETF), in general in life sciences this procedure is not particularly important — many of the most successful standards such as GO have not undergone any official approval procedure, but instead have become de facto standards. In fact, many of the most successful standards in other domains are de facto standards.

There are four steps involved in developing a complete and self-contained standard: conceptual model design, model formalization, development of a data exchange format (see Fig. 1 for an example) and implementation of the supporting tools. Box 2 shows how a standard can be developed for a specific simplified example.

**Figure 1: Development of components of data exchange standards: the MGED example.**

The order in which these four steps occur is not fixed. The first two steps often happen in parallel — formalizing the domain helps us to understand it better. Sometimes a specific format can be used as a way of formalization, for example, XML-based formats are often developed to circumvent any object models or ontologies (without explicitly implementing step 2). The reverse is also possible: the formalism used to model the domain can define the format automatically. Some successful standards are defined by successful software that becomes a market leader and defines the standard de facto (for example, the binary format of Affymetrix.cel files, or the netcdf format used in Bioconductor for microarrays⁷). Nevertheless, for a standard to be complete it is vital that it contains all the components described above.

Informal description and delimitation of the domain. In the microarray domain the first step of conceptualization has been implemented explicitly. MIAME provides an informal means of modelling a microarray experiment². Two competing goals should be balanced when developing the conceptual model of a domain: domain specificity and the need to find the common ground for all related applications. Arguably, the most useful standards are those that consist of the minimal number of most informative parameters. Keeping the list short makes it simple and practicable, while selecting the most informative features ensures accuracy and efficiency. This need for minimalism is reflected in the titles for many such standards — Minimum Information About XYZ.

Inclusiveness and establishment of a dialogue between all, or at least most, of the domain leaders in academia and industry is one of the key issues at this stage. In the microarray field the consensus was achieved by establishing the Microarray Gene Expression Data (MGED) Society^8,9, which started as a grass-roots movement comprising all the main microarray players at the time, and has continued to be inclusive since. Regardless of whether the conceptualization step is done explicitly or in parallel with the next step (formalization), it is important to understand the domain well before developing any specific format.

Building the formal model. A model can be formalized either using an ontology or an object model, or some other well-defined formalism (Box 3). Often a combination of formalisms is used. For instance, the Microarray Gene Expression Object Model (MAGE-OM) is used to formalize basic objects and their relationships¹⁰, whereas MGED Ontology (MO) is used to define the structured nomenclature describing the experiments¹¹. Additionally, guidelines on how to encode information in the model (for example, how to encode MIAME in MAGE-OM for microarrays; see the MIAME and MAGE-OM Terms Explained web site) might be needed, although such guidelines could be a part of the formalism itself. In fact, in life sciences models often consist of two parts — an object model and an ontology. This is because the first is mostly used by software developers, and the second by biologists. Both are required for a complete description of a study.

It is important to ensure the uniqueness of the object identifiers of all the instances and objects used at the domain formalization level. For example, the Human Genome Organisation (HUGO) is working on standards for naming human genes¹². A more generic approach to identifying objects is taken by the Life Sciences Identifier (LSID) convention (see below). Object identification is an important and highly complex issue in the life sciences, as the domain is dynamic. For example, how can predicted genes be identified in the human genome in a durable way when not only predictions but also genome assemblies continue to be updated? Assigning persistent identifiers to objects such as exons is even more daunting because the genome 'coordinate systems' are rapidly changing. This is one of the most important general problems in building standards for biology — our understanding of living systems is constantly developing.

A formal model is the core of any standard. The challenge is to build a model that represents the whole domain without being excessively complex. The key question is how granular the model should be. At one extreme, the model contains a set of data files and a free text that describes them. At the other extreme, the model can formally specify each elementary step done in the experiment and each numerical measurement obtained from each step. The tendency to include every detail and to develop a perfect model for the domain, rather than a practical data communication standard, can easily ruin the usability of the model. A good design should aim to deal with the majority of typical cases first, and if necessary, at the expense of the minority of less important cases. On the other hand, if a model is too simplistic, it might not fulfil its function as a way to describe the domain unambiguously.

Platform-specific implementations define particular formats. The so-called platform-specific implementation of a standard can be generated automatically or derived manually from a formal model. The efficiency of a standard, as of any product, depends not only on the concept behind it, but also on its actual technical solution. For example, XML has been widely used for data exchange in life sciences and has proved to be one of the most popular formats. In systems biology, XML-based formats provide links not only between various data storage systems, but also between modelling tools.

Another example is the LSID convention that can be used for resolving queries automatically. LSID has the following format: 'urn:lsid:authority:namespace:identifier:revision'; for example, urn:lsid:ebi.ac.uk:MIAMExpress/Experiment:E-MEXP-1, where ebi.ac.uk is the 'authority', MIAMExpress is the 'namespace' (assigned to the software tool MIAMExpress), and E-MEXP-1 is the 'identifier'. There is no revision in this example. If used correctly, this convention ensures that there is only one object with this identifier anywhere in the World Wide Web.

Implementing the supporting tools. For a standard to be successful laboratory information management systems (LIMS), databases and data analysis, and modelling tools should comply with it. One way of fostering this is to develop easy-to-use 'middleware' — software components that hide the technical complexities of the standard and facilitate manipulation of the standard format in an easy way. For instance, in the microarray domain, the role of such middleware is played by the microarray gene expression toolkit, MAGE-stk.

Although to develop a complete and self-contained systems biology standard all four steps described above have to be implemented, most of the existing standards that are described in the following section implement only some of these steps. For this reason a combination of several 'partial' standards is needed to define the complete standard. For instance, in the microarray domain each of the four parts is covered by a separate partial standard — the informal conceptualization is covered by MIAME guidelines², the formalization of the domain is covered by MAGE-OM and MO, the platform-specific implementation is covered by the XML-based Microarray Gene Expression Mark-up Language (MAGE-ML)¹⁰ (automatically derived from MAGE-OM), and the supporting tools are covered by MAGE-stk (MAGE-OM, MAGE-ML and MAGE-stk are jointly known as MAGE). Note that in a way each of the four steps represents a level of formality.

What are we standardizing?

Standardization initiatives can be classified into two orthogonal dimensions: first, based on which of the four standardization steps (discussed above) the particular standard covers; and second, based on the particular biological or technological domain it addresses (Table 2). There are two fundamental classes of data standards within the second: those that aim to communicate the actual biological knowledge and those that relate to the supporting experimental evidence (usually to a particular technology). A typical example of the first is GO or SBML. An example of the second is MIAME or MAGE.

Standards for describing biology. Because our biological knowledge is incomplete and developing fast, standards that describe biological knowledge have to be flexible, and must contain some mechanism of change. There are at least three levels (layers) of increasing detail that such standards might intend to capture: first, properties of objects of the system (parts list); second, topology and logics of the system; and third, systems dynamics^13,14.

Parts list is a: collection, naming, description and systematization of elements (for example, genes, proteins, cell types and anatomical structures) in a particular biological system. A basic standard that addresses the parts list on an even more elementary level is the HUGO gene naming standard. A semantically richer example of a standard describing the properties of objects is GO¹. GO provides controlled vocabularies for describing gene products and their properties in three orthogonal areas: molecular function, biological process and cellular location. Each of these is organized in a directed acyclic graph (DAG), which is a generalization of a hierarchical tree (in this way GO uses a formalism that is simpler than DAML+OIL or OWL (Web Ontology Language), see Box 3). GO has been a true success story; it has been taken up by the entire scientific community as the main means for annotating gene products. Analogous, unified nomenclatures for anatomy or phenotypes are being developed. Several cell-type ontologies are both publicly available and used internally in laboratories in industry and academia^15,16. An attempt to collect all the existing ontologies and initiate a discussion on unification is currently being made by the Open Biological Ontology (OBO)¹⁷ initiative. OBO is an umbrella for well-structured controlled vocabularies for shared use across different biological and medical domains. The main criterion for an ontology's inclusion in OBO is that it should be open, should share a common syntax, and should be orthogonal to other OBO ontologies.

Systems topology and logics (or pathway diagrams) can be thought of as a connection (interaction) diagram between the parts; this can be viewed as a graph, the nodes of which represent the objects (for example, genes, proteins or other interacting objects), whereas edges or arcs — connections between nodes, which can be directed or undirected — represent interactions between these objects (for example, metabolic, signalling and protein interaction networks). Examples of such topologies include protein interaction networks in a cell, or the network of regulatory interactions between genes. A standard known as the Proteomics Standards Initiative: Molecular Interactions (PSI-MI)¹⁸ has been developed to describe protein–protein interactions (Fig. 2).

Another example of a standard for describing overall structure of interactions and information about the nature of each interaction is BioPAX¹⁹. BioPAX is a collaborative project to create a data exchange format for biological pathway data. It uses an ontology language OWL, and it consists of several so-called 'levels' (releases). BioPAX Level 2 covers metabolic pathways, molecular interactions and post-translational protein modifications, and is backwards compatible with Level 1. Future levels will expand support for signalling pathways, gene regulatory networks and genetic interactions. Data from pathway databases, such as Reactome, are now available in BioPAX format²⁰. BioPAX does not contain quantitative parameters and cannot describe dynamics of interactions, which is not its goal.

Systems dynamics simulates real-time behaviour of the system such as the cell cycle, and predicts its response to various environmental changes — for example, to stress. Differential equations and hybrid systems are among the formalisms (languages) used to describe network dynamics and simulations (for example, see Refs 21,22). Standards for describing network simulations include SBML³ and CellML⁴, and are relatively well established. For instance, SBML release 2 (termed Level 2) consists of lists of various components that are needed to describe and simulate the dynamics of chemical reactions and their interaction networks. These include so-called compartments (containers of substances), species (for example, chemical substances), reactions (description of processes), parameters, unit definitions and rules (mathematical expressions). SBML has been described using a Unified Modeling Language (UML) model, and XML has been used as the framework for the format.

The main difference between CellML and SBML is that CellML is modular, whereas SBML is hierarchical. Arguably, CellML is conceptually more satisfying, but SBML is more usable. CellML was originally developed by a company, and although it is now in the academic domain there are only a few groups developing tools for it. SBML is much more of a community effort. For some reviews of these standards the reader should consult Refs 4,23.

Minimum Information Requested In the Annotation of Biochemical Models (MIRIAM) is a human-orientated counterpart of SBML and CellML, or any machine-readable format for describing models²⁴. The goal is to encode models in a systematic way and annotate them sufficiently and consistently. The need for these human-orientated guidelines has become apparent because of the sometimes inconsistent use of SBML. Using an analogy to the microarray domain, MIRIAM corresponds to MIAME, the same way as SBML or CellML corresponds to MAGE-ML. Unlike MIAME, MIRIAM was developed after the formal languages had been introduced, partly as the result of the realization that the formalisms were not used consistently and that often not enough information was provided.

The standardization of syntax (SBML, CellML) and semantics (MIRIAM) of computational models enabled their storage, exchange and integration. The BioModels Database is a data resource that allows biologists to store, search and retrieve published mathematical models of biological interest²⁵. Models accepted in the BioModels Database comply with MIRIAM, the standard of model curation. The syntax of the models and their results are carefully verified. Each model is annotated and linked to relevant data resources, such as publications, databases of compounds and pathways, and controlled vocabularies. Although the internal format of models is SBML, users can retrieve the models in many formats (including SBML, CellML, XPP, SciLab and BioPAX)

Technology-related standards for experimental evidence. The lifespan of each particular technology is limited, which means that if an investment in developing a particular standard is to pay off, either the development and deployment of this standard should be relatively inexpensive, or the lifespan of the technology should be sufficiently long. The state of development of technology-related standards varies from domain to domain — in some areas there already is a general agreement on the minimum information to be reported, in others there is less coordination and several XML formats or data models are being developed.

Arguably, standards related to microarray technology are the most advanced. MIAME is now accepted by most scientific journals as a guideline for making publication-supporting data available in a usable format^26,27. This has facilitated a flow of data from laboratories into public repositories such as ArrayExpress²⁸ and Gene Expression Omnibus (GEO)²⁹. In turn, this has given a boost to other components of microarray data standards, most importantly to MAGE-ML¹⁰. For instance, MAGE-ML files generated by more than 20 software systems, which account for over 20,000 hybridizations, account for half the data in ArrayExpress (May 2006). There are fully automated MIAME-compliant data submission pipelines from the Stanford Microarray Database (SMD)³⁰ and The Institute for Genomic Research database to ArrayExpress that are based on the MAGE-ML standard. The ArrayExpress repository itself is based on the MAGE-OM model³¹.

Minimum Information Specification For in situ Hybridization and Immunohistochemistry Experiments (MISFISHIE) is a proposal for a reporting standard for all studies on gene expression localization (see the MISFISHIE Standard Working Group web page). This specification was jointly developed by members of the US National Institutes of Health/National Institute of Diabetes and Digestive and Kidney Diseases Stem Cell Genome Anatomy Projects Consortium, providing a list of what must be submitted along with experimental results.

Modern proteomics technologies allow information on expression levels, modifications, localization, interactions and the structure of proteins to be gathered on the scale of the entire proteome. One of the working groups of the Human Proteome Organisation (HUPO) — PSI — was founded in 2002 to develop standards for representating mass spectrometry and protein interaction data³². General guidelines, MIAPE³³ (Minimum Information About a Proteomics Experiment, analogous to MIAME for microarrays) have been adopted and domain-specific recommendations for each of the stages of protein quantification techniques have been developed: MIAPE-MS, MIAPE-MSI, MIAME-Gel, reflecting the complexity of the data. Despite the success in achieving a consensus on experiment reporting requirements and the availability of the UniProtKB knowledgebase — the centralized protein sequence reso urce — creation of a complete formalism (that is, an object model for each of the related domains) might not be trivial. PEDRo (the Proteomics Experiment Data Repository) with its XML implementation, PEML, is a model for the most common workflows in proteomics and was used by PSI as a prototype for a proteomics standard³⁴ (for more detail on proteomics standards see Fig. 2).

Metabolomics and metabonomics provide an insight into the metabolic status of a biological system by profiling small molecules. Standardization initiatives in these fields are relatively recent by comparison with those of transcriptomics and proteomics. Several attempts have been made to conceptualize the output and metadata of metabolomics experiments, including stand-alone models and formats such as ArMet³⁵ (a standard for plant data) and CCPN (Collaborative Computing Project for the Nuclear-Magnetic Resonance Community)³⁶. Despite tremendous technological progress, the main focus of the field remains the evaluation of the performance of certain platforms, rather than high-throughput interpretation of the results. A recent publication on a standard metabolic reporting structure (SMRS)³⁷ outlines three areas that potentially could benefit from standards: origin of sample, analytical platform and interpretation of the results. Although the guidelines will have to stay flexible for the near future, the motivation to start a discussion is high, for even if the standards cannot be developed at this stage, the metabolomics field will benefit from establishing a community.

Among emerging and less well-developed initiatives is the Open Microscopy Environment (OME)³⁸, a framework for data sharing and data storage in the field of biological microscopy. Minimal Information About Cellular Assays (MIACA) and Minimum Information About an RNAi Experiment (MIARE) are also in this group; the former aims to delineate the minimal descriptors for experiments involving cellular assays, the latter aims to develop a standard to unambiguously describe RNAi experiments.

The experimental areas described above share many common steps; for example, sample processing can be described similarly for transcriptomics, proteomics and metabonomics experiments. There have been several attempts to integrate these domains, such as FGE-OM³⁹ and SysBio-OM⁴⁰. Although both were derived from MAGE-OM, the two approaches are different — whereas SysBio-OM is 'flat', with all model classes introduced at the same level of abstraction, FGE-OM developers thought more about hierarchically separating technological domain-independent constructs from domain-specific structures. The Functional Genomics Experiment Model (FuGE) is an ongoing project to develop a general model for representation of functional genomics investigations that can be extended for specific domains. This effort will provide a ready-made framework for domain-specific modelling, guiding and simplifying development of more detailed models. Similar to the microarray domain (MAGE-OM versus MGED Ontology), stable domain structure representation (object model — FuGE) and evolving vocabularies (ontology — FuGO) are being developed as separate projects. These initiatives are important because they help to prevent developing competing or incompatible standards that describe the same domain. The MGED working group Reporting Structure of Biological Experimentation (RSBE) is working to facilitate interactions between different communities and collecting use-cases.

It is worth pointing out that the separation between standards that describe biology and standards that describe the experimental evidence is not always clear cut. For instance, GO includes the concept of the 'evidence code' for capturing information on how the knowledge of a particular biological fact has been obtained, whereas microarray experiments could provide information about the expression levels of particular genes in particular biological conditions. Some standards, such as PSI-MI, which describes protein interactions, can be placed equally well within both categories — the biological facts and the experimental evidence. Note that although one of the main difficulties in developing standards for describing biology is the potential infinite richness of the underlying subject and our limited and changing understanding of it, the limited lifespan of a technology is an important factor in developing standards for each particular technology.

Secrets of successful standards

GO is an example of a successful standard that is widely accepted and used, and is helping to advance our knowledge about biology. GO can be used not only to query databases for genes of a particular function, it can also provide an 'executive summary' for the characterization of groups of genes found by some independent means, such as gene expression data clustering. GO is even used for computational analysis of large groups of genes or genomes⁴¹. But not all standards succeed. For instance, the attempts to introduce Common Object Request Broker Architecture (CORBA) in bioinformatics have largely failed. More light-weight approaches to software integration, such as those that are based on XML, have prevailed instead. So why has GO been successful, but CORBA not so?

Standards are only means. Despite the growing standardization 'industry', standards will always remain only a means for advancing our knowledge in biology. Standardization itself is not a scientific discipline, at least not a biologically interesting one. Introducing standards should not be a goal in itself, but should help biologists to solve problems.

Well-defined scope and goals. It is important that the goal of the standardization is well defined and limited. Instead of trying to standardize everything, it is better to rely on other related standards and be consistent with them (GO uses other ontologies, such as ontologies for cell types and chemical compounds). In fact it is detrimental to the community not to use relevant standards developed by others. One should ask oneself a question: suppose my standard is developed and adopted by the community, how exactly will it help to advance our knowledge about biology?

Community involvement: bottom-up rather than top-down approach. Attempts to impose standards unilaterally usually fail. The need for the standards should come from the community. Developing and using a standard is often an investment that will not pay off immediately, therefore there is a much better chance of success if the user community decides that the respective standard is needed. There is a notable difference between, for instance, the success story of GO, which involved the wide community right from the beginning, and the failure of CORBA, which largely ignored the established practices already in use. It is therefore a worry that calls for introduction of standards in systems biology in a top-down manner have recently been made⁴² (see also Ref. 43 for the response).

Cost effectiveness. To develop and deploy a standard creates an overhead, which can be expensive. Standards will help only if a particular type of information has to be exchanged often enough to pay off the development, implementation and usage of the standard during its lifespan. Standards designed to describe biological knowledge have to evolve and change as our knowledge in biology grows. Standards related to a particular technology have a lifespan that is no longer than the technology itself. This means that there is only a limited period of time in which overheads can be paid off. In some cases it might be cheaper to deal with data communication problems on a case-by-case basis, rather than to develop a standard. The high complexity and the lack of 'cost effectiveness' were arguably among the main reasons that the attempts to introduce CORBA as a standard in bioinformatics failed. Because the market in bioinformatics is considerably smaller than in the banking or telecommunication domains the same approaches might not necessarily work in bioinformatics.

Lack of ambiguity. Standards that allow different ways of encoding the same semantic information are difficult to use. MAGE initially suffered from this problem, which was later addressed by recommendations for how to encode MIAME in MAGE (see above); ideally this should be defined by the model itself. Even more important is to realize that there is a considerable difference between building a 'perfect ontology' for knowledge representation, and building a practical standard that can be taken up by the entire community as a means for information exchange. If the ontology is complex, it is unlikely that the wider community will use it consistently, if they use it at all. Even in GO, where there is a relatively limited community doing the actual annotation, consistency of use is an issue⁴⁴. For ontologies such as MO, which is used to describe and submit experiment descriptions to public databases by experimentalists who are not trained to use it, this is an even more important problem.

Simplicity. Simplicity relates to all the issues noted above and, in particular, to cost effectiveness and lack of ambiguity. But the need for simplicity goes further. An information communication standard can apply to communicating information between humans and computers (for example, GO), or between computers (for example, FASTA). But in the end, to advance our knowledge, information must be communicated between humans, a computer being only a tool. This consideration dictates that a useful standard must not introduce a higher granularity (and therefore complexity) than a human can use. For instance, does it help to record every step of a laboratory protocol as a separate action, if this protocol is recorded for a biologist to read rather than for a robot to execute? (The latter is possible and could be useful, but in this case a programming language not an information communication standard is used.) This, however, does not necessarily mean that only the minimum granularity of free text can be useful. For instance, although a simple human voice recording on a magnetic tape might be sufficient to communicate information from person to person, a finer granularity might allow for automatic translation of the recording from one language to another, by parsing the speech into an appropriate model.

To demonstrate the importance of these points, let us consider for a moment an example of how these principles worked in a totally different domain. In the 1970s two new high-level programming languages were created — Ada and C. Although Ada was arguably better designed and was supported by the US Department of Defense, C gained much wider popularity and is still used today, along with C++ and other C derivatives. Ada was developed with an aim to have a single universal programming language, whereas C was built with an aim to write programs more easily and evolved naturally (standards are only means). Ada was meant to satisfy all US Department of Defense needs; C designers saw the language as a tool for UNIX programming (with well-defined scope and goals). Ada was the result of a contract for programming language development, whereas C evolved initially as a two-man effort, and standardization and committees came later (community involvement; bottom-up rather than top-down approach). That also meant cost effectiveness. Finally, Ada supported parallel programming, generics and so on, whereas one of the main C-design criteria was that a simple, single-pass compiler should be possible (simplicity). This example is relevant for standards in systems biology; after all, standardization is nothing more than the mutual agreement about a model, be it a programming model or a data representation model.

The future

The role of ontologies and their use is likely to grow in the future, for instance, in the context of the Semantic Web. Research in the Semantic Web is concentrated on designing languages, methods and tools that would enable building of distributed knowledge resources, similar to the World Wide Web but precise enough for automated reasoning at some level. Life sciences is one of the areas that would benefit from the Semantic Web; to facilitate computer communication and reasoning humans have to design and adopt languages for knowledge exchange (for example, see the W3C Semantic Web Health Care and Life Sciences Interest Group web site). Although there are no practical demonstrations of the usefulness of the Semantic Web for life sciences so far, this is an active area of research (see the summary report).

GO has already been mentioned as an example of standards success stories. MIAME is another. It has helped software developers to design databases and laboratory information management systems for managing microarray data^26,45, which is reflected in the growing number of citations of the original MIAME paper (Table 3). The adoption of MIAME by scientific journals has made publications that support microarray gene expression data accessible to the community in a usable format and in the necessary detail. The impact of MIAME can be seen by comparing the availability of data from gene expression and array-CGH (comparative genomic hybridization) experiments. In array-CGH experiments, where MIAME has not been widely adopted by journals, important data sets supporting publications are sometimes unavailable (B.Ylstra, personal communication).

Table 3 Standardization activities in systems biology

Full size table

The data exchange formats PSI-MI and MAGE-ML have helped to get many of the high-throughput data sets into the public domain. Nevertheless, from a bench biologist's point of view benefits from adopting standards are not yet overwhelming. Most standardization initiatives are still mainly an investment for biologists. So how many of the growing number of standardization initiatives will pay off?

We think that the answer is related to The Law of Standards — “Whenever a major organization develops a new system as an official standard for X, the primary result is the widespread adoption of some simpler system as a de facto standard for X” — established by a well-known computer scientist, John F. Sowa. Many of the current standardization initiatives are useful in that they help biologists and bioinformaticians to better understand particular types of data and domains of systems biology, and create a common language with which biologists and information technologists can communicate¹⁰. It is possible that some of the more complex standards will never develop into data exchange formats that are accepted and used by everybody. Nevertheless, they will create a better understanding of the domain, which is needed to develop the next generation of standards, if and when they are required. For example, a draft for a simpler standard for microarray data, MAGE-TAB, has been recently presented (see the Microarray and Gene Expression web page), the development of which has been based on the experience with the first-generation standard MAGE. XML had a predecessor SGML, which was too complex to become widely adopted (although it is still used in particular applications).

Most of the successful standard development initiatives have started as grass-roots movements without any dedicated funding. This is certainly true of GO and MGED. Only after GO became successful was dedicated funding provided to the GO Consortium. Some funding has been recently awarded to some of the MGED members for developing the next-generation microarray standards, including MAGE-TAB. Nevertheless, there are no established mechanisms to fund initiatives for standards development, and funding agencies are reluctant to provide funding for such 'unglamorous, blue-collar' work. Creating a mechanism for funding standards development does not contradict the need to develop standards in a bottom-up fashion — the funding should be awarded to multi-institutional groups that have proved their ability to work together successfully.

Although standardization is not a goal in itself, its importance is growing in a high-throughput era. This is similar to what happened to manufacturing during industrialization. The data from high-throughput technologies are being generated at a rate that makes managing and using these data sets impossible on a case-by-case basis. Although some of the data generated by the newest technologies might have a low signal-to-noise ratio to make data re-usable, the data quality is improving as the technology matures, and it is a waste of resources not to share and re-use these expensive data sets. However, this is only possible if the instrumentation that generates these data, laboratory-based storage information management systems and databases, data analysis tools, and systems modelling software can talk to each other easily. This is the purpose of standardization.

One of the main goals of systems biology research is ultimately to improve health care. Health-care standards, as facilitated, for example, by Health Level Seven (HL7), an organization accredited by the American National Standards Institute (ANSI), are beyond the scope of this review; for more information an interested reader should consult Ref. 46. Interactions between genomics and bioinformatics on the one hand, and medical informatics on the other hand are on increase: there are large projects to investigate the effects of specific molecules on a patient's health (for example, the Molecular Phenotyping to Accelerate Genomic Epidemiology project — MolPAGE), organizations that coordinate these projects (for example, the UK National Cancer Research Institute) and the US Food and Drugs Administration are exploring voluntary genomics data submission (MIAME being noted among the possibilities). From the data standards perspective, integrating a patient's data with bioinformatics data will be the next big challenge.

Conclusions

Several standards are already contributing to the advancement of knowledge in biology, although most standardization initiatives are still in the investment stage for biologists. In life sciences standards development is typically done by grass-roots movements, and it is difficult to persuade funding agencies to fund such activities. Although it might be faster for a single organization to develop its own standards, a bottom-up community consensus approach is the key to long-term acceptance and usefulness of standards. To avoid proliferation of standards common features of existing standards should be re-used wherever possible. Simplicity, but not oversimplification, is the key to success.

Box 1 | How biologists could benefit from widely adopted standards

The diagram shows a typical laboratory scenario in which, for example, microarray experiments are performed and data are stored in the laboratory database (which could be, for example, a part of a laboratory information management system (LIMS), or just some simple filing system). Researchers want to analyse their data in conjunction with publicly available data and with data from their collaborators' laboratories. They want to use several data analysis tools, both commercial and custom-made. Some problems that have to be solved are listed below.

How to find the relevant studies in public databases (for example, what query terms to use). Lack of standard gene names is a well-known problem in bioinformatics, but for systems biology research there are more problems, for example, lack of generally accepted controlled vocabularies for most phenotype definitions.
How to query public databases. Most often it will be necessary to learn several web interfaces; programmatic access to databases is becoming more popular, but there are no standard query languages.
How to import data into the laboratory database. Currently, data returned by public databases needs to be processed in some ad hoc manner, often by parsing web pages. The same usually applies to data obtained from collaborators' laboratories.
How to pass data to data analysis tools. Currently, each of data analysis tools can have different requirements on input data formatting (and logical structure, which can be even more problematic).

Wide adoption of common standards for data representation and exchange, and for systems biology software interfaces would facilitate easier execution of these steps.

Box 2 | An example of a simple standard

Consider a simplified 'chromosome', which is a string of letters (A, T, G and C), parts of which are 'genes', each having a name. Suppose we need to develop a standard that enables software programs to analyse such chromosomes and to communicate the results to 'biologists'. For example, we want a program to read a chromosome, to extract genes, to pass these genes to another program that compares them and identifies genes with similar sequences. What do we need to do to develop such a standard?

The first step is to understand the different classes of object and the type of information that needs to be captured and communicated about each of them. In our example, the main object classes are a chromosome, a gene and a sequence. The standard should define what information should be captured and how it should be structured. We need to describe genes, which have names, sequences and coordinates in the chromosome (defining gene coordinates is not trivial, but here we assume that our simplified genes have well-defined locations in the chromosome). Whether we use a comma or any other delimiter to separate the name from the sequence in a particular data format is a secondary question, which can be answered only after the structure of the information has been well understood and defined. A real-life example in which the format has only a minor role is Gene Ontology.

Once a good informal understanding of the domain has been achieved, it next needs to be formalized. There are many ways of formalizing knowledge that can be used, for instance physics uses the language of mathematics. Object modelling or ontologies are the most popular approaches in developing standards for communicating information (Box 3). In our example, the system has been described using the Unified Modeling Language (UML).

Another important constituent of a standard is the nomenclature and the system of identification. So, it is important that in our example gene identifiers are unique, otherwise the biologist will not be able to interpret the output of the program.

The format itself is important if we want to use the standard to allow different pieces of software to communicate with each other without human intervention (the third step). For instance, the FASTA format for coding a set of sequences uses the symbol '>' to mark the sequence identifier, and a 'new line' to separate the identifier from the sequence itself. Formats that are based on the XML (Extensible Markup Language) representation are becoming widely accepted. Two XML formats for communicating information about the system have been derived as shown in the figure.

Finally, a standard is successful only if it is used, and it is important to ensure that supporting software tools are designed and implemented (the fourth step).

Box 3 | Modelling and data representation formalisms and languages

An object model defines the structural relationships and dynamic interactions between related objects. Object modelling is a standard practice in software engineering where models are built and refined, starting from the system analysis phase down to implementation. A standard visual language for describing object models, UML (Unified Modeling Language), has evolved. There are several types of diagram in UML models, but the most important is the class diagram for defining the structure of the domain: classes, their attributes and associations between classes.

An ontology is a specification of a conceptualization of a knowledge domain. A simple ontology is a controlled vocabulary — a list of terms and their definitions, potentially with some relationships between them, most often 'is-a', forming a term hierarchy. In the Gene Ontology (GO) relationships between objects form a directed acyclic graph (DAG) where a node can have more than one parent but there can be no cycles in the resulting object graph. More complex ontologies can also include other types of relationship, as well as axioms — logical assertions about the nature of the domain being described. The most general ontology description language is RDF (Resource Description Framework) — a general framework for describing any Internet resource (including ontology terms) by defining relationships with other Internet resources. RDF describes everything by the means of 'subject–predicate–object' triplets. However, because it is too general for many practical purposes several more restrictive ontology definition languages have been proposed, such as DAML+OIL and, more recently, OWL (Web Ontology Language). These languages essentially restrict the range of allowed RDF triplets.

Conceptually, there is not much difference between object models and ontologies; both allow modelling of complex real-world domains. Object models appeared in the context of software engineering and are mostly used for practical purposes: system analysis and software construction. Ontologies are mostly used in the fields of information sciences and knowledge management, attracting more research interest, especially in the area of reasoning (inferring new knowledge from facts and axioms, and ontology validation).

XML (Extensible Markup Language) is a standard for information communication. It is a low-level standard — in order to make XML documents understandable at the other end of an information channel, XML needs to be restricted. This is done by defining XML schemas or DTDs (document type definitions); in fact only after such a restriction is introduced does XML become a true 'language'. XML can be used for communicating data or models; UML models, RDF documents, OWL documents and XML schemas can all be represented in XML.

References

Ashburner, M. et al. Gene Ontology: tool for the unification of biology. Nature Genet. 25, 25–29 (2000). GO has been a true success story: it has been taken up by the entire scientific community as the main means for annotation of gene products.
Article CAS Google Scholar
Brazma, A. et al. Minimum Information About a Microarray Experiment (MIAME) — toward standards for microarray data. Nature Genet. 29, 365–371 (2001). The first result of the microarray data standardization effort was a community agreement about the level of detail necessary to make data exchange meaningful (MIAME). MIAME set a pace for such standards (Minimum Information About XYZ) in other domains.
Article CAS Google Scholar
Hucka, M. et al. The Systems Biology Markup Language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 19, 524–531 (2003). SBML has been evolving since the early 2000s through the efforts of an international group of software developers and users. Today, SBML is supported by over 90 software systems.
Article CAS Google Scholar
Lloyd, C. M., Halstead M. D. & Nielsen P. F. CellML: its future, present and past. Prog. Biophys. Mol. Biol. 85, 433–450 (2004).
Article CAS Google Scholar
Quackenbush, J. Data standards for 'omic' science. Nature Biotechnol. 22, 613–614 (2004).
Article CAS Google Scholar
Stoeckert, C. J. Jr, Causton, H. C. & Ball, C. A. Microarray databases: standards and ontologies. Nature Genet. 32, S469–S473 (2002).
Article Google Scholar
Gentleman, R. C. et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5, R80 (2004).
Article Google Scholar
Brazma, A. On the importance of standardisation in life sciences. Bioinformatics 17, 113–114 (2001).
Article CAS Google Scholar
Brazma, A., Robinson, A., Cameron, G. & Ashburner, M. One-stop shop for microarray data. Commentary. Nature 403, 699–700 (2000).
Article CAS Google Scholar
Spellman, P. A status report on MAGE. Bioinformatics 21, 3459–3460 (2005).
Article CAS Google Scholar
Whetzel, P. L. et al. The MGED Ontology; a resource for semantics-based description of microarray experiments. Bioinformatics 22, 866–873 (2006).
Article CAS Google Scholar
Eyre, T. A. et al. The HUGO Gene Nomenclature Database, updates. Nucleic Acids Res. 34, D319–D321 (2006).
Article CAS Google Scholar
Schlitt, T. & Brazma A. Modelling gene networks at different organisational levels. FEBS Lett. 579, 1859–1866 (2005).
Article CAS Google Scholar
Schlitt, T. & Brazma A. Modelling in molecular biology: describing transcription regulatory networks. Philos. Trans. R. Soc. B 361, 483–494 (2006).
Article CAS Google Scholar
Bard, J., Rhee, S.Y. & Ashburner, M. An ontology for cell types. Genome Biol. 6, R21 (2005).
Article Google Scholar
Kelso, J. et al. eVOC: a controlled vocabulary for unifying gene expression data. Genome Res. 13, 1222–1230 (2003).
Article CAS Google Scholar
Bard, J. B. & Rhee, S.Y. Ontologies in biology: design, applications and future challenges. Nature Rev. Genet. 5, 213–222 (2004).
Article CAS Google Scholar
Hermjakob, H. et al. The HUPO PSI's molecular interaction format — a community standard for the representation of protein interaction data. Nature Biotechnol. 22, 177–183 (2004). The PSI aims to define community standards for data representation in proteomics to facilitate data comparison, exchange and verification. The data exchange format for protein–protein interactions PSI-MI was designed by a group of people including representatives from database providers and users in both academia and industry, and is supported by the DIP, MINT, IntAct, BIND and HPRD databases.
Article CAS Google Scholar
Luciano, J. S. PAX of mind for pathway researchers. Drug Discov. Today. 10, 937–942 (2005).
Article CAS Google Scholar
Joshi-Tope, G. et al. Reactome: a knowledgebase of biological pathways. Nucleic Acids Res. 33, D428–D432 (2005).
Article CAS Google Scholar
Tyson, J. J. Modeling the cell division cycle: cdc2 and cyclin interactions. Proc. Natl Acad. Sci. USA 88, 7328–7332 (1991).
Article CAS Google Scholar
Huang, C. Y. & Ferrell, J. E. Jr. Ultrasensitivity in the mitogen-activated protein kinase cascade. Proc. Natl Acad. Sci. USA 93, 10078–10083 (1996).
Article CAS Google Scholar
Stromback, L. & Lambrix, P. Representations of molecular pathways: an evaluation of SBML, PSI MI and BioPAX. Bioinformatics. 21, 4401–4407 (2005).
Article CAS Google Scholar
Le Novere, N. et al. Minimum information requested in the annotation of biochemical models (MIRIAM). Nature Biotechnol. 23, 1509–1515 (2005).
Article CAS Google Scholar
Le Novere, N. et al. BioModels Database: a free, centralized database of curated, published, quantitative kinetic models of biochemical and cellular systems. Nucleic Acids Res. 34, D689–D691 (2006).
Article CAS Google Scholar
Ball, C. A. et al. Submission of microarray data to public repositories. PLoS Biol. e317 (2004).
Stoeckert, C. J., Quackenbush, J., Brazma, A. & Ball, C. A. Minimum information about a functional genomics experiment: the state of microarray standards and their extension to other technologies. Drug Discov. Today 3, 159–164 (2004).
Article CAS Google Scholar
Brazma, A. et al. ArrayExpress — a public repository for microarray gene expression data at the EBI. Nucleic Acids Res. 31, 68–71 (2003).
Article CAS Google Scholar
Barrett, T. et al. NCBI GEO: mining millions of expression profiles — database and tools. Nucleic Acids Res. 33, D562–D566 (2005).
Article CAS Google Scholar
Gollub, J. et al. The Stanford Microarray Database: data access and quality assessment tools. Nucleic Acids Res. 31, 94–96 (2003).
Article CAS Google Scholar
Sarkans, U. et al. The ArrayExpress gene expression database: a software engineering and implementation perspective. Bioinformatics, 21, 1495–1501 (2005).
Article CAS Google Scholar
Orchard, S., Hermjakob, H., Taylor, C., Aebersold, R. & Apweiler, R. Human Proteome Organisation Proteomics Standards Initiative. Pre-Congress Initiative. Proteomics 5, 4651–4652 (2005).
Article CAS Google Scholar
Orchard, S. et al. Common interchange standards for proteomics data: public availability of tools and schema. Proteomics 4, 490–491 (2004).
Article CAS Google Scholar
Taylor, C. F. et al. A systematic approach to modeling, capturing, and disseminating proteomics experimental data. Nature Biotechnol. 21, 247–254 (2003).
Article CAS Google Scholar
Jenkins, H. et al. A proposed framework for the description of plant metabolomics experiments and their results. Nature Biotechnol. 22, 1601–1606 (2004).
Article CAS Google Scholar
Fogh, R. et al. The CCPN project: an interim report on a data model for the NMR community. Nature Struct. Biol. 9, 416–418 (2002).
Article CAS Google Scholar
Lindon, J. C. et al. Standard Metabolic Reporting Structures working group. Summary recommendations for standardization and reporting of metabolic analyses. Nature Biotechnol. 23, 833–838 (2005). The SMRS group aims to supply an open, community-driven specification for the reporting of metabonomic/metabolomic experiments and a standard file transfer format for the data. Participants in the SMRS include leaders in the fields of metabonomics and metabolomics from both industry and academia.
Article CAS Google Scholar
Goldberg, I. G. et al. The Open Microscopy Environment (OME) data model and XML file: open tools for informatics and quantitative analysis in biological imaging. Genome Biol. 6, R47 (2005).
Article Google Scholar
Jones, A., Hunt, E., Wastling, J. M., Pizarro, A. & Stoeckert, C. J. Jr. An object model and database for functional genomics. Bioinformatics 20, 1583–1590 (2004).
Article CAS Google Scholar
Xirasagar, S. et al. CEBS object model for systems biology data, SysBio-OM. Bioinformatics 20, 2004–2015 (2004).
Article CAS Google Scholar
Rendl, M., Lewis, L. & Fuchs, E. Molecular dissection of mesenchymal–epithelial interactions in the hair follicle. PLoS Biol. 3, e331 (2005).
Article Google Scholar
Cassman, M. Barriers to progress in systems biology. Nature 438, 1079 (2005).
Article CAS Google Scholar
Quackenbush, J. et al. Top-down standards will not serve systems biology. Nature 440, 24 (2006).
Article CAS Google Scholar
Raychaudhuri, S., Chang, J. T., Sutphin, P. D. & Altman, R. B. Associating genes with gene ontology codes using a maximum entropy analysis of biomedical literature. Genome Res. 12, 203–214 (2002).
Article CAS Google Scholar
[Editorial] Microarray standards at last. Nature 419, 323 (2002).
Dolin, R. H. et al. HL7 clinical document architecture, Release 2. J. Am. Med. Inform. Assoc. 13, 30–39 (2006).
Article Google Scholar
Carr, S. et al. Working Group on Publication Guidelines for Peptide and Protein Identification Data. The need for guidelines in publication of peptide and protein identification data. Mol. Cell. Proteomics 3, 531–533 (2004).
Article CAS Google Scholar
Jones, A., Wastling, J. & Hunt, E. Proposal for a standard representation of two-dimensional gel electrophoresis data. Comp. Funct. Genomics 5, 492–501 (2003).
Article Google Scholar
Pedrioli, P. G. et al. A common open representation of mass spectrometry data and its application to proteomics research. Nature Biotechnol. 22, 1459–1466 (2004).
Article CAS Google Scholar

Download references

Acknowledgements

We would like to thank M. Ashburner, C. Brooksbank, H. Hermjakob and N. Le Novere for reading the manuscript and providing valuable comments. The work on this survey was partly funded by the MolPAGE grant from the European Commission and a grant from the US National Human Genome Research Institute and National Institute of Biomedical Imaging and Bioengineering.

Author information

Authors and Affiliations

European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK
Alvis Brazma, Maria Krestyaninova & Ugis Sarkans

Authors

Alvis Brazma
View author publications
You can also search for this author in PubMed Google Scholar
Maria Krestyaninova
View author publications
You can also search for this author in PubMed Google Scholar
Ugis Sarkans
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alvis Brazma.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Glossary

Domain: A field of study.
Conceptual model: In information engineering, a model that is meant to facilitate human communication; it does not need to be absolutely precise, as opposed to a formal model that has strictly defined semantics.
Data exchange format: A file or message format that is formally defined so that software can be built that 'knows' where to find various pieces of information.
Ontology: A model that describes a domain and can be used to reason about objects and relationships between them.
Tool: In softwa re engineering, a program or set of programs that enables a certain task(s).
Directed acyclic graph: A graph consisting of nodes and edges, where edges have direction (that is, can be traversed only one way), and it is not possible to find a set of edges that form a closed loop.
Diagram: A visual representation of concepts and relationships, used in information engineering to facilitate human communication.
Semantics: The meaning of something; in computer science, it is usually used in opposition to syntax (that is, format).
Reporting requirements: An agreed set of information items that needs to be provided for meaningful information communication (reporting).
Metabolomics: The study of metabolite profiles in individual cells and cell types.
Metabonomics: The study of systemic response to the pathophysiological stimuli and regulation of function in the whole organism through analysis of biofluids and tissues.
Visual language: In computer science and computer engineering, an agreed set of conventions for drawing diagrams that formally describe a model or a program.
Class: A concept used in ontology engineering and model building for referring to a set of objects with similar properties.
Graph: A visual representation of information in the form of edges (lines) and nodes (connection points). In biology, graphs can be represented as boxes (nodes) and lines between boxes.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Brazma, A., Krestyaninova, M. & Sarkans, U. Standards for systems biology. Nat Rev Genet 7, 593–605 (2006). https://doi.org/10.1038/nrg1922

Download citation

Issue Date: 01 August 2006
DOI: https://doi.org/10.1038/nrg1922

This article is cited by

A minimum information standard for reproducing bench-scale bacterial cell growth and productivity
- Ariel Hecht
- James Filliben
- Marc Salit
Communications Biology (2018)
Image Data Resource: a bioimage data integration and publication platform
- Eleanor Williams
- Josh Moore
- Jason R Swedlow
Nature Methods (2017)
Compliance with minimum information guidelines in public metabolomics repositories
- Rachel A. Spicer
- Reza Salek
- Christoph Steinbeck
Scientific Data (2017)
e!DAL - a framework to store, share and publish research data
- Daniel Arend
- Matthias Lange
- Uwe Scholz
BMC Bioinformatics (2014)
Biomarkers in autism spectrum disorder: the old and the new
- Barbara Ruggeri
- Ugis Sarkans
- Antonio M. Persico
Psychopharmacology (2014)

Standards for systems biology

Key Points

Abstract

Similar content being viewed by others

Orchestrating single-cell analysis with Bioconductor

Tutorial: guidelines for annotating single-cell transcriptomic maps using automated and manual methods

Initial recommendations for performing, benchmarking and reporting single-cell proteomics experiments

Main

Box 1 | How biologists could benefit from widely adopted standards

Box 2 | An example of a simple standard

Box 3 | Modelling and data representation formalisms and languages

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Related links

FURTHER INFORMATION

Glossary

Rights and permissions

About this article

Cite this article

This article is cited by

A minimum information standard for reproducing bench-scale bacterial cell growth and productivity

Image Data Resource: a bioimage data integration and publication platform

Compliance with minimum information guidelines in public metabolomics repositories

e!DAL - a framework to store, share and publish research data

Biomarkers in autism spectrum disorder: the old and the new

Search

Quick links

Key Points

Abstract

Similar content being viewed by others

Main

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Related links

Related links

FURTHER INFORMATION

Glossary

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links