Imagine a world where genomics and clinical data travel seamlessly between repositories at different institutions around the world; where harmonized standardized data formats and consent processes enable pooling of sequence data; and where standardized guidelines exist for informing patients and their families of the pathogenic significance of variations in their genome sequences. Making such a world a reality is the aim of the Global Alliance, an international initiative that aims to create universal interoperability standards and guidelines for genomics and medical data.

In recent weeks, the alliance (http://www.broadinstitute.org/news/globalalliance) published a white paper outlining its draft mission, goals and core principles. This was accompanied by a letter of intent with >70 signatories from medical, research and advocacy organizations in 41 countries, including such funders as the US National Institutes of Health, the UK's Wellcome Trust and Genome Canada. Since that time, another 20 organizations have signed the letter and the alliance has received numerous other expressions of interest.

The initiative is important because it recognizes a major impediment to human genomics research—the sequestration of data within individual institutions and the difficulty of sharing data securely in the public domain. Both databases and the literature remain replete with descriptions of sequence variants that are ill-defined or inappropriate, despite the existence of standardized nomenclatures. These issues, and the lack of international data formatting and exchange standards, not only make sharing and pooling of data problematic, but also impede progress in interpreting what human genetic variation means.

The ability to share data is important because it is becoming clear that sequence data from tens of thousands of people will be required to unravel the clinical significance of common genetic variation; indeed, millions of study participants will likely be required to identify interactions between genes and lifestyle risk factors and to unravel additive, epistatic interactions. One need only look at the recent COGS (Collaborative Oncological Gene-environment Study) collaboration (http://www.nature.com/icogs/), which gathered sequence data from over 200,000 people to identify 74 new susceptibility loci for breast, ovarian and prostate cancer.

Currently, COGS—and studies like it—take years to complete because no one institution has the resources to gather data on this scale, and pooling nonstandardized genetic data from different institutions is long and grueling work. These difficulties also impair progress in rare disease research, which could benefit from access to sequence data worldwide.

The problem of data silos, if left unattended, is only going to get worse. More and more sequencing is taking place in academic medical centers, blurring the line between the research setting and the clinic. Clinical research requires much more stringent quality criteria and attention to ethical norms (confidentiality/anonymity of patient information, consent procedures and patient consultation as to the clinical significance of findings). As a result, much of the data obtained from clinical work in the past has been off-limits to research. But this doesn't make much sense going forward and the Global Alliance emphasizes the importance of autonomy—that patients should decide whether and how their data are shared—as potentially tens, perhaps hundreds, of thousands of exomes are sequenced in the clinic. Even today, diagnostic laboratories have more clinical information on sequence variants than is present in the literature—the question remains how to incentivize them to spend the time and effort in posting variant data in open repositories like ClinVar.

Finally, and perhaps most importantly, the forces of commercial balkanization and annexation provide even greater urgency to efforts to encourage genome data exchange in the public domain. The past year has witnessed extensive consolidation in the sequencing and molecular diagnostics sectors, with a handful of companies now monopolizing the market. In 2012, Life Technologies acquired direct-to-consumer (DTC) genomics provider Navigenics for its clinical testing services and 23andMe consolidated its stranglehold on the DTC market through lowball pricing. Earlier this year, Illumina also bought the diagnostic startup Verinata and has kept on undercutting competitors through its Illumina Genome Network for clinical sequencing. At the same time, there has been a boom in startups, such as GenomeQuest, Knome, Omicia, Personalis, Real Time Genomics, Strand Life Sciences, SV Bio and SimulConsult, that offer software or services for clinical interpretation of genetic variation.

There's nothing wrong with this in and of itself, of course. But many of these businesses are currently being built around in-house databases that sequester data surrounding genetic variants and claim it as a trade secret.

One need look no further than Myriad Genetics. Since the company stopped sharing its BRCA1/BRCA2 variant information with the Breast Cancer Information Core database in 2006, it has been building a secret database that contains information on >14,300 variants of the two genes. And despite the recent establishment of the open Sharing Clinical Reports Project for BRCA1/BRCA2 data (see p. 713), many believe it will take years to accumulate sufficient data in the public domain to rival the power of Myriad's repository.

Another salutary lesson comes from electronic health records (EHRs). Spurred by the US Health Information Technology for Economic and Clinical Health Act, the adoption of EHRs has increased from 10% to 40% in the past four years. The problem is, though, that most commercial EHR software cannot interface with that of competitors. Why? Because interoperability standards simply came too late to the conversation.

For all these reasons, it is imperative that the global research and clinical communities embrace and participate in the Global Alliance, engage in its working groups and find a means to fund it sustainably. The effort is necessary and it is necessary to act now. Anything less and human genomics research and clinical translation may be held back for years, even decades. Let's make sure it is not only industry that reaps the profits from interpreting our genomic heritage.