High frequency of shared clonotypes in human B cell receptor repertoires

Soto, Cinque; Bombardi, Robin G.; Branchizio, Andre; Kose, Nurgun; Matta, Pranathi; Sevy, Alexander M.; Sinkovits, Robert S.; Gilchuk, Pavlo; Finn, Jessica A.; Crowe, James E.

doi:10.1038/s41586-019-0934-8

Letter
Published: 13 February 2019

High frequency of shared clonotypes in human B cell receptor repertoires

Cinque Soto^1,2^na1,
Robin G. Bombardi¹^na1,
Andre Branchizio¹,
Nurgun Kose¹,
Pranathi Matta¹,
Alexander M. Sevy³,
Robert S. Sinkovits⁴,
Pavlo Gilchuk¹,
Jessica A. Finn³ &
…
James E. Crowe Jr^1,2,5

Nature volume 566, pages 398–402 (2019)Cite this article

26k Accesses
186 Citations
137 Altmetric
Metrics details

Subjects

Abstract

The human genome contains approximately 20 thousand protein-coding genes¹, but the size of the collection of antigen receptors of the adaptive immune system that is generated by the recombination of gene segments with non-templated junctional additions (on B cells) is unknown—although it is certainly orders of magnitude larger. It has not been established whether individuals possess unique (or private) repertoires or substantial components of shared (or public) repertoires. Here we sequence recombined and expressed B cell receptor genes in several individuals to determine the size of their B cell receptor repertoires, and the extent to which these are shared between individuals. Our experiments revealed that the circulating repertoire of each individual contained between 9 and 17 million B cell clonotypes. The three individuals that we studied shared many clonotypes, including between 1 and 6% of B cell heavy-chain clonotypes shared between two subjects (0.3% of clonotypes shared by all three) and 20 to 34% of λ or κ light chains shared between two subjects (16 or 22% of λ or κ light chains, respectively, were shared by all three). Some of the B cell clonotypes had thousands of clones, or somatic variants, within the clonotype lineage. Although some of these shared lineages might be driven by exposure to common antigens, previous exposure to foreign antigens was not the only force that shaped the shared repertoires, as we also identified shared clonotypes in umbilical cord blood samples and all adult repertoires. The unexpectedly high prevalence of shared clonotypes in B cell repertoires, and identification of the sequences of these shared clonotypes, should enable better understanding of the role of B cell immune repertoires in health and disease.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Estimates of the diversity of V3J clonotypes from three healthy adult subjects.**

**Fig. 2: Shared clonotypes between three healthy adult subjects.**

**Fig. 3: Occurrence of public V3J clonotypes that are shared in adult and cord blood repertoires.**

Massively parallel single-cell B-cell receptor sequencing enables rapid discovery of diverse antigen-reactive antibodies

Article Open access 09 August 2019

More than one antibody of individual B cells revealed by single-cell immune profiling

Article Open access 10 December 2019

Genetic background and immunological status influence B cell repertoire diversity in mice

Article Open access 03 October 2019

Data availability

Sequencing data for HIP and CORD datasets have been deposited in the NCBI Sequence Read Archive under project number PRJNA511481. FASTA files for Adaptive Biotechnologies datasets used for analyses are available from https://github.com/crowelab/PyIR. Any other relevant data are available from the corresponding author upon reasonable request.

References

Ezkurdia, I. et al. Multiple evidence strands suggest that there may be as few as 19,000 human protein-coding genes. Hum. Mol. Genet. 23, 5866–5878 (2014).
Article CAS Google Scholar
Zalocusky, K. A. et al. The 10,000 immunomes project: building a resource for human immunology. Cell Rep. 25, 513–522 (2018).
Article CAS Google Scholar
Ye, J., Ma, N., Madden, T. L. & Ostell, J. M. IgBLAST: an immunoglobulin variable domain sequence analysis tool. Nucleic Acids Res. 41, W34–W40 (2013).
Article Google Scholar
Hsieh, T. C., Ma, K. H. & Chao, A. iNEXT: an R package for rarefaction and extrapolation of species diversity (Hill numbers). Methods Ecol. Evol. 7, 1451–1456 (2016).
Article Google Scholar
Kaplinsky, J. & Arnaout, R. Robust estimates of overall immune-repertoire diversity from high-throughput measurements on samples. Nat. Commun. 7, 11881 (2016).
Article ADS CAS Google Scholar
Trepel, F. Number and distribution of lymphocytes in man. A critical analysis. Klin. Wochenschr. 52, 511–515 (1974).
Article CAS Google Scholar
DeWitt, W. S. et al. A public database of memory and naive B-cell receptor sequences. PLoS ONE 11, e0160853 (2016).
Article Google Scholar
Arnaout, R. et al. High-resolution description of antibody heavy-chain repertoires in humans. PLoS ONE 6, e22365 (2011).
Article ADS CAS Google Scholar
Boyd, S. D. et al. Measurement and clinical monitoring of human lymphocyte clonality by massively parallel V-D-J pyrosequencing. Sci. Transl. Med. 1, 12ra23 (2009).
Article Google Scholar
Correia, B. E. et al. Proof of principle for epitope-focused vaccine design. Nature 507, 201–206 (2014).
Article ADS CAS Google Scholar
Jardine, J. G. et al. HIV-1 broadly neutralizing antibody precursor B cells revealed by germline-targeting immunogen. Science 351, 1458–1463 (2016).
Article ADS CAS Google Scholar
Briney, B. et al. Tailored immunogens direct affinity maturation toward HIV neutralizing antibodies. Cell 166, 1459–1470 (2016).
Article CAS Google Scholar
Crowe, J. E. Jr. Principles of broad and potent antiviral human antibodies: insights for vaccine design. Cell Host Microbe 22, 193–206 (2017).
Article CAS Google Scholar
Krause, J. C. et al. Epitope-specific human influenza antibody repertoires diversify by B cell intraclonal sequence divergence and interclonal convergence. J. Immunol. 187, 3704–3711 (2011).
Article CAS Google Scholar
Xu, R. et al. A recurring motif for antibody recognition of the receptor-binding site of influenza hemagglutinin. Nat. Struct. Mol. Biol. 20, 363–370 (2013).
Article CAS Google Scholar
de Bourcy, C. F. A., Dekker, C. L., Davis, M. M., Nicolls, M. R. & Quake, S. R. Dynamics of the human antibody repertoire after B cell depletion in systemic sclerosis. Sci. Immunol. 2, eaan8289 (2017).
Article Google Scholar
Pederson, T. The immunome. Mol. Immunol. 36, 1127–1128 (1999).
Article CAS Google Scholar
Briney, B. S., Willis, J. R., Finn, J. A., McKinney, B. A. & Crowe, J. E. Jr. Tissue-specific expressed antibody variable gene repertoires. PLoS ONE 9, e100839 (2014).
Article ADS Google Scholar
DeKosky, B. J. et al. High-throughput sequencing of the paired human immunoglobulin heavy and light chain repertoire. Nat. Biotechnol. 31, 166–169 (2013).
Article CAS Google Scholar
DeKosky, B. J. et al. In-depth determination and analysis of the human paired heavy- and light-chain antibody repertoire. Nat. Med. 21, 86–91 (2015).
Article CAS Google Scholar
Crooks, G. E., Hon, G., Chandonia, J. M. & Brenner, S. E. WebLogo: a sequence logo generator. Genome Res. 14, 1188–1190 (2004).
Article CAS Google Scholar
Diss, T. C., Liu, H. X., Du, M. Q. & Isaacson, P. G. Improvements to B cell clonality analysis using PCR amplification of immunoglobulin light chain genes. Mol. Pathol. 55, 98–101 (2002).
Article CAS Google Scholar
Smith, K. et al. Rapid generation of fully human monoclonal antibodies specific to a vaccinating antigen. Nat. Protoc. 4, 372–384 (2009).
Article CAS Google Scholar
van Dongen, J. J. et al. Design and standardization of PCR primers and protocols for detection of clonal immunoglobulin and T-cell receptor gene recombinations in suspect lymphoproliferations: report of the BIOMED-2 Concerted Action BMH4-CT98-3936. Leukemia 17, 2257–2317 (2003).
Article Google Scholar
Khan, T. A. et al. Accurate and predictive antibody repertoire profiling by molecular amplification fingerprinting. Sci. Adv. 2, e1501371 (2016).
Article ADS Google Scholar
Andrews, S. FastQC: a quality control tool for high throughput sequence data. https://www.bioinformatics.babraham.ac.uk/projects/fastqc/.
Edgar, R. C. & Flyvbjerg, H. Error filtering, pair assembly and error correction for next-generation sequencing reads. Bioinformatics 31, 3476–3482 (2015).
Article CAS Google Scholar
Roehr, J. T., Dieterich, C. & Reinert, K. Flexbar 3.0 – SIMD and multicore parallelization. Bioinformatics 33, 2941–2942 (2017).
Article CAS Google Scholar
Rognes, T., Flouri, T., Nichols, B., Quince, C. & Mahé, F. VSEARCH: a versatile open source tool for metagenomics. PeerJ 4, e2584 (2016).
Article Google Scholar

Download references

Acknowledgements

We thank M. Mayo and A. Pruijssers for regulatory and human subjects support; G. Sapparapu and O. Koues for technical help; Y. Umareddy for assistance with R; S. B. Day for assistance with artwork; scientists at the VANTAGE core of Vanderbilt University Medical Center (VUMC), Adaptive Biotechnologies, the Genomic Services Laboratory at the Hudson Alpha Institute for Biotechnology, and D. Zhang and team at Abhelix; New England BioLabs for early access to pre-release Abseq reagents; K. Trochez and J. Janssen of the Clinical Trials Center at VUMC and staff and physicians of the Vanderbilt University Medical Center leukapheresis clinic for assistance with large-scale human cell collections; and S. Mallal and M. Pilkinton (Vanderbilt), R. Scheuermann (JCVI), and W. Koff, T. Schenkelberg and the Advisory Board of the Human Vaccines Project for helpful discussions. This work was conducted in part using the resources of the Advanced Computing Center for Research and Education (ACCRE) at Vanderbilt University and the San Diego Supercomputer Center at the University of California, San Diego. We acknowledge the use of cord blood cells procured by the National Disease Research Interchange (NDRI) with support from NIH grant U42 OD11158. This work was supported by a grant from the Human Vaccines Project, and institutional funding from Vanderbilt University Medical Center.

Reviewer information

Nature thanks R. Arnaout, F. Breden, A. McHardy and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Author information

These authors contributed equally: Cinque Soto, Robin G. Bombardi

Authors and Affiliations

The Vanderbilt Vaccine Center, Vanderbilt University Medical Center, Nashville, TN, USA
Cinque Soto, Robin G. Bombardi, Andre Branchizio, Nurgun Kose, Pranathi Matta, Pavlo Gilchuk & James E. Crowe Jr
Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN, USA
Cinque Soto & James E. Crowe Jr
Chemical and Physical Biology Program, Vanderbilt University, Nashville, TN, USA
Alexander M. Sevy & Jessica A. Finn
San Diego Supercomputer Center, University of California, San Diego, San Diego, CA, USA
Robert S. Sinkovits
Department of Pathology, Microbiology and Immunology, Vanderbilt University Medical Center, Nashville, TN, USA
James E. Crowe Jr

Authors

Cinque Soto
View author publications
You can also search for this author in PubMed Google Scholar
Robin G. Bombardi
View author publications
You can also search for this author in PubMed Google Scholar
Andre Branchizio
View author publications
You can also search for this author in PubMed Google Scholar
Nurgun Kose
View author publications
You can also search for this author in PubMed Google Scholar
Pranathi Matta
View author publications
You can also search for this author in PubMed Google Scholar
Alexander M. Sevy
View author publications
You can also search for this author in PubMed Google Scholar
Robert S. Sinkovits
View author publications
You can also search for this author in PubMed Google Scholar
Pavlo Gilchuk
View author publications
You can also search for this author in PubMed Google Scholar
Jessica A. Finn
View author publications
You can also search for this author in PubMed Google Scholar
James E. Crowe Jr
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

R.G.B., C.S. and J.E.C. planned the studies. C.S., R.G.B., A.B., R.S.S., N.K., P.M., P.G., J.A.F. and A.M.S. conducted experiments. R.G.B., C.S., A.B., R.S.S., A.M.S. and J.E.C. interpreted the studies. C.S., R.G.B. and J.E.C. wrote the first draft of the paper. All authors reviewed, edited and approved the paper. J.E.C. obtained funding.

Corresponding author

Correspondence to James E. Crowe Jr.

Ethics declarations

Competing interests

J.E.C. has served as a consultant for Sanofi and Pfizer, is on the Scientific Advisory Boards of CompuVax and Meissa Vaccines, is a recipient of research grants from Takeda, Sanofi and Moderna, and is founder of IDBiologics. All other authors declare no conflicts of interest.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Repertoire properties for immunoglobulin V3J clonotype data belonging to HIP1–HIP3.

a, Normalized frequency histogram of HCDR3 sequence lengths belonging to immunoglobulin heavy-chain V3J clonotypes for HIP1 (left, n = 8,623,076 unique HCDR3s, with a median length of 16 amino acids), HIP2 (middle, n = 15,413,214 unique HCDR3s, with a median length of 16 amino acids) and HIP3 (right, n = 7,081,314 unique HCDR3s, with a median length of 15 amino acids). b, Normalized frequency histogram of germline divergence values for HIP1 (left), HIP2 (middle) and HIP3 (right). Germline divergence was defined as 100 per cent minus the per cent nucleotide identity that a read had with its closest matching germline variable (V) gene sequence. Median per cent germline divergence values for HIP1, HIP2 and HIP3 were 3, 0 and 2, respectively. c, Normalized frequency histogram of germline divergence values by isotype for HIP1 (left), HIP2 (middle) and HIP3 (right). The median germline divergence was 0 for all IgM datasets. All isotype data were obtained from the AbHelix sequencing method. d, Heat map representation of unique V_H + J_H recombinations in HIP1, HIP2 and HIP3. The data from each set were transformed to obtain z-scores, using the mean and s.d. In this figure, the IGH prefix is omitted from the gene symbols for V and J genes.

Source data

Extended Data Fig. 2 Extent of sharing between immunoglobulin clonotypes belonging to HIP1–HIP3.

a, Normalized frequency histogram of HCDR3 sequence lengths belonging to V3J clonotypes from HIP1+2+3_all (blue filled curve, n = 30,156,947 unique HCDR3s, with a median length of 16 amino acids) and HIP1+2+3_shared (grey bins, n = 22,934 unique HCDR3s, with a median length of 13 amino acids). Medians were statistically different, based on a two-tailed Mann–Whitney U-test with a P < 2.2 × 10⁻¹⁶ (at an α = 0.05). b, Normalized frequency histogram of HCDR3 lengths belonging to all V3DJ clonotypes from HIP1 (n = 1,750,325 unique HCDR3s, with a median length of 19 amino acids), HIP2 (n = 3,889,527 unique HCDR3s, with a median length of 19 amino acids) and HIP3 (n = 1,437,339 unique HCDR3s, with a median length of 19 amino acids). c, Cumulative distribution of normalized VDJ triple frequencies used for simulation. HIP1, n = 4,371 unique VDJ triples; HIP2, n = 4,346 unique VDJ triples; and HIP3, n = 4,370 unique VDJ triples. d, log–log frequency plot between experimental and synthetic HCDR3 lengths. The Pearson correlation coefficient r = 1.00 with a P < 2.2 × 10⁻¹⁶ (at an α = 0.05) (n = 26 CDR3 length bins for each set). e, Normalized frequency histogram of V3DJ overlap counts between all three synthetic HIP distributions (n = 3,641 common clonotypes between sequenced repertoires). f, V3J clonotypes with the largest numbers of somatic variants. Numbers in parentheses denote counts for the number of unique somatic variants associated with a V3J clonotype for HIP1, HIP2 and HIP3. g, Percentage overlaps for the Igκ V3J clonotypes from the experimentally determined repertoires belonging to HIP1–HIP3. h, Percentage overlaps for Igλ V3J clonotypes from the experimentally determined repertoires belonging to HIP1–HIP3.

Source data

Extended Data Fig. 3 Shared immunoglobulin heavy-chain clonotypes for three cord blood samples.

a, V3DJ clonotype overlaps from three cord blood samples, CORD1 (n = 40,480 unique V3DJ clonotypes), CORD2 (n = 66,718 unique V3DJ clonotypes) and CORD3 (n = 105,555 unique V3DJ clonotypes). b, Cumulative distribution of normalized VDJ triple frequencies for CORD1 (n = 2,273 unique VDJ triples), CORD2 (n = 2,788 unique VDJ triples) and CORD3 (n = 3,002 unique VDJ triples). c, log–log frequency plot between experimental and synthetic CDR3 lengths. The Pearson correlation coefficient r = 1.00 with a P < 2.2 × 10⁻¹⁶ (at an α = 0.05) (n = 21 bins for each set). It should be noted that there were no V3DJ clonotypes with HCDR3s that were less than eight amino acids in length. d, Normalized frequency histogram of V3DJ overlap counts between all three synthetic CORD distributions (n = 45 common clonotypes between all three sequenced repertoires). e, V3J clonotypes identified in HIP1, HIP2 and HIP3 (HIP1+2+3_all) were combined with an independently derived set of immunoglobulin heavy-chain V3J clonotypes for which sequences were publicly available⁷. Starting from the combined set of 59,193,994 clonotypes from six adult immunoglobulin heavy-chain repertoires, each of the three cord blood sets was scanned in a serial fashion, and only the common clonotypes were kept. A total of 130 shared V3J clonotypes was identified.

Source data

Extended Data Fig. 4 Schematic showing bioinformatics sequence processing.

The flow chart shows how a typical sequencing run using paired-ends reads from Illumina was processed using the bioinformatics pipeline. Detailed descriptions for each of the programs used in the pipeline can be found in Supplementary Methods.

Extended Data Fig. 5 Schematic showing placement of primers.

Annotated example of a biological sequence obtained from the two-step barcoded library preparation protocol. The red and yellow regions show the placement of the first and second steps of PCR amplification. The cyan region shows the location of the RID-tagged reverse transcription gene-specific primer.

Extended Data Table 1 Research subject demographics

Full size table

Extended Data Table 2 Summary of sequencing methods and cell counts

Full size table

Extended Data Table 3 One-step RT–PCR primers used in this study

Full size table

Extended Data Table 4 Two-step RT–PCR primers used in this study

Full size table

Supplementary information

Supplementary Information

This file contains Supplementary Methods and References

Reporting Summary

Source data

Source Data Fig. 1

Source Data Fig. 2

Source Data Fig. 3

Source Data Extended Data Fig. 1

Source Data Extended Data Fig. 2

Source Data Extended Data Fig. 3

Rights and permissions

Reprints and permissions

About this article

Cite this article

Soto, C., Bombardi, R.G., Branchizio, A. et al. High frequency of shared clonotypes in human B cell receptor repertoires. Nature 566, 398–402 (2019). https://doi.org/10.1038/s41586-019-0934-8

Download citation

Received: 06 November 2017
Accepted: 14 January 2019
Published: 13 February 2019
Issue Date: 21 February 2019
DOI: https://doi.org/10.1038/s41586-019-0934-8

This article is cited by

The rise of big data: deep sequencing-driven computational methods are transforming the landscape of synthetic antibody design
- Eugenio Gallo
Journal of Biomedical Science (2024)
Systematic evaluation of B-cell clonal family inference approaches
- Daria Balashova
- Barbera D. C. van Schaik
- Antoine H. C. van Kampen
BMC Immunology (2024)
Adaptive immune receptor repertoire analysis
- Vanessa Mhanna
- Habib Bashour
- Encarnita Mariotti-Ferrandiz
Nature Reviews Methods Primers (2024)
Redefining serological diagnostics with immunoaffinity proteomics
- Jonathan Walter
- Zicki Eludin
- Andrei P. Drabovich
Clinical Proteomics (2023)
Genetic variation in the immunoglobulin heavy chain locus shapes the human antibody repertoire
- Oscar L. Rodriguez
- Yana Safonova
- Corey T. Watson
Nature Communications (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

References

Acknowledgements

Reviewer information

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Extended data figures and tables

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links