Broad immune activation underlies shared set point signatures for vaccine responsiveness in healthy individuals and disease activity in patients with lupus

Kotliarov, Yuri; Sparks, Rachel; Martins, Andrew J.; Mulè, Matthew P.; Lu, Yong; Goswami, Meghali; Kardava, Lela; Banchereau, Romain; Pascual, Virginia; Biancotto, Angélique; Chen, Jinguo; Schwartzberg, Pamela L.; Bansal, Neha; Liu, Candace C.; Cheung, Foo; Moir, Susan; Tsang, John S.

doi:10.1038/s41591-020-0769-8

Article
Published: 24 February 2020

Broad immune activation underlies shared set point signatures for vaccine responsiveness in healthy individuals and disease activity in patients with lupus

Yuri Kotliarov¹^na1^nAff7,
Rachel Sparks²^na1,
Andrew J. Martins²,
Matthew P. Mulè^2,3,
Yong Lu²,
Meghali Goswami²^nAff8,
Lela Kardava⁴,
Romain Banchereau⁵^nAff9,
Virginia Pascual⁵^nAff10,
Angélique Biancotto¹^nAff11,
Jinguo Chen¹,
Pamela L. Schwartzberg^1,6,
Neha Bansal²,
Candace C. Liu²,
Foo Cheung¹,
Susan Moir⁴ &
…
John S. Tsang ORCID: orcid.org/0000-0003-3186-3047^1,2

Nature Medicine volume 26, pages 618–629 (2020)Cite this article

18k Accesses
97 Citations
53 Altmetric
Metrics details

Subjects

Abstract

Responses to vaccination and to diseases vary widely across individuals, which may be partly due to baseline immune variations. Identifying such baseline predictors of immune responses and their biological basis is of broad interest, given their potential importance for cancer immunotherapy, disease outcomes, vaccination and infection responses. Here we uncover baseline blood transcriptional signatures predictive of antibody responses to both influenza and yellow fever vaccinations in healthy subjects. These same signatures evaluated at clinical quiescence are correlated with disease activity in patients with systemic lupus erythematosus with plasmablast-associated flares. CITE-seq profiling of 82 surface proteins and transcriptomes of 53,201 single cells from healthy high and low influenza vaccination responders revealed that our signatures reflect the extent of activation in a plasmacytoid dendritic cell–type I IFN–T/B lymphocyte network. Our findings raise the prospect that modulating such immune baseline states may improve vaccine responsiveness and mitigate undesirable autoimmune disease activity.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Study questions and the derivation of a baseline, pre-vaccination signature predictive of response using an influenza vaccination cohort.**

**Fig. 2: Assessing TGSig in independent influenza vaccination, yellow fever vaccination and SLE datasets.**

**Fig. 3: A transcriptional correlate of plasmablast-associated disease activity in SLE is also associated with influenza vaccination responses and functionally related to TGSig.**

**Fig. 4: CITE-seq (simultaneous protein and transcriptome expression profiling in single cells) analysis of high and low influenza vaccine responders.**

**Fig. 5: Dissecting the cellular origin of baseline signatures.**

**Fig. 6: The correlative relationship among the single-cell-cluster-based and bulk (PBMC) signatures and a proposed cellular circuit whose pre-perturbation status determines future responsiveness.**

An autoantibody signature predictive for multiple sclerosis

Article 19 April 2024

Bispecific T cell engager therapy for refractory rheumatoid arthritis

Article 26 April 2024

FOXO1 is a master regulator of memory programming in CAR T cells

Article Open access 10 April 2024

Data availability

All data used in this study, including CITE-seq and flow data, are available at figshare (https://doi.org/10.35092/yhjc.c.4753772). The original/raw public gene expression data are available in the National Center for Biotechnology Information Gene Expression Omnibus (GEO) under accession numbers: GSE47353, GSE41080, GSE59654, GSE59743, GSE29619, GSE74817, GSE13486 and GSE65391 (see also Supplementary Table 2).

Code availability

The source code and software pipeline to reproduce our analyses can be assessed at https://github.com/kotliary/baseline.

References

Kotas, M. E. & Medzhitov, R. Homeostasis, inflammation, and disease susceptibility. Cell 160, 816–827 (2015).
CAS PubMed PubMed Central Google Scholar
Chen, D. S. & Mellman, I. Elements of cancer immunity and the cancer-immune set point. Nature 541, 321–330 (2017).
CAS PubMed Google Scholar
Tsang, J. S. Utilizing population variation, vaccination, and systems biology to study human immunology. Trends Immunol. 36, 479–493 (2015).
CAS PubMed PubMed Central Google Scholar
Willis, J. C. D. & Lord, G. M. Immune biomarkers: the promises and pitfalls of personalized medicine. Nat. Rev. Immunol. 15, 323 (2015).
CAS PubMed Google Scholar
Brodin, P. & Davis, M. M. Human immune system variation. Nat. Rev. Immunol. 17, 21–29 (2017).
CAS PubMed Google Scholar
Hagan, T. et al. Antibiotics-driven gut microbiome perturbation alters immunity to vaccines in humans. Cell 178, 1313–1328.e13 (2019).
CAS PubMed PubMed Central Google Scholar
Brodin, P. et al. Variation in the human immune system is largely driven by non-heritable influences. Cell 160, 37–47 (2015).
CAS PubMed PubMed Central Google Scholar
Marchant, A. et al. Predominant influence of environmental determinants on the persistence and avidity maturation of antibody responses to vaccines in infants. J. Infect. Dis. 193, 1598–1605 (2006).
PubMed Google Scholar
Krieg, C. et al. High-dimensional single-cell analysis predicts response to anti-PD-1 immunotherapy. Nat. Med. 24, 144 (2018).
CAS PubMed Google Scholar
HIPC-CHI Signatures Project Team & HIPC-I Consortium. Multicohort analysis reveals baseline transcriptional predictors of influenza vaccination responses. Sci. Immunol. 2, eaal4656 (2017).
Fourati, S. et al. Pre-vaccination inflammation and B-cell signalling predict age-related hyporesponse to hepatitis B vaccination. Nat. Commun. 7, 10369 (2016).
CAS PubMed PubMed Central Google Scholar
Tsang, J. S. et al. Global analyses of human immune variation reveal baseline predictors of postvaccination responses. Cell 157, 499–513 (2014).
CAS PubMed PubMed Central Google Scholar
Banchereau, R. et al. Personalized immunomonitoring uncovers molecular networks that stratify lupus patients. Cell 165, 551–565 (2016).
CAS PubMed PubMed Central Google Scholar
Tsokos, G. C., Lo, M. S., Reis, P. C. & Sullivan, K. E. New insights into the immunopathogenesis of systemic lupus erythematosus. Nat. Rev. Rheumatol. 12, 716–730 (2016).
CAS PubMed Google Scholar
Lee, J. C. & Smith, K. G. C. Prognosis in autoimmune and infectious disease: new insights from genetics. Clin. Transl. Immunol. 3, e15 (2014).
Google Scholar
Stoeckius, M. et al. Simultaneous epitope and transcriptome measurement in single cells. Nat. Methods 14, 865–868 (2017).
CAS PubMed PubMed Central Google Scholar
Lu, Y. et al. Systematic analysis of cell-to-cell expression variation of T lymphocytes in a human cohort identifies aging and genetic associations. Immunity 45, 1162–1175 (2016).
CAS PubMed PubMed Central Google Scholar
Nakaya, H. I. et al. Systems analysis of immunity to influenza vaccination across multiple years and in diverse populations reveals shared molecular signatures. Immunity 43, 1186–1198 (2015).
CAS PubMed PubMed Central Google Scholar
Querec, T. D. et al. Systems biology approach predicts immunogenicity of the yellow fever vaccine in humans. Nat. Immunol. 10, 116–125 (2009).
CAS PubMed Google Scholar
Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinf. 9, 559 (2008).
Google Scholar
Li, S. et al. Molecular signatures of antibody responses derived from a systems biology study of five human vaccines. Nat. Immunol. 15, 195–204 (2014).
PubMed Google Scholar
Muskardin, T. L. W. & Niewold, T. B. Type I interferon in rheumatic diseases. Nat. Rev. Rheumatol. 14, 214–228 (2018).
CAS PubMed PubMed Central Google Scholar
Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).
CAS PubMed PubMed Central Google Scholar
Swiecki, M. & Colonna, M. The multifaceted biology of plasmacytoid dendritic cells. Nat. Rev. Immunol. 15, 471–485 (2015).
CAS PubMed PubMed Central Google Scholar
Jego, G. et al. Plasmacytoid dendritic cells induce plasma cell differentiation through type I interferon and interleukin 6. Immunity 19, 225–234 (2003).
CAS PubMed Google Scholar
Gricks, C. S. et al. Differential regulation of gene expression following CD40 activation of leukemic compared to healthy B cells. Blood 104, 4002–4009 (2004).
CAS PubMed Google Scholar
Shimabukuro-Vornhagen, A. et al. Inhibition of protein geranylgeranylation specifically interferes with CD40-dependent B cell activation, resulting in a reduced capacity to induce T cell immunity. J. Immunol. 193, 5294–5305 (2014).
CAS PubMed Google Scholar
Elgueta, R. et al. Molecular mechanism and function of CD40/CD40L engagement in the immune system. Immunol. Rev. 229, 152–172 (2009).
CAS PubMed Google Scholar
Arriens, C., Wren, J. D., Munroe, M. E. & Mohan, C. Systemic lupus erythematosus biomarkers: the challenging quest. Rheumatology 56, i32–i45 (2017).
CAS PubMed Google Scholar
Verheul, M. K., Fearon, U., Trouw, L. A. & Veale, D. J. Biomarkers for rheumatoid and psoriatic arthritis. Clin. Immunol. 161, 2–10 (2015).
CAS PubMed Google Scholar
Banchereau, J. & Pascual, V. Type I interferon in systemic lupus erythematosus and other autoimmune diseases. Immunity 25, 383–392 (2006).
CAS PubMed Google Scholar
Quách, T. D. et al. Distinctions among circulating antibody-secreting cell populations, including B-1 cells, in human adult peripheral blood. J. Immunol. 196, 1060–1069 (2016).
PubMed Google Scholar
Andrews, S. F. et al. High preexisting serological antibody levels correlate with diversification of the influenza vaccine response. J. Virol. 89, 3308–3317 (2015).
CAS PubMed PubMed Central Google Scholar
He, X.-S. et al. Baseline levels of influenza-specific CD4 memory T-cells affect T-cell responses to influenza vaccines. PLoS One 3, e2574 (2008).
PubMed PubMed Central Google Scholar
Patin, E. et al. Natural variation in the parameters of innate immune cells is preferentially driven by genetic factors. Nat. Immunol. 19, 302–314 (2018).
CAS PubMed Google Scholar
Westra, H. J. et al. Systematic identification of trans eQTLs as putative drivers of known disease associations. Nat. Genet. 45, 1238–1243 (2013).
CAS PubMed PubMed Central Google Scholar
Klein, S. L. & Flanagan, K. L. Sex differences in immune responses. Nat. Rev. Immunol. 16, 626–638 (2016).
CAS PubMed Google Scholar
Rees, F., Doherty, M., Grainge, M. J., Lanyon, P. & Zhang, W. The worldwide incidence and prevalence of systemic lupus erythematosus: a systematic review of epidemiological studies. Rheumatology 56, 1945–1961 (2017).
PubMed Google Scholar
Zyla, J. et al. Gene set enrichment for reproducible science: comparison of CERNO and eight other algorithms. Bioinformatics 35, 5146–5154 (2019).
PubMed PubMed Central Google Scholar
Korotkevich, G., Sukhov, V. & Sergushichev, A. Fast gene set enrichment analysis. Preprint at bioRxiv https://doi.org/10.1101/060012 (2019).
van Lochem, E. G. et al. Immunophenotypic differentiation patterns of normal hematopoiesis in human bone marrow: reference patterns for age-related changes and disease-induced shifts. Cytometry B Clin. Cytom. 60, 1–13 (2004).
PubMed Google Scholar
Finak, G. et al. ImmuneSpaceR: A Thin Wrapper around the ImmuneSpace Database. https://rdrr.io/bioc/ImmuneSpaceR/ (2014).
Bhattacharya, S. et al. ImmPort: disseminating data to the public for the future of immunology. Immunol. Res. 58, 234–239 (2014).
CAS PubMed Google Scholar
Robin, X. et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinf. 12, 77 (2011).
Google Scholar
Bates, D., Bolker, B. M., Maechler, M. & Walker, S. C. Fitting linear mixed-effects models using lme4. J. Stat. Softw. https://doi.org/10.18637/jss.v067.i01 (2015).
Chaussabel, D. et al. A modular analysis framework for blood genomics studies: application to systemic lupus erythematosus. Immunity 29, 150–164 (2008).
CAS PubMed PubMed Central Google Scholar
Weiner, J. tmod: Feature Set Enrichment Analysis for Metabolomics and Transcriptomics. R package version 0.40 (2018); https://CRAN.R-project.org/package=tmod
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B 57, 289–300 (1995).
Google Scholar
Choi, J. K., Yu, U., Kim, S. & Yoo, O. J. Combining multiple microarray studies and modeling interstudy variation. Bioinformatics 19 (Suppl. 1), i84–i90 (2003).
PubMed Google Scholar
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).
CAS PubMed PubMed Central Google Scholar
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
CAS PubMed Google Scholar
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
PubMed PubMed Central Google Scholar
Wickham, H. ggplot2: Elegant Graphics for Data Analysis (Springer-Verlag, 2016).
Torchiano, M. effsize: Efficient Effect Size Computation. R package version 0.7.6 (2019); https://CRAN.R-project.org/package=effsize
Wang, X. et al. An R package suite for microarray meta-analysis in quality control, differentially expressed gene analysis and pathway enrichment detection. Bioinformatics 28, 2534–2536 (2012).
CAS PubMed PubMed Central Google Scholar
Seshan, V. clinfun: Clinical Trial Design and Data Analysis Functions. R package version 1.0.15 (2018); https://CRAN.R-project.org/package=clinfun
Yan, Y. MLmetrics: Machine Learning Evaluation Metrics. R package version 1.1.1 (2016); https://CRAN.R-project.org/package=MLmetrics
Stoeckius, M. et al. Cell Hashing with barcoded antibodies enables multiplexing and doublet detection for single cell genomics. Genome Biol. 19, 224 (2018).
CAS PubMed PubMed Central Google Scholar
Stoeckius, M. et al. CITE-seq protocols (2017); https://cite-seq.com/protocol/
Kang, H. M. et al. Multiplexed droplet single-cell RNA-sequencing using natural genetic variation. Nat. Biotechnol. 36, 89–94 (2018).
CAS PubMed Google Scholar
Roelli, P. et al. Hoohm/CITE-seq-Count: 1.4.2. (Zenodo, 2019); https://doi.org/10.5281/zenodo.2590196
Lun, A. T. L. & Risso, D. SingleCellExperiment: S4 Classes for Single Cell Data. R package version 1.4.1. (2019); https://bioconductor.org/packages/release/bioc/html/SingleCellExperiment.html
Mule, M. P., Martins, A. J. & Tsang, J. S. Normalizing and denoising protein expression data from droplet-based single cell profiling. bioRxiv (in the press).
Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).
PubMed PubMed Central Google Scholar
Lun, A. T., Bach, K. & Marioni, J. C. Pooling across cells to normalize single-cell RNA sequencing data with many zero counts. Genome Biol. 17, 75 (2016).
PubMed Google Scholar
Waltman, L. & Eck, N. Jvan A smart local moving algorithm for large-scale modularity-based community detection. Eur. Phys. J. B 86, 471 (2013).
Google Scholar
Zappia, L. & Oshlack, A. Clustering trees: a visualization for evaluating clusterings at multiple resolutions. GigaScience 7, giy083 (2018).
PubMed Central Google Scholar
Pederson, T. L. ggraph: An Implementation of Grammar of Graphics for Graphs and Networks. R package version 1.0.2. (2018); https://CRAN.R-project.org/package=ggraph
Carlson, M. hgu95av2.db: Affymetrix Human Genome U95 Set annotation data (chip hgu95av2). R package version 3.2.3. (2016); http://bioconductor.org/packages/release/data/annotation/html/hgu95av2.db.html
Chen, J., Bardes, E. E., Aronow, B. J. & Jegga, A. G. ToppGene Suite for gene list enrichment analysis and candidate gene prioritization. Nucleic Acids Res. 37, W305–W311 (2009).
CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank members of the Tsang Lab and the NIH Center for Human Immunology (CHI) for inputs and discussions; L. Failla for help with figures; D. Hirsch for testing the R code; and R. Germain and R. Apps for comments on the manuscript. This research was supported by the Intramural Research Program of the National Institute of Allergy and Infectious Diseases (NIAID) and Intramural Programs of the NIH Institutes supporting the Center for Human Immunology. Figure 6 was created with BioRender.com.

Author information

Yuri Kotliarov
Present address: Biometric Research Program, Division of Cancer Treatment and Diagnosis, NCI, NIH, Rockville, MD, USA
Meghali Goswami
Present address: Laboratory of Myeloid Malignancies, NHLBI, NIH, Bethesda, MD, USA
Romain Banchereau
Present address: Oncology Biomarker Development, Genentech Inc., South San Francisco, CA, USA
Virginia Pascual
Present address: Drukier Institute for Children’s Health at Weill Cornell Medicine, New York, NY, USA
Angélique Biancotto
Present address: Sanofi Precision Immunology, Immunology and Inflammation Therapeutic Area, Cambridge, MA, USA
These authors contributed equally: Yuri Kotliarov, Rachel Sparks.

Authors and Affiliations

NIH Center for Human Immunology, NIAID, NIH, Bethesda, MD, USA
Yuri Kotliarov, Angélique Biancotto, Jinguo Chen, Pamela L. Schwartzberg, Foo Cheung & John S. Tsang
Multiscale Systems Biology Section, Laboratory of Immune System Biology, NIAID, NIH, Bethesda, MD, USA
Rachel Sparks, Andrew J. Martins, Matthew P. Mulè, Yong Lu, Meghali Goswami, Neha Bansal, Candace C. Liu & John S. Tsang
NIH-Oxford-Cambridge Scholars Program; Cambridge Institute for Medical Research and Department of Medicine, University of Cambridge, Cambridge, UK
Matthew P. Mulè
B Cell Immunology Unit, Laboratory of Immunoregulation, NIAID, NIH, Bethesda, MD, USA
Lela Kardava & Susan Moir
Baylor Institute for Immunology Research, Dallas, TX, USA
Romain Banchereau & Virginia Pascual
Cell Signaling and Immunity Section, Laboratory of Immune System Biology, NIAID, NIH, Bethesda, MD, USA
Pamela L. Schwartzberg

Authors

Yuri Kotliarov
View author publications
You can also search for this author in PubMed Google Scholar
Rachel Sparks
View author publications
You can also search for this author in PubMed Google Scholar
Andrew J. Martins
View author publications
You can also search for this author in PubMed Google Scholar
Matthew P. Mulè
View author publications
You can also search for this author in PubMed Google Scholar
Yong Lu
View author publications
You can also search for this author in PubMed Google Scholar
Meghali Goswami
View author publications
You can also search for this author in PubMed Google Scholar
Lela Kardava
View author publications
You can also search for this author in PubMed Google Scholar
Romain Banchereau
View author publications
You can also search for this author in PubMed Google Scholar
Virginia Pascual
View author publications
You can also search for this author in PubMed Google Scholar
Angélique Biancotto
View author publications
You can also search for this author in PubMed Google Scholar
Jinguo Chen
View author publications
You can also search for this author in PubMed Google Scholar
Pamela L. Schwartzberg
View author publications
You can also search for this author in PubMed Google Scholar
Neha Bansal
View author publications
You can also search for this author in PubMed Google Scholar
Candace C. Liu
View author publications
You can also search for this author in PubMed Google Scholar
Foo Cheung
View author publications
You can also search for this author in PubMed Google Scholar
Susan Moir
View author publications
You can also search for this author in PubMed Google Scholar
John S. Tsang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.S.T conceived and designed the study; R.S. and Y.K. collected and evaluated public datasets; Y.K., Y.L., A.J.M, M.P.M. and J.S.T designed data analysis strategies with inputs from R.S.; Y.K., Y.L., M.P.M. and A.J.M. performed data analysis with help from R.S., C.C.L. and F.C.; J.C. and P.L.S. contributed samples for CITE-seq; A.J.M, M.P.M. and N.B. developed protocols and generated CITE-seq data; M.G. and A.J.M. analyzed flow cytometry data prepared by A.B.; R.B. and V.P. provided SLE data; L.K. and S.M. sorted cells from healthy donors for RNA-seq analysis performed by A.J.M.; J.S.T. and R.S. wrote the manuscript with contributions from Y.K.; A.J.M., M.P.M. and Y.L. are co-second authors and contributed equally; J.S.T. supervised the study; all authors approved the manuscript.

Corresponding author

Correspondence to John S. Tsang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Saheli Sadanand was the primary editor on this article and managed its editorial process in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Identification and characterization of the CD19+CD20+CD38++B cell population, a baseline, pre-vaccination cell frequency-based signature (CBSig) of antibody responses to influenza vaccination.

a, Flow cytometric gating strategy for the CD19+CD20+CD38++ B cell population. Populations 1–4 are further described in (Fig. 1b). b, Box plots (top) showing the frequency of CD19+CD20+CD38++ cells (CBSig; y axis) at the three baseline time points from ref. ¹² (days −7 and 0 are prior to vaccination and day 70 is after vaccination) in low and high responders (x axis) to the seasonal and pandemic H1N1 influenza vaccines as defined by the Adjusted Maximum Fold Change (adjMFC) metric (see ref. ¹²). There are 11 low and 12 high responders for day −7 and 0, and 10 low and 11 high responders for day 70. P values from the Wilcoxon one-tailed test results are shown on the box plots (based on results from ref. ¹² our hypothesis was that the high responders have higher frequencies of these cells than low responders). Boxplots’ center line corresponds to the median value, lower and upper hinges correspond to the first and third quartiles (the 25th and 75th percentiles); lower and upper whiskers extend from the box to the smallest or largest value correspondingly, but no further than 1.5x inter-quantile range. (Bottom) Corresponding receiver operator curves (ROC) for vaccine response at each of the above baseline time points and their AUC (area under the curve) and corresponding permutation-based one-tailed p value are shown. c, Dot plots (CD38 vs. CD10 of CD19+CD20+B cells) for example high and low responders. d, Glossary of major abbreviations used in the study.

Extended Data Fig. 2 Derivation of TGSig, a transcriptional surrogate signature for CBSig.

a, Step 1: Identification of genes with high temporal stability across the three baseline time points (days −7 and 0 are prior to vaccination and day 70 is after vaccination) in the NIH influenza study¹². The middle box shows the distribution of the temporal stability metric (TSM) across all the genes. The boxes on the right and left show examples of genes with high and low temporal stability, respectively; each line corresponds to an individual. Genes with high temporal stability (≥0.75) across the three baseline time points (depicted to the right of the red dashed line) were subsequently evaluated for correlation with CD19+CD20+CD38++ B cell frequency. b, Step 2: 726 temporally stable genes were ranked by their ‘robust’ correlation with CD19+CD20+CD38++ B cell frequency. Robustness is evaluated using all 231 random samplings of 20 subjects out of the cohort of 22 subjects (that is, two random subjects were dropped out from each sampling); the mean Spearman correlation coefficient divided by the standard deviation across the samplings (x axis left panel) was used to rank the genes. Top genes are shown together with the predictive performance of each gene evaluated at day 0 (AUC; right panel). The red dashed line in the right panel corresponds to AUC=0.50 (prediction performance as expected by chance); the top 10 genes were selected in TGSig (the black dashed line; see Supplementary Table 1 for full list of ranked genes) based on (c). c, Performance (AUC; y axis) of the gene signature by baseline time point (different lines) and number of top genes included in computing the signature score (x axis). The vertical dashed line corresponds to a gene signature (TGSig) containing the top 10 genes achieving the best AUC across all three time points. d, Schema for gene signature score calculation. Gene expression data is standardized through calculation of Z-scores for each gene (that is, each gene would have mean 0 and standard deviation 1). The Z-scores for each gene in the signature are then averaged to generate the gene signature score. e, Distribution of predictive performance (AUC; x axis) of 500 random top-10 gene signatures generated from subject-label shuffled gene expression data using the robust correlation metric approach described above. The dashed red line indicates the observed AUC of TGSig at each of the baseline time points. One-tailed empirical p-values are shown. f, Relative rank (rank position divided by the total number of genes) of the top 20 genes from (b) and Supplementary Table 1 at different TSM thresholds. The black line shows TSM cutoff=0.75, the value used in selecting the top 10 genes for inclusion in TGSig (boxed). g, Change in AUC (x axis) of TGSig score following removal of the indicated gene in the signature (y axis) at each of the three baseline time points in the NIH influenza study.

Extended Data Fig. 3 Comparison among response classes by including middle responders and evaluating TGSig in the influenza datasets from Emory 18.

a, similar to Fig. 1e, f but including middle responders (for day 0 n=11/19/13 for low/middle/high responders, respectively; for day -7 n = 10/22/14 and for day 70 n = 11/21/12); p values from the Jonckheere trend test (with an a priori alternative hypothesis that the high responders >= middle responders >= low responders.) For all box plots the center line corresponds to the median value, lower and upper hinges correspond to the first and third quartiles (the 25th and 75th percentiles); lower and upper whiskers extend from the box to the smallest or largest value correspondingly, but no further than 1.5x inter-quantile range. b, Similar to (a) but related to Fig. 2a; note that Yale 2012 does not have middle responders based on data retrieved from ImmuneSpace (n = 9/10/10 (Stanford 2008), 7/3/6 (Yale 2010), 7/0/8 (Yale 2012)). c, Similar to Fig. 2a but testing TGSig using influenza datasets from Emory University over four years (n = 14/8 low/high responders for year 2008, 8/8 for 2009, 11/11 for 2010, 12/10 for 2011). Box plots (top) showing the TGSig score (y axis) in low and high responders (x axis) as defined by adjMFC in the indicated season. P values shown on the box plots were obtained from the Wilcoxon one-tailed test. (bottom) Corresponding receiver operator curves (ROC) for vaccine response and the AUC (area under the curve) and corresponding permutation-based one-tailed p values are shown. d, Similar to (a) but for yellow fever and related to Fig. 2b.

Extended Data Fig. 4 Further evaluation of TGSig in yellow fever and influenza datasets.

a, (top) Box plots (top) showing the TGSig score applied to pre-vaccination PBMC expression data (y axis) between low and high responders (x axis) to yellow fever vaccine in trial #2 (x axis; 4 high vs. 3 low responders). (bottom) Corresponding receiver operator curves (ROC) for vaccine response and the AUC (area under the curve) and corresponding one-tailed permutation-based p value are shown. This vaccination cohort included 10 subjects (see Fig. 2b for results on a first, larger trial with 15 subjects). Boxplots’ center line corresponds to the median value, lower and upper hinges correspond to the first and third quartiles (the 25th and 75th percentiles); lower and upper whiskers extend from the box to the smallest or largest value correspondingly, but no further than 1.5x inter-quantile range. P-value is from Wilcoxon one-tailed test. b, Forest plot showing the meta-effect sizes (Hedge’s g reflecting correlation strength with adjMFC) of the TGSig genes from a meta-analysis of four influenza datasets (Stanford 2008, NIH 2009, Yale 2011, Yale 2012; genes in gray were not present in all datasets); the bars represent the 95% CI. c, Similar to (b) but for yellow fever vaccination (computed from trial #1).

Extended Data Fig. 5 SLE patient phenotypes based on DA-associated transcriptional signatures and assessing the association between TGSig (at low DA) and DaCP after the removal of patients with lower DA-associated plasmablast signature scores.

a, Recreation of Figure 6a from ref. ¹³ but as a robustness check here we used a different method: for each patient we ranked genes based on the difference between their average expression at high/middle and low DA time points and then performed GSEA analysis using the blood transcriptomic modules (columns) to fingerprint the patient. The colors on the heatmap denote the statistical significance (-log10 of BH-adjusted p-values from CERNO³⁹) of the gene set enrichment test. Here we kept the order of patients (rows) and modules (columns) the same as in the original heatmap, although some patients from the original heatmap were removed due to the absence of low DA time points. The original patient groups were used to annotate the patients (rows). The overall fingerprinting pattern is visually highly consistent with the original heatmap. This heatmap/matrix was then used to construct Fig. 2c by averaging individual patient values within each patient and phenotype group combination. b, DaCP versus average plasmablast score difference between high (or middle if no high was available) and low DA time points in SLE patients from patient groups 2, 3 (in blue, n = 19) or groups 2, 3, and 4 (group 4 in grey, n = 12) from ref. ¹³ (see patient groups in Fig. 2c). Pearson correlation coefficients and two-tailed P values are shown. Spearman rho for groups 2 and 3 is 0.87 (two-tailed p = 2.2 × 10⁻¹⁶); and for groups 2, 3 and 4 is 0.85 (two-tailed p = 4.9 × 10⁻⁷). c, Similar to Fig. 2e, but here patients are from patient groups 2 and 3 only (n = 22) and are shaded based on their DaCP. Note that there are more patients here for analysis from groups 2 and 3 than those shown in (a) and (b) because there we required that every patient has at least one low DA and one high/mid DA sample, while here (and in Fig. 2) the DaCP was estimated using all patients with at least one sample including those without high/mid DA time points (see Methods). Pearson correlation coefficient and two-tailed p values are shown. Spearman correlation coefficient is 0.47 (p = 0.029 two-tailed). d, Evaluating the correlation between DaCP and mean TGSig score at low DA time points (as in (c)) by removing patients with DaCP below the indicated cutoff (y axis). The first panel shows the number of patients in the evaluation given the threshold; the second and third panels show the corresponding Pearson r and two-tailed p value (shown as -log10(p)).

Extended Data Fig. 6 Evaluating the predictive capacity and information overlap among the TGSig, SLE-Sig, and IFN-I-DCact signatures.

a–d, The predictive profile of SLE-Sig (Fig. 3e) (a), TGSig (b), IFN-I-DCact (Fig. 3f) (c), and the non-leading edge genes from the brown module (Fig. 3e) (d) used as the sole predictor in logistic regression models of high versus low influenza vaccination responder status. Influenza vaccination data pooled from four datasets (Stanford 2008, NIH 2009, Yale 2011, Yale 2012) were used (n = 71 high and low responders). Note that for the brown module (Fig. 3a–d), most of the predictive information come from the leading-edge genes since the signature score of genes outside of the leading edge is not predictive (shown in (d)). e, f when both TGSig and SLE-Sig were used as predictors in the logistic regression (e) or when both TGSig and IFN-I-DCact score were used as predictors (f). In these graphs, the predictor scores are shown on the x axis and the probability that a high responder falls within the predictor score bin is shown on the y axis. The error bars correspond to 95% CIs. The two-tailed p value indicates the probability that the coefficient (“effect”) of the term (for example, TGSig score) in the logistic regression is 0.

Extended Data Fig. 7 RNA-seq analysis of CD19+CD20+CD38++ B cells sorted from healthy individuals.

a, Sorting strategy and approach: CD19+CD20+and CD20+CD38++ B cell populations were isolated by FACS from peripheral blood samples of six healthy donors. RNA-seq libraries were prepared for each isolated sample using the Nugen Ovation SoLo low-input RNA-seq library preparation kit. b, Differential gene expression between the CD20+CD38++ B cells and parental CD19+CD20+B cells from six paired samples was compared using DESeq2. Plot shows the log2 fold change versus the log2 of the mean normalized counts across all samples for each gene. TGSig genes are shown in red; genes from the differentially expressed gene set (BH-adjusted two-tailed p-value <1%, log of mean normalized counts >1; total genes in set: 105) that fall into the top enriched Gene Ontology Biological Processes category “Cell Activation” (enrichment analysis done using ToppGene⁷⁰) are shown in cyan (21 genes). c, Enrichment of the 87 SLE-Sig genes in genes ranked by differential expression between CD19+CD20+CD38++ versus CD19+CD20+cells. The p value shown was computed from the GSEA test. d, Similar to (c) but instead of the SLE-Sig genes here the top k (k = 10 (TGSig), 30, 50) genes correlated with the frequency of CD20+CD38++ B cells is assessed (only 713 temporally stable genes with TSM≥ 0.75 were included in the analysis); also see Extended Data Fig. 2. e, Similar to (d) but using 7731 genes with TSM ≥ 0.5. A lower/more relaxed TSM cutoff was used to evaluate whether by starting with more genes (therefore potentially more statistical power for enrichment analysis) an enrichment signal can be detected.

Extended Data Fig. 8 Supporting data for CITE-seq single cell analysis to dissect the cellular origin of baseline signatures.

a, Single cell scatterplots of key markers in the CD4+T cell clusters. C0.0.0 (Naïve), C1.0.0 (Central/Transitional Memory), and C1.1.0 (TEMRA/Effector memory) clusters show different distributions of CD62L, CD45RA, CD27, and CD28. b, Distributions of key markers in the CD8+Memory T cell clusters. CD45RA vs. CD62L expression are shown in the top panels, ridge plots of CD45RO and CD28 are shown in lower panels. Similar to CD4+cells, these CD8+clusters show differences in CD62L and CD45RA distributions; C4.0.1 (TEMRA/Effector Memory) are mainly CD62L negative and CD28 negative, with CD45RA+(TEMRA) and CD45RO+(Effector Memory) subsets within this cluster. C4.0.0 and C4.0.3 show highly similar protein expression, with C4.0.3 being defined by high CD103 expression (upper right panel). c, Unconventional T cells (C7 clusters) show variable CD161, CD8, and CD56 expression. C7.0.0 and C7.0.1 are both CD3+/CD161+, with C7.0.1 being CD8 positive while C7.0.0 is CD8 negative. C4.0.2 (NKT-like) are also CD3 positive, but express CD57/56 and are CD8/CD161 negative or low. The C4.0.2 cluster also showed distinctly low CD27 and high in CD45RA compared to the C7 clusters, making it more similar to the C4.0.1 TEMRA/Effector memory CD8+subset, except expressing CD56/CD57; this could also be consistent with a Terminal Effector phenotype. d, Hand gating strategy for CD20+CD38++ B cells using CITE-seq data (number of cells in the gate shown in red). e, Boxplot comparing the hand gated frequency of CD20+CD38 ++ B cells between 10 high and 10 low responders; p value from Wilcoxon one-tailed test. Boxplots’ center line corresponds to the median value, lower and upper hinges correspond to the first and third quartiles (the 25th and 75th percentiles); lower and upper whiskers extend from the box to the smallest or largest value correspondingly, but no further than 1.5x inter-quantile range. f, Similar to Fig. 5b (10 high versus 10 low responders) but for an independently obtained Type I IFN signature gene set (Supplementary Table 6: IFN gene set; see Methods). One or two asterisks denote significance with p < 0.05 or p < 0.01, respectively (Wilcoxon one-tailed test because we are interested in assessing whether the high responders are higher than the low responders; see also Supplementary Table 7). g, Results of the “drop out” analysis using CITE-seq data. The goal was to assess which cell clusters (or combination of clusters) that were individually significant delineators of high versus low responders were essential for prediction using the baseline signatures in bulk (i.e., simulating transcriptional data from PBMCs). The “pseudo bulk” results (average across all single cells for every subject in 10 low and 10 high responders) of the three signatures tested are shown on the first row. For subsequent rows cells from the indicated cell cluster(s) were dropped before repeating the pseudo-bulk analysis as in row 1. P values obtained from one-tailed Wilcoxon test: *: p < 0.05; **: p < 0.01.

Extended Data Fig. 9 Evaluating cytomegalovirus (CMV) correlates and pDC surface expression phenotypes strongly influenced by genetics from ref. 35 in high versus low responders.

Since CMV status is not available for the cohorts we evaluated, we evaluated whether CMV correlates are significantly different between high and low responders in the NIH influenza vaccination cohort. a, Boxplots comparing the frequency of CD4+ TEMRA cells between 10 low and 10 high responders using CITE-seq (left panel) and between 9 low and 8 high responders using flow cytometry (center panel) data. Wilcoxon two-tailed p values are shown. The third panel is a scatter plot of CITE-seq versus the flow cytometry cell frequencies (n = 17). Pearson correlation and two-tailed p values are shown. b, Same as (a) but for CD8+ TEMRA cells. c, Boxplots comparing the relative surface protein expression of CD86 and HLA-DR in pDCs (cluster C9) between 10 low and 10 high responders using CITE-seq data. Wilcoxon two-tailed p values are shown. For all box plots the center line corresponds to the median value, lower and upper hinges correspond to the first and third quartiles (the 25th and 75th percentiles); lower and upper whiskers extend from the box to the smallest or largest value correspondingly, but no further than 1.5x inter-quantile range.

Extended Data Fig. 10 Relationship between sex and baseline signatures.

a, Box plots comparing the TGSig and SLE-Sig scores in females versus males; here only subjects with CITE-seq data are included to indicate that sex was not a driver of the differences between 10 high and 10 low responders emerged from CITE-seq data analysis (see Fig. 4). Wilcoxon two-tailed p values comparing 11 females and 9 males are shown. For all box plots the center line corresponds to the median value, lower and upper hinges correspond to the first and third quartiles (the 25th and 75th percentiles); lower and upper whiskers extend from the box to the smallest or largest value correspondingly, but no further than 1.5x inter-quantile range. b, Same as (a) but including all high and low responders from original NIH study¹². (day 0: 12 females and 12 males; day -7: 13 females and 11 males; day 70: 11 females and 12 males) c, same as (b) but including middle responders (i.e., all subjects in the study: day 0: 27 females and 16 males; day -7: 30 females and 16 males; day 70: 27 females and 17 males). d, Box plots comparing TGSig scores among low, middle, and high responders in males only (all subjects in the original NIH study¹² used: day 0: 6/4/6 for low/middle/high responders, respectively; day -7: 5/5/6; day 70: 6/5/6). e, same as (d) but in females only (day 0: 5/15/17 for low/middle/high responders, respectively; day -7: 5/17/8; day 70: 5/16/6). All p values shown for two-group comparison were from the Wilcoxon two-tailed test; one-tailed p values shown for three-group comparison were from the Jonckheere trend test (with an a priori alternative hypothesis that the high responders >= middle responders >=low responders).

Supplementary information

Supplementary Information

Supplementary Figs. 1 and 2.

Reporting Summary

Supplementary Tables

Supplementary Tables 1–10.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kotliarov, Y., Sparks, R., Martins, A.J. et al. Broad immune activation underlies shared set point signatures for vaccine responsiveness in healthy individuals and disease activity in patients with lupus. Nat Med 26, 618–629 (2020). https://doi.org/10.1038/s41591-020-0769-8

Download citation

Received: 08 November 2019
Accepted: 14 January 2020
Published: 24 February 2020
Issue Date: April 2020
DOI: https://doi.org/10.1038/s41591-020-0769-8

This article is cited by

Immunological factors linked to geographical variation in vaccine responses
- Marloes M. A. R. van Dorst
- Jeremia J. Pyuza
- Maria Yazdanbakhsh
Nature Reviews Immunology (2024)
SampleQC: robust multivariate, multi-cell type, multi-sample quality control for single-cell data
- Will Macnair
- Mark Robinson
Genome Biology (2023)
Long-term in vivo vitamin D3 supplementation modulates bovine IL-1 and chemokine responses
- Cian Reid
- Susana Flores-Villalva
- Kieran G. Meade
Scientific Reports (2023)
Influenza vaccination reveals sex dimorphic imprints of prior mild COVID-19
- Rachel Sparks
- William W. Lau
- John S. Tsang
Nature (2023)
Adaptive immune responses to SARS-CoV-2 persist in the pharyngeal lymphoid tissue of children
- Qin Xu
- Pedro Milanez-Almeida
- Kalpana Manthiram
Nature Immunology (2023)