Bayesian cluster identification in single-molecule localization microscopy data

Rubin-Delanchy, Patrick; Burn, Garth L; Griffié, Juliette; Williamson, David J; Heard, Nicholas A; Cope, Andrew P; Owen, Dylan M

doi:10.1038/nmeth.3612

Article
Published: 05 October 2015

Bayesian cluster identification in single-molecule localization microscopy data

Patrick Rubin-Delanchy¹,
Garth L Burn²,
Juliette Griffié²,
David J Williamson³,
Nicholas A Heard⁴,
Andrew P Cope⁵ &
…
Dylan M Owen²

Nature Methods volume 12, pages 1072–1076 (2015)Cite this article

8773 Accesses
98 Citations
12 Altmetric
Metrics details

Subjects

Abstract

Single-molecule localization-based super-resolution microscopy techniques such as photoactivated localization microscopy (PALM) and stochastic optical reconstruction microscopy (STORM) produce pointillist data sets of molecular coordinates. Although many algorithms exist for the identification and localization of molecules from raw image data, methods for analyzing the resulting point patterns for properties such as clustering have remained relatively under-studied. Here we present a model-based Bayesian approach to evaluate molecular cluster assignment proposals, generated in this study by analysis based on Ripley's K function. The method takes full account of the individual localization precisions calculated for each emitter. We validate the approach using simulated data, as well as experimental data on the clustering behavior of CD3ζ, a subunit of the CD3 T cell receptor complex, in resting and activated primary human T cells.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Workflow of the algorithm.**

**Figure 2: Four different clustering scenarios.**

**Figure 3: Comparison of the clustering behavior of CD3ζ-mEos3.2 in primary human T cells resting on poly-L-lysine (PLL) or forming synapses (activated).**

Digital colloid-enhanced Raman spectroscopy by single-molecule counting

Article 17 April 2024

Pooled multicolour tagging for visualizing subcellular protein dynamics

Article Open access 19 April 2024

Three million images and morphological profiles of cells treated with matched chemical and genetic perturbations

Article Open access 09 April 2024

References

Huang, B. Super-resolution optical microscopy: multiple choices. Curr. Opin. Chem. Biol. 14, 10–14 (2010).
Article CAS Google Scholar
Hell, S.W. & Wichmann, J. Breaking the diffraction resolution limit by stimulated emission: stimulated-emission-depletion fluorescence microscopy. Opt. Lett. 19, 780–782 (1994).
Article CAS Google Scholar
Chmyrov, A. et al. Nanoscopy with more than 100,000 'doughnuts'. Nat. Methods 10, 737–740 (2013).
Article CAS Google Scholar
Betzig, E. et al. Imaging intracellular fluorescent proteins at nanometer resolution. Science 313, 1642–1645 (2006).
Article CAS Google Scholar
Rust, M.J., Bates, M. & Zhuang, X. Sub-diffraction-limit imaging by stochastic optical reconstruction microscopy (STORM). Nat. Methods 3, 793–795 (2006).
Article CAS Google Scholar
Heilemann, M. et al. Subdiffraction-resolution fluorescence imaging with conventional fluorescent probes. Angew. Chem. Int. Ed. Engl. 47, 6172–6176 (2008).
Article CAS Google Scholar
Hess, S.T., Girirajan, T.P.K. & Mason, M.D. Ultra-high resolution imaging by fluorescence photoactivation localization microscopy. Biophys. J. 91, 4258–4272 (2006).
Article CAS Google Scholar
Wolter, S. et al. rapidSTORM: accurate, fast open-source software for localization microscopy. Nat. Methods 9, 1040–1041 (2012).
Article CAS Google Scholar
Holden, S.J., Uphoff, S. & Kapanidis, A.N. DAOSTORM: an algorithm for high-density super-resolution microscopy. Nat. Methods 8, 279–280 (2011).
Article CAS Google Scholar
Henriques, R. et al. QuickPALM: 3D real-time photoactivation nanoscopy image processing in ImageJ. Nat. Methods 7, 339–340 (2010).
Article CAS Google Scholar
van de Linde, S. et al. Direct stochastic optical reconstruction microscopy with standard fluorescent probes. Nat. Protoc. 6, 991–1009 (2011).
Article CAS Google Scholar
Heilemann, M., van de Linde, S., Mukherjee, A. & Sauer, M. Super-resolution imaging with small organic fluorophores. Angew. Chem. Int. Ed. Engl. 48, 6903–6908 (2009).
Article CAS Google Scholar
Dempsey, G.T. et al. Photoswitching mechanism of cyanine dyes. J. Am. Chem. Soc. 131, 18192–18193 (2009).
Article CAS Google Scholar
Williamson, D.J. et al. Pre-existing clusters of the adaptor Lat do not participate in early T cell signaling events. Nat. Immunol. 12, 655–662 (2011).
Article CAS Google Scholar
Rossy, J., Owen, D.M., Williamson, D.J., Yang, Z. & Gaus, K. Conformational states of the kinase Lck regulate clustering in early T cell signaling. Nat. Immunol. 14, 82–89 (2013).
Article CAS Google Scholar
Ripley, B.D. Modelling spatial patterns. J. R. Stat. Soc. Series B Stat. Methodol. 39, 172–192 (1977).
Google Scholar
Sengupta, P. et al. Probing protein heterogeneity in the plasma membrane using PALM and pair correlation analysis. Nat. Methods 8, 969–975 (2011).
Article CAS Google Scholar
Veatch, S.L. et al. Correlation functions quantify super-resolution images and estimate apparent clustering due to over-counting. PLoS ONE 7, e31457 (2012).
Article CAS Google Scholar
Owen, D.M. et al. PALM imaging and cluster analysis of protein heterogeneity at the cell surface. J. Biophotonics 3, 446–454 (2010).
Article CAS Google Scholar
Sherman, E. et al. Functional nanoscale organization of signaling molecules downstream of the T cell antigen receptor. Immunity 35, 705–720 (2011).
Article CAS Google Scholar
Lillemeier, B.F. et al. TCR and Lat are expressed on separate protein islands on T cell membranes and concatenate during activation. Nat. Immunol. 11, 90–96 (2010).
Article CAS Google Scholar
Annibale, P., Vanni, S., Scarselli, M., Rothlisberger, U. & Radenovic, A. Identification of clustering artifacts in photoactivated localization microscopy. Nat. Methods 8, 527–528 (2011).
Article CAS Google Scholar
Annibale, P., Vanni, S., Scarselli, M., Rothlisberger, U. & Radenovic, A. Quantitative photo activated localization microscopy: unraveling the effects of photoblinking. PLoS ONE 6, e22678 (2011).
Article CAS Google Scholar
Ovesný, M., Krř ížek, P., Borkovec, J., Švindrych, Z. & Hagen, G.M. ThunderSTORM: a comprehensive ImageJ plug-in for PALM and STORM data analysis and super-resolution imaging. Bioinformatics 30, 2389–2390 (2014).
Article Google Scholar
Quan, T., Zeng, S. & Huang, Z.-L. Localization capability and limitation of electron-multiplying charge-coupled, scientific complementary metal-oxide semiconductor, and charge-coupled devices for superresolution imaging. J. Biomed. Opt. 15, 066005 (2010).
Article Google Scholar
Ferguson, T.S. A Bayesian analysis of some nonparametric problems. Ann. Stat. 1, 209–230 (1973).
Article Google Scholar
Getis, A. & Franklin, J. Second-order neighborhood analysis of mapped point patterns. Ecology 68, 473–477 (1987).
Article Google Scholar
Lloyd, S. Least squares quantization in PCM. IEEE Trans. Inf. Theory 28, 129–137 (2006).
Article Google Scholar
Hinneburg, A. & Gabriel, H.-H. in Advances in Intelligent Data Analysis VII (eds. Berthold, M.R., Shawe-Taylor, J. & Lavrač, N.) 70–80 (Springer, 2007).
Johnson, S.C. Hierarchical clustering schemes. Psychometrika 32, 241–254 (1967).
Article CAS Google Scholar
Ester, M., Kriegel, H.-P., Sander, J. & Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. KDD-96 226–231 (1996).
Neve-Oz, Y., Razvag, Y., Sajman, J. & Sherman, E. Mechanisms of localized activation of the T cell antigen receptor inside clusters. Biochim. Biophys. Acta 1853, 810–821 (2015).
Article CAS Google Scholar
Cox, S. et al. Bayesian localization microscopy reveals nanoscale podosome dynamics. Nat. Methods 9, 195–200 (2012).
Article CAS Google Scholar
Lee, S.-H., Shin, J.Y., Lee, A. & Bustamante, C. Counting single photoactivatable fluorescent molecules by photoactivated localization microscopy (PALM). Proc. Natl. Acad. Sci. USA 109, 17436–17441 (2012).
Article CAS Google Scholar
Gandy, A. Sequential implementation of Monte Carlo tests with uniformly bounded resampling risk. J. Am. Stat. Assoc. 104, 1504–1511 (2009).
Article Google Scholar
Gandy, A. & Rubin-Delanchy, P. An algorithm to compute the power of Monte Carlo tests with guaranteed precision. Ann. Stat. 41, 125–142 (2013).
Article Google Scholar
Green, P.J. & Richardson, S. Modelling heterogeneity with and without the Dirichlet process. Scand. J. Stat. 28, 355–375 (2001).
Article Google Scholar

Download references

Acknowledgements

D.M.O. acknowledges funding from the European Research Council (FP7 starter grant 337187) and Marie Curie Career Integration grant 334303. A.P.C. is funded by Arthritis Research UK grants 19652 and 20525.

Author information

Authors and Affiliations

School of Mathematics, Heilbronn Institute for Mathematical Research, University of Bristol, Bristol, UK
Patrick Rubin-Delanchy
Department of Physics and Randall Division of Cell and Molecular Biophysics, King's College London, London, UK
Garth L Burn, Juliette Griffié & Dylan M Owen
Manchester Collaborative Centre for Inflammation Research, University of Manchester, Manchester, UK
David J Williamson
Department of Mathematics, Imperial College London, London, UK
Nicholas A Heard
Division of Immunology, Academic Department of Rheumatology, Infection and Inflammatory Disease, King's College London, London, UK
Andrew P Cope

Authors

Patrick Rubin-Delanchy
View author publications
You can also search for this author in PubMed Google Scholar
Garth L Burn
View author publications
You can also search for this author in PubMed Google Scholar
Juliette Griffié
View author publications
You can also search for this author in PubMed Google Scholar
David J Williamson
View author publications
You can also search for this author in PubMed Google Scholar
Nicholas A Heard
View author publications
You can also search for this author in PubMed Google Scholar
Andrew P Cope
View author publications
You can also search for this author in PubMed Google Scholar
Dylan M Owen
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

P.R.-D., N.A.H. and D.M.O. conceived the method. P.R.-D., J.G. and D.M.O. performed the analysis. P.R.-D. and D.M.O. wrote the manuscript. G.L.B. acquired cell data. G.L.B., D.J.W. and A.P.C. provided materials.

Corresponding authors

Correspondence to Patrick Rubin-Delanchy or Dylan M Owen.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Performance analysis under four different clustering scenarios.

Performance analysis under four different clustering scenarios. i) Standard Conditions, ii) a sparse data set with only 10% as many localisations, iii) clusters which are twice as large and iv) only 10 localisations per cluster and 90% of localisations in the background. a) Histograms of the number of clusters per region. b) Histograms of the cluster radii. c) Histograms of the number of localisations per cluster and d) Histograms of the percentage of localisations found in clusters. In all cases the blue dashed line represents the true value. Histograms are calculated from 100 simulated data sets for each scenario.

Supplementary Figure 2 Estimation of cluster descriptors as simulation parameters vary (n = 100 per point).

Estimation of cluster descriptors as simulation parameters vary (n = 100 per point). i) Measured localisations per cluster. ii) Measured cluster radii. iii) Measured percentage of localisations in clusters and iv) measured number of clusters per region. a) Simulated number of localisations per cluster, b) simulated cluster radii and c) simulated fraction of background localisations. Blue dashed lines represent simulated values.

Supplementary Figure 3 Comparison of our algorithm with DBSCAN.

Comparison of our algorithm with DBSCAN. a-d) Comparison of the proposal generating algorithm (I) with DBSCAN (II) when each method is allowed to optimise its analysis parameters based on our Bayesian scoring mechanism, run on simulated data in the Standard Conditions (n = 100). a) Number of clusters per region, b) percentage of localisations in clusters, c) number of localisations per cluster and d) cluster radii. e-h) Histograms of key cluster descriptors generated by DBSCAN with fixed r = 50 nm and T = 78 from simulated data in the Standard Conditions (n = 100). e) Number of clusters per region, f) number of localisations per cluster, g) percentage of localisations in clusters and h) cluster radii. Blue dashed lines represent simulated values.

Supplementary Figure 4 Analysis of simulated data from the standard and sparse conditions (n = 100) by alternative clustering techniques.

Analysis of simulated data from the Standard (a-f) and sparse (g-l) conditions (n = 100) by alternative clustering techniques. a and g) Cluster heat maps generated from Getis and Franklin’s Local Point Pattern Analysis. b and h) Binary maps generated from these heat maps. c and i) Number of clusters per region extracted from the binary maps. d and j) Percentage of localisations in clusters. Blue dashed lines represent simulated values. e and k) Ripley’s K function curves for example data sets together with 99% confidence intervals (C.I.) generated by simulating 100 CSR datasets. f and i) Pair Correlation curves for example data sets.

Supplementary Figure 5 Analysis of simulated data with 100 nm clusters and data with 90% of localizations in the background (n = 100) by alternative clustering techniques.

Analysis of simulated data with 100 nm clusters (a-f) and data with 90% of localisations in the background (g-l) (n = 100) by alternative clustering techniques. a and g) Cluster heat maps generated from Getis and Franklin’s Local Point Pattern Analysis. b and h) Binary maps generated from these heat maps. c and i) Number of clusters per region extracted from the binary maps. d and j) Percentage of localisations in clusters. Blue dashed lines represent simulated values. e and k) Ripley’s K function curves for example data sets together with 99% confidence intervals (C.I.) generated by simulating 100 CSR datasets. f and i) Pair Correlation curves for example data sets.

Supplementary Figure 6 Performance of the algorithm on data sets (n = 100) with an uneven background.

Performance of the algorithm on data sets (n = 100) with an uneven background, following a Beta(2,2) (i) or a Beta(5,1) (ii) distribution. a) Representative data. b) Log(posterior probability) heat maps. c) The highest scoring cluster proposal. d) Histograms of cluster radii. e) Histograms of the number of localisations per cluster. f) Histograms of the number of clusters. g) Histograms of the percentage of localisations found in clusters. Blue dashed lines represent true values.

Supplementary Figure 7 Analysis of simulated data with an uneven background by alternative clustering techniques.

Analysis of simulated data with an uneven background, following a Beta(2,2) (a-f) or a Beta(5,1) (g-l) distribution (n = 100) by alternative clustering techniques. a and g) Cluster heat maps generated from Getis and Franklin’s Local Point Pattern Analysis. b and h) Binary maps generated from these heat maps. c and i) Number of clusters per region extracted from the binary maps. d and j) Percentage of localisations in clusters. Blue dashed lines represent simulated values. e and k) Ripley’s K function curves for example data sets together with 99% confidence intervals (C.I.) generated by simulating 100 CSR datasets. f and i) Pair Correlation curves for example data sets.

Supplementary Figure 8 Histograms of the measured number of localizations per cluster and cluster radii for simulated dimers, trimers and hexamers.

Histograms of the measured number of localisations per cluster and cluster radii for simulated dimers, trimers and hexamers in a dense or sparse distribution (n = 100 per condition). These were simulated by generating localisations with identical coordinates which were then independently scrambled according to each point’s localisation precision. The detection problem becomes harder as the overall density increases. We therefore performed the analysis at two different densities (2000 and 200 overall localisations).

Supplementary Figure 9 Performance of the algorithm on data sets (n = 100) with different cluster sizes within the same ROI.

Performance of the algorithm on data sets (n = 100) with different cluster sizes within the same ROI, 5x 50 nm and 5x 100 nm (i), 5x 10 nm and 5x 100 nm (ii) and a range of cluster sizes between 10 and 100 nm (iii). a) Representative data. b) Log(posterior probability) heat maps. c) The highest scoring cluster proposal and d) histograms of the measured cluster radii. Blue dashed lines represent simulated values.

Supplementary Figure 10 Side by side comparison of methods applied to three simulated conditions.

Side by side comparison of our method (I), Getis’s method (II) and DBSCAN (III) applied to three simulated conditions (n = 100 datasets each). The conditions are Standard Conditions (representative dataset shown in Fig. 2a i), Standard Conditions but with larger clusters (representative dataset shown in Fig. 2a iii) and Standard Conditions with uneven background (representative dataset shown in Supplementary Fig. 6a ii).

Supplementary Figure 11 Sensitivity of the measured clustering descriptors to the prior settings.

Sensitivity of the measured clustering descriptors (percentage of localisations in clusters, cluster radii, number of clusters per ROI and the number of localisations per cluster) to the prior settings. a) Illustration of the sensitivity of measured cluster descriptors to varying the Dirichlet process concentration coefficient, α. b) Sensitivity to the prior probability of any localisation being allocated to the background. c) Sensitivity to the prior distribution on the cluster radius. Two possible distributions are considered, one taken from experimental data (i) and a flat distribution between zero and half the size of the ROI (ii).

Supplementary Figure 12 Data preprocessing steps.

Data preprocessing steps. a) Determination of the optimal merge time. Following the method of Annibale et al, we plot the total number of localisations in a representative image against the merge time for CD3ζ -mEos3 in primary T cells. The average optimum merge time was found to be three frames (30 ms). This example is representative of four such plots. b and c) Representative histogram of the localisation precisions calculated for CD3ζ data in resting T cells (b) and in activated cells (c), using the method of Quan et al.

Supplementary Figure 13 Bayesian analysis of CD3ζ clustering in activated primary human T cells with localizations from the same ROI divided into two equally sized data sets.

Bayesian analysis of CD3ζ clustering in activated primary human T cells with localisations from the same ROI divided into two equally-sized data sets. a) and b) Highest scoring cluster proposals. c) and d) Log(posterior probability) heat maps.

Supplementary Figure 14 Data on CD3ζ clustering analysed by alternative approaches in resting and activated primary human T cells.

Data on CD3ζ clustering analysed by alternative approaches in resting and activated primary human T cells. a) Representative cluster maps (n = 30) generated by Getis and Franklin’s Local Point Pattern Analysis of a 3000 x 3000 nm area. b) Representative binary maps showing clustered areas. c) Ripley’s K function (average of n = 30 regions). d) Pair correlation curves (average of n = 30 regions) and e) Cluster statistics on the percentage of localisations found in clusters, number of clusters per region and cluster radii extracted from the binary maps in b).

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–14 (PDF 5755 kb)

Supplementary Software

R code for running Bayesian cluster analysis (ZIP 103 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rubin-Delanchy, P., Burn, G., Griffié, J. et al. Bayesian cluster identification in single-molecule localization microscopy data. Nat Methods 12, 1072–1076 (2015). https://doi.org/10.1038/nmeth.3612

Download citation

Received: 05 March 2015
Accepted: 02 September 2015
Published: 05 October 2015
Issue Date: November 2015
DOI: https://doi.org/10.1038/nmeth.3612

This article is cited by

A framework for evaluating the performance of SMLM cluster analysis algorithms
- Daniel J. Nieves
- Jeremy A. Pike
- Dylan M. Owen
Nature Methods (2023)
Multicolor lifetime imaging and its application to HIV-1 uptake
- Tobias Starling
- Irene Carlon-Andres
- Sergi Padilla-Parra
Nature Communications (2023)
Acute Transcriptomic and Epigenetic Alterations at T12 After Rat T10 Spinal Cord Contusive Injury
- Junkai Xie
- Seth Herr
- Chongli Yuan
Molecular Neurobiology (2023)
In-situ study of the impact of temperature and architecture on the interfacial structure of microgels
- Steffen Bochenek
- Fabrizio Camerin
- Andrea Scotti
Nature Communications (2022)
Correction of multiple-blinking artifacts in photoactivated localization microscopy
- Louis G. Jensen
- Tjun Yee Hoh
- Dylan M. Owen
Nature Methods (2022)