Abstract
Single-molecule localization-based super-resolution microscopy techniques such as photoactivated localization microscopy (PALM) and stochastic optical reconstruction microscopy (STORM) produce pointillist data sets of molecular coordinates. Although many algorithms exist for the identification and localization of molecules from raw image data, methods for analyzing the resulting point patterns for properties such as clustering have remained relatively under-studied. Here we present a model-based Bayesian approach to evaluate molecular cluster assignment proposals, generated in this study by analysis based on Ripley's K function. The method takes full account of the individual localization precisions calculated for each emitter. We validate the approach using simulated data, as well as experimental data on the clustering behavior of CD3ζ, a subunit of the CD3 T cell receptor complex, in resting and activated primary human T cells.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Huang, B. Super-resolution optical microscopy: multiple choices. Curr. Opin. Chem. Biol. 14, 10–14 (2010).
Hell, S.W. & Wichmann, J. Breaking the diffraction resolution limit by stimulated emission: stimulated-emission-depletion fluorescence microscopy. Opt. Lett. 19, 780–782 (1994).
Chmyrov, A. et al. Nanoscopy with more than 100,000 'doughnuts'. Nat. Methods 10, 737–740 (2013).
Betzig, E. et al. Imaging intracellular fluorescent proteins at nanometer resolution. Science 313, 1642–1645 (2006).
Rust, M.J., Bates, M. & Zhuang, X. Sub-diffraction-limit imaging by stochastic optical reconstruction microscopy (STORM). Nat. Methods 3, 793–795 (2006).
Heilemann, M. et al. Subdiffraction-resolution fluorescence imaging with conventional fluorescent probes. Angew. Chem. Int. Ed. Engl. 47, 6172–6176 (2008).
Hess, S.T., Girirajan, T.P.K. & Mason, M.D. Ultra-high resolution imaging by fluorescence photoactivation localization microscopy. Biophys. J. 91, 4258–4272 (2006).
Wolter, S. et al. rapidSTORM: accurate, fast open-source software for localization microscopy. Nat. Methods 9, 1040–1041 (2012).
Holden, S.J., Uphoff, S. & Kapanidis, A.N. DAOSTORM: an algorithm for high-density super-resolution microscopy. Nat. Methods 8, 279–280 (2011).
Henriques, R. et al. QuickPALM: 3D real-time photoactivation nanoscopy image processing in ImageJ. Nat. Methods 7, 339–340 (2010).
van de Linde, S. et al. Direct stochastic optical reconstruction microscopy with standard fluorescent probes. Nat. Protoc. 6, 991–1009 (2011).
Heilemann, M., van de Linde, S., Mukherjee, A. & Sauer, M. Super-resolution imaging with small organic fluorophores. Angew. Chem. Int. Ed. Engl. 48, 6903–6908 (2009).
Dempsey, G.T. et al. Photoswitching mechanism of cyanine dyes. J. Am. Chem. Soc. 131, 18192–18193 (2009).
Williamson, D.J. et al. Pre-existing clusters of the adaptor Lat do not participate in early T cell signaling events. Nat. Immunol. 12, 655–662 (2011).
Rossy, J., Owen, D.M., Williamson, D.J., Yang, Z. & Gaus, K. Conformational states of the kinase Lck regulate clustering in early T cell signaling. Nat. Immunol. 14, 82–89 (2013).
Ripley, B.D. Modelling spatial patterns. J. R. Stat. Soc. Series B Stat. Methodol. 39, 172–192 (1977).
Sengupta, P. et al. Probing protein heterogeneity in the plasma membrane using PALM and pair correlation analysis. Nat. Methods 8, 969–975 (2011).
Veatch, S.L. et al. Correlation functions quantify super-resolution images and estimate apparent clustering due to over-counting. PLoS ONE 7, e31457 (2012).
Owen, D.M. et al. PALM imaging and cluster analysis of protein heterogeneity at the cell surface. J. Biophotonics 3, 446–454 (2010).
Sherman, E. et al. Functional nanoscale organization of signaling molecules downstream of the T cell antigen receptor. Immunity 35, 705–720 (2011).
Lillemeier, B.F. et al. TCR and Lat are expressed on separate protein islands on T cell membranes and concatenate during activation. Nat. Immunol. 11, 90–96 (2010).
Annibale, P., Vanni, S., Scarselli, M., Rothlisberger, U. & Radenovic, A. Identification of clustering artifacts in photoactivated localization microscopy. Nat. Methods 8, 527–528 (2011).
Annibale, P., Vanni, S., Scarselli, M., Rothlisberger, U. & Radenovic, A. Quantitative photo activated localization microscopy: unraveling the effects of photoblinking. PLoS ONE 6, e22678 (2011).
Ovesný, M., Krř ížek, P., Borkovec, J., Švindrych, Z. & Hagen, G.M. ThunderSTORM: a comprehensive ImageJ plug-in for PALM and STORM data analysis and super-resolution imaging. Bioinformatics 30, 2389–2390 (2014).
Quan, T., Zeng, S. & Huang, Z.-L. Localization capability and limitation of electron-multiplying charge-coupled, scientific complementary metal-oxide semiconductor, and charge-coupled devices for superresolution imaging. J. Biomed. Opt. 15, 066005 (2010).
Ferguson, T.S. A Bayesian analysis of some nonparametric problems. Ann. Stat. 1, 209–230 (1973).
Getis, A. & Franklin, J. Second-order neighborhood analysis of mapped point patterns. Ecology 68, 473–477 (1987).
Lloyd, S. Least squares quantization in PCM. IEEE Trans. Inf. Theory 28, 129–137 (2006).
Hinneburg, A. & Gabriel, H.-H. in Advances in Intelligent Data Analysis VII (eds. Berthold, M.R., Shawe-Taylor, J. & Lavrač, N.) 70–80 (Springer, 2007).
Johnson, S.C. Hierarchical clustering schemes. Psychometrika 32, 241–254 (1967).
Ester, M., Kriegel, H.-P., Sander, J. & Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. KDD-96 226–231 (1996).
Neve-Oz, Y., Razvag, Y., Sajman, J. & Sherman, E. Mechanisms of localized activation of the T cell antigen receptor inside clusters. Biochim. Biophys. Acta 1853, 810–821 (2015).
Cox, S. et al. Bayesian localization microscopy reveals nanoscale podosome dynamics. Nat. Methods 9, 195–200 (2012).
Lee, S.-H., Shin, J.Y., Lee, A. & Bustamante, C. Counting single photoactivatable fluorescent molecules by photoactivated localization microscopy (PALM). Proc. Natl. Acad. Sci. USA 109, 17436–17441 (2012).
Gandy, A. Sequential implementation of Monte Carlo tests with uniformly bounded resampling risk. J. Am. Stat. Assoc. 104, 1504–1511 (2009).
Gandy, A. & Rubin-Delanchy, P. An algorithm to compute the power of Monte Carlo tests with guaranteed precision. Ann. Stat. 41, 125–142 (2013).
Green, P.J. & Richardson, S. Modelling heterogeneity with and without the Dirichlet process. Scand. J. Stat. 28, 355–375 (2001).
Acknowledgements
D.M.O. acknowledges funding from the European Research Council (FP7 starter grant 337187) and Marie Curie Career Integration grant 334303. A.P.C. is funded by Arthritis Research UK grants 19652 and 20525.
Author information
Authors and Affiliations
Contributions
P.R.-D., N.A.H. and D.M.O. conceived the method. P.R.-D., J.G. and D.M.O. performed the analysis. P.R.-D. and D.M.O. wrote the manuscript. G.L.B. acquired cell data. G.L.B., D.J.W. and A.P.C. provided materials.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Integrated supplementary information
Supplementary Figure 1 Performance analysis under four different clustering scenarios.
Performance analysis under four different clustering scenarios. i) Standard Conditions, ii) a sparse data set with only 10% as many localisations, iii) clusters which are twice as large and iv) only 10 localisations per cluster and 90% of localisations in the background. a) Histograms of the number of clusters per region. b) Histograms of the cluster radii. c) Histograms of the number of localisations per cluster and d) Histograms of the percentage of localisations found in clusters. In all cases the blue dashed line represents the true value. Histograms are calculated from 100 simulated data sets for each scenario.
Supplementary Figure 2 Estimation of cluster descriptors as simulation parameters vary (n = 100 per point).
Estimation of cluster descriptors as simulation parameters vary (n = 100 per point). i) Measured localisations per cluster. ii) Measured cluster radii. iii) Measured percentage of localisations in clusters and iv) measured number of clusters per region. a) Simulated number of localisations per cluster, b) simulated cluster radii and c) simulated fraction of background localisations. Blue dashed lines represent simulated values.
Supplementary Figure 3 Comparison of our algorithm with DBSCAN.
Comparison of our algorithm with DBSCAN. a-d) Comparison of the proposal generating algorithm (I) with DBSCAN (II) when each method is allowed to optimise its analysis parameters based on our Bayesian scoring mechanism, run on simulated data in the Standard Conditions (n = 100). a) Number of clusters per region, b) percentage of localisations in clusters, c) number of localisations per cluster and d) cluster radii. e-h) Histograms of key cluster descriptors generated by DBSCAN with fixed r = 50 nm and T = 78 from simulated data in the Standard Conditions (n = 100). e) Number of clusters per region, f) number of localisations per cluster, g) percentage of localisations in clusters and h) cluster radii. Blue dashed lines represent simulated values.
Supplementary Figure 4 Analysis of simulated data from the standard and sparse conditions (n = 100) by alternative clustering techniques.
Analysis of simulated data from the Standard (a-f) and sparse (g-l) conditions (n = 100) by alternative clustering techniques. a and g) Cluster heat maps generated from Getis and Franklin’s Local Point Pattern Analysis. b and h) Binary maps generated from these heat maps. c and i) Number of clusters per region extracted from the binary maps. d and j) Percentage of localisations in clusters. Blue dashed lines represent simulated values. e and k) Ripley’s K function curves for example data sets together with 99% confidence intervals (C.I.) generated by simulating 100 CSR datasets. f and i) Pair Correlation curves for example data sets.
Supplementary Figure 5 Analysis of simulated data with 100 nm clusters and data with 90% of localizations in the background (n = 100) by alternative clustering techniques.
Analysis of simulated data with 100 nm clusters (a-f) and data with 90% of localisations in the background (g-l) (n = 100) by alternative clustering techniques. a and g) Cluster heat maps generated from Getis and Franklin’s Local Point Pattern Analysis. b and h) Binary maps generated from these heat maps. c and i) Number of clusters per region extracted from the binary maps. d and j) Percentage of localisations in clusters. Blue dashed lines represent simulated values. e and k) Ripley’s K function curves for example data sets together with 99% confidence intervals (C.I.) generated by simulating 100 CSR datasets. f and i) Pair Correlation curves for example data sets.
Supplementary Figure 6 Performance of the algorithm on data sets (n = 100) with an uneven background.
Performance of the algorithm on data sets (n = 100) with an uneven background, following a Beta(2,2) (i) or a Beta(5,1) (ii) distribution. a) Representative data. b) Log(posterior probability) heat maps. c) The highest scoring cluster proposal. d) Histograms of cluster radii. e) Histograms of the number of localisations per cluster. f) Histograms of the number of clusters. g) Histograms of the percentage of localisations found in clusters. Blue dashed lines represent true values.
Supplementary Figure 7 Analysis of simulated data with an uneven background by alternative clustering techniques.
Analysis of simulated data with an uneven background, following a Beta(2,2) (a-f) or a Beta(5,1) (g-l) distribution (n = 100) by alternative clustering techniques. a and g) Cluster heat maps generated from Getis and Franklin’s Local Point Pattern Analysis. b and h) Binary maps generated from these heat maps. c and i) Number of clusters per region extracted from the binary maps. d and j) Percentage of localisations in clusters. Blue dashed lines represent simulated values. e and k) Ripley’s K function curves for example data sets together with 99% confidence intervals (C.I.) generated by simulating 100 CSR datasets. f and i) Pair Correlation curves for example data sets.
Supplementary Figure 8 Histograms of the measured number of localizations per cluster and cluster radii for simulated dimers, trimers and hexamers.
Histograms of the measured number of localisations per cluster and cluster radii for simulated dimers, trimers and hexamers in a dense or sparse distribution (n = 100 per condition). These were simulated by generating localisations with identical coordinates which were then independently scrambled according to each point’s localisation precision. The detection problem becomes harder as the overall density increases. We therefore performed the analysis at two different densities (2000 and 200 overall localisations).
Supplementary Figure 9 Performance of the algorithm on data sets (n = 100) with different cluster sizes within the same ROI.
Performance of the algorithm on data sets (n = 100) with different cluster sizes within the same ROI, 5x 50 nm and 5x 100 nm (i), 5x 10 nm and 5x 100 nm (ii) and a range of cluster sizes between 10 and 100 nm (iii). a) Representative data. b) Log(posterior probability) heat maps. c) The highest scoring cluster proposal and d) histograms of the measured cluster radii. Blue dashed lines represent simulated values.
Supplementary Figure 10 Side by side comparison of methods applied to three simulated conditions.
Side by side comparison of our method (I), Getis’s method (II) and DBSCAN (III) applied to three simulated conditions (n = 100 datasets each). The conditions are Standard Conditions (representative dataset shown in Fig. 2a i), Standard Conditions but with larger clusters (representative dataset shown in Fig. 2a iii) and Standard Conditions with uneven background (representative dataset shown in Supplementary Fig. 6a ii).
Supplementary Figure 11 Sensitivity of the measured clustering descriptors to the prior settings.
Sensitivity of the measured clustering descriptors (percentage of localisations in clusters, cluster radii, number of clusters per ROI and the number of localisations per cluster) to the prior settings. a) Illustration of the sensitivity of measured cluster descriptors to varying the Dirichlet process concentration coefficient, α. b) Sensitivity to the prior probability of any localisation being allocated to the background. c) Sensitivity to the prior distribution on the cluster radius. Two possible distributions are considered, one taken from experimental data (i) and a flat distribution between zero and half the size of the ROI (ii).
Supplementary Figure 12 Data preprocessing steps.
Data preprocessing steps. a) Determination of the optimal merge time. Following the method of Annibale et al, we plot the total number of localisations in a representative image against the merge time for CD3ζ -mEos3 in primary T cells. The average optimum merge time was found to be three frames (30 ms). This example is representative of four such plots. b and c) Representative histogram of the localisation precisions calculated for CD3ζ data in resting T cells (b) and in activated cells (c), using the method of Quan et al.
Supplementary Figure 13 Bayesian analysis of CD3ζ clustering in activated primary human T cells with localizations from the same ROI divided into two equally sized data sets.
Bayesian analysis of CD3ζ clustering in activated primary human T cells with localisations from the same ROI divided into two equally-sized data sets. a) and b) Highest scoring cluster proposals. c) and d) Log(posterior probability) heat maps.
Supplementary Figure 14 Data on CD3ζ clustering analysed by alternative approaches in resting and activated primary human T cells.
Data on CD3ζ clustering analysed by alternative approaches in resting and activated primary human T cells. a) Representative cluster maps (n = 30) generated by Getis and Franklin’s Local Point Pattern Analysis of a 3000 x 3000 nm area. b) Representative binary maps showing clustered areas. c) Ripley’s K function (average of n = 30 regions). d) Pair correlation curves (average of n = 30 regions) and e) Cluster statistics on the percentage of localisations found in clusters, number of clusters per region and cluster radii extracted from the binary maps in b).
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–14 (PDF 5755 kb)
Supplementary Software
R code for running Bayesian cluster analysis (ZIP 103 kb)
Rights and permissions
About this article
Cite this article
Rubin-Delanchy, P., Burn, G., Griffié, J. et al. Bayesian cluster identification in single-molecule localization microscopy data. Nat Methods 12, 1072–1076 (2015). https://doi.org/10.1038/nmeth.3612
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nmeth.3612
This article is cited by
-
A framework for evaluating the performance of SMLM cluster analysis algorithms
Nature Methods (2023)
-
Multicolor lifetime imaging and its application to HIV-1 uptake
Nature Communications (2023)
-
Acute Transcriptomic and Epigenetic Alterations at T12 After Rat T10 Spinal Cord Contusive Injury
Molecular Neurobiology (2023)
-
In-situ study of the impact of temperature and architecture on the interfacial structure of microgels
Nature Communications (2022)
-
Correction of multiple-blinking artifacts in photoactivated localization microscopy
Nature Methods (2022)