Main

The analysis of experimental data often necessitates statistical comparisons of similarity between observations that incorporate either systematic or purely random measurement errors. Systematic errors derived from the instrument, sample or data processing steps require experiment-dependent treatment, whereas random errors are taken into account using standard procedures1. However, it can be difficult at a practical level to obtain accurate error estimates required for valid statistical testing. Therefore, we developed an approach to quantitatively assess data-data comparisons and to evaluate model-data correspondence without the need to explicitly estimate experimental errors. We apply a correlation-based statistical test to evaluate small-angle X-ray scattering (SAXS) data obtained from biological macromolecules in solution. This method, CorMap, should also be generally applicable for experiments producing oversampled one-dimensional data sets of discrete data points.

For SAXS, scattering intensities I(q) are recorded as a function of angle, or momentum transfer q = 4πsinθ/λ, where λ is the X-ray wavelength and 2θ is the scattering angle (ref. 2). The successful interpretation of one-dimensional SAXS data is marred by a number of pitfalls that need addressing before reporting conclusions on the basis of these data3. In particular, when the statistical similarity between experimentally obtained intensities, Iexp(q), and those computed from a model, Icalc(q), is evaluated using the reduced χ2 statistic4

at n experimental data points, it is necessary that the experimental errors, σ(Iexp(qk)), are correctly estimated in order for the test to be statistically valid. If this condition is met and the model adequately describes the experimental data, the resulting χ2 should be in the range 0.9 ≤ χ2 ≤ 1.1 (Supplementary Fig. 1). However, the true values of σ(Iexp(qk)), i.e., σ(I(qk)), are always unknown and have to be estimated from the data assuming Poisson statistics (Supplementary Fig. 2 and Online Methods). Accurate propagation of the recorded errors through data processing steps is also nontrivial2. Consequently, and especially at the modern high-throughput SAXS facilities, there is a risk of collecting thousands of data sets with poorly determined or incorrectly propagated errors that may invalidate the assessments of data-data or data-model fits. This problem is also evident in fields other than SAXS5.

We propose a statistically valid approach that simply utilizes experimental intensities. When a photon-counting detector is used, each radially averaged Iexp(qk) may be considered as a sample drawn from a normally distributed random variable with expected value I(qk) and s.d. σ(I(qk)), i.e., Iexp(qk) N(I(qk), σ(I(qk)) (Supplementary Fig. 2 and Online Methods). Therefore, an entire scattering profile, S, that is a collection of n normally distributed Iexp(qk) data points, may be conceptualized to be simultaneously drawn from an n-variate normal distribution, S N(J, Σ), where J corresponds to the vector of expected intensities of S, and Σ to its variance-covariance matrix

In synchrotron SAXS, data are usually recorded in multiple short frames to monitor for various systematic deviations, for example, for radiation damage. Inserting the observed experimental intensities Iexp(qk) for m frames along the diagonal of Σ,

and the off-diagonal covariance estimates between all point-to-point qk and ql

where

the corresponding correlations

may be computed as −1 ≤ rkl ≤ +1. If there are no differences among the data sets, the Iexp(qk) values are normally distributed at all qk and also jointly normal distributed and uncorrelated, and thus Iexp(qk) and Iexp(ql) are independent for all 1k, ln and kl (Supplementary Fig. 2), i.e., the resulting correlations rkl are random values. This random property allows one to evaluate the similarity between data when comparing m ≥ 2 data frames but also modeled versus experimental scattering intensities without the need to explicitly estimate experimental errors.

The numeric values of the rkl correlation matrix for successive frames may be visualized as a map, or CorMap, using gray levels ranging from −1 (black) to +1 (white), where the extent of the map corresponds to the selected q-range (Fig. 1). For example, the CorMap acquired from m = 20 × 50-ms frames of aqueous buffer solution (Fig. 1a,b) displays a random pattern without any obvious features, indicating no systematic differences between the scattering intensities. A similar random pattern is noted when analyzing the correlations acquired from a protein that is stable in the X-ray beam (Fig. 1d,e). Conversely, if differences arise within a multiple-frame data set, the features of the map change dramatically. The effects of severe radiation damage to a lysozyme sample (Fig. 1g,h) show long contiguous areas of both positive and negative correlations at low q, indicating systematic differences between data frames in this region of the profile. Less obvious features can also be revealed, for example, where scaling effects have been deliberately introduced into a data set to reflect poorly recorded sample transmissions (Fig. 1j,k). The CorMap shows nonrandom features, even though it is difficult to assess these inconsistencies by overlaying the individual one-dimensional scattering profiles.

Figure 1: CorMap visualization.
figure 1

(a,d,g,j) One-dimensional SAXS data sets, each consisting of 20 data frames, obtained from aqueous buffer (a), glucose isomerase (solvent-subtracted data) (d), lysozyme with accumulating radiation damage (unsubtracted data) (g) and human serum albumin with scaling issues (j). (b,e,h,k) CorMaps corresponding to the data sets at left, computed from all 20 frames. (c,f,i,l) Selected pairwise comparisons of two frames each. The corresponding probabilities of similarity (P value) of the two-frame comparisons are 0.3400 (c), 0.097 (f), 0.0002 (i) and <10−5 (l).

Meaningful comparisons of two data frames are also possible. In the absence of systematic differences, the pairwise CorMaps display a randomized lattice pattern (Fig. 1c,f). If differences exist, nonrandom and contiguous areas—or 'patches'—of positive (+1) or negative (−1) correlations emerge (Fig. 1i,l). Such pairwise comparisons enable the identification of subtle changes between any two data frames or data sets, for example, the onset of radiation damage during data-frame collection, interparticle interference or concentration effects (Supplementary Figs. 3 and 4). Furthermore, CorMap can assess the quality of fits by the scattering computed from structural models against SAXS data; nonrandom patterns clearly and reliably point to systematic deviations for incorrect model fits. Shown in Figure 2 are examples of assessing the fits obtained during the course of ab initio bead modeling (Fig. 2a–c and Supplementary Video 1) and when employing rigid-body modeling (Fig. 2d–f).

Figure 2: CorMap applications.
figure 2

(a) SAXS data model fits of a starting (misfitting) and final (fitting) bead model of lysozyme obtained during ab initio modeling using the program DAMMIF12. (b,c) Corresponding CorMaps for the initial (b; n = 449, C = 137, P < 10−6) and refined bead models (c; n = 449, C = 10, P = 0.3532). A spatial superposition of the final refined bead model and the lysozyme X-ray crystal structure is also displayed (black ribbons). (d) Comparisons of the experimental scattering from 3-phosphoglycerate kinase, with the calculated scattering from a closed (liganded) or open X-ray crystal structure (top and bottom, respectively13). (e,f) CorMap assessment of the rigid-body model fits (closed form (e): n = 2,039, C = 338, P < 10−6; open form (f): n = 2,039, C = 18, P = 0.011). (g,h) Zero-applied-field muon spin rotation data (ZF-μSR) obtained from a crystal of MnSi at 5 K with a fitted polarization function10 (g) and corresponding assessment of the fit using CorMap (n = 200, C = 11, P = 0.0456) (h).

The probability of similarity in data-data or data-model comparisons may be quantified from pairwise CorMaps by evaluating whether the largest observed patch of contiguous −1 or +1 correlations is likely to occur by chance. The maximum edge length of the patches, C, follows the same distribution as that of the longest head-or-tail runs in coin-toss experiments as described by Schilling6 (Supplementary Fig. 5). The probability (P value) to obtain an edge length larger than C within an n-by-n correlation matrix is calculated from the Schilling distribution with parameters n and C (Supplementary Fig. 6 and Online Methods). If the P value is less than a predetermined significance level α, preferably α = 0.01 or less7, the observed differences between any two SAXS patterns may be considered statistically significant.

To compare the statistical properties of CorMap with that of the reduced χ2 test, i.e., true and false positive rates, we simulated several thousand SAXS data sets to represent a number of experimental scenarios including systematic errors, random scaling errors and radiation damage (Supplementary Fig. 7). The false positive rates, i.e., the proportion of cases in which differences are flagged when there are none, were found to be 0.010 ± 0.003 for the reduced χ2 test and 0.019 ± 0.003 for CorMap, for both tests, respectively (α = 0.01; Supplementary Table 1). The true positive rate of CorMap, i.e., the proportion of correctly identified differences, also known as the statistical power, we found to be similar to that of the reduced χ2 test (Supplementary Figs. 8, 9, 10, Supplementary Table 2 and Online Methods). However, CorMap is uncoupled from the requirement of explicitly defining correct experimental errors, so the test is widely applicable in situations where estimating errors is problematic.

The specification of experimental errors has a major effect on data analysis and model fitting in many physical experiments. For SAXS, there are no agreed-upon standards with respect to correct error estimation or propagation, thus prompting suggestions to find replacements for the reduced χ2 test. The recently proposed χ2free (ref. 8), a resampling-based adaptation of the reduced χ2 test, has the same limitation as the reduced χ2 test in that χ2free is valid only when correct error estimates are available. Moreover, under these valid-only circumstances, χ2free and the reduced χ2 test are equivalent (Supplementary Figs. 11 and 12); if the errors are randomly chosen, correspondingly randomized results may be observed (Supplementary Fig. 13). An alternative approach using the paired t-test9 cannot be applied to SAXS data analysis, as the t-test's requirement of identically distributed data is not met (Supplementary Fig. 2b).

CorMap offers a valid approach to evaluate discrepancies between SAXS data sets or data-model fits that overcomes the issue of correctly estimating experimental errors and identifies the q-range where the largest dissimilarity occurs. Beyond SAXS, we anticipate that CorMap may be applied to assess differences between discrete oversampled one-dimensional data from various scattering, reflectometry and other spectroscopic techniques as well as from other fields of physics. We demonstrate one example: a polarization function used to model experimental zero-applied-field muon spin rotation (ZF-μSR) data10 (Fig. 2g,h); here, the patch sizes appear larger in the ZF-μSR CorMap compared to in the SAXS examples because the ZF-μSR spectra have a lower number of experimental points. CorMap is implemented in the ATSAS software package11 as a command-line module and graphical user interface, and is freely available for academic use (http://www.embl-hamburg.de/biosaxs/download.html). In addition, the source code of the calculation of the P value is available to academic users upon request. The results of CorMap should be reported as: “The hypothesis of similarity of experimental data and model could [not] be rejected (CorMap test, n points, C = XXX, P = x.xxx).”

Methods

Experimental SAXS and sample details.

Continuous-flow sample injection was performed at 20 °C using a temperature-controlled EMBL/ESRF automated sample changer equipped with a 1.8-mm quartz capillary sample cell held under vacuum14. SAXS data (20 × 50-ms frames) were collected from several protein samples and their associated matched solvent blanks: (i) glucose isomerase (Hampton Research): 5.8 mg/ml dialyzed against 200 mM Na2SO4, 50 mM K2SO4, 1 mM MgCl2, 50% (v/v) 2H2O, 20 mM HEPES, pH 7.0; (ii) chicken egg white lysozyme (USB Corporation): 2.6 mg/ml dialyzed against 20 mM CH3COONa, 20 mM HEPES, pH 6.8; (iii) bovine pancreatic RNase A, RNAse; (Sigma): 3.7 mg/ml, 7.5 mg/ml and 15 mg/ml in phosphate buffered saline, pH 7.0; (iv) human serum albumin, HSA, (Sigma): 5 mg/ml, 10 mg/ml and 20 mg/ml in phosphate buffered saline, pH 7.0; (v) HSA (Sigma): 3.65 mg/ml dialyzed against 50 mM HEPES, pH 7.5.

The solvent scattering contributions were subtracted to obtain the scattering from each protein in solution. To test the onset of radiation damage (Supplementary Fig. 3), we collected lysozyme SAXS data as described above, without solvent subtraction, from protein samples prepared at 7.9 mg/ml in 40 mM NaCl, 20 mM CH3COONa, 20 mM HEPES, pH 4.5. Severe radiation damage was tested on lysozyme samples at 8.4 mg/ml in 150 mM NaCl, 40 mM CH3COONa, pH 3.8 (Fig. 1j). To simulate poorly recorded sample transmissions, SAXS data from HSA in HEPES buffer (v) were recorded and deliberate scaling errors applied to randomly selected data frames via multiplication of the sample scattering data before solvent subtraction (Fig. 1d). The multiplication factors introduced were, in order: 1.00, 1.01, 1.02, 0.99, 0.96, 1.04, 1.00, 1.00, 1.00, 0.96, 1.00, 1.00, 1.02, 0.96, 1.00, 0.94, 1.04, 0.96, 1.00 and 1.00. In all instances, protein concentrations were estimated using the extinction coefficient calculated from the amino acid sequence (ProtParam15).

Experimental setup and data collection.

SAXS intensities, I(q) vs. q, where q = 4π sinθ/λ, λ = 0.124 nm is the X-ray wavelength and 2θ is the scattering angle, were collected at the EMBL BioSAXS P12 beam line (PETRA-III storage ring, DESY, Hamburg, Germany) equipped with a DECTRIS Pilatus 2M photon-counting detector. Radial averaging to produce 1D scattering profiles from the recorded 2D data was performed using RADAVER that is integrated into the automated P12 data acquisition and analysis pipeline16. All experimental data were recorded on a relative scale. The radial averaging process, which assumes error estimates based on Poisson counting statistics, where the red square donates the beam center (Supplementary Fig. 14), is as follows

1. To obtain the intensity for a single value of q, find all N pixels with constant distance (black) from the beam-center (red) (Supplementary Fig. 14), without applying anti-aliasing or 'pixel splitting', as this produces correlations in neighboring intensities.

2. Compute the average intensity Ipoi at distance

where pk the kth pixel intensity.

3. Compute the standard error for intensity Ipoi

with √Ipoi the s.d. of the Poisson variable Ipoi and √N the reduction factor of the s.e.m.

4. Normalize by exposure time t, intensity of the transmitted beam, d and to unit time T

Scaling intensity and the error term by the same value seems counterintuitive; however, the standard error is an estimate of accuracy of the point estimate, not its variation. On scaling of the intensity, the accuracy does not change.

Scattering patterns as random variables.

For assessment of the statistical properties of experimentally recorded scattering intensities at each point in q, several thousand SAXS data sets from water were recorded spanning a momentum transfer of 0.03 < q < 4.4 nm−1: (i) 10,000 consecutive frames of 0.1 s, (ii) 2,000 consecutive frames of 1 s and (iii) 500 consecutive frames of 10 s. The analysis of the statistical properties of SAXS intensities recorded from water (Supplementary Fig. 2) indicate that, for a data set consisting of K q points (k = 1, ..., K), the experimental Iexp(qk) at each qk follow Poisson counting statistics1. With a sufficiently large number of counts, the distribution of Iexp(qk) limits to a Gaussian distribution with mean Nk and s.d. σ(Iexp(qk)) = √Nk (cf. radial averaging above), i.e., the Iexp(qk) values follow a normal distribution in accordance with the central limit theorem (CLT), and the variances decrease with extended exposure time in accordance with the s.e.m. (Supplementary Fig. 2a,b).

An analysis of the pairwise joint distribution for any qk and ql where kl shows a good agreement with a two-dimensional normal distribution (Supplementary Fig. 2c), whereas the correlation matrix (Supplementary Fig. 2d) shows that no two points are correlated with each other. Therefore, the scattering intensities recorded at each value of q are statistically independent, i.e., (pairwise) jointly normal distributed and uncorrelated.

Correlation Map: theoretical distribution, P value and approximation.

When analyzing 5,000 independent pairwise correlation matrices derived from 10,000 experimental water frames, we found that the distribution of the edge length C of the largest contiguous area of similar correlation, i.e., contiguous patches of −1 or +1, may be described by the same distribution that models the longest head-or-tail runs in coin-toss experiments, for example, as in Schilling6 (Supplementary Fig. 5). Consequently, the probability to obtain a run Rn of more than C consecutive data points with the same direction of correlation is given by

The longest run of heads defined by Schilling's equation (1) defines An(C) as

where “the distribution of the longest run of heads or tails for a fair coin is simply the distribution of the longest run of heads alone for a sequence containing one fewer coin toss, shifted to the right by one”6.

The toss-a-coin principle directly applies to a sequence of one-dimensional discrete data. Indeed, if two data sets are identical up to noise, the difference between them is that of two random numbers. The difference of two normally distributed random variables is a normally distributed random variable. The mean is the difference of the two means, i.e., 0 in our case. Hence, owing to symmetry of the normal distribution, the probability of the variable being positive or negative is exactly as for tossing a fair coin, i.e., 0.5. And the two subsequent data points are independent, exactly as for toss-a-coin.

The probability of how likely it is that two SAXS data sets are similar may thus be obtained from a pairwise CorMap consisting of n × n data points by calculating the exact Schilling distribution for n points and computing the probability of obtaining an edge length larger than the edge length of the observed patch size. If this probability is less than the predefined significance level α, the hypothesis of similarity between the frames has to be rejected. Examples of the Schilling distribution for various n are shown in Supplementary Figure 6 as well the effects of SAXS data rebinning in q, for example, averaging intensities over a selected Δq intervals, that is often employed as a data processing step.

The P value has a clear statistical meaning and, as such, is a correct measure of similarity. The situation is similar to the case of reduced χ2 test, where, given the χ2 value, the goodness of fit should be judged on the basis of the associated P value computed from the χ2 distribution. In practice, χ2 itself is most often reported (cf. Supplementary Fig. 1) and not its P value. To have a similar shorthand measure for the CorMap, one may consider, as a rule of thumb, a z-score approximation directly related to the longest edge length

In practical terms, z values exceeding 3 indicate statistically significant differences between the data sets. However, use of the exact P value is strongly recommended.

Correlation Map: Bonferroni correction for multiple testing.

In instances when it is necessary to evaluate multiple pairwise tests derived from several comparisons (for example, Supplementary Figs. 3 and 4), the P values for the CorMaps are adjusted by the Bonferroni method

where p = P(Rn > C) and m is the number of tests. The adjusted P value is then compared to the predefined significance level α. Of note, other adjustments for multiple testing are possible and possibly more powerful (Bonferroni-Holm, Benjamini-Hochberg); however, in the context of this manuscript, the classical approach of Bonferroni was chosen for its simplicity.

Correlation Map: visualization.

In Octave (http://www.octave.org/) or Matlab (http://www.mathworks.com/):

% Load two or three column data (s,I,err) as ascii

% without headers or footers

d1 = dlmread('file1.dat');

d2 = dlmread('file2.dat');

% Build data matrix, one set of intensities per columndata = [d1(:,2), d2(:,2)]';

% Plot the correlation map

imagesc(corr(data, data))

Comparing statistical tests.

To assess the ability of the CorMap to hold the type I error (false positive rate) and evaluate its power (true positive rate) in comparison to the reduced χ2 test, we generated a large pool of data consisting of several thousand simulated scattering profiles of monomeric bovine serum albumin (BSA, PDB: 3V03) and tested several hypotheses. Simulated SAXS data are very useful for comparative testing in that they provide a standard frame of reference that consists of a sufficiently large sample set for significance testing without being influenced by unknowable experimental uncertainties in intensity and error estimates. Notably, this approach ensures that the CorMap is always compared to a reduced χ2 test with known errors. The simulated data were computed by taking the expected solution scattering intensities, I(qk), from the atomic structure of BSA using Crysol17 and introducing statistical variations based on the variation information obtained from real data (Supplementary Fig. 2). All calculations were done using Octave (http://www.octave.org/) and Matlab (http://www.mathworks.com/). Each of the simulated data sets is composed of m = 10 data frames to more realistically model frame-by-frame variations.

The hypotheses used to test the reduced χ2 test and the CorMap were formulated to reflect 'real-world' scenarios and were as follows:

H0: no systematic differences among SAXS data frames (10,000 times ten frames of simulated data to evaluate type 1 error, i.e., the probability that a difference is detected although there are none).

H1: random shifts in I(q) simulating systematic errors, for example, incorrectly matched solvents or sample fluorescence (2,000 times ten simulated frames).

H2: random scaling to simulate systematic error due to incorrect sample transmissions (2,000 times ten simulated frames).

H3,H4,H5: examples of radiation damage (2,000 times ten simulated frames). Note, as it is very difficult to specifically characterize radiation damage at the molecular level, the H3–H5 scenarios were modeled using simple additive contributions to the scattering patterns, based on empirical estimates of radiation damage observed during experiments at P12 (ref. 18) (Supplementary Fig. 7).

Both the CorMap and reduced χ2 were applied to all pairs of frames within a data set, and the P value for each test was adjusted for multiple testing by the Bonferroni correction. The smallest P value after this adjustment was selected as the global result of all pairwise tests, and the number of statistically significant results across the simulated data sets was counted, i.e., where the adjusted P value was less than the significance level α = 0.01, to determine empirical values for type I error and power. Finally the 99% Clopper-Pearson confidence intervals19 were computed to facilitate the comparison of results; overlapping confidence intervals indicate equivalent tests at that effect size; fully separated intervals indicate significant differences between the tests. Both the reduced χ2 and the CorMap tests approximately hold the type I error level at α = 0.01 (0.010 ± 0.003 and 0.019 ± 0.003, respectively; Supplementary Table 1). The CorMap test is more powerful, according to the Clopper-Pearson confidence intervals, than the reduced χ2 test in four out of five alternatives (Supplementary Fig. 8b–e and Supplementary Table 2). Only when detecting random shifts (H1) does χ2 display slightly more power than the CorMap (Supplementary Fig. 8a).

For comparing the performance of the CorMap with the reduced χ2 test for assessing SAXS data model fits, an additional 10,000 simulated data sets of BSA were generated and compared to the scattering calculated for a set of BSA models (Supplementary Fig. 9). Fits against the data were determined for the native BSA monomer (Supplementary Fig. 10a,c) to estimate the type I error, as well as a set of 23 hypothetical structures where the Tyr496-to-Val497 bond angle was rotated in consecutive steps of 1° to introduce ever-increasing systematic differences relative to the native structure (Supplementary Fig. 10b,d). Both the CorMap and χ2 test hold the type I error level (0.012 ± 0.003 and 0.013 ± 0.003, respectively), and the CorMap is about as powerful to detect systematic differences in data fitting as the reduced χ2 test (Supplementary Fig. 10e).

Evaluation of χ2free.

After performing comparisons of the CorMap with the reduced χ2, we also intended to rigorously compare its statistical power for discerning data model fits with the recently developed χ2free test8. However, when 23,000 pairwise model-fit test comparisons of χ2 and χ2free are plotted, the corresponding values of χ2 and χ2free are, up to the sampling variation of χ2free, identical. A one-to-one correspondence between χ2 and χ2free is observed for all data model fits when the errors on the data have been correctly specified (Supplementary Fig. 11). To verify that this is not an artifact, we tested another example with correct, but different, error structure. We simulated 1,000 frames of BSA with the commonly assumed constant 3% relative error across the whole q-range instead of an empirical error structure (Supplementary Fig. 1b). Again, we found the values of χ2 and χ2free are, up to the sampling variation of χ2free, in essence identical (Supplementary Fig. 12). Therefore, the result of one-to-one correspondence, if the correct errors are available, is independent of the applied error structure. We note that the observed upwards shift from the diagonal, i.e., from a perfect 1:1 correlation between χ2 and χ2free (Supplementary Figs. 11 and 12b), may be attributed to the maximum particle dimension, Dmax, parameter of χ2free; changing the estimated Dmax modulates the outcome of the computation.

Four additional cases comparing χ2free and the reduced χ2 test were considered (Supplementary Fig. 13): (i) the general proportions of the errors are correct, but exactly half the magnitude; (ii) likewise, but twice the magnitude; (iii) random permutation of the previously correct error estimates to random positions; and (iv) assumption of a constant 75% relative error. A total of 23,000 model fit test comparisons of χ2 and χ2free were calculated and compared to the 23,000 cases with correct error structure (identical to Supplementary Fig. 11, marked with a circle in each panel of Supplementary Fig. 13 as a reference). It is notable that with incorrect errors, both tests report values that indicate that differences are present (i.e., outside the interval 0.9–1.1) even in the cases where there are no statistical differences. This corresponds to a false positive error rate of 100%; hence, both tests are invalid. Any previously reported improvements of stability of χ2free over χ2 (ref. 8) have thus to be attributed to coincidence. Consequently, χ2free affords no advantage over the reduced χ2 test when data-model fits are evaluated because both are equivalent under valid test conditions. Therefore, the statistical power of a valid χ2free test in comparison to the CorMap is the same as that of a valid reduced χ2 test, and thus χ2free was not considered in more detail. All results shown here were computed using the reference implementation of χ2free, which was provided on request by R.P. Rambo.

Accession codes.

Small Angle Scattering Biological Data Bank: SAXS data have been deposited under accession codes SASDAB6 (xylose isomerase), SASDAA6 (human serum albumin) and SASDA96 (chicken lysozyme).