Abstract
We present a broadly applicable, user-friendly protocol that incorporates sparse and hybrid experimental data to calculate quasi-atomic-resolution structures of molecular machines. The protocol uses the HADDOCK framework, accounts for extensive structural rearrangements both at the domain and atomic levels and accepts input from all structural and biochemical experiments whose data can be translated into interatomic distances and/or molecular shapes.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Karaca, E. & Bonvin, A.M. Advances in integrative modeling of biomolecular complexes. Methods 59, 372–381 (2013).
Ward, A.B., Sali, A. & Wilson, I.A. Biochemistry. Integrative structural biology. Science 339, 913–915 (2013).
Morag, O., Sgourakis, N.G., Baker, D. & Goldbourt, A. The NMR-Rosetta capsid model of M13 bacteriophage reveals a quadrupled hydrophobic packing epitope. Proc. Natl. Acad. Sci. USA 112, 971–976 (2015).
Duss, O., Yulikov, M., Jeschke, G. & Allain, F.H. EPR-aided approach for solution structure determination of large RNAs or protein–RNA complexes. Nat. Commun. 5, 3669 (2014).
Ferber, M. et al. Automated structure modeling of large protein assemblies using crosslinks as distance restraints. Nat. Methods 13, 515–520 (2016).
Kalinin, S. et al. A toolkit and benchmark study for FRET-restrained high-precision structural modeling. Nat. Methods 9, 1218–1225 (2012).
Lapinaite, A. et al. The structure of the box C/D enzyme reveals regulation of RNA methylation. Nature 502, 519–523 (2013).
Politis, A. et al. A mass spectrometry-based hybrid method for structural modeling of protein complexes. Nat. Methods 11, 403–406 (2014).
Russel, D. et al. Putting the pieces together: integrative modeling platform software for structure determination of macromolecular assemblies. PLoS Biol. 10, e1001244 (2012).
van Zundert, G.C. et al. The HADDOCK2.2 web server: user-friendly integrative modeling of biomolecular complexes. J. Mol. Biol. 428, 720–725 (2016).
Carlomagno, T. Present and future of NMR for RNA–protein complexes: a perspective of integrated structural biology. J. Magn. Reson. 241, 126–136 (2014).
Dominguez, C., Boelens, R. & Bonvin, A.M. HADDOCK: a protein–protein docking approach based on biochemical or biophysical information. J. Am. Chem. Soc. 125, 1731–1737 (2003).
Gabel, F. Small-angle neutron scattering for structural biology of protein-RNA complexes. Methods Enzymol. 558, 391–415 (2015).
Madl, T., Gabel, F. & Sattler, M. NMR and small-angle scattering-based structural analysis of protein complexes in solution. J. Struct. Biol. 173, 472–482 (2011).
Feng, C. et al. Log-transformation and its implications for data analysis. Shanghai Arch. Psychiatry 26, 105–109 (2014).
Robinson, R.C. et al. Crystal structure of Arp2/3 complex. Science 294, 1679–1684 (2001).
Leung, A.K., Nagai, K. & Li, J. Structure of the spliceosomal U4 snRNP core domain and its implication for snRNP biogenesis. Nature 473, 536–539 (2011).
Gnatt, A.L., Cramer, P., Fu, J., Bushnell, D.A. & Kornberg, R.D. Structural basis of transcription: an RNA polymerase II elongation complex at 3.3 A resolution. Science 292, 1876–1882 (2001).
Armache, K.J., Mitterweger, S., Meinhart, A. & Cramer, P. Structures of complete RNA polymerase II and its subcomplex, Rpb4/7. J. Biol. Chem. 280, 7131–7134 (2005).
Chen, Z.A. et al. Architecture of the RNA polymerase II–TFIIF complex revealed by cross-linking and mass spectrometry. EMBO J. 29, 717–726 (2010).
Raman, S. et al. NMR structure determination for larger proteins using backbone-only data. Science 327, 1014–1018 (2010).
Plaschka, C. et al. Architecture of the RNA polymerase II–Mediator core initiation complex. Nature 518, 376–380 (2015).
Karaca, E. & Bonvin, A.M. A multidomain flexible docking approach to deal with large conformational changes in the modeling of biomolecular complexes. Structure 19, 555–565 (2011).
Alber, F. et al. The molecular architecture of the nuclear pore complex. Nature 450, 695–701 (2007).
Xue, S. et al. Structural basis for substrate placement by an archaeal box C/D ribonucleoprotein particle. Mol. Cell 39, 939–949 (2010).
Saff, E.B. & Kuijlaars, A.B.J. Distributing many points on a sphere. Math. Intell. 19, 5–11 (1997).
Rodrigues, J.P. Computational Structural Biology of Macromolecular Interactions (Ridderprint BV, 2014).
Brunger, A.T. Version 1.2 of the crystallography and NMR system. Nat. Protoc. 2, 2728–2733 (2007).
MATLAB and Statistics Toolbox Release v. R2008a (Version 7.6) (Natick, 2008).
van Dijk, M. & Bonvin, A.M. Pushing the limits of what is achievable in protein–DNA docking: benchmarking HADDOCK's performance. Nucleic Acids Res. 38, 5634–5647 (2010).
Petoukhov, M.V. et al. New developments in the ATSAS program package for small-angle scattering data analysis. J. Appl. Cryst. 45, 342–350 (2012).
Pettersen, E.F. et al. UCSF Chimera—a visualization system for exploratory research and analysis. J. Comput. Chem. 25, 1605–1612 (2004).
Méndez, R., Leplae, R., De Maria, L. & Wodak, S.J. Assessment of blind predictions of protein–protein interactions: current status of docking methods. Proteins 52, 51–67 (2003).
Nilges, M., Gronenborn, A.M., Brünger, A.T. & Clore, G.M. Determination of three-dimensional structures of proteins by simulated annealing with interproton distance restraints. Application to crambin, potato carboxypeptidase inhibitor and barley serine proteinase inhibitor 2. Protein Eng. 2, 27–38 (1988).
Rosenzweig, R., Moradi, S., Zarrine-Afsar, A., Glover, J.R. & Kay, L.E. Unraveling the mechanism of protein disaggregation through a ClpB-DnaK interaction. Science 339, 1080–1083 (2013).
Kahraman, A., Malmström, L. & Aebersold, R. Xwalk: computing and visualizing distances in cross-linking experiments. Bioinformatics 27, 2163–2164 (2011).
Urlaub, H., Kühn-Hölsken, E. & Lührmann, R. Analyzing RNA-protein crosslinking sites in unlabeled ribonucleoprotein complexes by mass spectrometry. Methods Mol. Biol. 488, 221–245 (2008).
Karaca, E. & Bonvin, A.M. On the usefulness of ion-mobility mass spectrometry and SAXS data in scoring docking decoys. Acta Crystallogr. D Biol. Crystallogr. 69, 683–694 (2013).
Mund, M., Overbeck, J.H., Ullmann, J. & Sprangers, R. LEGO-NMR spectroscopy: a method to visualize individual subunits in large heteromeric complexes. Angew. Chem. Int. Edn Engl. 52, 11401–11405 (2013).
Mühlbacher, W. et al. Conserved architecture of the core RNA polymerase II initiation complex. Nat. Commun. 5, 4310 (2014).
Petoukhov, S.V. The system-resonance approach in modeling genetic structures. Biosystems 139, 1–11 (2016).
Karaca, E. et al. M3: an integrative framework for structure determination of molecular machines. Protocol Exchange http://dx.doi.org/10.1038/protex.2017.093 (2017).
Acknowledgements
This work was supported by the EMBL, the EU FP7 ITN project RNPnet (contract number 289007) and the DFG grant CA294/3-2. E.K. acknowledges support from the Alexander von Humboldt Foundation through a Humboldt Research Fellowship for Postdoctoral Researchers. We thank J. Kirkpatrick for critical reading of the manuscript and B. Simon for discussion and support with CNS. A.M.J.J.B. acknowledges funding from the European H2020 e-Infrastructure grants West-Life (grant no. 675858) and BioExcel (grant no. 675728).
Author information
Authors and Affiliations
Contributions
E.K. designed the studies, developed software, performed structure calculations, analyzed and interpreted data and wrote the manuscript, J.P.G.L.M.R. developed software; A.G. analyzed experiments; A.M.J.J.B. provided software and assisted in software development; T.C. designed the studies, assisted in data interpretation, wrote the manuscript and supervised the project.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Integrated supplementary information
Supplementary Figure 1 Sparse experimental data leads to a non-normal right (positive) skewed Eexp distribution.
a. When only sparse experimental data is available, global search generates few structures with significantly low Eexp. b. Low Eexp structures can be distinguished from the rest of the population by transforming Eexp values into ln(Eexp). Such transformation leads to a left (negative) skewed distribution. c. Structures with significantly low Eexp can be isolated as outliers (green circles) by using a box-and-whisker plot, where whiskers are extended by two IQRs. The green line indicates the median.
Supplementary Figure 2 The completeness of the input data can be probed by box-and-whisker statistics.
The heptameric Arp2/3 protein complex was used to test the performance of the M3 protocol with respect to the number of restraints. a. Graphical representation of building block separation prior to global search; Arp2/3 monomers are named after their chain IDs (as given in 1k8k). The yellow dashes correspond to 30 inter-monomer NOE distances. b. Normalized ln(Eexp) distributions for global-search runs using 50 (blue), 30 (green) and 10 (grey) NOEs. The run with 50/30/10 NOEs resulted in 119/58/0 outliers. The outliers of the runs with 50 and 30 NOEs run have a precision of 2.1 ű1.0 Å and 7.2 ű3.0 Å, respectively. c. The top ten structures from the global search step using 30 NOEs (superimposed on chain A). The precision of the ensemble with 10 lowest-energy structures is reported in the figure; the accuracy with respect to 1k8k is 2.5 ű2.2 Å (Cα-RMSD).
Supplementary Figure 3 Use of complementary structural information leads to a converged ensemble.
The human U4 Sm proteins-RNA complex (4wzj) was used to test the performance of different types of restraints. a. Graphical representation of the positions between which distances can be measure either by NMR, i.e. methyl groups of the ILV residues (represented by spheres) and PRE label locations (pink pentagons), or by XL-MS, i.e. NZ atoms of the lysine side chains (metallic blue circles). b. MS-XL restraints during the global search step resulted in no outliers, whereas runs using mPREs generated 9 and 7 outliers for 100% and 50% assigned methyl groups, respectively. c. 70 local search structures, following the global search using mPRE data for 50% assigned methyl groups, were grouped into 7 clusters. The best scoring two structures of cluster 2 (dark green circles) display a significantly better χ with respect to the SAXS curve. d. The precision of the final selected ensemble is reported in the figure; the accuracy with respect to 4wzj is 2.8±0.8 Å (Cα- and P-RMSD).
Supplementary Figure 4 Sparse distance restraints result in a native-like ensemble for all but one monomer.
a. Graphical representation of the separation of the building blocks of RNA polymerase II prior to the global search with 50 inter-protein (yellow dashed lines) and 5 protein-nucleic acid (salmon dashed lines) restraints. Due to the small number of restraints, the interactions between Rpb1-Rpb3, Rpb2-Rpb7, Rpb2-Rpb10, Rpb2-Rpb11, Rpb3-Rpb7 and Rpb2-Rpb6 are described by only one distance. b. Scoring by ln(EExp) identified three conformers to be passed to the local search step. c. The 30 local search conformers were separated in two clusters. Cluster 1 contains the ensemble of 13 structures (dark green circles) with the best fitness to the EM map (mean ccor > 0.94). The precision of the ensemble of 13 structures, including Rpb11, is given in the figure (for clarity we depicted only the best 10 structures); the accuracy with respect to 1i6h is 7.7±1.2 Å (Cα/P-RMSD). The orientation of all monomers but Rbp11 is similar to 1i6h (Supplementary Figure 5).
Supplementary Figure 5 The RNA pol II structures resulting from the local search step prior to the shape-driven selection differ in the orientation of Rpb5 and Rpbp11.
a. Representative structures of cluster 1 and 2. Major differences are related to the orientation of Rpb5 (light gray) and Rpb11 (black). b. In cluster 1 the relative orientation of the monomers Rpb11 and Rpb3 is predicted incorrectly. As a result, one restraint is violated between two lysine side chains (dashed yellow line). c. The restraint #41 (shown in b) is violated in all structures of cluster 1 (distance >> 16.4 Å). In this panel, e on the x-axis indicates a structure that is selected for the final ensemble. The order of the structures represented on the x-axis is random.
Supplementary Figure 6 a-b. Evaluation of global conformational sampling for RNA Pol II.
Due to the limited number of degrees of freedom and experimental restraints, the energy surface could be sampled with only 500 structures (a); extension of the sampling to 1000 structures (b) did not generate any structure with better fit to the experimental data or significantly different geometry. c-d. Decrease in the Eexp values after local search indicates convergence of physical and restraint forces close to the native structure. For the U4 Sm proteins-RNA complex, Eexp decreases upon refinement of the interaction interfaces, as it is expected when searching the space close to the native structure (c); contrarily, for RNA Pol II the Eexp values increase upon refinement of the interfaces, indicating conflicting physical and restraints forces; this is expected when searching the space far from the native structure (d). e-f. Distribution of energy values for the structures of RNA Pol II calculated during local search. Restraints (e) and physical (force-field, f) energies are plotted with respect to the i-RMSD from the structure with the highest ccor for each structure generated during local search. The lack of correlation between ln(Eexp) and Eff is evident.
Supplementary Figure 7 Eexp analysis of global search solutions for the Box C/D RNP in its substrate-bound form.
The global search of the conformational space of the Box C/D enzyme in the substrate-bound form was driven by three restraint classes: PRE-derived distances, SANS-derived RNA shape and connectivity restraints. To ensure equal weighting of each term in the selection process, the Eexp terms, which span different value ranges, were individually normalized over [0,1] and then summed (Methods).
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–7, Supplementary Table 1 and Supplementary Note 1. (PDF 2518 kb)
Life Sciences Reporting Summary
Life Sciences Reporting Summary. (PDF 129 kb)
Supplementary Protocol
M3 manual. (PDF 338 kb)
Supplementary Software
HADDOCK-M3 software. (ZIP 2929 kb)
Supplementary Data
Restraint files, starting structures and final models. (ZIP 27585 kb)
Rights and permissions
About this article
Cite this article
Karaca, E., Rodrigues, J., Graziadei, A. et al. M3: an integrative framework for structure determination of molecular machines. Nat Methods 14, 897–902 (2017). https://doi.org/10.1038/nmeth.4392
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nmeth.4392
This article is cited by
-
Integrative structural modeling of macromolecular complexes using Assembline
Nature Protocols (2022)
-
Recent advances in RNA structurome
Science China Life Sciences (2022)
-
Dynamic particle swarm optimization of biomolecular simulation parameters with flexible objective functions
Nature Machine Intelligence (2021)
-
Structure-based validation can drastically underestimate error rate in proteome-wide cross-linking mass spectrometry studies
Nature Methods (2020)
-
Histone chaperone exploits intrinsic disorder to switch acetylation specificity
Nature Communications (2019)