M3: an integrative framework for structure determination of molecular machines

Karaca, Ezgi; Rodrigues, João P G L M; Graziadei, Andrea; Bonvin, Alexandre M J J; Carlomagno, Teresa

doi:10.1038/nmeth.4392

Article
Published: 14 August 2017

M3: an integrative framework for structure determination of molecular machines

Nature Methods volume 14, pages 897–902 (2017)Cite this article

2930 Accesses
33 Citations
28 Altmetric
Metrics details

Subjects

Abstract

We present a broadly applicable, user-friendly protocol that incorporates sparse and hybrid experimental data to calculate quasi-atomic-resolution structures of molecular machines. The protocol uses the HADDOCK framework, accounts for extensive structural rearrangements both at the domain and atomic levels and accepts input from all structural and biochemical experiments whose data can be translated into interatomic distances and/or molecular shapes.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Workflow of the integrative structure determination protocol M3.**

**Figure 2: Application to the yeast RNA polymerase (pol) II demonstrates M3's ability to translate sparse data into a structural model.**

**Figure 3: Structure determination of the Box C/D RNP underpins the robustness of the M3 protocol.**

Macromolecular modeling and design in Rosetta: recent methods and frameworks

Article 01 June 2020

ColabFold: making protein folding accessible to all

Article Open access 30 May 2022

Highly accurate protein structure prediction for the human proteome

Article Open access 22 July 2021

Accession codes

Primary accessions

Protein Data Bank

Referenced accessions

Electron Microscopy Data Bank

2784

References

Karaca, E. & Bonvin, A.M. Advances in integrative modeling of biomolecular complexes. Methods 59, 372–381 (2013).
Article CAS Google Scholar
Ward, A.B., Sali, A. & Wilson, I.A. Biochemistry. Integrative structural biology. Science 339, 913–915 (2013).
Article CAS Google Scholar
Morag, O., Sgourakis, N.G., Baker, D. & Goldbourt, A. The NMR-Rosetta capsid model of M13 bacteriophage reveals a quadrupled hydrophobic packing epitope. Proc. Natl. Acad. Sci. USA 112, 971–976 (2015).
Article CAS Google Scholar
Duss, O., Yulikov, M., Jeschke, G. & Allain, F.H. EPR-aided approach for solution structure determination of large RNAs or protein–RNA complexes. Nat. Commun. 5, 3669 (2014).
Article CAS Google Scholar
Ferber, M. et al. Automated structure modeling of large protein assemblies using crosslinks as distance restraints. Nat. Methods 13, 515–520 (2016).
Article CAS Google Scholar
Kalinin, S. et al. A toolkit and benchmark study for FRET-restrained high-precision structural modeling. Nat. Methods 9, 1218–1225 (2012).
Article CAS Google Scholar
Lapinaite, A. et al. The structure of the box C/D enzyme reveals regulation of RNA methylation. Nature 502, 519–523 (2013).
Article CAS Google Scholar
Politis, A. et al. A mass spectrometry-based hybrid method for structural modeling of protein complexes. Nat. Methods 11, 403–406 (2014).
Article CAS Google Scholar
Russel, D. et al. Putting the pieces together: integrative modeling platform software for structure determination of macromolecular assemblies. PLoS Biol. 10, e1001244 (2012).
Article CAS Google Scholar
van Zundert, G.C. et al. The HADDOCK2.2 web server: user-friendly integrative modeling of biomolecular complexes. J. Mol. Biol. 428, 720–725 (2016).
Article CAS Google Scholar
Carlomagno, T. Present and future of NMR for RNA–protein complexes: a perspective of integrated structural biology. J. Magn. Reson. 241, 126–136 (2014).
Article CAS Google Scholar
Dominguez, C., Boelens, R. & Bonvin, A.M. HADDOCK: a protein–protein docking approach based on biochemical or biophysical information. J. Am. Chem. Soc. 125, 1731–1737 (2003).
Article CAS Google Scholar
Gabel, F. Small-angle neutron scattering for structural biology of protein-RNA complexes. Methods Enzymol. 558, 391–415 (2015).
Article CAS Google Scholar
Madl, T., Gabel, F. & Sattler, M. NMR and small-angle scattering-based structural analysis of protein complexes in solution. J. Struct. Biol. 173, 472–482 (2011).
Article CAS Google Scholar
Feng, C. et al. Log-transformation and its implications for data analysis. Shanghai Arch. Psychiatry 26, 105–109 (2014).
PubMed PubMed Central Google Scholar
Robinson, R.C. et al. Crystal structure of Arp2/3 complex. Science 294, 1679–1684 (2001).
Article CAS Google Scholar
Leung, A.K., Nagai, K. & Li, J. Structure of the spliceosomal U4 snRNP core domain and its implication for snRNP biogenesis. Nature 473, 536–539 (2011).
Article CAS Google Scholar
Gnatt, A.L., Cramer, P., Fu, J., Bushnell, D.A. & Kornberg, R.D. Structural basis of transcription: an RNA polymerase II elongation complex at 3.3 A resolution. Science 292, 1876–1882 (2001).
Article CAS Google Scholar
Armache, K.J., Mitterweger, S., Meinhart, A. & Cramer, P. Structures of complete RNA polymerase II and its subcomplex, Rpb4/7. J. Biol. Chem. 280, 7131–7134 (2005).
Article CAS Google Scholar
Chen, Z.A. et al. Architecture of the RNA polymerase II–TFIIF complex revealed by cross-linking and mass spectrometry. EMBO J. 29, 717–726 (2010).
Article CAS Google Scholar
Raman, S. et al. NMR structure determination for larger proteins using backbone-only data. Science 327, 1014–1018 (2010).
Article CAS Google Scholar
Plaschka, C. et al. Architecture of the RNA polymerase II–Mediator core initiation complex. Nature 518, 376–380 (2015).
Article CAS Google Scholar
Karaca, E. & Bonvin, A.M. A multidomain flexible docking approach to deal with large conformational changes in the modeling of biomolecular complexes. Structure 19, 555–565 (2011).
Article CAS Google Scholar
Alber, F. et al. The molecular architecture of the nuclear pore complex. Nature 450, 695–701 (2007).
Article CAS Google Scholar
Xue, S. et al. Structural basis for substrate placement by an archaeal box C/D ribonucleoprotein particle. Mol. Cell 39, 939–949 (2010).
Article CAS Google Scholar
Saff, E.B. & Kuijlaars, A.B.J. Distributing many points on a sphere. Math. Intell. 19, 5–11 (1997).
Article Google Scholar
Rodrigues, J.P. Computational Structural Biology of Macromolecular Interactions (Ridderprint BV, 2014).
Brunger, A.T. Version 1.2 of the crystallography and NMR system. Nat. Protoc. 2, 2728–2733 (2007).
Article CAS Google Scholar
MATLAB and Statistics Toolbox Release v. R2008a (Version 7.6) (Natick, 2008).
van Dijk, M. & Bonvin, A.M. Pushing the limits of what is achievable in protein–DNA docking: benchmarking HADDOCK's performance. Nucleic Acids Res. 38, 5634–5647 (2010).
Article CAS Google Scholar
Petoukhov, M.V. et al. New developments in the ATSAS program package for small-angle scattering data analysis. J. Appl. Cryst. 45, 342–350 (2012).
Article CAS Google Scholar
Pettersen, E.F. et al. UCSF Chimera—a visualization system for exploratory research and analysis. J. Comput. Chem. 25, 1605–1612 (2004).
Article CAS Google Scholar
Méndez, R., Leplae, R., De Maria, L. & Wodak, S.J. Assessment of blind predictions of protein–protein interactions: current status of docking methods. Proteins 52, 51–67 (2003).
Article Google Scholar
Nilges, M., Gronenborn, A.M., Brünger, A.T. & Clore, G.M. Determination of three-dimensional structures of proteins by simulated annealing with interproton distance restraints. Application to crambin, potato carboxypeptidase inhibitor and barley serine proteinase inhibitor 2. Protein Eng. 2, 27–38 (1988).
Article CAS Google Scholar
Rosenzweig, R., Moradi, S., Zarrine-Afsar, A., Glover, J.R. & Kay, L.E. Unraveling the mechanism of protein disaggregation through a ClpB-DnaK interaction. Science 339, 1080–1083 (2013).
Article CAS Google Scholar
Kahraman, A., Malmström, L. & Aebersold, R. Xwalk: computing and visualizing distances in cross-linking experiments. Bioinformatics 27, 2163–2164 (2011).
Article CAS Google Scholar
Urlaub, H., Kühn-Hölsken, E. & Lührmann, R. Analyzing RNA-protein crosslinking sites in unlabeled ribonucleoprotein complexes by mass spectrometry. Methods Mol. Biol. 488, 221–245 (2008).
Article CAS Google Scholar
Karaca, E. & Bonvin, A.M. On the usefulness of ion-mobility mass spectrometry and SAXS data in scoring docking decoys. Acta Crystallogr. D Biol. Crystallogr. 69, 683–694 (2013).
Article CAS Google Scholar
Mund, M., Overbeck, J.H., Ullmann, J. & Sprangers, R. LEGO-NMR spectroscopy: a method to visualize individual subunits in large heteromeric complexes. Angew. Chem. Int. Edn Engl. 52, 11401–11405 (2013).
Article CAS Google Scholar
Mühlbacher, W. et al. Conserved architecture of the core RNA polymerase II initiation complex. Nat. Commun. 5, 4310 (2014).
Article Google Scholar
Petoukhov, S.V. The system-resonance approach in modeling genetic structures. Biosystems 139, 1–11 (2016).
Article CAS Google Scholar
Karaca, E. et al. M3: an integrative framework for structure determination of molecular machines. Protocol Exchange http://dx.doi.org/10.1038/protex.2017.093 (2017).

Download references

Acknowledgements

This work was supported by the EMBL, the EU FP7 ITN project RNPnet (contract number 289007) and the DFG grant CA294/3-2. E.K. acknowledges support from the Alexander von Humboldt Foundation through a Humboldt Research Fellowship for Postdoctoral Researchers. We thank J. Kirkpatrick for critical reading of the manuscript and B. Simon for discussion and support with CNS. A.M.J.J.B. acknowledges funding from the European H2020 e-Infrastructure grants West-Life (grant no. 675858) and BioExcel (grant no. 675728).

Author information

Ezgi Karaca
Present address: Izmir International Biomedicine and Genome Institute (iBG-izmir), Dokuz Eylül University Saglik Yerleskesi, Izmir, Turkey
João P G L M Rodrigues
Present address: Department of Structural Biology, Stanford University School of Medicine, Stanford, California, USA

Authors and Affiliations

European Molecular Biology Laboratory, Structural and Computational Biology Unit, Heidelberg, Germany
Ezgi Karaca, Andrea Graziadei & Teresa Carlomagno
Bijvoet Center for Biomolecular Research, Faculty of Science, Utrecht University, Utrecht, The Netherlands
João P G L M Rodrigues & Alexandre M J J Bonvin
Leibniz University Hannover, Centre for Biomolecular Drug Research, Hannover, Germany
Teresa Carlomagno
Helmholtz Centre for Infection Research, Group of Structural Chemistry, Braunschweig, Germany
Teresa Carlomagno

Authors

Ezgi Karaca
View author publications
You can also search for this author in PubMed Google Scholar
João P G L M Rodrigues
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Graziadei
View author publications
You can also search for this author in PubMed Google Scholar
Alexandre M J J Bonvin
View author publications
You can also search for this author in PubMed Google Scholar
Teresa Carlomagno
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

E.K. designed the studies, developed software, performed structure calculations, analyzed and interpreted data and wrote the manuscript, J.P.G.L.M.R. developed software; A.G. analyzed experiments; A.M.J.J.B. provided software and assisted in software development; T.C. designed the studies, assisted in data interpretation, wrote the manuscript and supervised the project.

Corresponding author

Correspondence to Teresa Carlomagno.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Sparse experimental data leads to a non-normal right (positive) skewed E_exp distribution.

a. When only sparse experimental data is available, global search generates few structures with significantly low E_exp. b. Low E_exp structures can be distinguished from the rest of the population by transforming E_exp values into ln(E_exp). Such transformation leads to a left (negative) skewed distribution. c. Structures with significantly low E_exp can be isolated as outliers (green circles) by using a box-and-whisker plot, where whiskers are extended by two IQRs. The green line indicates the median.

Supplementary Figure 2 The completeness of the input data can be probed by box-and-whisker statistics.

The heptameric Arp2/3 protein complex was used to test the performance of the M3 protocol with respect to the number of restraints. a. Graphical representation of building block separation prior to global search; Arp2/3 monomers are named after their chain IDs (as given in 1k8k). The yellow dashes correspond to 30 inter-monomer NOE distances. b. Normalized ln(E_exp) distributions for global-search runs using 50 (blue), 30 (green) and 10 (grey) NOEs. The run with 50/30/10 NOEs resulted in 119/58/0 outliers. The outliers of the runs with 50 and 30 NOEs run have a precision of 2.1 Å±1.0 Å and 7.2 Å±3.0 Å, respectively. c. The top ten structures from the global search step using 30 NOEs (superimposed on chain A). The precision of the ensemble with 10 lowest-energy structures is reported in the figure; the accuracy with respect to 1k8k is 2.5 Å±2.2 Å (Cα-RMSD).

Supplementary Figure 3 Use of complementary structural information leads to a converged ensemble.

The human U4 Sm proteins-RNA complex (4wzj) was used to test the performance of different types of restraints. a. Graphical representation of the positions between which distances can be measure either by NMR, i.e. methyl groups of the ILV residues (represented by spheres) and PRE label locations (pink pentagons), or by XL-MS, i.e. NZ atoms of the lysine side chains (metallic blue circles). b. MS-XL restraints during the global search step resulted in no outliers, whereas runs using mPREs generated 9 and 7 outliers for 100% and 50% assigned methyl groups, respectively. c. 70 local search structures, following the global search using mPRE data for 50% assigned methyl groups, were grouped into 7 clusters. The best scoring two structures of cluster 2 (dark green circles) display a significantly better χ with respect to the SAXS curve. d. The precision of the final selected ensemble is reported in the figure; the accuracy with respect to 4wzj is 2.8±0.8 Å (Cα- and P-RMSD).

Supplementary Figure 4 Sparse distance restraints result in a native-like ensemble for all but one monomer.

a. Graphical representation of the separation of the building blocks of RNA polymerase II prior to the global search with 50 inter-protein (yellow dashed lines) and 5 protein-nucleic acid (salmon dashed lines) restraints. Due to the small number of restraints, the interactions between Rpb1-Rpb3, Rpb2-Rpb7, Rpb2-Rpb10, Rpb2-Rpb11, Rpb3-Rpb7 and Rpb2-Rpb6 are described by only one distance. b. Scoring by ln(E_Exp) identified three conformers to be passed to the local search step. c. The 30 local search conformers were separated in two clusters. Cluster 1 contains the ensemble of 13 structures (dark green circles) with the best fitness to the EM map (mean ccor > 0.94). The precision of the ensemble of 13 structures, including Rpb11, is given in the figure (for clarity we depicted only the best 10 structures); the accuracy with respect to 1i6h is 7.7±1.2 Å (Cα/P-RMSD). The orientation of all monomers but Rbp11 is similar to 1i6h (Supplementary Figure 5).

Supplementary Figure 5 The RNA pol II structures resulting from the local search step prior to the shape-driven selection differ in the orientation of Rpb5 and Rpbp11.

a. Representative structures of cluster 1 and 2. Major differences are related to the orientation of Rpb5 (light gray) and Rpb11 (black). b. In cluster 1 the relative orientation of the monomers Rpb11 and Rpb3 is predicted incorrectly. As a result, one restraint is violated between two lysine side chains (dashed yellow line). c. The restraint #41 (shown in b) is violated in all structures of cluster 1 (distance >> 16.4 Å). In this panel, e on the x-axis indicates a structure that is selected for the final ensemble. The order of the structures represented on the x-axis is random.

Supplementary Figure 6 a-b. Evaluation of global conformational sampling for RNA Pol II.

Due to the limited number of degrees of freedom and experimental restraints, the energy surface could be sampled with only 500 structures (a); extension of the sampling to 1000 structures (b) did not generate any structure with better fit to the experimental data or significantly different geometry. c-d. Decrease in the E_exp values after local search indicates convergence of physical and restraint forces close to the native structure. For the U4 Sm proteins-RNA complex, E_exp decreases upon refinement of the interaction interfaces, as it is expected when searching the space close to the native structure (c); contrarily, for RNA Pol II the E_exp values increase upon refinement of the interfaces, indicating conflicting physical and restraints forces; this is expected when searching the space far from the native structure (d). e-f. Distribution of energy values for the structures of RNA Pol II calculated during local search. Restraints (e) and physical (force-field, f) energies are plotted with respect to the i-RMSD from the structure with the highest ccor for each structure generated during local search. The lack of correlation between ln(E_exp) and E_ff is evident.

Supplementary Figure 7 E_exp analysis of global search solutions for the Box C/D RNP in its substrate-bound form.

The global search of the conformational space of the Box C/D enzyme in the substrate-bound form was driven by three restraint classes: PRE-derived distances, SANS-derived RNA shape and connectivity restraints. To ensure equal weighting of each term in the selection process, the E_exp terms, which span different value ranges, were individually normalized over [0,1] and then summed (Methods).

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–7, Supplementary Table 1 and Supplementary Note 1. (PDF 2518 kb)

Life Sciences Reporting Summary

Life Sciences Reporting Summary. (PDF 129 kb)

Supplementary Protocol

M3 manual. (PDF 338 kb)

Supplementary Software

HADDOCK-M3 software. (ZIP 2929 kb)

Supplementary Data

Restraint files, starting structures and final models. (ZIP 27585 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Karaca, E., Rodrigues, J., Graziadei, A. et al. M3: an integrative framework for structure determination of molecular machines. Nat Methods 14, 897–902 (2017). https://doi.org/10.1038/nmeth.4392

Download citation

Received: 21 April 2017
Accepted: 05 July 2017
Published: 14 August 2017
Issue Date: 01 September 2017
DOI: https://doi.org/10.1038/nmeth.4392

This article is cited by

Integrative structural modeling of macromolecular complexes using Assembline
- Vasileios Rantos
- Kai Karius
- Jan Kosinski
Nature Protocols (2022)
Recent advances in RNA structurome
- Bingbing Xu
- Yanda Zhu
- Yu Zhou
Science China Life Sciences (2022)
Dynamic particle swarm optimization of biomolecular simulation parameters with flexible objective functions
- Marie Weiel
- Markus Götz
- Alexander Schug
Nature Machine Intelligence (2021)
Structure-based validation can drastically underestimate error rate in proteome-wide cross-linking mass spectrometry studies
- Kumar Yugandhar
- Ting-Yi Wang
- Haiyuan Yu
Nature Methods (2020)
Histone chaperone exploits intrinsic disorder to switch acetylation specificity
- Nataliya Danilenko
- Lukas Lercher
- Teresa Carlomagno
Nature Communications (2019)