Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Expert Recommendation
  • Published:

The case for data science in experimental chemistry: examples and recommendations

Abstract

The physical sciences community is increasingly taking advantage of the possibilities offered by modern data science to solve problems in experimental chemistry and potentially to change the way we design, conduct and understand results from experiments. Successfully exploiting these opportunities involves considerable challenges. In this Expert Recommendation, we focus on experimental co-design and its importance to experimental chemistry. We provide examples of how data science is changing the way we conduct experiments, and we outline opportunities for further integration of data science and experimental chemistry to advance these fields. Our recommendations include establishing stronger links between chemists and data scientists; developing chemistry-specific data science methods; integrating algorithms, software and hardware to ‘co-design’ chemistry experiments from inception; and combining diverse and disparate data sources into a data network for chemistry research.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Role of data science in experimental processes.
Fig. 2: Artificial intelligence and machine learning deployed to accelerate, autonomously control and understand experiments, using state-of-the-art mathematics coupled to advances in data science.
Fig. 3: Application of machine learning to conduct new types of experiments at XFEL facilities.
Fig. 4: Visualizing a data network.
Fig. 5: Interplay of experiments, workflow and data.

Similar content being viewed by others

References

  1. Ourmazd, A. Science in the age of machine learning. Nat. Rev. Phys. 2, 342–343 (2020).

    Article  Google Scholar 

  2. National Science Foundation. Framing the Role of Big Data and Modern Data Science in Chemistry. NSF https://www.nsf.gov/mps/che/workshops/data_chemistry_workshop_report_03262018.pdf (2018).

  3. Mission Innovation (Energy Materials Innovation, 2018); http://mission-innovation.net/wp-content/uploads/2018/01/Mission-Innovation-IC6-Report-Materials-Acceleration-Platform-Jan-2018.pdf.

  4. Butler, K. T., Davies, D. W., Cartwright, H., Isayev, O. & Walsh, A. Machine learning for molecular and materials science. Nature 559, 547–555 (2018).

    Article  CAS  PubMed  Google Scholar 

  5. Morgan, D. & Jacobs, R. Opportunities and challenges for machine learning in materials science. Annu. Rev. Mater. Res. 50, 71–103 (2020).

    Article  CAS  Google Scholar 

  6. Janet, J. P. & Kulik, H. J. Machine Learning In Chemistry (American Chemical Society, 2020).

  7. Wang, A. Y.-T. et al. Machine learning for materials scientists: an introductory guide toward best practices. Chem. Mater. 32, 4954–4965 (2020).

    Article  CAS  Google Scholar 

  8. Dashti, A. et al. Retrieving functional pathways of biomolecules from single-particle snapshots. Nat. Commun. 11, 4734 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Selvaratnam, B. & Koodali, R. T. Machine learning in experimental materials chemistry. Catal. Today 371, 77–84 (2021).

    Article  CAS  Google Scholar 

  10. Shi, Y., Prieto, P. L., Zepel, T., Grunert, S. & Hein, J. E. Automated experimentation powers data science in chemistry. Acc. Chem. Res. 54, 546–555 (2021).

    Article  CAS  PubMed  Google Scholar 

  11. Shen, Y. et al. Automation and computer-assisted planning for chemical synthesis. Nat. Rev. Meth. Prim. 1, 23 (2021).

    Article  CAS  Google Scholar 

  12. Nichols, P. L. Automated and enabling technologies for medicinal chemistry. Progr. Med. Chem. 60, 191–272 (2021).

    Article  Google Scholar 

  13. Stein, H. S. & Gregoire, J. M. Progress and prospects for accelerating materials science with automated and autonomous workflows. Chem. Sci. 10, 9640–9649 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Flores-Leonar, M. M. et al. Materials acceleration platforms: on the way to autonomous experimentation. Curr. Opin. Green. Sustain. Chem. 25, 100370 (2020).

    Article  Google Scholar 

  15. Dashti, A. et al. Trajectories of the ribosome as a Brownian nanomachine. Proc. Natl Acad. Sci. USA 111, 17492 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Hosseinizadeh, A. et al. Conformational landscape of a virus by single-particle X-ray scattering. Nat. Methods 14, 877–881 (2017).

    Article  CAS  PubMed  Google Scholar 

  17. Ourmazd, A. Cryo-EM, XFELs and the structure conundrum in structural biology. Nat. Methods 16, 941–944 (2019).

    Article  CAS  PubMed  Google Scholar 

  18. Fung, R. et al. Dynamics from noisy data with extreme timing uncertainty. Nature 532, 471–475 (2016).

    Article  CAS  PubMed  Google Scholar 

  19. Coley, C. W., Eyke, N. S. & Jensen, K. F. Autonomous discovery in the chemical sciences. Part I: progress. Angew. Chem. Int. Ed. 59, 22858–22893 (2020).

    Article  CAS  Google Scholar 

  20. Coley, C. W., Eyke, N. S. & Jensen, K. F. Autonomous discovery in the chemical sciences. Part II: Outlook. Angew. Chem. Int. Ed. 59, 23414–23436 (2020).

    Article  CAS  Google Scholar 

  21. Stach, E. et al. Autonomous experimentation systems for materials development: a community perspective. Matter 4, 2702–2726 (2021).

    Article  Google Scholar 

  22. Cao, L., Russo, D. & Lapkin, A. A. Automated robotic platforms in design and development of formulations. AIChE J. 67, e17248 (2021).

    Article  CAS  Google Scholar 

  23. Oviedo, F. et al. Fast and interpretable classification of small X-ray diffraction datasets using data augmentation and deep neural networks. njp Comput. Mat. 5, 60 (2019).

    Google Scholar 

  24. Epps, R. W. et al. Artificial chemist: an autonomous quantum dot synthesis bot. Adv. Mater. 32, 2001626 (2020).

    Article  CAS  Google Scholar 

  25. Volk, A. A., Epps, R. W. & Abolhasani, M. Accelerated development of colloidal nanomaterials enabled by modular microfluidic reactors: toward autonomous robotic experimentation. Adv. Mater. 33, 2004495 (2021).

    Article  CAS  Google Scholar 

  26. Abdel-Latif, K., Bateni, F., Crouse, S. & Abolhasani, M. Flow synthesis of metal halide perovskite quantum dots: from rapid parameter space mapping to AI-guided modular manufacturing. Matter 3, 1053–1086 (2020).

    Article  Google Scholar 

  27. Whitacre, J. F. et al. An autonomous electrochemical test stand for machine learning informed electrolyte optimization. J. Electrochem. Soc. 166, A4181–A4187 (2019).

    Article  CAS  Google Scholar 

  28. Dave, A. et al. Autonomous discovery of battery electrolytes with robotic experimentation and machine learning. Cell Rep. Phys. Sci. 1, 100264 (2020).

    Article  CAS  Google Scholar 

  29. Wimmer, E. et al. An autonomous self-optimizing flow machine for the synthesis of pyridine–oxazoline (PyOX) ligands. React. Chem. Eng. 4, 1608–1615 (2019).

    Article  CAS  Google Scholar 

  30. Cortés-Borda, D. et al. An autonomous self-optimizing flow reactor for the synthesis of natural product carpanone. J. Org. Chem. 83, 14286–14299 (2018).

    Article  PubMed  CAS  Google Scholar 

  31. Jeraal, M. I., Sung, S. & Lapkin, A. A. A machine learning-enabled autonomous flow chemistry platform for process optimization of multiple reaction metrics. Chem. Meth. 1, 71–77 (2021).

    Article  Google Scholar 

  32. Christensen, M. et al. Data-science driven autonomous process optimization. Commun. Chem. 4, 112 (2021).

    Article  Google Scholar 

  33. Burger, B. et al. A mobile robotic chemist. Nature 583, 237–241 (2020).

    Article  CAS  PubMed  Google Scholar 

  34. Shiri, P. et al. Automated solubility screening platform using computer vision. iScience 24, 102176 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Waldron, C. et al. An autonomous microreactor platform for the rapid identification of kinetic models. React. Chem. Eng. 4, 1623–1636 (2019).

    Article  CAS  Google Scholar 

  36. Noack, M. M. et al. A kriging-based approach to autonomous experimentation with applications to X-ray scattering. Sci. Rep. 9, 11809 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  37. Noack, M. M., Doerk, G. S., Li, R., Fukuto, M. & Yager, K. G. Advances in kriging-based autonomous X-ray scattering experiments. Sci. Rep. 10, 1325 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Noack, M. M., Zwart, P. H. & Ushizima, D. M. et al. Gaussian processes for autonomous data acquisition at large-scale synchrotron and neutron facilities. Nat. Rev. Phys. 3, 685–697 (2021).

    Article  Google Scholar 

  39. Cho, S.-Y. et al. Finding hidden signals in chemical sensors using deep learning. Anal. Chem. 92, 6529–6537 (2020).

    Article  CAS  PubMed  Google Scholar 

  40. Nega, P. W. et al. Using automated serendipity to discover how trace water promotes and inhibits lead halide perovskite crystal formation. Appl. Phys. Lett. 119, 041903 (2021).

    Article  CAS  Google Scholar 

  41. Kayser, Y. et al. Core-level nonlinear spectroscopy triggered by stochastic X-ray pulses. Nat. Commun. 10, 4761 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  42. Fuller, F. D. et al. Resonant X-ray emission spectroscopy from broadband stochastic pulses at an X-ray free electron laser. Commun. Chem. 4, 84 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Fagnan, K. et al. Data and Models: A Framework for Advancing AI in Science (OSTI, 2019).

  44. Domcke, W. & Yarkony, D. R. Role of conical intersections in molecular spectroscopy and photoinduced chemical dynamics. Annu. Rev. Phys. Chem. 63, 325–352 (2012).

    Article  CAS  PubMed  Google Scholar 

  45. Hosseinizadeh, A. et al. Single-femtosecond atomic-resolution observation of a protein traversing a conical intersection. Nature 599, 697–701 (2021).

    Article  CAS  PubMed  Google Scholar 

  46. Takens, F. in Dynamical Systems and Turbulence, Warwick 1980 (eds Rand, D. & Young, L.S.) 366–381 (Springer, 1981).

  47. Packard, N. H., Crutchfield, J. P., Farmer, J. D. & Shaw, R. S. Geometry from a time series. Phys. Rev. Lett. 45, 712–716 (1980).

    Article  Google Scholar 

  48. Hosseinizadeh, A. et al. Few-fs resolution of a photoactive protein traversing a conical intersection. Nature 599, 697–701 (2021).

    Article  CAS  PubMed  Google Scholar 

  49. Fung, R. et al. Achieving accurate estimates of fetal gestational age and personalised predictions of fetal growth based on data from an international prospective cohort study: a population-based machine learning study. Lancet Dig. Health 2, e368–e375 (2020).

    Article  Google Scholar 

  50. Jia, W. et al. in SC20: International Conference for High Performance Computing, Networking, Storage and Analysis 1–14 (IEEE, 2020); https://dl.acm.org/doi/abs/10.5555/3433701.3433707.

  51. Sun, S. et al. A data fusion approach to optimize compositional stability of halide perovskites. Matter 4, 1305–1322 (2021).

    Article  CAS  Google Scholar 

  52. Jia, X. et al. Anthropogenic biases in chemical reaction data hinder exploratory inorganic synthesis. Nature 573, 251–255 (2019).

    Article  CAS  PubMed  Google Scholar 

  53. Krska, S. W., DiRocco, D. A., Dreher, S. D. & Shevlin, M. The evolution of chemical high-throughput experimentation to address challenging problems in pharmaceutical synthesis. Acc. Chem. Res. 50, 2976–2985 (2017).

    Article  CAS  PubMed  Google Scholar 

  54. Dybowski, R. Interpretable machine learning as a tool for scientific discovery in chemistry. N. J. Chem. 44, 20914–20920 (2020).

    Article  CAS  Google Scholar 

  55. Guan, W. et al. Quantum machine learning in high energy physics. Mach. Learn. Sci. Technol. 2, 011003 (2021).

    Article  Google Scholar 

  56. Duros, V. et al. Intuition-enabled machine learning beats the competition when joint human-robot teams perform inorganic chemical experiments. J. Chem. Inf. Model. 59, 2664–2671 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. McNally, A., Prier, C. K. & MacMillan, D. W. C. Discovery of an α-amino C–H arylation reaction using the strategy of accelerated serendipity. Science 334, 1114 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Buitrago Santanilla, A. et al. Nanomole-scale high-throughput chemistry for the synthesis of complex molecules. Science 347, 49–53 (2015).

    Article  PubMed  CAS  Google Scholar 

  59. Lin, S. et al. Mapping the dark space of chemical reactions with extended nanomole synthesis and MALDI-TOF MS. Science 361, eaar6236 (2018).

    Article  PubMed  CAS  Google Scholar 

  60. Selekman, J. A. et al. High-throughput automation in chemical process development. Annu. Rev. Chem. Biomol. 8, 525–547 (2017).

    Article  Google Scholar 

  61. Dragone, V., Sans, V., Henson, A. B., Granda, J. M. & Cronin, L. An autonomous organic reaction search engine for chemical reactivity. Nat. Commun. 8, 15733 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  62. Sader, J. K. & Wulff, J. E. Reinvestigation of a robotically revealed reaction. Nature 570, E54–E59 (2019).

    Article  CAS  PubMed  Google Scholar 

  63. Milo, A., Neel, A. J., Toste, F. D. & Sigman, M. S. Organic chemistry. A data-intensive approach to mechanistic elucidation applied to chiral anion catalysis. Science 347, 737–743 (2015).

    Article  PubMed Central  CAS  Google Scholar 

  64. Melodie, C. et al. Data-science driven autonomous process optimization. Comm. Chem. 4, 112 (2021).

    Article  Google Scholar 

  65. Li, J. et al. AI applications through the whole life cycle of material discovery. Matter 3, 393–432 (2020).

    Article  Google Scholar 

  66. Kusne, A. G. et al. On-the-fly machine-learning for high-throughput experiments: search for rare-earth-free permanent magnets. Sci. Rep. 4, 6367 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  67. Kusne, A. G. et al. On-the-fly closed-loop materials discovery via Bayesian active learning. Nat. Commun. 11, 5966 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Shi, F., Foster, J. G. & Evans, J. A. Weaving the fabric of science: dynamic network models of science’s unfolding structure. Soc. Netw. 43, 73–85 (2015).

    Article  Google Scholar 

  69. Bai, J. et al. From platform to knowledge graph: evolution of laboratory automation. J. Am. Chem. Soc. Au 2, 292–309 (2022).

    CAS  Google Scholar 

  70. Gates-Rector, S. & Blanton, T. The Powder Diffraction File: a quality materials characterization database. Powder Diffr. 34, 352–360 (2019).

    Article  CAS  Google Scholar 

  71. Linstrom, P. J. & Mallard, W. G. (eds) NIST Chemistry WebBook, NIST Standard Reference Database Number 69 (National Institute of Standards and Technology, 2022).

  72. Berman, H. M. et al. The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Kuhn, S. & Schlörer, N. E. Facilitating quality control for spectra assignments of small organic molecules: nmrshiftdb2 — a free in-house NMR database with integrated LIMS for academic service laboratories. Magn. Reson. Chem. 53, 582–589 (2015).

    Article  CAS  PubMed  Google Scholar 

  74. Hanson, R. et al. Development Of A Standard For Fair Data Management Of Spectroscopic Data (IUPAC, 2020).

  75. Hanson, R. M. J. et al. FAIR enough? Spectrosc. Eur. World 33, 25–31 (2021).

    Article  Google Scholar 

  76. Kearnes, S. M. et al. The open reaction database. J. Am. Chem. Soc. 143, 18820–18826 (2021).

    Article  CAS  PubMed  Google Scholar 

  77. Tremouilhac, P. et al. Chemotion ELN: an open source electronic lab notebook for chemists in academia. J. Cheminform. 9, 54 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  78. Mehr, S. H. M., Craven, M., Leonov Artem, I., Keenan, G. & Cronin, L. A universal system for digitization and automatic execution of the chemical synthesis literature. Science 370, 101–108 (2020).

    Article  CAS  PubMed  Google Scholar 

  79. Vaucher, A. C. et al. Automated extraction of chemical synthesis actions from experimental procedures. Nat. Commun. 11, 3601 (2020).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  80. Pendleton, I. M. et al. Experiment Specification, Capture and Laboratory Automation Technology (ESCALATE): a software pipeline for automated chemical experimentation and data management. MRS Commun. 9, 846–859 (2019).

    Article  CAS  Google Scholar 

  81. Choudhury, R., Aykol, M., Gratzl, S., Montoya, J. & Hummelshøj, J. S. MaterialNet: a web-based graph explorer for materials science data. J. Opn Src. Softw. 5, 2105 (2020).

    Article  Google Scholar 

  82. Aykol, M. et al. Network analysis of synthesizable materials discovery. Nat. Commun. 10, 2018 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  83. Statt, M. R. et al. ESAMP: event-sourced architecture for materials provenance management and application to accelerated materials discovery. Preprint at ChemRxiv https://doi.org/10.26434/chemrxiv.14583258.v1 (2021).

  84. Li, Z. et al. Robot-accelerated perovskite investigation and discovery. Chem. Mater. 32, 5650–5663 (2020).

    Article  CAS  Google Scholar 

  85. Ratner, D. et al. Office Of Basic Energy Sciences (BES) roundtable on producing and managing large scientific data with artificial intelligence and machine learning. US DOE OSTI https://doi.org/10.2172/1630823 (2019).

  86. Kwon, H.-K., Gopal, C. B., Kirschner, J., Caicedo, S. & Storey, B. D. A user-centered approach to designing an experimental laboratory data platform. Preprint at arXiv https://arxiv.org/abs/2007.14443 (2020).

  87. Mrdjenovich, D. et al. Propnet: a knowledge graph for materials science. Matter 2, 464–480 (2020).

    Article  Google Scholar 

  88. Sullivan, K. P., Brennan-Tonetta, P. & Marxen, L. J. Economic Impacts of the Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank (Rutgers Office of Research Analytics, 2017).

  89. Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  90. Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  91. Alshahrani, M. et al. Neuro-symbolic representation learning on biological knowledge graphs. Bioinformatics 33, 2723–2730 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  92. Carbone, M. R., Yoo, S., Topsakal, M. & Lu, D. Classification of local chemical environments from X-ray absorption spectra using supervised machine learning. Phys. Rev. Mater. 3, 033604 (2019).

    Article  CAS  Google Scholar 

  93. Zheng, C., Chen, C., Chen, Y. & Ong, S. P. Random forest models for accurate identification of coordination environments from X-ray absorption near-edge structure. Patterns 1, 100013 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  94. Torrisi, S. B. et al. Random forest machine learning models for interpretable X-ray absorption near-edge structure spectrum-property relationships. npj Comput. Mater. 6, 109 (2020).

    Article  Google Scholar 

  95. Carbone, M. R., Topsakal, M., Lu, D. & Yoo, S. Machine-learning X-ray absorption spectra to quantitative accuracy. Phys. Rev. Lett. 124, 156401 (2020).

    Article  CAS  PubMed  Google Scholar 

  96. Cibin, G. et al. An open access, integrated XAS data repository at diamond light source. Radiat. Phys. Chem. 175, 108479 (2020).

    Article  CAS  Google Scholar 

  97. Musil, F. et al. Physics-inspired structural representations for molecules and materials. Chem. Rev. 121, 9759–9815 (2021).

    Article  CAS  PubMed  Google Scholar 

  98. Smidt, T. E. Euclidean symmetry and equivariance in machine learning. Trends Chem. 3, 82–85 (2021).

    Article  CAS  Google Scholar 

  99. Ropers, J., Mosca, M. M., Anosova, O., Kurlin, V. & Cooper, A. I. Fast predictions of lattice energies by continuous isometry invariants of crystal structures. Preprint at https://arxiv.org/abs/2108.07233 (2021).

  100. Herr, J. E., Koh, K., Yao, K. & Parkhill, J. Compressing physics with an autoencoder: creating an atomic species representation to improve machine learning models in the chemical sciences. J. Chem. Phys. 151, 084103 (2019).

    Article  PubMed  CAS  Google Scholar 

  101. Sharma, A. Laboratory glassware identification: supervised machine learning example for science students. J. Comput. Sci. Ed. 12, 8–15 (2021).

    Article  Google Scholar 

  102. Thrall, E. S., Lee, S. E., Schrier, J. & Zhao, Y. Machine learning for functional group identification in vibrational spectroscopy: a pedagogical lab for undergraduate chemistry students. J. Chem. Educ. 98, 3269–3276 (2021).

    Article  CAS  Google Scholar 

  103. Lafuente, D. et al. A gentle introduction to machine learning for chemists: an undergraduate workshop using python notebooks for visualization, data processing, analysis, modeling. J. Chem. Ed. 98, 2892–2898 (2021).

    Article  CAS  Google Scholar 

  104. Gressling, T. Data Science in Chemistry: Artificial Intelligence, Big Data, Chemometrics and Quantum Computing with Jupyter (Walter de Gruyter, 2020).

  105. Kauwe, S. K., Graser, J., Murdock, R. & Sparks, T. D. Can machine learning find extraordinary materials? Comput. Mat. Sci. 174, 109498 (2020).

    Article  Google Scholar 

  106. Schwaller, P. et al. “Found in translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models. Chem. Sci. 9, 6091–6098 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  107. Bergmann, U. et al. Using X-ray free-electron lasers for spectroscopy of molecular catalysts and metalloenzymes. Nat. Rev. Phys. 3, 264–282 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  108. Ayyer, K. et al. Low-signal limit of X-ray single particle diffractive imaging. Opt. Express 27, 37816–37833 (2019).

    Article  CAS  PubMed  Google Scholar 

  109. Brewster, A. et al. Processing serial crystallographic data from XFELs or synchrotrons using the cctbx.xfel GUI. Comput. Crystallogr. Newsl. 10, 22–39 (2019).

    Google Scholar 

  110. Young, I. D. et al. Structure of photosystem II and substrate binding at room temperature. Nature 540, 453–457 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  111. Ratner, D., Cryan, J. P., Lane, T. J., Li, S. & Stupakov, G. Pump–probe ghost imaging with SASE FELs. Phys. Rev. X 9, 011045 (2019).

    CAS  Google Scholar 

Download references

Acknowledgements

This article evolved from presentations and discussions at the workshop ‘At the Tipping Point: A Future of Fused Chemical and Data Science’ held in September 2020, sponsored by the Council on Chemical Sciences, Geosciences, and Biosciences of the US Department of Energy, Office of Science, Office of Basic Energy Sciences. The authors thank the members of the Council for their encouragement and assistance in developing this workshop. In addition, the authors are indebted to the agencies responsible for funding their individual research efforts, without which this work would not have been possible.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed equally to all aspects of the article.

Corresponding authors

Correspondence to Junko Yano, Kelly J. Gaffney, John Gregoire, Linda Hung, Abbas Ourmazd, Joshua Schrier, James A. Sethian or Francesca M. Toma.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Reviews Chemistry thanks Martin Green, Venkatasubramanian Viswanathan and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

Autoprotocol: https://autoprotocol.org/

Cambridge Structural Database: https://www.ccdc.cam.ac.uk/

CAMERA: https://camera.lbl.gov/

Chemotion Repository: https://www.chemotion-repository.net/welcome

FAIR principles: https://www.go-fair.org/fair-principles/

HardwareX: https://www.journals.elsevier.com/hardwarex

IBM RXN: https://rxn.res.ibm.com/

Inorganic Crystal Structure Database: https://www.psds.ac.uk/icsd

MaterialNet: https://maps.matr.io/

NMRShiftDB: https://nmrshiftdb.nmr.uni-koeln.de/

Open Reaction Database: http://open-reaction-database.org

Protein Data Bank: https://www.rcsb.org/

PuRe Data Resources: https://www.energy.gov/science/office-science-pure-data-resources

Reaxys: https://www.elsevier.com/solutions/reaxys

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yano, J., Gaffney, K.J., Gregoire, J. et al. The case for data science in experimental chemistry: examples and recommendations. Nat Rev Chem 6, 357–370 (2022). https://doi.org/10.1038/s41570-022-00382-w

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41570-022-00382-w

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing