Automated discovery of algorithms from data

Blazek, Paul J.; Venkatesh, Kesavan; Lin, Milo M.

doi:10.1038/s43588-024-00593-9

Article
Published: 19 February 2024

Automated discovery of algorithms from data

Nature Computational Science volume 4, pages 110–118 (2024)Cite this article

4063 Accesses
1 Citations
60 Altmetric
Metrics details

Subjects

Abstract

To automate the discovery of new scientific and engineering principles, artificial intelligence must distill explicit rules from experimental data. This has proven difficult because existing methods typically search through the enormous space of possible functions. Here we introduce deep distilling, a machine learning method that does not perform searches but instead learns from data using symbolic essence neural networks and then losslessly condenses the network parameters into a concise algorithm written in computer code. This distilled code, which can contain loops and nested logic, is equivalent to the neural network but is human-comprehensible and orders-of-magnitude more compact. On arithmetic, vision and optimization tasks, the distilled code is capable of out-of-distribution systematic generalization to solve cases orders-of-magnitude larger and more complex than the training data. The distilled algorithms can sometimes outperform human-designed algorithms, demonstrating that deep distilling is able to discover generalizable principles complementary to human expertise.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Deep distilling automatically writes computer code.**

**Fig. 2: Deep distilling the rules of cellular automata.**

**Fig. 3: Deep distilling learns generalizable algorithms written as code.**

**Fig. 4: Distilled algorithms can outperform human-designed algorithms.**

A review of some techniques for inclusion of domain-knowledge into deep neural networks

Article Open access 20 January 2022

Efficient neural codes naturally emerge through gradient descent learning

Article Open access 29 December 2022

Parsimonious neural networks learn interpretable physical laws

Article Open access 17 June 2021

Data availability

The datasets used in this work are included with the code. Source Data are provided with this paper.

Code availability

The code used to distill ENNs has been deposited at CodeOcean 31 and at https://github.com/pauljblazek/deepdistilling.

References

Arrieta, A. B. et al. Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 58, 82–115 (2020).
Article Google Scholar
Gunning, D. & Aha, D. DARPA’s explainable artificial intelligence (XAI) program. AI Magazine 40, 44–58 (2019).
Article Google Scholar
Marcus, G. Deep learning: a critical appraisal. Preprint at http://arxiv.org/abs/1801.00631 (2018).
Chen, M. et al. Evaluating large language models trained on code. Preprint at http://arxiv.org/abs/2107.03374 (2021).
Austin, J. et al. Program synthesis with large language models. Preprint at http://arxiv.org/abs/2108.07732 (2021).
Li, Y. et al. Competition-level code generation with AlphaCode. Science 378, 1092–1097 (2022).
Article ADS CAS PubMed Google Scholar
Zelikman, E. et al. Parsel: algorithmic reasoning with language models by composing decompositions. Preprint at http://arxiv.org/abs/2212.10561 (2023).
Romera-Paredes, B. et al. Mathematical discoveries from program search with large language models. Nature 625, 468-475 (2023).
Article ADS PubMed PubMed Central Google Scholar
Fawzi, A. et al. Discovering faster matrix multiplication algorithms with reinforcement learning. Nature 610, 47–53 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Gulwani, S. Automating string processing in spreadsheets using input–output examples. In POPL ’11: Proc. 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages Vol. 46, 317–330 (ACM, 2011).
Gulwani, S. et al. Inductive programming meets the real world. Commun. ACM 58, 90–99 (2015).
Article Google Scholar
Raedt, L. D. et al. (eds) Approaches and Applications of Inductive Programming (Dagstuhl Seminar 19202) (Dagstuhl, 2019).
Kitzelmann, E. Inductive Programming: A Survey of Program Synthesis Techniques in Approaches and Applications of Inductive Programming (eds Schmid, U. et al.) 50–73 (Springer, 2010).
Balog, M., Gaunt, A. L., Brockschmidt, M., Nowozin, S. & Tarlow, D. DeepCoder: learning to write programs. In 5th Int. Conf. Learn. Represent. (2017).
Polozov, O. & Gulwani, S. FlashMeta: a framework for inductive program synthesis. In Proc. 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications 107–126 (ACM, 2015).
Blazek, P. J. & Lin, M. M. Explainable neural networks that simulate reasoning. Nat. Comput. Sci. 1, 607–618 (2021).
Article PubMed Google Scholar
Kautz, H. A. The third AI summer: AAAI Robert S. Engelmore memorial lecture. AI Magazine 43, 105–125 (2022).
Article Google Scholar
Besold, T. R. et al. in Neuro-symbolic artificial intelligence: the state of the art. (eds Hitzler, P. & Sarker, M. K.) Ch. 1 (IOS Press, 2022).
McCulloch, W. & Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5, 115–133 (1943).
Article MathSciNet Google Scholar
Mitchell, M. in Non-standard Computation—Molecular Computation, Cellular Automata, Evolutionary Algorithms, Quantum Computers (eds Gramß, T. et al.) Ch. 4 (Wiley, 2005); https://doi.org/10.1002/3527602968.ch4
Wolfram, S. Statistical mechanics of cellular automata. Rev. Modern Phys. 55, 601–644 (1983).
Article ADS MathSciNet Google Scholar
Lake, B. M. & Baroni, M. Human-like systematic generalization through a meta-learning neural network. Nature 623, 115–121 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Fodor, J. A. & Pylyshyn, Z. W. Connectionism and cognitive architecture: a critical analysis. Cognition 28, 3–71 (1988).
Article CAS PubMed Google Scholar
Gardner, M. Mathematical games: the fantastic combinations of John Conway’s new solitaire game life. Scientific American 223, 120–123 (1970).
Article Google Scholar
Rendell, P. A universal Turing machine in Conway’s Game of Life. In 2011 International Conference on High Performance Computing Simulation 764–772 (IEEE, 2011).
Karp, R. Reducibility among combinatorial problems. In Proc. Complexity of Computer Computations Vol. 40, 85–103 (Springer, 1972).
Poloczek, M., Schnitger, G., Williamson, D. & Zuylen, A. Greedy algorithms for the maximum satisfiability problem: simple algorithms and inapproximability bounds. SIAM J. Comput. 46, 1029–1061 (2017).
Article MathSciNet Google Scholar
Mukhopadhyay, P. & Chaudhuri, B. B. A survey of Hough transform. Pattern Recognition 48, 993–1010 (2015).
Article ADS Google Scholar
Adams, G. S., Converse, B. A., Hales, A. H. & Klotz, L. E. People systematically overlook subtractive changes. Nature 592, 258–261 (2021).
Article ADS CAS PubMed Google Scholar
McCluskey, E. J. Minimization of Boolean functions. Bell Syst. Tech. J. 35, 1417–1444 (1956).
Article MathSciNet Google Scholar
Blazek, P. B. & Lin, M. M. Deep distilling: automated algorithm discovery using explainable deep learning. Code Ocean https://doi.org/10.24433/CO.6047170.v1 (2024).

Download references

Acknowledgements

This work was supported by the UTSW High Risk/High Impact grant.

Author information

Authors and Affiliations

Green Center for Systems Biology, University of Texas Southwestern Medical Center, Dallas, TX, USA
Paul J. Blazek, Kesavan Venkatesh & Milo M. Lin
Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX, USA
Paul J. Blazek & Milo M. Lin
Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA
Paul J. Blazek & Milo M. Lin
Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
Kesavan Venkatesh
Center for Alzheimer’s and Neurodegenerative Diseases, University of Texas Southwestern Medical Center, Dallas, TX, USA
Milo M. Lin

Authors

Paul J. Blazek
View author publications
You can also search for this author in PubMed Google Scholar
Kesavan Venkatesh
View author publications
You can also search for this author in PubMed Google Scholar
Milo M. Lin
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

P.J.B. and M.M.L. conceptualized the work, wrote the paper, and performed the visualizations and methodology. M.M.L. acquired funding. P.J.B. and K.V. performed investigations and co-wrote the software.

Corresponding author

Correspondence to Milo M. Lin.

Ethics declarations

Competing interests

P.J.B. and M.M.L. are co-authors on international patent applications related to ENNs (PCT/US2021/019470) and to deep distilling (PCT/US2022/040885). K.V. declares no competing interests.

Peer review

Peer review information

Nature Computational Science thanks Joseph Bakarji and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Ananya Rastogi, in collaboration with the Nature Computational Science team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–4, discussion and examples of output code generated by the method described.

Reporting Summary

Source data

Source Data Fig. 2

Source data files (.csv).

Source Data Fig. 4

Source data files (.csv).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Blazek, P.J., Venkatesh, K. & Lin, M.M. Automated discovery of algorithms from data. Nat Comput Sci 4, 110–118 (2024). https://doi.org/10.1038/s43588-024-00593-9

Download citation

Received: 04 June 2023
Accepted: 12 January 2024
Published: 19 February 2024
Issue Date: February 2024
DOI: https://doi.org/10.1038/s43588-024-00593-9

This article is cited by

Distilling data into code
- Joseph Bakarji
Nature Computational Science (2024)