Common-sense reasoning has recently emerged as an important test for artificial general intelligence, especially given the much-publicized successes of language representation models such as T5, BERT and GPT-3. Currently, typical benchmarks involve question answering tasks, but to test the full complexity of common-sense reasoning, more comprehensive evaluation methods that are grounded in theory should be developed.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
A noise audit of human-labeled benchmarks for machine commonsense reasoning
Scientific Reports Open Access 14 April 2024
-
Evaluating deep generative models on cognitive tasks: a case study
Discover Artificial Intelligence Open Access 06 June 2023
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
References
Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. in Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Vol. 1 (eds Burstein, J. et al.) 4171–4186 (ACL, 2019).
Brown, T. et al. in Advances in Neural Information Processing Systems 33, 1877–1901 (2020).
Liu, Y. et al. Preprint at arXiv https://doi.org/10.48550/arXiv.1907.11692 (2019).
Grudin, J. & Jacques, R. in Proc. 2019 CHI Conference on Human Factors in Computing Systems 209 (2019).
Davis, E. & Marcus, G. Commun. ACM 58, 92–103 (2015).
Talmor, A., Herzig, J., Lourie, N. & Berant, J. in Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Vol. 1 (eds Burstein, J. et al.) 4149–4158 (ACL, 2019).
Lewis, M. & Fan, A. in International Conference on Learning Representations https://openreview.net/pdf?id=Bkx0RjA9tX (2019).
Khashabi, D. et al. in EMNLP’20: Findings of the Association for Computational Linguistics 1896–1907 (2020).
Lenat, D. B. Commun. ACM 38, 33–38 (1995).
Speer, R., Chin, J. & Havasi, C. in Proc. 31st AAAI Conference on Artificial Intelligence, February 2017 (eds Singh, S. & Markovitch, S.) 4444–4451 (AAAI Press, 2017).
Gordon, A. S. & Hobbs, J. R. A Formal Theory of Commonsense Psychology: How People Think People Think (Cambridge Univ. Press, 2017).
Moore, C. The Development of Commonsense Psychology (Psychology Press, 2006).
Ratcliffe, M. Rethinking Commonsense Psychology: A Critique of Folk Psychology, Theory of Mind and Simulation (Palgrave Macmillan UK, 2007).
Goldman, A. I. in The Oxford Handbook of Philosophy of Cognitive Science (eds Margolis, E. et al.) (2012).
Simon, H. A. Q. J. Econ. 69, 99–118 (1955).
Grice, P. Studies in the Way of Words (Harvard Univ. Press, 1989).
McCarthy, J. Ann. N. Y. Acad. Sci. 426, 129–137 (1984).
Biro, S., Verschoor, S., Coalter, E. & Leslie, A. M. Infant Behav. Devel. 37, 729–738 (2014).
Kushnir, T., Xu, F. & Wellman, H. M. Psychol. Sci. 21, 1134–1140 (2010).
Perner, J. Understanding the Representational Mind (MIT Press, 1991).
Powell, L. J. & Spelke, E. S. Proc. Natl Acad. Sci. USA 110, E3965–E3972 (2013).
Sommerville, J. A., Schmidt, M. F., Yun, J. E. & Burns, M. Infancy 18, 40–66 (2013).
Jara-Ettinger, J., Gweon, H., Schulz, L. E. & Tenenbaum, J. B. Trends Cogn. Sci. 20, 589–604 (2016).
Lin, X. V., Socher, R. & Xiong, C. in Proc. 2018 Conference on Empirical Methods in Natural Language Processing (eds Riloff, E. et al.) 3243–3253 (ACL, 2018).
Acknowledgements
This work was funded under the DARPA Machine Common Sense (MCS) program under award number N660011924033.
Author information
Authors and Affiliations
Contributions
M.K. and D.M. conceived the ideas behind the manuscript and its outline. M.K., H.S. and A.M co-wrote the manuscript and designed the figures, examples and supplementary material. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Machine Intelligence thanks the anonymous reviewers for their contribution to the peer review of this work.
Rights and permissions
About this article
Cite this article
Kejriwal, M., Santos, H., Mulvehill, A.M. et al. Designing a strong test for measuring true common-sense reasoning. Nat Mach Intell 4, 318–322 (2022). https://doi.org/10.1038/s42256-022-00478-4
Published:
Issue Date:
DOI: https://doi.org/10.1038/s42256-022-00478-4
This article is cited by
-
A noise audit of human-labeled benchmarks for machine commonsense reasoning
Scientific Reports (2024)
-
Evaluating deep generative models on cognitive tasks: a case study
Discover Artificial Intelligence (2023)