Designing a strong test for measuring true common-sense reasoning

Kejriwal, Mayank; Santos, Henrique; Mulvehill, Alice M.; McGuinness, Deborah L.

doi:10.1038/s42256-022-00478-4

Comment
Published: 22 April 2022

Designing a strong test for measuring true common-sense reasoning

Nature Machine Intelligence volume 4, pages 318–322 (2022)Cite this article

719 Accesses
7 Citations
2 Altmetric
Metrics details

Subjects

Common-sense reasoning has recently emerged as an important test for artificial general intelligence, especially given the much-publicized successes of language representation models such as T5, BERT and GPT-3. Currently, typical benchmarks involve question answering tasks, but to test the full complexity of common-sense reasoning, more comprehensive evaluation methods that are grounded in theory should be developed.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

A noise audit of human-labeled benchmarks for machine commonsense reasoning
- Mayank Kejriwal
- , Henrique Santos
- … Deborah L. McGuinness
Scientific Reports Open Access 14 April 2024
Evaluating deep generative models on cognitive tasks: a case study
- Zhisheng Tang
- & Mayank Kejriwal
Discover Artificial Intelligence Open Access 06 June 2023

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: A tree-based visualization of the 48 representational areas in the Gordon–Hobbs common-sense theory.**

References

Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. in Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Vol. 1 (eds Burstein, J. et al.) 4171–4186 (ACL, 2019).
Brown, T. et al. in Advances in Neural Information Processing Systems 33, 1877–1901 (2020).
Liu, Y. et al. Preprint at arXiv https://doi.org/10.48550/arXiv.1907.11692 (2019).
Grudin, J. & Jacques, R. in Proc. 2019 CHI Conference on Human Factors in Computing Systems 209 (2019).
Davis, E. & Marcus, G. Commun. ACM 58, 92–103 (2015).
Article Google Scholar
Talmor, A., Herzig, J., Lourie, N. & Berant, J. in Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Vol. 1 (eds Burstein, J. et al.) 4149–4158 (ACL, 2019).
Lewis, M. & Fan, A. in International Conference on Learning Representations https://openreview.net/pdf?id=Bkx0RjA9tX (2019).
Khashabi, D. et al. in EMNLP’20: Findings of the Association for Computational Linguistics 1896–1907 (2020).
Lenat, D. B. Commun. ACM 38, 33–38 (1995).
Article Google Scholar
Speer, R., Chin, J. & Havasi, C. in Proc. 31st AAAI Conference on Artificial Intelligence, February 2017 (eds Singh, S. & Markovitch, S.) 4444–4451 (AAAI Press, 2017).
Gordon, A. S. & Hobbs, J. R. A Formal Theory of Commonsense Psychology: How People Think People Think (Cambridge Univ. Press, 2017).
Moore, C. The Development of Commonsense Psychology (Psychology Press, 2006).
Ratcliffe, M. Rethinking Commonsense Psychology: A Critique of Folk Psychology, Theory of Mind and Simulation (Palgrave Macmillan UK, 2007).
Goldman, A. I. in The Oxford Handbook of Philosophy of Cognitive Science (eds Margolis, E. et al.) (2012).
Simon, H. A. Q. J. Econ. 69, 99–118 (1955).
Article Google Scholar
Grice, P. Studies in the Way of Words (Harvard Univ. Press, 1989).
McCarthy, J. Ann. N. Y. Acad. Sci. 426, 129–137 (1984).
Article Google Scholar
Biro, S., Verschoor, S., Coalter, E. & Leslie, A. M. Infant Behav. Devel. 37, 729–738 (2014).
Article Google Scholar
Kushnir, T., Xu, F. & Wellman, H. M. Psychol. Sci. 21, 1134–1140 (2010).
Article Google Scholar
Perner, J. Understanding the Representational Mind (MIT Press, 1991).
Powell, L. J. & Spelke, E. S. Proc. Natl Acad. Sci. USA 110, E3965–E3972 (2013).
Google Scholar
Sommerville, J. A., Schmidt, M. F., Yun, J. E. & Burns, M. Infancy 18, 40–66 (2013).
Article Google Scholar
Jara-Ettinger, J., Gweon, H., Schulz, L. E. & Tenenbaum, J. B. Trends Cogn. Sci. 20, 589–604 (2016).
Article Google Scholar
Lin, X. V., Socher, R. & Xiong, C. in Proc. 2018 Conference on Empirical Methods in Natural Language Processing (eds Riloff, E. et al.) 3243–3253 (ACL, 2018).

Download references

Acknowledgements

This work was funded under the DARPA Machine Common Sense (MCS) program under award number N660011924033.

Author information

These authors contributed equally: Henrique Santos, Alice M. Mulvehill.

Authors and Affiliations

Information Sciences Institute, University of Southern California, Marina del Rey, CA, USA
Mayank Kejriwal
Tetherless World Constellation, Rensselaer Polytechnic Institute, Troy, NY, USA
Henrique Santos, Alice M. Mulvehill & Deborah L. McGuinness

Authors

Mayank Kejriwal
View author publications
You can also search for this author in PubMed Google Scholar
Henrique Santos
View author publications
You can also search for this author in PubMed Google Scholar
Alice M. Mulvehill
View author publications
You can also search for this author in PubMed Google Scholar
Deborah L. McGuinness
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.K. and D.M. conceived the ideas behind the manuscript and its outline. M.K., H.S. and A.M co-wrote the manuscript and designed the figures, examples and supplementary material. All authors reviewed the manuscript.

Corresponding author

Correspondence to Mayank Kejriwal.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks the anonymous reviewers for their contribution to the peer review of this work.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kejriwal, M., Santos, H., Mulvehill, A.M. et al. Designing a strong test for measuring true common-sense reasoning. Nat Mach Intell 4, 318–322 (2022). https://doi.org/10.1038/s42256-022-00478-4

Download citation

Published: 22 April 2022
Issue Date: April 2022
DOI: https://doi.org/10.1038/s42256-022-00478-4

This article is cited by

A noise audit of human-labeled benchmarks for machine commonsense reasoning
- Mayank Kejriwal
- Henrique Santos
- Deborah L. McGuinness
Scientific Reports (2024)
Evaluating deep generative models on cognitive tasks: a case study
- Zhisheng Tang
- Mayank Kejriwal
Discover Artificial Intelligence (2023)

Designing a strong test for measuring true common-sense reasoning

Subjects

Relevant articles

A noise audit of human-labeled benchmarks for machine commonsense reasoning

Evaluating deep generative models on cognitive tasks: a case study

Access options

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Rights and permissions

About this article

Cite this article

This article is cited by

A noise audit of human-labeled benchmarks for machine commonsense reasoning

Evaluating deep generative models on cognitive tasks: a case study

Search

Quick links

Subjects

Relevant articles

A noise audit of human-labeled benchmarks for machine commonsense reasoning

Evaluating deep generative models on cognitive tasks: a case study

Access options

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

A noise audit of human-labeled benchmarks for machine commonsense reasoning

Evaluating deep generative models on cognitive tasks: a case study

Search

Quick links