Human action recognition with a large-scale brain-inspired photonic computer

Antonik, Piotr; Marsal, Nicolas; Brunner, Daniel; Rontani, Damien

doi:10.1038/s42256-019-0110-8

Article
Published: 12 November 2019

Human action recognition with a large-scale brain-inspired photonic computer

Nature Machine Intelligence volume 1, pages 530–537 (2019)Cite this article

2636 Accesses
89 Citations
11 Altmetric
Metrics details

Subjects

Abstract

The recognition of human actions in video streams is a challenging task in computer vision, with cardinal applications in brain–computer interfaces and surveillance, for example. Recently, deep learning has produced remarkable results, but it can be hard to use in practice, as its training requires large datasets and special-purpose and energy-consuming hardware. In this work, we propose a photonic hardware approach. Our experimental set-up comprises off-the-shelf components and implements an easy-to-train recurrent neural network with 16,384 nodes, scalable to hundreds of thousands of nodes. The system, based on the reservoir computing paradigm, is trained to recognize six human actions from the KTH video database using either raw frames as inputs or a set of features extracted with the histograms of an oriented gradients algorithm. We report a classification accuracy of 91.3%, comparable to state-of-the-art digital implementations, while promising a higher processing speed in comparison to the existing hardware approaches. Because of the massively parallel processing capabilities offered by photonic architectures, we anticipate that this work will pave the way towards simply reconfigurable and energy-efficient solutions for real-time video processing.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Examples of KTH frames and HOG feaures.**

**Fig. 2: Scheme of the principle of how our reservoir computer solves the human action classification task.**

**Fig. 3: Illustration of the experimental set-up, composed of an optical arm, connected to a computer.**

**Fig. 4: Performance of our photonic neuro-inspired architecture on the human action classification task.**

**Fig. 5: Confusion matrices with the best performance.**

Machine learning reveals the control mechanics of an insect wing hinge

Article 17 April 2024

Control of working memory by phase–amplitude coupling of human hippocampal neurons

Article Open access 17 April 2024

Memorability shapes perceived time (and vice versa)

Article 22 April 2024

Data availability

The KTH dataset can be downloaded from http://www.nada.kth.se/cvap/actions/. The numerical and experimental data can be downloaded from the data folder in our GitHub repository: https://github.com/pantonik/rc_slm_kth/ (https://doi.org/10.5281/zenodo.3474559).

Code availability

The code used in this study can be downloaded from the scripts folder in our GitHub repository: https://github.com/pantonik/rc_slm_kth (https://doi.org/10.5281/zenodo.3474559).

References

Wu, D., Sharma, N. & Blumenstein, M. Recent advances in video-based human action recognition using deep learning: a review. In 2017 International Joint Conference on Neural Networks (IJCNN) https://doi.org/10.1109/ijcnn.2017.7966210 (IEEE, 2017).
Moeslund, T. B. & Granum, E. A survey of computer vision-based human motion capture. Comput. Vis. Image Underst. 81, 231–268 (2001).
Article Google Scholar
Moeslund, T. B. in Virtual Interaction: Interaction in Virtual Inhabited 3D Worlds (eds Qvortrup, L. et al.) 221–234 (Springer, 2001).
Vrigkas, M., Nikou, C. & Kakadiaris, I. A. A review of human activity recognition methods. Front. Robot. AI 2, 28 (2015).
Article Google Scholar
Jaeger, H. Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless communication. Science 304, 78–80 (2004).
Article Google Scholar
Maass, W., Natschläger, T. & Markram, H. Real-time computing without stable states: a new framework for neural computation based on perturbations. Neural Comput. 14, 2531–2560 (2002).
Article Google Scholar
Lukoševičius, M. & Jaeger, H. Reservoir computing approaches to recurrent neural network training. Comput. Sci. Rev. 3, 127–149 (2009).
Article Google Scholar
Appeltant, L. et al. Information processing using a single dynamical node as complex system. Nat. Commun. 2, 468 (2011).
Article Google Scholar
Paquot, Y. et al. Optoelectronic reservoir computing. Sci. Rep. 2, 287 (2012).
Article Google Scholar
Larger, L. et al. Photonic information processing beyond turing: an optoelectronic implementation of reservoir computing. Opt. Express 20, 3241 (2012).
Article Google Scholar
Martinenghi, R., Rybalko, S., Jacquot, M., Chembo, Y. K. & Larger, L. Photonic nonlinear transient computing with multiple-delay wavelength dynamics. Phys. Rev. Lett. 108, 244101 (2012).
Article Google Scholar
Larger, L. et al. High-speed photonic reservoir computing using a time-delay-based architecture: million words per second classification. Phys. Rev. X 7, 011015 (2017).
Google Scholar
Duport, F., Schneider, B., Smerieri, A., Haelterman, M. & Massar, S. All-optical reservoir computing. Opt. Express 20, 22783 (2012).
Article Google Scholar
Brunner, D., Soriano, M. C., Mirasso, C. R. & Fischer, I. Parallel photonic information processing at gigabyte per second data rates using transient states. Nat. Commun. 4, 1364 (2013).
Article Google Scholar
Vinckier, Q. et al. High-performance photonic reservoir computer based on a coherently driven passive cavity. Optica 2, 438 (2015).
Article Google Scholar
Akrout, A. et al. Parallel photonic reservoir computing using frequency multiplexing of neurons. Preprint at https://arxiv.org/abs/1612.08606 (2016).
Vandoorne, K. et al. Experimental demonstration of reservoir computing on a silicon photonics chip. Nat. Commun. 5, 3541 (2014).
Article Google Scholar
Triefenbach, F., Jalalvand, A., Schrauwen, B. & Martens, J.-P. Phoneme recognition with large hierarchical reservoirs. In Advances in Neural Information Processing Systems Proceedings 2307–2315 (NIPS, 2010).
The 2006/07 Forecasting Competition for Neural Networks and Computational Intelligence http://www.neural-forecasting-competition.com/NN3/ (2006).
Antonik, P., Haelterman, M. & Massar, S. Brain-inspired photonic signal processor for generating periodic patterns and emulating chaotic systems. Phys. Rev. Appl. 7, 054014 (2017).
Article Google Scholar
Bueno, J. et al. Reinforcement learning in a large-scale photonic recurrent neural network. Optica 5, 756 (2018).
Article Google Scholar
Hagerstrom, A. M. et al. Experimental observation of chimeras in coupled-map lattices. Nat. Phys. 8, 658–661 (2012).
Article Google Scholar
Schuldt, C., Laptev, I. & Caputo, B. Recognizing human actions: a local SVM approach. In Proceedings of the 17th International Conference on Pattern Recognition, 2004 https://doi.org/10.1109/icpr.2004.1334462 (IEEE, 2004).
Dalal, N. & Triggs, B. Histograms of oriented gradients for human detection. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR) https://doi.org/10.1109/cvpr.2005.177 (IEEE, 2005).
Bahi, H. E., Mahani, Z., Zatni, A. & Saoud, S. A robust system for printed and handwritten character recognition of images obtained by camera phone. In WSEAS Transactions on Signal Processing (WSEAS, 2015).
Pearson, K. L. III On lines and planes of closest fit to systems of points in space. Lond. Edinb. Dubl. Phil. Mag. J. Sci. 2, 559–572 (1901).
Article Google Scholar
Hotelling, H. Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 24, 417–441 (1933).
Article Google Scholar
Smith, L. I. A Tutorial on Principal Components Analysis. Technical report, Univ. Otago (2002).
Antonik, P. et al. Online training of an opto-electronic reservoir computer applied to real-time channel equalization. IEEE Trans. Neural Netw. Learn. Systems 28, 2686–2698 (2017).
Article Google Scholar
Psaltis, D. & Farhat, N. Optical information processing based on an associative-memory model of neural nets with thresholding and feedback. Opt. Lett. 10, 98 (1985).
Article Google Scholar
Jhuang, H. A Biologically Inspired System for Action Recognition. PhD thesis, Massachusetts Institute of Technology (2007).
Grushin, A., Monner, D. D., Reggia, J. A. & Mishra, A. Robust human action recognition via long short-term memory. In The 2013 International Joint Conference on Neural Networks (IJCNN) https://doi.org/10.1109/ijcnn.2013.6706797 (IEEE, 2013).
Gilbert, A., Illingworth, J. & Bowden, R. Action recognition using mined hierarchical compound features. IEEE Trans. Pattern Anal. Mach. Intell. 33, 883–897 (2011).
Article Google Scholar
Tikhonov, A. N, Goncharsky, A, Stepanov, V. & Yagola, A. G. Numerical Methods for the Solution of Ill-posed Problems (Springer, 1995).
Saleh, B. E. A. & Teich, M. C. Fundamental of Photonics 3rd edn (Wiley, 2019).
Jaeger, H. The ‘echo state’ approach to analysing and training recurrent neural networks—with an Erratum note. GMD Report 148, 1–47 (2001).
Google Scholar
Lowe, D. G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004).
Article Google Scholar
Yadav, G. K., Shukla, P. & Sethfi, A. Action recognition using interest points capturing differential motion information. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) https://doi.org/10.1109/icassp.2016.7472003 (IEEE, 2016).
Shi, Y., Zeng, W., Huang, T. & Wang, Y. Learning deep trajectory descriptor for action recognition in videos using deep neural networks. In 2015 IEEE International Conference on Multimedia and Expo (ICME) https://doi.org/10.1109/icme.2015.7177461 (IEEE, 2015).
Kovashka, A. & Grauman, K. Learning a hierarchy of discriminative space–time neighborhood features for human action recognition. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition https://doi.org/10.1109/cvpr.2010.5539881 (IEEE, 2010).
Baccouche, M., Mamalet, F., Wolf, C., Garcia, C. & Baskurt, A. in Sequential Deep Learning for Human Action Recognition 29–39 (Springer, 2011).
Ali, K. H. & Wang, T. Learning features for action recognition and identity with deep belief networks. In 2014 International Conference on Audio, Language and Image Processing https://doi.org/10.1109/icalip.2014.7009771 (IEEE, 2014).
Wang, H., Klaser, A., Schmid, C. & Liu, C.-L. Action recognition by dense trajectories. In 2011 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR) https://doi.org/10.1109/cvpr.2011.5995407 (IEEE, 2011).
Liu, J. & Shah, M. Learning human actions via information maximization. In 2008 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) https://doi.org/10.1109/cvpr.2008.4587723 (IEEE, 2008).
Sun, X., Chen, M. & Hauptmann, A. Action recognition via local descriptors and holistic features. In 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops https://doi.org/10.1109/cvprw.2009.5204255 (IEEE, 2009).
Veeriah, V., Zhuang, N. & Qi, G.-J. Differential recurrent neural networks for action recognition. In 2015 IEEE International Conference on Computer Vision (ICCV) https://doi.org/10.1109/iccv.2015.460 (IEEE, 2015).
Shu, N., Tang, Q. & Liu, H. A bio-inspired approach modeling spiking neural networks of visual cortex for human action recognition. In 2014 International Joint Conference on Neural Networks (IJCNN) https://doi.org/10.1109/ijcnn.2014.6889832 (IEEE, 2014).
Laptev, I., Marszalek, M., Schmid, C. & Rozenfeld, B. Learning realistic human actions from movies. In 2008 IEEE Conference on Computer Vision and Pattern Recognition https://doi.org/10.1109/cvpr.2008.4587756 (IEEE, 2008).
Klaeser, A., Marszalek, M. & Schmid, C. A spatio-temporal descriptor based on 3D-gradients. In Proceedings of the British Machine Vision Conference 2008 https://doi.org/10.5244/c.22.99 (British Machine Vision Association, 2008).
Ji, S., Xu, W., Yang, M. & Yu, K. 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35, 221–231 (2013).
Article Google Scholar
Escobar, M.-J. & Kornprobst, P. Action recognition via bio-inspired features: the richness of center–surround interaction. Comput. Vis. Image Underst. 116, 593–605 (2012).
Article Google Scholar

Download references

Acknowledgements

The authors thank the creators of the KTH dataset for making the videos publicly available. This work was supported by AFOSR (grants nos. FA-9550-15-1-0279 and FA-9550-17-1-0072), Région Grand-Est and the Volkswagen Foundation via the NeuroQNet Project.

Author information

Authors and Affiliations

LMOPS EA 4423 Laboratory, CentraleSupélec & Université de Lorraine, Metz, France
Piotr Antonik, Nicolas Marsal & Damien Rontani
FEMTO-ST Institute/Optics Department, CNRS & Université Bourgogne Franche-Comté, Besançon, France
Daniel Brunner

Authors

Piotr Antonik
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Marsal
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Brunner
View author publications
You can also search for this author in PubMed Google Scholar
Damien Rontani
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

D.B., N.M. and D.R. designed and managed the study. P.A., N.M. and D.R. realized the experimental set-up. P.A. performed the numerical simulations and the experimental campaigns. P.A., N.M. and D.R. prepared the manuscript. All authors discussed the results and reviewed the manuscript.

Corresponding authors

Correspondence to Piotr Antonik or Damien Rontani.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Antonik, P., Marsal, N., Brunner, D. et al. Human action recognition with a large-scale brain-inspired photonic computer. Nat Mach Intell 1, 530–537 (2019). https://doi.org/10.1038/s42256-019-0110-8

Download citation

Received: 28 April 2019
Accepted: 07 October 2019
Published: 12 November 2019
Issue Date: November 2019
DOI: https://doi.org/10.1038/s42256-019-0110-8

This article is cited by

Time-domain photonic image processor based on speckle projection and reservoir computing
- Tomoya Yamaguchi
- Kohei Arai
- Satoshi Sunada
Communications Physics (2023)
Training large-scale optoelectronic neural networks with dual-neuron optical-artificial learning
- Xiaoyun Yuan
- Yong Wang
- Lu Fang
Nature Communications (2023)
Photonic signal processor based on a Kerr microcomb for real-time video image processing
- Mengxi Tan
- Xingyuan Xu
- David J. Moss
Communications Engineering (2023)
All-analog photoelectronic chip for high-speed vision tasks
- Yitong Chen
- Maimaiti Nazhamaiti
- Qionghai Dai
Nature (2023)
Echo State Networks and Long Short-Term Memory for Continuous Gesture Recognition: a Comparative Study
- Doreen Jirak
- Stephan Tietz
- Stefan Wermter
Cognitive Computation (2023)