Key Frame-Based Skeleton Extraction for Lightweight Human Action Recognition Networks

Yao, Leiyue; Zeng, Chao; Xiong, Jianying; Xiong, Keyun; Zhang, Lei; Wang, Yucheng

doi:10.15388/25-INFOR613

Informatica

Key Frame-Based Skeleton Extraction for Lightweight Human Action Recognition Networks

Volume 36, Issue 4 (2025), pp. 985–1012

Leiyue Yao Chao Zeng Jianying Xiong Keyun Xiong Lei Zhang Yucheng Wang

https://doi.org/10.15388/25-INFOR613

Pub. online: 24 November 2025 Type: Research Article

Open Access

Received
1 May 2025

Accepted
1 November 2025

Published
24 November 2025

Abstract

Human Action Recognition (HAR) is an important task in computer vision with diverse applications. However, most existing methods rely on all frames of an action video for classification, which leads to high computational cost and low efficiency. In many cases, a compact set of key keyframes can effectively encode the essence of a complete action. Therefore, this study proposes an efficient HAR method that combines a new keyframe extraction algorithm with a lightweight neural network. Our contribution is three-fold. Firstly, an accurate and efficient key frame algorithm is proposed to alleviate the issue of frame-order confusion in classical clustering methods. Secondly, a key-frame-based multi-feature fusion matrix is constructed to address information loss from spatio-temporal trajectory overlap and the sensitivity issue of viewpoint changes in classical models. Thirdly, a lightweight neural network model is designed to achieve effective convergence within a short training period. The proposed method was evaluated on two public datasets (UTKinect-Action3D and Florence-3D) and a self-collected dataset (HanYue-3D). The experiment results show the advantages of our method in both accuracy and efficiency.

References

Alsaadi, M., Keshta, I., Ramesh, J.V.N., Nimma, D., Shabaz, M., Pathak, N., Singh, P.P., Kiyosov, S., Soni, M. (2025). Logical reasoning for human activity recognition based on multisource data from wearable device. Scientific Reports, 15(1), 380.

Biswal, A., Panigrahi, C.R., Behera, A., Nanda, S., Weng, T.H., Pati, B., Malu, C. (2024). Activity recognition for elderly care using genetic search. Computer Science and Information Systems, 21(1), 95–116.

Caetano, C., Sena, J., Brémond, F., Dos Santos, J.A., Schwartz, W.R. (2019). Skelemotion: a new representation of skeleton joint sequences based on motion information for 3d action recognition. In: 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance. IEEE, pp. 1–8.

Chen, Y.X., Song, Y.H., Huo, F.Z. (2023). Simulation of crowd evacuation behaviours at subway stations under panic emotion. International Journal of Simulation Modelling, 22(4), 667–678.

Chen, H., Pan, Y., Wang, C. (2024). An optimization method of human skeleton keyframes selection for action recognition. Complex & Intelligent Systems, 10(4), 4659–4673.

Du, Y., Fu, Y., Wang, L. (2015). November. Skeleton based action recognition with convolutional neural network. In: 2015 3rd IAPR Asian Conference on Pattern Recognition. IEEE, pp. 579–583.

Dutta, S.J., Boongoen, T., Zwiggelaar, R. (2025). Human activity recognition: a review of deep learning-based methods. IET Computer Vision, 19(1), e70003.

Gan, M., Liu, J., He, Y., Chen, A., Ma, Q. (2023). Keyframe selection via deep reinforcement learning for skeleton-based gesture recognition. IEEE Robotics and Automation Letters, 8(11), 7807–7814.

Kar, T., Kanungo, P., Mohanty, S.N., Groppe, S., Groppe, J. (2024). Video shot-boundary detection: issues, challenges and solutions. Artificial Intelligence Review, 57(4), 104.

Ke, Q., Bennamoun, M., An, S., Sohel, F., Boussaid, F. (2017). A new representation of skeleton sequences for 3d action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3288–3297.

Khezerlou, F., Baradarani, A., Balafar, M.A. (2023). A convolutional autoencoder model with weighted multi-scale attention modules for 3D skeleton-based action recognition. Journal of Visual Communication and Image Representation, 92, 103781.

Kong, Y., Fu, Y. (2022). Human action recognition and prediction: a survey. International Journal of Computer Vision, 130(5), 1366–1401.

Lanzoni, D., Harih, G., Buchmeister, B., Vujica-Herzog, N. (2024). Process simulate versus inertial Mocap system in human movement evaluation. International Journal of Simulation Modelling, 23(4), 587–598.

Li, X., Kang, J., Yang, Y., Zhao, F. (2023). A lightweight attentional shift graph convolutional network for skeleton-based action recognition. International Journal of Computers Communications & Control, 18(3), 5061.

Liu, J., Akhtar, N., Mian, A. (2019). Skepxels: Spatio-temporal image representation of human skeleton joints for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 10–19.

Liu, Q., Yun, F., Dong, M., Djoric, D., Zivlak, N. (2024). Health prognosis for equipment based on ACO-K-means and MCS-SVM under small sample noise unbalanced data. Tehnički vjesnik, 31(1), 24–31.

Phan, H.H., Nguyen, T.T., Phuc, N.H., Nhan, N.H., Tran, C.T., Vi, B.N. (2021). Key frame and skeleton extraction for deep learning-based human action recognition. In: 2021 RIVF International Conference on Computing and Communication Technologies. IEEE, pp. 1–6.

Simonyan, K., Zisserman, A. (2014). Two-stream convolutional networks for action recognition in videos. Advances in Neural Information Processing Systems, 27.

Xin, W., Liu, R., Liu, Y., Chen, Y., Yu, W., Miao, Q. (2023). Transformer for skeleton-based action recognition: a review of recent advances. Neurocomputing, 537, 164–186.

Yang, Z., Li, Y., Yang, J., Luo, J. (2018). Action recognition with spatio–temporal visual attention on skeleton image sequences. IEEE Transactions on Circuits and Systems for Video Technology, 29(8), 2405–2415.

Yao, L., Yang, W., Huang, W. (2020). A data augmentation method for human action recognition using dense joint motion images. Applied Soft Computing, 97, 106713.

Yao, L., Yang, W., Huang, W., Jiang, N., Zhou, B. (2022). Multi-scale feature learning and temporal probing strategy for one-stage temporal action localization. International Journal of Intelligent Systems, 37(7), 4092–4112.

Yu, X., Zhang, X., Xu, C., Ou, L. (2024). Human-robot collaborative interaction with human perception and action recognition. Neurocomputing, 563, 126827.

Zhang, F.L. (2024). Evolutionary algorithm for dynamic resource allocation and its applications. International Journal of Simulation Modelling, 23(3), 531–542.

Zhang, Y., Zhang, J., Liu, R., Zhu, P., Liu, Y. (2023). Key frame extraction based on quaternion Fourier transform with multiple features fusion. Expert Systems with Applications, 216, 119467.

Zhao, Y., Guo, H., Gao, L., Wang, H., Zheng, J., Zhang, K., Zheng, Y. (2023). Multi feature fusion action recognition based on key frames. Concurrency and Computation: Practice and Experience, 35(21), e6137.

Zhao, Z., Chen, Z., Li, J., Wang, X., Xie, X., Huang, L., Zhang, W., Shi, G. (2024). Glimpse and zoom: Spatio-temporal focused dynamic network for skeleton-based action recognition. IEEE Transactions on Circuits and Systems for Video Technology, 34(7), 5616–5629.

Zhao, Z., Chai, W., Hao, S., Hu, W., Wang, G., Cao, S., Song, M., Hwang, J.-N., Wang, G. (2025). A survey of deep learning in sports applications: Perception, comprehension, and decision. IEEE Transactions on Visualization and Computer Graphics, 31.

Zhu, H., Zheng, Z., Nevatia, R. (2023). Gait recognition using 3-d human body shape inference. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 909–918.

Biographies

Yao Leiyue

L. Yao received the BE, ME, and PhD degrees in computer science from Nanchang University, China. He is currently a professor at the School of Intelligent Medicine and Information Engineering, Jiangxi University of Chinese Medicine. He has published several papers in international journals and conferences. His current research interests include vision-based human action recognition, image and video processing, massive data processing, distributed systems, and software engineering.

Zeng Chao

C. Zeng received the BE degree in network engineering from Hunan Institute of Technology, China, in 2023, and is currently pursuing the MS degree in Computer Science and Technology at Jiangxi University of Chinese Medicine. His research interests include key-frame extraction for human action recognition and vision-based behavior analysis.

Xiong Jianying

special8212@sohu.com

J. Xiong is an associate professor in the field of computer science. She received the ME degree from Zhejiang University of Technology, China, in 2006, and the PhD degree from Jiangxi University of Finance and Economics, China, in 2013. Her research interests include information systems, information management, and service computing.

Xiong Keyun

K. Xiong was born in September 1980 in Nanchang, China. He received the BE degree and is currently a lecturer. His research interests mainly include big data architecture and data mining.

Zhang Lei

L. Zhang received the BS degree in mechanical design and automation from China University of Petroleum (Beijing), China. He is currently a senior engineer in mechanical design and serves as the Deputy General Manager of R&D at Beijing Hanlin Hangyu Technology Development Co., Ltd. His research interests include pharmaceutical technology and the development of intelligent equipment based on computer vision and related technologies.

Wang Yucheng

Y. Wang received the BS degree in computing and software systems from the University of Melbourne, Australia. His research interests include computer vision, intelligent systems, and data-driven applications.

Full article

Open access article under the CC BY license.

Keywords

key frame extraction lightweight neural network multi-scale learning skeleton-based action recognition CNN-based action recognition

Funding

This research was supported by the National Natural Science Foundation of China under Grant 62366023, the Scientific and Technological Projects of the Nanchang Science and Technology Bureau under Grant GJJ2202613.

Metrics

since January 2020

417

Article info
views

431

Full article
views

278

PDF
downloads

XML
downloads

RSS

Authors

Abstract

References

Biographies

Export citation

Copy and paste formatted citation

Download citation in file