Informatica logo


Login Register

  1. Home
  2. Issues
  3. Volume 36, Issue 4 (2025)
  4. Key Frame-Based Skeleton Extraction for ...

Informatica

Information Submit your article For Referees Help ATTENTION!
  • Article info
  • Full article
  • More
    Article info Full article

Key Frame-Based Skeleton Extraction for Lightweight Human Action Recognition Networks
Volume 36, Issue 4 (2025), pp. 985–1012
Leiyue Yao   Chao Zeng   Jianying Xiong   Keyun Xiong   Lei Zhang   Yucheng Wang  

Authors

 
Placeholder
https://doi.org/10.15388/25-INFOR613
Pub. online: 24 November 2025      Type: Research Article      Open accessOpen Access

Received
1 May 2025
Accepted
1 November 2025
Published
24 November 2025

Abstract

Human Action Recognition (HAR) is an important task in computer vision with diverse applications. However, most existing methods rely on all frames of an action video for classification, which leads to high computational cost and low efficiency. In many cases, a compact set of key keyframes can effectively encode the essence of a complete action. Therefore, this study proposes an efficient HAR method that combines a new keyframe extraction algorithm with a lightweight neural network. Our contribution is three-fold. Firstly, an accurate and efficient key frame algorithm is proposed to alleviate the issue of frame-order confusion in classical clustering methods. Secondly, a key-frame-based multi-feature fusion matrix is constructed to address information loss from spatio-temporal trajectory overlap and the sensitivity issue of viewpoint changes in classical models. Thirdly, a lightweight neural network model is designed to achieve effective convergence within a short training period. The proposed method was evaluated on two public datasets (UTKinect-Action3D and Florence-3D) and a self-collected dataset (HanYue-3D). The experiment results show the advantages of our method in both accuracy and efficiency.

References

 
Alsaadi, M., Keshta, I., Ramesh, J.V.N., Nimma, D., Shabaz, M., Pathak, N., Singh, P.P., Kiyosov, S., Soni, M. (2025). Logical reasoning for human activity recognition based on multisource data from wearable device. Scientific Reports, 15(1), 380.
 
Biswal, A., Panigrahi, C.R., Behera, A., Nanda, S., Weng, T.H., Pati, B., Malu, C. (2024). Activity recognition for elderly care using genetic search. Computer Science and Information Systems, 21(1), 95–116.
 
Caetano, C., Sena, J., Brémond, F., Dos Santos, J.A., Schwartz, W.R. (2019). Skelemotion: a new representation of skeleton joint sequences based on motion information for 3d action recognition. In: 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance. IEEE, pp. 1–8.
 
Chen, Y.X., Song, Y.H., Huo, F.Z. (2023). Simulation of crowd evacuation behaviours at subway stations under panic emotion. International Journal of Simulation Modelling, 22(4), 667–678.
 
Chen, H., Pan, Y., Wang, C. (2024). An optimization method of human skeleton keyframes selection for action recognition. Complex & Intelligent Systems, 10(4), 4659–4673.
 
Du, Y., Fu, Y., Wang, L. (2015). November. Skeleton based action recognition with convolutional neural network. In: 2015 3rd IAPR Asian Conference on Pattern Recognition. IEEE, pp. 579–583.
 
Dutta, S.J., Boongoen, T., Zwiggelaar, R. (2025). Human activity recognition: a review of deep learning-based methods. IET Computer Vision, 19(1), e70003.
 
Gan, M., Liu, J., He, Y., Chen, A., Ma, Q. (2023). Keyframe selection via deep reinforcement learning for skeleton-based gesture recognition. IEEE Robotics and Automation Letters, 8(11), 7807–7814.
 
Kar, T., Kanungo, P., Mohanty, S.N., Groppe, S., Groppe, J. (2024). Video shot-boundary detection: issues, challenges and solutions. Artificial Intelligence Review, 57(4), 104.
 
Ke, Q., Bennamoun, M., An, S., Sohel, F., Boussaid, F. (2017). A new representation of skeleton sequences for 3d action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3288–3297.
 
Khezerlou, F., Baradarani, A., Balafar, M.A. (2023). A convolutional autoencoder model with weighted multi-scale attention modules for 3D skeleton-based action recognition. Journal of Visual Communication and Image Representation, 92, 103781.
 
Kong, Y., Fu, Y. (2022). Human action recognition and prediction: a survey. International Journal of Computer Vision, 130(5), 1366–1401.
 
Lanzoni, D., Harih, G., Buchmeister, B., Vujica-Herzog, N. (2024). Process simulate versus inertial Mocap system in human movement evaluation. International Journal of Simulation Modelling, 23(4), 587–598.
 
Li, X., Kang, J., Yang, Y., Zhao, F. (2023). A lightweight attentional shift graph convolutional network for skeleton-based action recognition. International Journal of Computers Communications & Control, 18(3), 5061.
 
Liu, J., Akhtar, N., Mian, A. (2019). Skepxels: Spatio-temporal image representation of human skeleton joints for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 10–19.
 
Liu, Q., Yun, F., Dong, M., Djoric, D., Zivlak, N. (2024). Health prognosis for equipment based on ACO-K-means and MCS-SVM under small sample noise unbalanced data. Tehnički vjesnik, 31(1), 24–31.
 
Phan, H.H., Nguyen, T.T., Phuc, N.H., Nhan, N.H., Tran, C.T., Vi, B.N. (2021). Key frame and skeleton extraction for deep learning-based human action recognition. In: 2021 RIVF International Conference on Computing and Communication Technologies. IEEE, pp. 1–6.
 
Simonyan, K., Zisserman, A. (2014). Two-stream convolutional networks for action recognition in videos. Advances in Neural Information Processing Systems, 27.
 
Xin, W., Liu, R., Liu, Y., Chen, Y., Yu, W., Miao, Q. (2023). Transformer for skeleton-based action recognition: a review of recent advances. Neurocomputing, 537, 164–186.
 
Yang, Z., Li, Y., Yang, J., Luo, J. (2018). Action recognition with spatio–temporal visual attention on skeleton image sequences. IEEE Transactions on Circuits and Systems for Video Technology, 29(8), 2405–2415.
 
Yao, L., Yang, W., Huang, W. (2020). A data augmentation method for human action recognition using dense joint motion images. Applied Soft Computing, 97, 106713.
 
Yao, L., Yang, W., Huang, W., Jiang, N., Zhou, B. (2022). Multi-scale feature learning and temporal probing strategy for one-stage temporal action localization. International Journal of Intelligent Systems, 37(7), 4092–4112.
 
Yu, X., Zhang, X., Xu, C., Ou, L. (2024). Human-robot collaborative interaction with human perception and action recognition. Neurocomputing, 563, 126827.
 
Zhang, F.L. (2024). Evolutionary algorithm for dynamic resource allocation and its applications. International Journal of Simulation Modelling, 23(3), 531–542.
 
Zhang, Y., Zhang, J., Liu, R., Zhu, P., Liu, Y. (2023). Key frame extraction based on quaternion Fourier transform with multiple features fusion. Expert Systems with Applications, 216, 119467.
 
Zhao, Y., Guo, H., Gao, L., Wang, H., Zheng, J., Zhang, K., Zheng, Y. (2023). Multi feature fusion action recognition based on key frames. Concurrency and Computation: Practice and Experience, 35(21), e6137.
 
Zhao, Z., Chen, Z., Li, J., Wang, X., Xie, X., Huang, L., Zhang, W., Shi, G. (2024). Glimpse and zoom: Spatio-temporal focused dynamic network for skeleton-based action recognition. IEEE Transactions on Circuits and Systems for Video Technology, 34(7), 5616–5629.
 
Zhao, Z., Chai, W., Hao, S., Hu, W., Wang, G., Cao, S., Song, M., Hwang, J.-N., Wang, G. (2025). A survey of deep learning in sports applications: Perception, comprehension, and decision. IEEE Transactions on Visualization and Computer Graphics, 31.
 
Zhu, H., Zheng, Z., Nevatia, R. (2023). Gait recognition using 3-d human body shape inference. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 909–918.

Biographies

Yao Leiyue

L. Yao received the BE, ME, and PhD degrees in computer science from Nanchang University, China. He is currently a professor at the School of Intelligent Medicine and Information Engineering, Jiangxi University of Chinese Medicine. He has published several papers in international journals and conferences. His current research interests include vision-based human action recognition, image and video processing, massive data processing, distributed systems, and software engineering.

Zeng Chao

C. Zeng received the BE degree in network engineering from Hunan Institute of Technology, China, in 2023, and is currently pursuing the MS degree in Computer Science and Technology at Jiangxi University of Chinese Medicine. His research interests include key-frame extraction for human action recognition and vision-based behavior analysis.

Xiong Jianying
special8212@sohu.com

J. Xiong is an associate professor in the field of computer science. She received the ME degree from Zhejiang University of Technology, China, in 2006, and the PhD degree from Jiangxi University of Finance and Economics, China, in 2013. Her research interests include information systems, information management, and service computing.

Xiong Keyun

K. Xiong was born in September 1980 in Nanchang, China. He received the BE degree and is currently a lecturer. His research interests mainly include big data architecture and data mining.

Zhang Lei

L. Zhang received the BS degree in mechanical design and automation from China University of Petroleum (Beijing), China. He is currently a senior engineer in mechanical design and serves as the Deputy General Manager of R&D at Beijing Hanlin Hangyu Technology Development Co., Ltd. His research interests include pharmaceutical technology and the development of intelligent equipment based on computer vision and related technologies.

Wang Yucheng

Y. Wang received the BS degree in computing and software systems from the University of Melbourne, Australia. His research interests include computer vision, intelligent systems, and data-driven applications.


Full article PDF XML
Full article PDF XML

Copyright
© 2025 Vilnius University
by logo by logo
Open access article under the CC BY license.

Keywords
key frame extraction lightweight neural network multi-scale learning skeleton-based action recognition CNN-based action recognition

Funding
This research was supported by the National Natural Science Foundation of China under Grant 62366023, the Scientific and Technological Projects of the Nanchang Science and Technology Bureau under Grant GJJ2202613.

Metrics
since January 2020
127

Article info
views

71

Full article
views

55

PDF
downloads

25

XML
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

INFORMATICA

  • Online ISSN: 1822-8844
  • Print ISSN: 0868-4952
  • Copyright © 2023 Vilnius University

About

  • About journal

For contributors

  • OA Policy
  • Submit your article
  • Instructions for Referees
    •  

    •  

Contact us

  • Institute of Data Science and Digital Technologies
  • Vilnius University

    Akademijos St. 4

    08412 Vilnius, Lithuania

    Phone: (+370 5) 2109 338

    E-mail: informatica@mii.vu.lt

    https://informatica.vu.lt/journal/INFORMATICA
Powered by PubliMill  •  Privacy policy