Pub. online:24 Nov 2025Type:Research ArticleOpen Access
Journal:Informatica
Volume 36, Issue 4 (2025), pp. 985–1012
Abstract
Human Action Recognition (HAR) is an important task in computer vision with diverse applications. However, most existing methods rely on all frames of an action video for classification, which leads to high computational cost and low efficiency. In many cases, a compact set of key keyframes can effectively encode the essence of a complete action. Therefore, this study proposes an efficient HAR method that combines a new keyframe extraction algorithm with a lightweight neural network. Our contribution is three-fold. Firstly, an accurate and efficient key frame algorithm is proposed to alleviate the issue of frame-order confusion in classical clustering methods. Secondly, a key-frame-based multi-feature fusion matrix is constructed to address information loss from spatio-temporal trajectory overlap and the sensitivity issue of viewpoint changes in classical models. Thirdly, a lightweight neural network model is designed to achieve effective convergence within a short training period. The proposed method was evaluated on two public datasets (UTKinect-Action3D and Florence-3D) and a self-collected dataset (HanYue-3D). The experiment results show the advantages of our method in both accuracy and efficiency.