Recognizing Action Using Multi-center Subspace Learning-based Spatial-temporal Information Fusion
-
摘要: 基于深度图序列的人体行为识别, 一般通过提取特征图来提高识别精度, 但这类特征图通常存在时序信息缺失的问题. 针对上述问题, 本文提出了一种新的深度图序列表示方式, 即深度时空图(Depth Space Time Maps, DSTM), 该算法降低了特征图的冗余度, 弥补了时序信息缺失的问题. 本文通过融合空间信息占优的Depth Motion Maps (DMM) 与时序信息占优的DSTM, 进行高精度的人体行为研究. 提出了一种名为多聚点子空间学习Multi-Center Subspace Learning (MCSL)的多模态数据融合算法. 该算法为各类别数据构建了多个投影聚点,以此增大了样本的类间距离, 降低了投影目标区域维度. 本文在MSR-Action3D深度数据集和UTD-MHAD深度数据集上进行人体行为识别最后实验结果表明, 本文方法相较于现有人体行为识别方法有着较高的识别率.Abstract: Human action recognitions from depth map sequences improve the recognition accuracy by extracting feature maps. A new representation of depth map sequences called Depth Space Time Maps (DSTM) is proposed in this paper for overcoming the lack of temporal information in e feature maps. DSTM reduces the redundancy of action features. We conduct high-precision human action recognitions by fusing Depth Motion Maps (DMM) and DSTM based on a new multi-modal data fusion algorithm called Multi-Center Subspace Learning (MCSL). The algorithm constructs multiple projection centers for each class data to expand the samples inter-class distance and reduce the projection target area dimension. Experiments conducted on MSR-Action3D and UTD-MHAD depth database show the effectiveness of the proposed method.
-
Key words:
- action recognition 1 /
- information fusion 2 /
- DSTM 3 /
- multi-center subspace learning 4
-
表 1 MSR数据库中的人体行为
Table 1 Human Actions in MSR
动作 样本数 动作 样本数 高挥手(A01) 27 双手挥(A11) 30 水平挥手(A02) 26 侧边拳击(A12) 30 锤(A03) 27 弯曲(A13) 27 手抓(A04) 25 向前踢(A14) 29 打拳(A05 26 侧踢(A15) 20 高抛(A06) 26 慢跑(A16) 30 画叉(A07) 27 网球挥拍(A17) 30 画勾(A08) 30 发网球(A18) 30 画圆(A09) 30 高尔夫挥杆(A19) 30 拍手(A10) 30 捡起扔(A20) 27 表 2 MSR数据库中的人体行为
Table 2 Human Actions in MSR
动作 样本数 动作 样本数 向左滑动(B01) 32 手臂卷曲(B16) 32 向右滑动(B02) 32 挥网球(B15) 32 挥手(B03) 32 网球发球(B17) 32 鼓掌(B04) 32 推(B18)) 32 扔(B05) 32 敲(B19) 32 双手交叉(B06) 32 抓(B20) 32 拍篮球(B07) 32 捡起扔(B21) 32 画叉(B08) 31 慢跑(B22) 31 画圆(B09) 32 走(B23) 32 持续画圆(B10) 32 坐下(B24) 32 画三角(B11) 32 站起来(B25) 32 打保龄球(B12) 32 弓步(B26) 32 冲拳(B13) 32 蹲(B27) 32 挥羽毛球(B14) 32 表 3 MSR数据库中的人体行为
Table 3 Human Actions in MSR
AS1 AS2 AS3 A02 A01 A06 A03 A04 A14 A05 A07 A15 A06 A08 A16 A10 A09 A17 A13 A11 A18 A18 A14 A19 A20 A12 A20 表 4 MSR数据库上不同特征的识别率
Table 4 Different of feature action recognition on MSR
method Test One Test Two Test Three AS1 AS2 AS3 avg AS1 AS2 AS3 avg AS1 AS2 AS3 avg MEI-HOG 69.79 77.63 79.72 75.71 84.00 89.58 93.24 88.94 86.95 86.95 95.45 89.78 MEI-LBP 57.05 56.58 64.19 59.27 66.66 69.79 78.37 71.61 69.56 73.91 77.27 73.58 DSTM-HOG 83.22 71.71 87.83 80.92 94.66 84.37 88.23 89.80 91.30 82.61 95.95 89.95 DSTM-LBP 84.56 71.71 87.83 81.37 88.00 82.29 95.94 88.74 86.96 82.61 95.45 88.34 MHI-HOG 69.79 72.36 70.95 71.03 88.00 84.37 89.19 87.19 95.65 82.60 95.45 91.23 MHI-LBP 51.67 60.52 54.05 55.41 73.33 70.83 78.37 74.18 82.60 65.21 72.72 73.51 DMM-HOG 88.00 87.78 87.16 87.65 94.66 87.78 100.00 94.15 100.00 88.23 95.45 94.56 DMM-LBP 89.52 87.78 93.20 90.17 93.11 85.19 100.00 92.77 94.03 88.98 92.38 91.80 表 5 UTD数据库上不同特征的识别率
Table 5 Different of feature action recognition on UTD
method Test One Test Two Test Three MEI-HOG 69.51 65.42 68.20 MEI-LBP 45.12 51.97 52.61 DSTM-HOG 71.08 80.28 89.54 DSTM-LBP 68.81 80.97 86.06 MHI-HOG 56.44 66.58 73.14 MHI-LBP 49.82 53.82 57.40 DMM-HOG 78.39 75.40 87.94 DMM-LBP 68.98 74.94 86.75 表 6 DMM和DSTM对比实验结果
Table 6 Experimental results of DMM and DSTM
method D1 D2 DSTM 62.83 81.53 DMM 32.17 63.93 表 7 DMM和DSTM平均处理时间
Table 7 Average processing time of DMM and DSTM
method D1(s) D2(s) DSTM 2.1059 3.4376 DMM 5.6014 8.6583 表 8
$ \mathrm{MSR}-\mathrm{Action} 3 \mathrm{D}^{1} $ 在上的实验结果Table 8 Experimental results on
$ \mathrm{MSR}-\mathrm{Action} 3 \mathrm{D}^{1} $ 表 9
$ \mathrm{MSR}-\mathrm{Action} 3 \mathrm{D}^{2} $ 在上的实验结果Table 9 Experimental results on
$ \mathrm{MSR}-\mathrm{Action} 3 \mathrm{D}^{2} $ -
[1] Yousefi S, Narui H, Dayal S, Ermon S, Valaee S. A Survey on Behavior Recognition Using WiFi Channel State Information. IEEE Communications Magazine, 2017, 55(10): 98−104 doi: 10.1109/MCOM.2017.1700082 [2] Mabrouk A B, Zagrouba E. Abnormal behavior recognition for intelligent video surveillance systems: A review. Expert Systems with Applications, 2018, 91: 480−491 doi: 10.1016/j.eswa.2017.09.029 [3] Fang C C, Mou T C, Sun S W, Chang P C. Machine-Learning Based Fitness Behavior Recognition from Camera and Sensor Modalities//2018 IEEE International Conference on Artificial Intelligence and Virtual Reality (AIVR). IEEE, 2018: 249−250 [4] Chen C, Liu K, Jafari R, Kehtarnavaz N. Home-based Senior Fitness Test measurement system using collaborative inertial and depth sensors//Engineering in Medicine and Biology Society. IEEE, 2014: 4135−4138 [5] Laver K E, Lange B, George S, Deutsch J E, Saposnik G, Crotty M. Virtual reality for stroke rehabilitation. Cochrane database of systematic reviews, 2017, (11) [6] Sun J, Wu X, Yan S, Cheong L F, Chua T S, Li J. Hierarchical spatio-temporal context modeling for action recognition. Cvpr, 2009: 2004−2011 [7] 胡建芳, 王熊辉, 郑伟诗, 赖剑煌. RGB-D行为识别研究进展及展望. 自动化学报, 2019, 45(5): 829−840Hu Jianfang, Wang Xionghui, Zheng Weishi, Lai Jianhuang. RGB-D Action Recognition: Recent Advances and Future Perspectives. Acta Automatica Sinica, 2019, 45(5): 829−840 [8] Bobick A F, Davis J W. The Recognition of Human Movement Using Temporal Templates. Pattern Analysis & Machine Intelligence IEEE Transactions on, 2001, 23(3): 257−267 [9] 苏本跃, 蒋京, 汤庆丰, 盛敏. 基于函数型数据分析方法的人体动态行为识别. 自动化学报, 2017, 43(5): 866−876Su Benyue, Jiang Jing, Tang Qingfeng, Sheng Min. Human Dynamic Action Recognition Based on Functional Data Analysis. Acta Automatica Sinica, 2017, 43(5): 866−876 [10] Anderson D, Luke R H, Keller J M, Skubic M, Rantz M J, Aud M A. Modeling human activity from voxel person using fuzzy logic. IEEE Transactions on Fuzzy Systems, 2009, 17(1): 39−49 doi: 10.1109/TFUZZ.2008.2004498 [11] 朱红蕾, 朱昶胜, 徐志刚. 人体行为识别数据集研究进展. 自动化学报, 2018, 44(6): 978−1004Zhu Honglei, Zhu Yusheng, Xu Zhigang. Research Advances on Human Activity Recognition Datasets. Acta Automatica Sinica, 2018, 44(6): 978−1004 [12] Wu Y, Jia Z, Ming Y, Sun J, Cao L. Human behavior recognition based on 3D features and hidden markov models. Signal, Image and Video Processing, 2016, 10(3): 495−502 doi: 10.1007/s11760-015-0756-6 [13] Wang J, Liu Z, Chorowski J, Chen Z, Wu Y. Robust 3d action recognition with random occupancy patterns//Computer vision-ECCV 2012. Springer, Berlin, Heidelberg, 2012: 872−885 [14] Zhang H, Zhong P, He J, Xia C. Combining depth-skeleton feature with sparse coding for action recognition. Neurocomputing, 2017, 230: 417−426 doi: 10.1016/j.neucom.2016.12.041 [15] Zhang S, Chen E, Qi C, Liang C. Action Recognition Based on Sub-action Motion History Image and Static History Image//MATEC Web of Conferences. EDP Sciences, 2016, 56: 02006. [16] Liu Z, Zhang C, Tian Y. 3D-based deep convolutional neural network for action recognition with depth sequences. Image and Vision Computing, 2016, 55: 93−100 doi: 10.1016/j.imavis.2016.04.004 [17] Xu Y, Hou Z, Liang J, Chen C, Jia L, Song Y. Action recognition using weighted fusion of depth images and skeleton's key frames. Multimedia Tools and Applications, 2019: 1−16 [18] Wang P, Li W, Li C, Hou Y. Action recognition based on joint trajectory maps with convolutional neural networks. Knowledge-Based Systems, 2018, 158: 43−53 doi: 10.1016/j.knosys.2018.05.029 [19] Kamel A, Sheng B, Yang P, Li P, Shen R, Feng D D. Deep convolutional neural networks for human action recognition using depth maps and postures. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2018 [20] Li C, Hou Y, Wang P, Li W. Joint distance maps based action recognition with convolutional neural networks. IEEE Signal Processing Letters, 2017, 24(5): 624−628 doi: 10.1109/LSP.2017.2678539 [21] Yang X, Zhang C, Tian Y L. Recognizing actions using depth motion maps-based histograms of oriented gradient//Proceedings of the 20th ACM international conference on Multimedia. ACM, 2012: 1057−1060 [22] Li A, Shan S, Chen X, Gao W. Face recognition based on non-corresponding region matching//2011 International Conference on Computer Vision. IEEE, 2011: 1060−1067 [23] Haghighat M, Abdel-Mottaleb M, Alhalabi W. Discriminant correlation analysis: Real-time feature level fusion for multimodal biometric recognition. IEEE Transactions on Information Forensics and Security, 2016, 11(9): 1984−1996 doi: 10.1109/TIFS.2016.2569061 [24] Rosipal R, Kr?mer N. Overview and recent advances in partial least squares//International Statistical and Optimization Perspectives Workshop" Subspace, Latent Structure and Feature Selection". Springer, Berlin, Heidelberg, 2005: 34−51 [25] Liu H, Sun F. Material identification using tactile perception: A semantics-regularized dictionary learning method. IEEE/ASME Transactions on Mechatronics, 2018, 23(3): 1050−1058 doi: 10.1109/TMECH.2017.2775208 [26] Zhuang Y T, Yang Y, Wu F. Mining Semantic Correlation of Heterogeneous Multimedia Data for Cross-Media Retrieval. IEEE Transactions on Multimedia, 2008, 10(2): 221−229 doi: 10.1109/TMM.2007.911822 [27] Chen C, Jafari R, Kehtarnavaz N. Utd-mhad: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor//2015 IEEE International conference on image processing (ICIP). IEEE, 2015: 168−172 [28] Sharma A, Kumar A, Daume H, Jacobs D W. Generalized multiview analysis: A discriminative latent space//2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2012: 2160−2167 [29] Wang K, He R, Wang L, Wang W, Tan T. Joint feature selection and subspace learning for cross-modal retrieval. IEEE transactions on pattern analysis and machine intelligence, 2016, 38(10): 2010−2023 doi: 10.1109/TPAMI.2015.2505311 [30] Shotton J, Fitzgibbon A, Cook M, Sharp T, Finocchio M, Moore R, et al. Real-Time Pose Recognition in Parts from Single Depth Images//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2013: 1297−1304 [31] Chen C, Jafari R, Kehtarnavaz N. Action recognition from depth sequences using depth motion maps-based local binary patterns//2015 IEEE Winter Conference on Applications of Computer Vision. IEEE, 2015: 1092−1099 [32] Nie F, Huang H, Cai X, Ding C H. Efficient and robust feature selection via joint?2, 1-norms minimization//Advances in neural information processing systems. 2010: 1813−1821 [33] He R, Tan T, Wang L, Zheng W S. l21 regularized correntropy for robust feature selection//2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2012: 2504−2511 [34] Koniusz P, Cherian A, Porikli F. Tensor representations via kernel linearization for action recognition from 3d skeletons//European Conference on Computer Vision. Springer, Cham, 2016: 37−53 [35] Ben Tanfous A, Drira H, Ben Amor B. Coding Kendall's Shape Trajectories for 3D Action Recognition//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 2840−2849 [36] Vemulapalli R, Chellapa R. Rolling rotations for recognizing human actions from 3d skeletal data//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 4471−4479 [37] Wang L, Huynh D Q, Koniusz P. A Comparative Review of Recent Kinect-based Action Recognition Algorithms. arXiv preprint arXiv: 1906.09955, 2019. [38] Rahmani H, Mian A. 3D action recognition from novel viewpoints//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 1506−1515 [39] Tanfous A B, Drira H, Amor B B. Sparse Coding of Shape Trajectories for Facial Expression and Action Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019 [40] Amor B B, Su J, Srivastava A. Action recognition using rate-invariant analysis of skeletal shape trajectories. IEEE transactions on pattern analysis and machine intelligence, 2015, 38(1): 1−13 -

计量
- 文章访问数: 62
- HTML全文浏览量: 14
- 被引次数: 0