2.765

2022影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于时序深度置信网络的在线人体动作识别

周风余 尹建芹 杨阳 张海婷 袁宪锋

周风余, 尹建芹, 杨阳, 张海婷, 袁宪锋. 基于时序深度置信网络的在线人体动作识别. 自动化学报, 2016, 42(7): 1030-1039. doi: 10.16383/j.aas.2016.c150629
引用本文: 周风余, 尹建芹, 杨阳, 张海婷, 袁宪锋. 基于时序深度置信网络的在线人体动作识别. 自动化学报, 2016, 42(7): 1030-1039. doi: 10.16383/j.aas.2016.c150629
ZHOU Feng-Yu, YIN Jian-Qin, YANG Yang, ZHANG Hai-Ting, YUAN Xian-Feng. Online Recognition of Human Actions Based on Temporal Deep Belief Neural Network. ACTA AUTOMATICA SINICA, 2016, 42(7): 1030-1039. doi: 10.16383/j.aas.2016.c150629
Citation: ZHOU Feng-Yu, YIN Jian-Qin, YANG Yang, ZHANG Hai-Ting, YUAN Xian-Feng. Online Recognition of Human Actions Based on Temporal Deep Belief Neural Network. ACTA AUTOMATICA SINICA, 2016, 42(7): 1030-1039. doi: 10.16383/j.aas.2016.c150629

基于时序深度置信网络的在线人体动作识别

doi: 10.16383/j.aas.2016.c150629
基金项目: 

国家自然科学基金 61203341

山东省自然科学基金重点项目 ZR2015QZ08

国家自然科学基金 61375084

详细信息
    作者简介:

    周风余山东大学控制科学与工程学院教授.2008年获得天津大学电气与自动化工程学院博士学位.主要研究方向为智能机器人技术.E-mail:zhoufengyu@sdu.edu.cn

    杨阳山东大学信息科学与工程学院讲师.2009年获得山东大学信息科学与工程学院博士学位.主要研究方向为图像处理与目标跟踪.E-mail:yangyang@mail.sdu.edu.cn

    张海婷山东大学控制科学与工程学院硕士研究生.2011年获得山东大学工学学士学位.主要研究方向为深度学习与图像处理.E-mail:546597163@qq.com

    袁宪锋山东大学控制科学与工程学院博士研究生.2011年获得山东大学工学学士学位.主要研究方向为机器学习与服务机器人.E-mail:yuanxianfeng_sdu@126.com

    通讯作者:

    尹建芹济南大学信息科学与工程学院副教授.2013年获得山东大学控制科学与工程学院博士学位.主要研究方向为图像处理与机器学习.本文通信作者.E-mail:ise_yinjq@ujn.edu.cn

Online Recognition of Human Actions Based on Temporal Deep Belief Neural Network

Funds: 

National Natural Science Foundation of China 61203341

Key Program of Natural Science Foundation of Shandong Province ZR2015QZ08

National Natural Science Foundation of China 61375084

More Information
    Author Bio:

    Professor at the School of Control Science and Engineering, Shandong University.He received his Ph.D.degree from Tianjin University in 2008.His main research interest is technology of intelligent robot

    Lecturer at the School of Information Science and Technology, Shandong University.He received his Ph.D.degree from the School of Information Science and Technology, Shandong University in 2009.His research interest covers image processing and object tracking

    Master student at the School of Control Science and Engineering, Shandong University. She received her bachelor degree from Shandong University in 2011.Her research interest covers deep learning and image processing

    Ph.D.candidate at the School of Control Science and Engineering, Shandong University. He received his bachelor degree from Shandong University in 2011.His research interest covers machine learning and service robot

    Corresponding author: YIN Jian-Qin Associate professor at the School of Information Science and Technology, Jinan University.She received her Ph.D.degree from the School of Control Science and Engineering, Shandong University in 2013. Her research interest covers image processing and machine learning.Corresponding author of this paper.
  • 摘要: 在线人体动作识别是人体动作识别的最终目标,但由于如何分割动作序列是一个待解决的难点问题,因此目前大多数人体动作识别方法仅关注在分割好的动作序列中进行动作识别,未关注在线人体动作识别问题.本文针对这一问题,提出了一种可以完成在线人体动作识别的时序深度置信网络(Temporal deep belief network, TDBN)模型.该模型充分利用动作序列前后帧提供的上下文信息,解决了目前深度置信网络模型仅能识别静态图像的问题,不仅大大提高了动作识别的准确率,而且由于该模型不需要人为对动作序列进行分割,可以从动作进行中的任意时刻开始识别,实现了真正意义上的在线动作识别,为实际应用打下了较好的理论基础.
  • 图  1  条件限制玻尔兹曼机结构

    Fig.  1  The structure of conditional restricted Boltzmann machines

    图  2  时序深度置信网络结构

    Fig.  2  The structure of the temporal deep belief network

    图  3  MIT数据库关节示意图

    Fig.  3  Illustration of the skeleton of MIT

    图  4  CRBM学习过程流程图

    Fig.  4  Flowchart of the learning of CRBM

    图  5  全局微调流程图

    Fig.  5  Flowchart of the global weights adjustment

    图  6  MIT数据库的识别结果

    Fig.  6  Recognition results on MIT datasets

    图  7  MIT数据库的混淆矩阵

    Fig.  7  Confusion matrix of MIT dataset

    图  8  CRBM的权重分布示意图

    Fig.  8  Illustration of the distribution of the weights of CRBM

    图  9  3D数据库动作示意图

    Fig.  9  Illustration of the action of MSR Action 3D

    图  10  3D数据库关节示意图

    Fig.  10  Illustration of the Skeleton of MSR Action 3D

    图  11  MSR Action 3D数据库 $AS1_{2}$的混淆矩阵

    Fig.  11  Confusion matrix of MSR Action 3D of $AS1_{2}$

    表  1  测试1和测试2中整个序列的识别结果(%)

    Table  1  Results of the sequences (%)

    ASl1 AS21 AS31 AS12 AS22 AS32
    本文一CRBM 92.23 89.46 92.05 95.62 93.42 95.67
    本文一TDBN 96.67 92.81 96.68 99.33 97.44 99.87
    Li等[2] 89.5 89.0 96.3 93.4 92.9 96.3
    Xia等[19] 98.47 96.67 93.47 98.61 97.92 94.93
    Yang等[3] 97.3 92.2 98.0 98.7 94.7 98.7
    下载: 导出CSV

    表  2  测试3中本文算法与其他算法的比较(%)

    Table  2  Comparisons between our method

    ASl1 AS21 AS31 Average
    Li等[2] 72.9 71.9 79.2 74.7
    Chen等[21] 96.2 83.2 92.0 90.47
    Gowayyed等[22] 92.39 90.18 91.43 91.26
    Vemulapalli等[23] 95.29 83.87 98.22 92.46
    Du等[13] 93.33 94.64 95.50 94.49
    TDBN 97.01 94.22 98.34 96.52
    下载: 导出CSV

    表  3  前5帧的识别结果(%)

    Table  3  Recognition results of the first 5 sequences (%)

    ASl1 AS21 AS31 AS12 AS22 AS32
    本文 79.84 79.35 82.93 90.78 92.76 94.66
    Yang等[3] 67±1 67±1 74±1 77±1 75±1 82±1
    下载: 导出CSV

    表  4  全部实验识别结果(%)

    Table  4  All recognition results (%)

    1 5 整个动作
    ASl1 77.55 79.84 96.67
    AS21 78.01 79.35 92.81
    AS31 81.60 82.93 96.68
    平均 79.05 80.71 95.39
    ASl2 89.74 90.78 99.33
    AS22 90.78 92.76 97.44
    AS32 93.00 94.66 99.87
    平均 91.17 92.73 98.88
    下载: 导出CSV

    表  5  不同阶数的识别时间(ms)

    Table  5  Recognition time with different orders (ms)

    Action n
    0 1 2 3 4 5 6
    Horizontal arm wave 4.45 9.78 12.56 14.61 17.21 19.45 23.51
    Hammer 3.67 9.89 11.89 14.39 17.12 19.78 22.13
    Forward punch 3.79 10.03 12.54 14.48 17.49 20.01 22.56
    High throw 3.96 9.92 12.48 14.68 17.73 19.21 22.67
    Hand clap 4.13 9.99 12.49 14.63 17.62 19.84 22.78
    Bend 4.78 9.79 12.34 14.61 17.94 19.47 21.87
    Tennis serve 4.56 9.67 12.52 14.65 17.56 19.49 22.46
    Pickup and throw 3.71 9.97 12.67 14.51 17.83 19.92 22.81
    下载: 导出CSV

    表  6  不同阶数的识别率(%)

    Table  6  Recognition rates with different orders (%)

    Action n
    0 1 2 3 4 5 6
    Horizontal arm wave 81.56 85.39 89.12 90.03 91.45 89.97 87.68
    Hammer 82.56 86.48 85.34 87.50 86.84 88.10 87.98
    Forward punch 73.67 76.45 78.78 79.19 77.87 78.16 78.45
    High throw 72.78 73.46 76.98 79.92 79.23 78.89 76.75
    Hand clap 87.65 93.78 98.34 98.65 97.85 96.12 96.23
    Bend 80.13 81.35 84.56 86.43 86.72 85.97 83.85
    Tennis serve 88.74 91.67 92.89 93.67 93.35 92.89 92.54
    Pickup and throw 83.81 86.34 86.94 88.34 87.13 87.67 97.56
    Average 81.36 84.37 86.62 87.97 87.56 87.22 87.63
    下载: 导出CSV
  • [1] (佟丽娜, 侯增广, 彭亮, 王卫群, 陈翼雄, 谭民.基于多路sEMG时序分析的人体运动模式识别方法.自动化学报, 2014, 40(5):810-821) http://www.aas.net.cn/CN/abstract/abstract18349.shtml

    Tong Li-Na, Hou Zeng-Guang, Peng Liang, Wang Wei-Qun, Chen Yi-Xiong, Tan Min. Multi-channel sEMG time series analysis based human motion recognition method. Acta Automatica Sinica, 2014, 40(5):810-821 http://www.aas.net.cn/CN/abstract/abstract18349.shtml
    [2] Li W Q, Zhang Z Y, Liu Z C. Action recognition based on a bag of 3D points. In:Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. San Francisco, CA:IEEE, 2010. 9-14 http://www.oalib.com/references/17186320
    [3] Yang X D, Zhang C Y, Tian Y L. Recognizing actions using depth motion maps-based histograms of oriented gradients. In:Proceedings of the 20th ACM International Conference on Multimedia. Nara, Japan:ACM, 2012. 1057-1060 http://www.oalib.com/paper/4237135
    [4] Ofli F, Chaudhry R, Kurillo G, Vidal R, Bajcsy R. Sequence of the most informative joints (SMIJ):a new representation for human skeletal action recognition. Journal of Visual Communication & Image Representation, 2014, 25(1):24-38 http://cn.bing.com/academic/profile?id=2125073690&encoded=0&v=paper_preview&mkt=zh-cn
    [5] Theodorakopoulos I, Kastaniotis D, Economou G, Fotopoulos S. Pose-based human action recognition via sparse representation in dissimilarity space. Journal of Visual Communication and Image Representation, 2014, 25(1):12-23 doi: 10.1016/j.jvcir.2013.03.008
    [6] (王斌, 王媛媛, 肖文华, 王炜, 张茂军.基于判别稀疏编码视频表示的人体动作识别.机器人, 2012, 34(6):745-750) doi: 10.3724/SP.J.1218.2012.00745

    Wang Bin, Wang Yuan-Yuan, Xiao Wen-Hua, Wang Wei, Zhang Mao-Jun. Human action recognition based on discriminative sparse coding video representation. Robot, 2012, 34(6):745-750 doi: 10.3724/SP.J.1218.2012.00745
    [7] (田国会, 尹建芹, 韩旭, 于静.一种基于关节点信息的人体行为识别新方法.机器人, 2014, 34(3):285-292) http://www.cnki.com.cn/Article/CJFDTOTAL-JQRR201403005.htm

    Tian Guo-Hui, Yin Jian-Qin, Han Xu, Yu Jing. A novel human activity recognition method using joint points information. Robot, 2014, 34(3):285-292 http://www.cnki.com.cn/Article/CJFDTOTAL-JQRR201403005.htm
    [8] (乔俊飞, 潘广源, 韩红桂.一种连续型深度信念网的设计与应用.自动化学报, 2015, 41(12):2138-2146) http://www.aas.net.cn/CN/Y2015/V41/I12/2138

    Qiao Jun-Fei, Pan Guang-Yuan, Han Hong-Gui. Design and application of continuous deep belief network. Acta Automatica Sinica, 2015, 41(12):2138-2146 http://www.aas.net.cn/CN/Y2015/V41/I12/2138
    [9] Zhao S C, Liu Y B, Han Y H, Hong R C. Pooling the convolutional layers in deep convnets for action recognition[Online], available:http://120.52.73.77/arxiv.org/pdf/1511.02126v1.pdf, November 1, 2015. http://cn.bing.com/academic/profile?id=1969767174&encoded=0&v=paper_preview&mkt=zh-cn
    [10] Liu C, Xu W S, Wu Q D, Yang G L. Learning motion and content-dependent features with convolutions for action recognition. Multimedia Tools and Applications, 2015, http://dx.doi.org/10.1007/s11042-015-2550-4.
    [11] Baccouche M, Mamalet F, Wolf C, Garcia C, Baskurt A. Sequential deep learning for human action recognition. Human Behavior Understanding. Berlin:Springer, 2011. 29-39
    [12] Lefebvre G, Berlemont S, Mamalet F, Garcia C. BLSTM-RNN based 3d gesture classification. Artificial Neural Networks and Machine Learning. Berlin:Springer, 2013. 381-388
    [13] Du Y, Wang W, Wang L. Hierarchical recurrent neural network for skeleton based action recognition. In:Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA:IEEE, 2015. 1110-1118
    [14] Taylor G W, Hinton G E, Roweis S. Modeling human motion using binary latent variables. In:Proceedings of Advances in Neural Information Processing Systems. Cambridge, MA:MIT Press, 2007. 1345-1352
    [15] Hinton G E, Osindero S. A fast learning algorithm for deep belief nets. Neural Computation, 2006, 18:1527-1554 doi: 10.1162/neco.2006.18.7.1527
    [16] Bengio Y, Lamblin P, Popovici D, Larochelle H. Personal communications with Will Zou. learning optimization Greedy layerwise training of deep networks. In:Proceedings of Advances in Neural Information Processing Systems. Cambridge, MA:MIT Press, 2007.
    [17] Rumelhart D E, Hinton G E, Williams R J. Learning representations by back-propagating errors. Nature, 1986, 323(6088):533-536 doi: 10.1038/323533a0
    [18] Hsu E, Pulli K, PopovićJ. Style translation for human motion. ACM Transactions on Graphics, 2005, 24(3):1082-1089 doi: 10.1145/1073204
    [19] Xia L, Chen C C, Aggarwal J K. View invariant human action recognition using histograms of 3D joints. In:Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. Providence, USA:IEEE, 2012. 20-27 http://www.oalib.com/references/16301407
    [20] Ellis C, Masood S Z, Tappen M F, LaViola J J Jr, Sukthankar R. Exploring the trade-off between accuracy and observational latency in action recognition. International Journal of Computer Vision, 2013, 101(3):420-436 doi: 10.1007/s11263-012-0550-7
    [21] Chen C, Liu K, Kehtarnavaz N. Real-time human action recognition based on depth motion maps. Journal of Real-Time Image Processing, 2016, 12(1):155-163 doi: 10.1007/s11554-013-0370-1
    [22] Gowayyed M A, Torki M, Hussein M E, El-Saban M. Histogram of oriented displacements (HOD):describing trajectories of human joints for action recognition. In:Proceedings of the 2013 International Joint Conference on Artificial Intelligence. Beijing, China, AAAI Press, 2013. 1351-1357
    [23] Vemulapalli R, Arrate F, Chellappa R. Human action recognition by representing 3D skeletons as points in a lie group. In:Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA:IEEE, 2014. 588-595
  • 加载中
图(11) / 表(6)
计量
  • 文章访问数:  2936
  • HTML全文浏览量:  434
  • PDF下载量:  1168
  • 被引次数: 0
出版历程
  • 收稿日期:  2015-10-20
  • 录用日期:  2016-02-14
  • 刊出日期:  2016-07-01

目录

    /

    返回文章
    返回