基于时序深度置信网络的在线人体动作识别

周风余; 尹建芹; 杨阳; 张海婷; 袁宪锋

doi:10.16383/j.aas.2016.c150629

基于时序深度置信网络的在线人体动作识别

doi: 10.16383/j.aas.2016.c150629

周风余^1,,
尹建芹^1,2, ,,
杨阳^1,,
张海婷^1,,
袁宪锋^1,

1.
山东大学控制科学与工程学院济南 250061
2.
济南大学信息科学与工程学院山东省网络环境智能计算技术重点实验室济南 250022

基金项目:

国家自然科学基金 61203341

山东省自然科学基金重点项目 ZR2015QZ08

国家自然科学基金 61375084

详细信息

作者简介:
周风余山东大学控制科学与工程学院教授.2008年获得天津大学电气与自动化工程学院博士学位.主要研究方向为智能机器人技术.E-mail:zhoufengyu@sdu.edu.cn

杨阳山东大学信息科学与工程学院讲师.2009年获得山东大学信息科学与工程学院博士学位.主要研究方向为图像处理与目标跟踪.E-mail:yangyang@mail.sdu.edu.cn

张海婷山东大学控制科学与工程学院硕士研究生.2011年获得山东大学工学学士学位.主要研究方向为深度学习与图像处理.E-mail:546597163@qq.com

袁宪锋山东大学控制科学与工程学院博士研究生.2011年获得山东大学工学学士学位.主要研究方向为机器学习与服务机器人.E-mail:yuanxianfeng_sdu@126.com

通讯作者:
尹建芹济南大学信息科学与工程学院副教授.2013年获得山东大学控制科学与工程学院博士学位.主要研究方向为图像处理与机器学习.本文通信作者.E-mail:ise_yinjq@ujn.edu.cn

计量
- 文章访问数: 3317
- HTML全文浏览量: 506
- PDF下载量: 1190
- 被引次数: 0
出版历程
- 收稿日期: 2015-10-20
- 录用日期: 2016-02-14
- 刊出日期: 2016-07-01

Online Recognition of Human Actions Based on Temporal Deep Belief Neural Network

1.
School of Control Science and Engineering, Shandong University, Jinan 250061
2.
Shandong Provincial Key Laboratory of Network Based Intelligent Computing, School of Information Science and Engineering, University of Jinan, Jinan 250022

Funds:

National Natural Science Foundation of China 61203341

Key Program of Natural Science Foundation of Shandong Province ZR2015QZ08

National Natural Science Foundation of China 61375084

More Information

Author Bio:
Professor at the School of Control Science and Engineering, Shandong University.He received his Ph.D.degree from Tianjin University in 2008.His main research interest is technology of intelligent robot

Lecturer at the School of Information Science and Technology, Shandong University.He received his Ph.D.degree from the School of Information Science and Technology, Shandong University in 2009.His research interest covers image processing and object tracking

Master student at the School of Control Science and Engineering, Shandong University. She received her bachelor degree from Shandong University in 2011.Her research interest covers deep learning and image processing

Ph.D.candidate at the School of Control Science and Engineering, Shandong University. He received his bachelor degree from Shandong University in 2011.His research interest covers machine learning and service robot

Corresponding author: YIN Jian-Qin Associate professor at the School of Information Science and Technology, Jinan University.She received her Ph.D.degree from the School of Control Science and Engineering, Shandong University in 2013. Her research interest covers image processing and machine learning.Corresponding author of this paper.

摘要

摘要: 在线人体动作识别是人体动作识别的最终目标，但由于如何分割动作序列是一个待解决的难点问题，因此目前大多数人体动作识别方法仅关注在分割好的动作序列中进行动作识别，未关注在线人体动作识别问题.本文针对这一问题，提出了一种可以完成在线人体动作识别的时序深度置信网络（Temporal deep belief network, TDBN）模型.该模型充分利用动作序列前后帧提供的上下文信息，解决了目前深度置信网络模型仅能识别静态图像的问题，不仅大大提高了动作识别的准确率，而且由于该模型不需要人为对动作序列进行分割，可以从动作进行中的任意时刻开始识别，实现了真正意义上的在线动作识别，为实际应用打下了较好的理论基础.
- 人体动作识别 /
- 时序深度置信网络 /
- 条件限制玻尔兹曼机 /
- 在线动作识别
Abstract: Online human action recognition is the ultimate goal of human action recognition. However, how to segment the action sequence is a difficult problem to be solved. So far, most human action recognition algorithms are only concerned with the action recognition within a segmented action sequences. In order to solve this problem, a deep belief network (DBN) model is proposed which can handle sequential time series data. This model makes full use of the action sequences and frames to provide contextual information so that it can handle video data. Moreover, this model not only greatly improves the action recognition accuracy, but also realizes online action recognition. So it lays a good theoretical foundation for practical applications.
- Human action recognition /
- temporal deep belief network (TDBN) /
- conditional restricted Boltzmann machine (CRBM) /
- online action recognition

HTML全文

图 1 条件限制玻尔兹曼机结构

Fig. 1 The structure of conditional restricted Boltzmann machines

下载: 全尺寸图片幻灯片

图 2 时序深度置信网络结构

Fig. 2 The structure of the temporal deep belief network

下载: 全尺寸图片幻灯片

图 3 MIT数据库关节示意图

Fig. 3 Illustration of the skeleton of MIT

下载: 全尺寸图片幻灯片

图 4 CRBM学习过程流程图

Fig. 4 Flowchart of the learning of CRBM

下载: 全尺寸图片幻灯片

图 5 全局微调流程图

Fig. 5 Flowchart of the global weights adjustment

下载: 全尺寸图片幻灯片

图 6 MIT数据库的识别结果

Fig. 6 Recognition results on MIT datasets

下载: 全尺寸图片幻灯片

图 7 MIT数据库的混淆矩阵

Fig. 7 Confusion matrix of MIT dataset

下载: 全尺寸图片幻灯片

图 8 CRBM的权重分布示意图

Fig. 8 Illustration of the distribution of the weights of CRBM

下载: 全尺寸图片幻灯片

图 9 3D数据库动作示意图

Fig. 9 Illustration of the action of MSR Action 3D

下载: 全尺寸图片幻灯片

图 10 3D数据库关节示意图

Fig. 10 Illustration of the Skeleton of MSR Action 3D

下载: 全尺寸图片幻灯片

图 11 MSR Action 3D数据库 $AS1_{2}$的混淆矩阵

Fig. 11 Confusion matrix of MSR Action 3D of $AS1_{2}$

下载: 全尺寸图片幻灯片

表 1 测试1和测试2中整个序列的识别结果(%)

Table 1 Results of the sequences (%)

	ASl₁	AS2₁	AS3₁	AS1₂	AS2₂	AS3₂
本文一CRBM	92.23	89.46	92.05	95.62	93.42	95.67
本文一TDBN	96.67	92.81	96.68	99.33	97.44	99.87
Li等^[2]	89.5	89.0	96.3	93.4	92.9	96.3
Xia等^[19]	98.47	96.67	93.47	98.61	97.92	94.93
Yang等^[3]	97.3	92.2	98.0	98.7	94.7	98.7

下载: 导出CSV

表 2 测试3中本文算法与其他算法的比较(%)

Table 2 Comparisons between our method

	ASl₁	AS2₁	AS3₁	Average
Li等^[2]	72.9	71.9	79.2	74.7
Chen等^[21]	96.2	83.2	92.0	90.47
Gowayyed等^[22]	92.39	90.18	91.43	91.26
Vemulapalli等^[23]	95.29	83.87	98.22	92.46
Du等^[13]	93.33	94.64	95.50	94.49
TDBN	97.01	94.22	98.34	96.52

下载: 导出CSV

表 3 前5帧的识别结果(%)

Table 3 Recognition results of the first 5 sequences (%)

	ASl₁	AS2₁	AS3₁	AS1₂	AS2₂	AS3₂
本文	79.84	79.35	82.93	90.78	92.76	94.66
Yang等^[3]	67±1	67±1	74±1	77±1	75±1	82±1

下载: 导出CSV

表 4 全部实验识别结果(%)

Table 4 All recognition results (%)

	1	5	整个动作
ASl₁	77.55	79.84	96.67
AS2₁	78.01	79.35	92.81
AS3₁	81.60	82.93	96.68
平均	79.05	80.71	95.39
ASl₂	89.74	90.78	99.33
AS2₂	90.78	92.76	97.44
AS3₂	93.00	94.66	99.87
平均	91.17	92.73	98.88

下载: 导出CSV

表 5 不同阶数的识别时间（ms)

Table 5 Recognition time with different orders (ms)

Action	n
Action	0	1	2	3	4	5	6
Horizontal arm wave	4.45	9.78	12.56	14.61	17.21	19.45	23.51
Hammer	3.67	9.89	11.89	14.39	17.12	19.78	22.13
Forward punch	3.79	10.03	12.54	14.48	17.49	20.01	22.56
High throw	3.96	9.92	12.48	14.68	17.73	19.21	22.67
Hand clap	4.13	9.99	12.49	14.63	17.62	19.84	22.78
Bend	4.78	9.79	12.34	14.61	17.94	19.47	21.87
Tennis serve	4.56	9.67	12.52	14.65	17.56	19.49	22.46
Pickup and throw	3.71	9.97	12.67	14.51	17.83	19.92	22.81

下载: 导出CSV

表 6 不同阶数的识别率(%)

Table 6 Recognition rates with different orders (%)

Action	n
Action	0	1	2	3	4	5	6
Horizontal arm wave	81.56	85.39	89.12	90.03	91.45	89.97	87.68
Hammer	82.56	86.48	85.34	87.50	86.84	88.10	87.98
Forward punch	73.67	76.45	78.78	79.19	77.87	78.16	78.45
High throw	72.78	73.46	76.98	79.92	79.23	78.89	76.75
Hand clap	87.65	93.78	98.34	98.65	97.85	96.12	96.23
Bend	80.13	81.35	84.56	86.43	86.72	85.97	83.85
Tennis serve	88.74	91.67	92.89	93.67	93.35	92.89	92.54
Pickup and throw	83.81	86.34	86.94	88.34	87.13	87.67	97.56
Average	81.36	84.37	86.62	87.97	87.56	87.22	87.63

下载: 导出CSV

参考文献(23)

[1]	(佟丽娜, 侯增广, 彭亮, 王卫群, 陈翼雄, 谭民.基于多路sEMG时序分析的人体运动模式识别方法.自动化学报, 2014, 40(5):810-821) http://www.aas.net.cn/CN/abstract/abstract18349.shtml Tong Li-Na, Hou Zeng-Guang, Peng Liang, Wang Wei-Qun, Chen Yi-Xiong, Tan Min. Multi-channel sEMG time series analysis based human motion recognition method. Acta Automatica Sinica, 2014, 40(5):810-821 http://www.aas.net.cn/CN/abstract/abstract18349.shtml
[2]	Li W Q, Zhang Z Y, Liu Z C. Action recognition based on a bag of 3D points. In:Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. San Francisco, CA:IEEE, 2010. 9-14 http://www.oalib.com/references/17186320
[3]	Yang X D, Zhang C Y, Tian Y L. Recognizing actions using depth motion maps-based histograms of oriented gradients. In:Proceedings of the 20th ACM International Conference on Multimedia. Nara, Japan:ACM, 2012. 1057-1060 http://www.oalib.com/paper/4237135
[4]	Ofli F, Chaudhry R, Kurillo G, Vidal R, Bajcsy R. Sequence of the most informative joints (SMIJ):a new representation for human skeletal action recognition. Journal of Visual Communication & Image Representation, 2014, 25(1):24-38 http://cn.bing.com/academic/profile?id=2125073690&encoded=0&v=paper_preview&mkt=zh-cn
[5]	Theodorakopoulos I, Kastaniotis D, Economou G, Fotopoulos S. Pose-based human action recognition via sparse representation in dissimilarity space. Journal of Visual Communication and Image Representation, 2014, 25(1):12-23 doi: 10.1016/j.jvcir.2013.03.008
[6]	(王斌, 王媛媛, 肖文华, 王炜, 张茂军.基于判别稀疏编码视频表示的人体动作识别.机器人, 2012, 34(6):745-750) doi: 10.3724/SP.J.1218.2012.00745 Wang Bin, Wang Yuan-Yuan, Xiao Wen-Hua, Wang Wei, Zhang Mao-Jun. Human action recognition based on discriminative sparse coding video representation. Robot, 2012, 34(6):745-750 doi: 10.3724/SP.J.1218.2012.00745
[7]	(田国会, 尹建芹, 韩旭, 于静.一种基于关节点信息的人体行为识别新方法.机器人, 2014, 34(3):285-292) http://www.cnki.com.cn/Article/CJFDTOTAL-JQRR201403005.htm Tian Guo-Hui, Yin Jian-Qin, Han Xu, Yu Jing. A novel human activity recognition method using joint points information. Robot, 2014, 34(3):285-292 http://www.cnki.com.cn/Article/CJFDTOTAL-JQRR201403005.htm
[8]	(乔俊飞, 潘广源, 韩红桂.一种连续型深度信念网的设计与应用.自动化学报, 2015, 41(12):2138-2146) http://www.aas.net.cn/CN/Y2015/V41/I12/2138 Qiao Jun-Fei, Pan Guang-Yuan, Han Hong-Gui. Design and application of continuous deep belief network. Acta Automatica Sinica, 2015, 41(12):2138-2146 http://www.aas.net.cn/CN/Y2015/V41/I12/2138
[9]	Zhao S C, Liu Y B, Han Y H, Hong R C. Pooling the convolutional layers in deep convnets for action recognition[Online], available:http://120.52.73.77/arxiv.org/pdf/1511.02126v1.pdf, November 1, 2015. http://cn.bing.com/academic/profile?id=1969767174&encoded=0&v=paper_preview&mkt=zh-cn
[10]	Liu C, Xu W S, Wu Q D, Yang G L. Learning motion and content-dependent features with convolutions for action recognition. Multimedia Tools and Applications, 2015, http://dx.doi.org/10.1007/s11042-015-2550-4.
[11]	Baccouche M, Mamalet F, Wolf C, Garcia C, Baskurt A. Sequential deep learning for human action recognition. Human Behavior Understanding. Berlin:Springer, 2011. 29-39
[12]	Lefebvre G, Berlemont S, Mamalet F, Garcia C. BLSTM-RNN based 3d gesture classification. Artificial Neural Networks and Machine Learning. Berlin:Springer, 2013. 381-388
[13]	Du Y, Wang W, Wang L. Hierarchical recurrent neural network for skeleton based action recognition. In:Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA:IEEE, 2015. 1110-1118
[14]	Taylor G W, Hinton G E, Roweis S. Modeling human motion using binary latent variables. In:Proceedings of Advances in Neural Information Processing Systems. Cambridge, MA:MIT Press, 2007. 1345-1352
[15]	Hinton G E, Osindero S. A fast learning algorithm for deep belief nets. Neural Computation, 2006, 18:1527-1554 doi: 10.1162/neco.2006.18.7.1527
[16]	Bengio Y, Lamblin P, Popovici D, Larochelle H. Personal communications with Will Zou. learning optimization Greedy layerwise training of deep networks. In:Proceedings of Advances in Neural Information Processing Systems. Cambridge, MA:MIT Press, 2007.
[17]	Rumelhart D E, Hinton G E, Williams R J. Learning representations by back-propagating errors. Nature, 1986, 323(6088):533-536 doi: 10.1038/323533a0
[18]	Hsu E, Pulli K, PopovićJ. Style translation for human motion. ACM Transactions on Graphics, 2005, 24(3):1082-1089 doi: 10.1145/1073204
[19]	Xia L, Chen C C, Aggarwal J K. View invariant human action recognition using histograms of 3D joints. In:Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. Providence, USA:IEEE, 2012. 20-27 http://www.oalib.com/references/16301407
[20]	Ellis C, Masood S Z, Tappen M F, LaViola J J Jr, Sukthankar R. Exploring the trade-off between accuracy and observational latency in action recognition. International Journal of Computer Vision, 2013, 101(3):420-436 doi: 10.1007/s11263-012-0550-7
[21]	Chen C, Liu K, Kehtarnavaz N. Real-time human action recognition based on depth motion maps. Journal of Real-Time Image Processing, 2016, 12(1):155-163 doi: 10.1007/s11554-013-0370-1
[22]	Gowayyed M A, Torki M, Hussein M E, El-Saban M. Histogram of oriented displacements (HOD):describing trajectories of human joints for action recognition. In:Proceedings of the 2013 International Joint Conference on Artificial Intelligence. Beijing, China, AAAI Press, 2013. 1351-1357
[23]	Vemulapalli R, Arrate F, Chellappa R. Human action recognition by representing 3D skeletons as points in a lie group. In:Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA:IEEE, 2014. 588-595

施引文献

资源附件(0)

访问统计

图(11) / 表(6)

计量

文章访问数: 3317
HTML全文浏览量: 506
PDF下载量: 1190
被引次数: 0

姓名
邮箱
手机号码
标题
留言内容
验证码

留言板

基于时序深度置信网络的在线人体动作识别

doi: 10.16383/j.aas.2016.c150629

通讯作者:
尹建芹济南大学信息科学与工程学院副教授.2013年获得山东大学控制科学与工程学院博士学位.主要研究方向为图像处理与机器学习.本文通信作者.E-mail:ise_yinjq@ujn.edu.cn

计量

Online Recognition of Human Actions Based on Temporal Deep Belief Neural Network

计量

目录

留言板

基于时序深度置信网络的在线人体动作识别

doi: 10.16383/j.aas.2016.c150629

通讯作者: 尹建芹济南大学信息科学与工程学院副教授.2013年获得山东大学控制科学与工程学院博士学位.主要研究方向为图像处理与机器学习.本文通信作者.E-mail:ise_yinjq@ujn.edu.cn

计量

出版历程

Online Recognition of Human Actions Based on Temporal Deep Belief Neural Network

计量

出版历程

目录

通讯作者:
尹建芹济南大学信息科学与工程学院副教授.2013年获得山东大学控制科学与工程学院博士学位.主要研究方向为图像处理与机器学习.本文通信作者.E-mail:ise_yinjq@ujn.edu.cn