2.765

2022影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

姿态特征与深度特征在图像动作识别中的混合应用

钱银中 沈一帆

钱银中, 沈一帆. 姿态特征与深度特征在图像动作识别中的混合应用. 自动化学报, 2019, 45(3): 626-636. doi: 10.16383/j.aas.2018.c170294
引用本文: 钱银中, 沈一帆. 姿态特征与深度特征在图像动作识别中的混合应用. 自动化学报, 2019, 45(3): 626-636. doi: 10.16383/j.aas.2018.c170294
QIAN Yin-Zhong, SHEN Yi-Fan. Hybrid of Pose Feature and Depth Feature for Action Recognition in Static Image. ACTA AUTOMATICA SINICA, 2019, 45(3): 626-636. doi: 10.16383/j.aas.2018.c170294
Citation: QIAN Yin-Zhong, SHEN Yi-Fan. Hybrid of Pose Feature and Depth Feature for Action Recognition in Static Image. ACTA AUTOMATICA SINICA, 2019, 45(3): 626-636. doi: 10.16383/j.aas.2018.c170294

姿态特征与深度特征在图像动作识别中的混合应用

doi: 10.16383/j.aas.2018.c170294
基金项目: 

常州信息职业技术学院自然科学项目 CXZK201803Z

江苏高校品牌专业建设工程资助项目 PPZY2015A090

详细信息
    作者简介:

    沈一帆    复旦大学计算机科学技术学院教授.研究方向为计算机图形学和科学计算的可视化.E-mail:yfshen@fudan.edu.cn

    通讯作者:

    钱银中    复旦大学计算机科学技术学院博士研究生, 常州信息职业技术学院副教授.主要研究方向为计算机视觉和机器学习.本文通信作者.E-mail:yinzhongqian10@fudan.edu.cn

Hybrid of Pose Feature and Depth Feature for Action Recognition in Static Image

Funds: 

Natural Science Project of Changzhou College of Information Technology CXZK201803Z

Top-notch Academic Programs Project of Jiangsu Higher Education Institutions PPZY2015A090

More Information
    Author Bio:

    Professor at the School of Computer Science, Fudan University. His research interest covers computer graphics and scientiflc computing visualization

    Corresponding author: QIAN Yin-Zhong Ph. D. candidate at the School of Computer Science, Fudan University. He is also an associate professor in Changzhou College of Information Technology. His research interest covers computer vision and machine learning. Corresponding author of this paper
  • 摘要: 人体姿态是动作识别的重要语义线索,而CNN能够从图像中提取有很强判别能力的深度特征,本文从图像局部区域提取姿态特征,从整体图像中提取深度特征,探索两者在动作识别中的互补作用.首先介绍了一种姿态表示方法,每个肢体部件的姿态由描述该部件姿态的一组Poselet检测得分表示.为了抑制检测错误,设计了基于部件的模型作为检测上下文.为了从数量有限的数据集中训练CNN网络,本文使用了预训练和精细调节的方法.在两个数据集中的实验表明,本文介绍的姿态特征与深度特征混合使用,动作识别性能得到了极大提升.
    1)  本文责任编委 赖剑煌
  • 图  6  静止图像数据集中的部分图像

    Fig.  6  Some images in static image data set

    图  1  打高尔夫球动作中部分胳膊Poselet训练实例

    Fig.  1  Instances for some arm poselets in playing golf

    图  2  层次部件树

    Fig.  2  Hierarchical part tree

    图  3  Poselet上下文模型

    Fig.  3  Poselet context

    图  4  在上下文环境中检测Poselet

    Fig.  4  Detecting Poselet in context

    图  5  提取深度特征

    Fig.  5  Extract deep features

    图  7  标注了关键点的图像

    Fig.  7  Some images with annotated key points

    图  8  视频截图数据集中的部分图像

    Fig.  8  ome images in video data set

    图  9  使用预备模型前后CNN训练过程top1错误率比较

    Fig.  9  Comparison of top1 error between whether using pre trained model

    图  10  姿态特征识别正确而深度特征识别错误的图像

    Fig.  10  Some images recognized accurately by pose feature but falsely by deep feature

    图  11  深度特征识别正确而姿态特征识别错误的图像

    Fig.  11  Some images recognized accurately by deep feature but falsely by pose feature

    表  1  静止图像数据集上的动作识别精度(%)

    Table  1  Precision on static image data set (%)

    方法 姿态特征平均精度 基准平均精度 与基准比较
    四节点星型[17] 61.07 56.45 + 4.62
    20节点模型[18] 65.15 62.8 + 2.35
    POSELETS[7] 61.33 56.41 + 4.92
    CNN黑盒特征[31] 67.20 56.41 + 10.79
    本文的姿态特征 66.40 56.41 + 9.99
    下载: 导出CSV

    表  2  视频截图数据集上的动作识别精度(%)

    Table  2  Precision on image form video data set (%)

    方法 姿态特征平均精度 基准平均精度 与基准比较
    四节点星型[17] 50.58 46.98 + 3.6
    multiLR NMF[11] 63.61 59.35 + 4.26
    POSELETS[7] 54.84 49.42 + 5.42
    CNN黑盒特征[31] 63.84 49.42 + 14.42
    本文的姿态特征 63.58 49.42 + 14.16
    下载: 导出CSV

    表  3  静止图像数据集姿态特征、CNN及混合后性能比较(%)

    Table  3  Precision comparison on static image data set (%)

    方法 舞蹈 打高尔夫球 跑步 行走 平均
    姿态特征 69 65 74 65 59 66.4
    CNN 72.4 76.4 70.2 73.9 65.6 71.7
    姿态特征+ C5 79.4 68.8 79.9 77.4 72.0 75.5
    姿态特征+ F6 78.5 70.2 77.9 78.3 74.5 75.8
    姿态特征+ F7 75.3 68.9 76.2 79.3 72.4 74.4
    下载: 导出CSV

    表  4  视频截图数据集姿态特征、CNN及混合后精度比较(%)

    Table  4  Precision comparision on video data set (%)

    方法 舞蹈 打高尔夫球 跑步 行走 平均
    姿态特征 52.1 91.5 83.6 39.4 51.4 63.6
    CNN 62.2 58.9 76.2 63.9 58.9 64.0
    姿态特征+ C5 63.4 65.7 82.3 61.5 65.5 67.6
    姿态特征+ F6 69.8 64.6 84.5 64.3 66.3 69.9
    姿态特征+ F7 67.5 64.1 82.5 63.7 65.8 68.7
    下载: 导出CSV
  • [1] Aggarwal J K, Ryoo M S. Human activity analysis:a review. ACM Computing Surveys, 2011, 43(3):Article No.16 http://d.old.wanfangdata.com.cn/Periodical/dlkxjz201407009
    [2] Jiang Y G, Wu Z X, Wang J, Xue X Y, Chang S F. Exploiting feature and class relationships in video categorization with regularized deep neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(2):352-364 http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=Arxiv000001111697
    [3] Wu Z X, Jiang Y G, Wang X, Ye H, Xue X Y. Multi-stream multi-class fusion of deep networks for video classification. In:Proceedings of the 2016 ACM on Multimedia Conference. Amsterdam, The Netherlands:ACM, 2016. 791-800
    [4] 朱煜, 赵江坤, 王逸宁, 郑兵兵.基于深度学习的人体行为识别算法综述.自动化学报, 2016, 42(6):848-857 http://www.aas.net.cn/CN/abstract/abstract18875.shtml

    Zhu Yu, Zhao Jiang-Kun, Wang Yi-Ning, Zheng Bing-Bing. A review of human action recognition based on deep learning. Acta Automatica Sinica, 2016, 42(6):848-857 http://www.aas.net.cn/CN/abstract/abstract18875.shtml
    [5] 关秋菊, 罗晓牧, 郭雪梅, 王国利.基于隐马尔科夫模型的人体动作压缩红外分类.自动化学报, 2017, 43(3):398-406 http://www.aas.net.cn/CN/abstract/abstract19018.shtml

    Guan Qiu-Ju, Luo Xiao-Mu, Guo Xue-Mei, Wang Guo-Li. Compressive infrared classification of human motion using HMM. Acta Automatica Sinica, 2017, 43(3):398-406 http://www.aas.net.cn/CN/abstract/abstract19018.shtml
    [6] Guo G D, Lai A. A survey on still image based human action recognition. Pattern Recognition, 2014, 47(10):3343-3361 doi: 10.1016/j.patcog.2014.04.018
    [7] Maji S, Bourdev L, Malik J. Action recognition from a distributed representation of pose and appearance. In:Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition. Colorado Springs, USA:IEEE, 2011. 3177-3184
    [8] Yao B P, Li F F. Action recognition with exemplar based 2.5D graph matching. In:Proceedings of the 12th European Conference on Computer Vision. Florence, Italy:Springer, 2012. 173-186
    [9] Yao B P, Jiang X Y, Khosla A, Lin A L, Guibas L, Li F F. Human action recognition by learning bases of action attributes and parts. In:Proceedings of the 2011 IEEE International Conference on Computer Vision. Barcelona, Spain:IEEE, 2011. 1331-1338
    [10] Delaitre V, Laptev I, Sivic J. Recognizing human actions in still images:a study of bag-of-features and part-based representations. In:Proceedings of the 21st British Machine Vision Conference. Aberystwyth, UK:BMVC Press, 2010. 1-11
    [11] Ikizler-Cinbis N, Cinbis R G, Sclaroff S. Learning actions from the Web. In:Proceedings of the 2009 IEEE 12th International Conference on Computer Vision. Kyoto, Japan:IEEE, 2009. 995-1002
    [12] Wang Y, Jiang H, Drew M S, Li Z N, Mori G. Unsupervised discovery of action classes. In:Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York, USA:IEEE, 2006. 1654-1661
    [13] Felzenszwalb P F, Huttenlocher D P. Pictorial structures for object recognition. International Journal of Computer Vision, 2005, 61(1):55-79 doi: 10.1023/B:VISI.0000042934.15159.49
    [14] Yang Y, Ramanan D. Articulated pose estimation with flexible mixtures-of-parts. In:Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition. Colorado Springs, USA:IEEE, 2011. 1385-1392
    [15] Bourdev L, Malik J. Poselets:body part detectors trained using 3D human pose annotations. In:Proceedings of IEEE the 12th International Conference on Computer Vision. Kyoto, Japan:IEEE, 2009. 1365-1372
    [16] Bourdev L, Maji S, Brox T, Malik J. Detecting people using mutually consistent poselet activations. In:Proceedings of the 11th European Conference on Computer Vision. Heraklion, Crete, Greece:Springer, 2010. 168-181
    [17] Yang W L, Wang Y, Mori G. Recognizing human actions from still images with latent poses. In:Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition. San Francisco, USA:IEEE, 2010. 2030-2037
    [18] Wang Y, Tran D, Liao Z C, Forsyth D. Discriminative hierarchical part-based models for human parsing and action recognition. Journal of Machine Learning Research, 2012, 13(1):3075-3102 http://cn.bing.com/academic/profile?id=b01a8d0fa4dfaa33e7d60d84dc664875&encoded=0&v=paper_preview&mkt=zh-cn
    [19] Ni B B, Moulin P, Yang X K, Yan S C. Motion part regularization:improving action recognition via trajectory group selection. In:Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA:IEEE, 2015. 3698-3706
    [20] Wang H, Schmid C. Action recognition with improved trajectories. In:Proceedings of the 2013 IEEE International Conference on Computer Vision. Sydney, Australia:IEEE, 2013. 3551-3558
    [21] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In:Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada, USA:ACM, 2012. 84-90
    [22] Simonyan K, Zisserman A. Two-stream convolutional networks for action recognition in videos. In:Proceedings of the 2014 Advances in Neural Information Processing Systems. Montrál, Canada:NIPS, 2014. 1-8
    [23] Goodale M A, Milner A D. Separate visual pathways for perception and action. Trends in Neurosciences, 1992, 15(1):20-25 doi: 10.1016/0166-2236(92)90344-8
    [24] Gkioxari G, Malik J. Finding action tubes. In:Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA:IEEE, 2015. 759-768
    [25] Cheron G, Laptev I, Schmid C. P-CNN:pose-based CNN features for action recognition. In:Proceedings of the 2015 IEEE International Conference on Computer Vision. Santiago, Chile:IEEE, 2015. 3218-3226
    [26] Chen J W, Wu J, Konrad J, Ishwar P. Semi-coupled two-stream fusion ConvNets for action recognition at extremely low resolutions. In:Proceedings of 2017 IEEE Winter Conference on Applications of Computer Vision. Santa Rosa, CA, USA:IEEE, 2017. 139-147
    [27] Shi Y M, Tian Y H, Wang Y W, Huang T J. Sequential deep trajectory descriptor for action recognition with three-stream CNN. IEEE Transactions on Multimedia, 2017, 19(7):1510-1520 doi: 10.1109/TMM.2017.2666540
    [28] Tu Z G, Cao J, Li Y K, Li B X. MSR-CNN:applying motion salient region based descriptors for action recognition. In:Proceedings of the 23rd International Conference on Pattern Recognition. Cancun, Mexico:IEEE, 2016. 3524-3529
    [29] Mikolajczyk K, Choudhury R, Schmid C. Face detection in a video sequence-a temporal approach. In:Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Kauai, HI, USA:IEEE, 2001. Ⅱ-96-Ⅱ-101
    [30] Ramanan D. Dual coordinate solvers for large-scale structural SVMs. USA:UC Irvine, 2013
    [31] Donahue J, Jia Y Q, Vinyals O, Hoffman J, Zhang N, Tzeng E, et al. DeCAF:a deep convolutional activation feature for generic visual recognition. In:Proceedings of the 31st International Conference on Machine Learning. Beijing, China:ICML, 2014. 647-655
    [32] Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In:Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA:IEEE, 2014. 580-587
    [33] Niebles J C, Han B, Ferencz A, Li F F. Extracting moving people from internet videos. In:Proceedings of the 10th European Conference on Computer Vision. Marseille, France:Springer, 2008. 527-540
  • 加载中
图(11) / 表(4)
计量
  • 文章访问数:  2416
  • HTML全文浏览量:  589
  • PDF下载量:  829
  • 被引次数: 0
出版历程
  • 收稿日期:  2017-06-01
  • 录用日期:  2018-01-20
  • 刊出日期:  2019-03-20

目录

    /

    返回文章
    返回