Human Action Detection Based on Tracking Region of Maximum Mutual Information
-
摘要: 人体行为检测问题不仅需要判断行为的类别,而且需要估计行为发生的时间和位置,有重要的现实应用意义. 人体行为检测的主要难点在于参数空间维度高以及背景运动干扰. 针对上述难点,本文提出了一种基于最大互信息区域跟踪的人体行为检测算法. 该算法将行为区域定义为最大互信息矩形区域,采用稠密轨迹作为底层特征,利用随机森林学习轨迹特征与行为类别的互信息函数,利用轨迹的时间连续性对行为区域进行大时间跨度的预测和跟踪. 实验结果表明,该算法不仅能够有效地识别不同类别的行为,而且能够适应现实场景中背景运动的干扰,从而准确地检测和跟踪行为区域.Abstract: Human action detection tries to estimate not only the category but also the time and place of the action, which are significant for real-world applications. The main difficulties of action detection lie in the high dimensionality of the parameter space and the distraction of dynamic background. To attack these difficulties, an algorithm based on tracking of the maximum-mutual-information region is presented in this paper. We define the action region as the region of maximum mutual information. We use dense trajectories as the low-level feature, learn the mutual information function between the trajectory and the action category via the random forest. Then, the action region is tracked in a large-time-span by making use of the continuity of trajectories. The experimental results show the effectiveness of our method in recognizing different actions, and the ability of accurately tracking the action region in large-time-span in the presence of dynamic background.
-
Key words:
- Action detection /
- action recognition /
- random forest /
- dense trajectory /
- mutual information
-
[1] Turaga P, Chellappa R, Subrahmanian V S, Udrea O. Machine recognition of human activities: a survey. IEEE Transactions on Circuits and Systems for Video Technology, 2008, 18(11): 1473-1488[2] Gu J X, Ding X Q, Wang S J, Wu Y S. Action and gait recognition from recovered 3-D human joints. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2010, 40(4): 1021-1033[3] Gu Jun-Xia, Ding Xiao-Qing, Wang Sheng-Jin. Human 3D model-based 2D action recognition. Acta Automatica Sinica, 2010, 36(1): 46-53(谷军霞, 丁晓青, 王生进. 基于人体行为3D模型的2D行为识别. 自动化学报, 2010, 36(1): 46-53)[4] Du You-Tian, Chen Feng, Xu Wen-Li. Approach to human activity multi-scale analysis and recognition based on multi-layer dynamic Bayesian network. Acta Automatica Sinica, 2009, 35(3): 225-232(杜友田, 陈峰, 徐文立. 基于多层动态贝叶斯网络的人的行为多尺度分析及识别方法. 自动化学报, 2009, 35(3): 225-232)[5] Yu G, Goussies N A, Yuan J S, Liu Z C. Fast action detection via discriminative random forest voting and top-K subvolume search. IEEE Transactions on Multimedia, 2011, 13(3): 507-517[6] Ryoo M S, Aggarwal J K. Spatio-temporal relationship match: video structure comparison for recognition of complex human activities. In: Proceedings of the 12th International Conference on Computer Vision. Kyoto, Japan, Brazil: IEEE, 2009. 1593-1600[7] Yuan J S, Liu Z C, Wu Y. Discriminative video pattern search for efficient action detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 33(9): 1728-1743[8] Gall J, Yao A, Razavi N, Van Gool L, Lempitsky V. Hough forests for object detection, tracking, and action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(11): 2188-2202[9] Yao A, Gall J, Van Gool L. A Hough transform-based voting framework for action recognition. In: Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition. San Francisco, CA, USA: IEEE, 2010. 2061-2068[10] Rodriguez M D, Ahmed J, Shah M. Action MACH: a spatio-temporal maximum average correlation height filter for action recognition. In: Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, Alaska, USA: IEEE, 2008. 1-8[11] Derpanis K G, Sizintsev M, Cannons K, Wildes R P. Efficient action spotting based on a spacetime oriented structure representation. In: Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition. San Francisco, CA, USA: IEEE, 2010. 1990-1997[12] Oikonomopoulos A, Patras I, Pantic M. Spatiotemporal localization and categorization of human actions in unsegmented image sequences. IEEE Transactions on Image Processing, 2011, 20(4): 1126-1140[13] Cao L L, Liu Z C, Huang T S. Cross-dataset action detection. In: Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition. San Francisco, CA, USA: IEEE, 2010. 1998-2005[14] Laptev I, Cedex R. On space-time interest points. International Journal of Computer Vision, 2005, 64(2-3): 107-123[15] Wang H, Ullah M M, Klser A, Laptev I, Schmid C. Evaluation of local spatio-temporal features for action recognition. In: Proceedings of the 2009 British Machine Vision Conference. London, UK: Springer-Verlag, 2009. 1-11[16] Wang H, Klaser A, Schmid C, Liu C L. Action recognition by dense trajectories. In: Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition. Colorado Springs, CO, USA: IEEE, 2011. 3169-3176[17] Lampert C H, Blaschko M B, Hofmann T. Beyond sliding windows: object localization by efficient subwindow search. In: Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, Alaska, USA: IEEE, 2008. 1-8[18] Schuldt C, Laptev I, Caputo B. Recognizing human actions: a local SVM approach. In: Proceedings of the 17th International Conference on Pattern Recognition. Cambridge, England, UK: IEEE, 2004. 32-36[19] Chen C C, Aggarwal J K. Modeling human activities as speech. In: Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI: IEEE, 2011. 3425-3432[20] Yan K, Sukthankar R, Hebert M. Event detection in crowded videos. In: Proceedings of the 11th International Conference on Computer Vision. Rio de Janeiro, Brazil: IEEE, 2007. 1-8
点击查看大图
计量
- 文章访问数: 1958
- HTML全文浏览量: 42
- PDF下载量: 985
- 被引次数: 0