基于CNN的监控视频事件检测

王梦来; 李想; 陈奇; 李澜博; 赵衍运

doi:10.16383/j.aas.2016.c150729

基于CNN的监控视频事件检测

doi: 10.16383/j.aas.2016.c150729

北京邮电大学信息与通信工程学院多媒体通信与模式识别实验室北京 100876

详细信息

作者简介:
李想北京邮电大学信息与通信工程学院硕士研究生. 主要研究方向为计算机视觉与模式识别

陈奇北京邮电大学信息与通信工程学院硕士研究生. 主要研究方向为计算机视觉与模式识别

李澜博北京邮电大学信息与通信工程学院硕士研究生. 主要研究方向为计算机视觉和大规模深度学习.

赵衍运北京邮电大学信息与通信工程学院副教授. 主要研究方向为计算机视觉与模式识别

通讯作者:
王梦来北京邮电大学信息与通信工程学院硕士研究生. 主要研究方向为计算机视觉和深度学习. 本文通信作者. E-mail: wangmenglai@bupt.edu.cn

中图分类号:
计量
- 文章访问数: 3805
- HTML全文浏览量: 1659
- PDF下载量: 1165
- 被引次数: 0
出版历程
- 收稿日期: 2015-11-03
- 录用日期: 2016-04-01
- 刊出日期: 2016-06-20

Surveillance Event Detection Based on CNN

Multimedia Communication and Pattern Recognition Laboratory, School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing 100876

More Information

Author Bio:
(LI Xiang Master student at the School of Information and Communication Engineering, Beijing University of Posts and Telecommunications. His research interest covers computer vision and pattern recognition

CHEN Qi Master student at the School of Information and Communication Engineering, Beijing University of Posts and Telecommunications. His research interest covers computer vision and pattern recognition

LI Lan-Bo Master student at the School of Information and Communica- tion Engineering, Beijing University of Posts and Telecommunications. His re- search interest covers computer vision and large scale deep learning.

ZHAO Yan-Yun Associate professor at the School of Information and Communication Engineering, Beijing University of Posts and Telecommunications. Her research interest covers com- puter vision and pattern recognition

Corresponding author: WANG Meng-Lai Master student at the School of Information and Com- munication Engineering, Beijing Uni- versity of Posts and Telecommunica- tions. His research interest covers computer vision and deep learning. Corresponding author of this paper. E-mail:wangmenglai@bupt.edu.cn

摘要

摘要: 复杂监控视频中事件检测是一个具有挑战性的难题, 而TRECVID-SED评测使用的数据集取自机场的实际监控视频,以高难度著称. 针对TRECVID-SED评测集, 提出了一种基于卷积神经网络(Convolutional neural network, CNN)级联网络和轨迹分析的监控视频事件检测综合方案. 在该方案中, 引入级联CNN网络在拥挤场景中准确地检测行人, 为跟踪行人奠定了基础; 采用CNN网络检测具有关键姿态的个体事件, 引入轨迹分析方法检测群体事件. 该方案在国际评测中取得了很好的评测排名: 在6个事件检测的评测中, 3个事件检测排名第一.
- 卷积神经网络 /
- 事件检测 /
- 行人检测 /
- 目标跟踪 /
- 轨迹分析
Abstract: It is well-known that event detection in real-world surveillance videos is a challenging task. The corpus of TRECVID-SED evaluation is acquired from the surveillance video of London Gatwick International Airport and it is well known for its high difficulties. We propose a comprehensive event detection framework based on an effective part-based deep network cascade——head-shoulder networks (HsNet) and trajectory analysis. On the one hand, the deep network detects pedestrians very precisely, laying a foundation for tracking pedestrians. On the other hand, convolutional neural networks (CNNs) are good at detecting key-pose-based single events. Trajectory analysis is introduced for group events. In TRECVID-SED15 evaluation, our approach outperformed others in 3 out of 6 events, demonstrating the power of our proposal.
- Convolutional neural network (CNN) /
- event detection /
- pedestrian detection /
- target tracking /
- trajectory analysis

HTML全文

图 1 头肩检测的级联深度网络(HsNet)结构^[26]

Fig. 1 The architecture of the CNN cascade for head-shoulder detection^[26]

下载: 全尺寸图片幻灯片

图 2 在线学习非线性运动模式及鲁棒外观模型的多目标跟踪算法框图^[13]

Fig. 2 The block diagram of multi-target tracking by online learning of non-linear motion patterns and robust appearance models^[13]

下载: 全尺寸图片幻灯片

图 3 Pointing和Embrace事件样本截图

Fig. 3 Samples of Pointing and Embrace

下载: 全尺寸图片幻灯片

图 4 4ObjectPut和PersonRuns事件样本截图

Fig. 4 Samples of ObjectPut and PersonRuns

下载: 全尺寸图片幻灯片

图 5 ObjectPut和PersonRuns事件关键姿态检测的网络结构

Fig. 5 The architecture of CNN for ObjectPut and PersonRuns key-pose detection

下载: 全尺寸图片幻灯片

图 6 群体事件检测框图

Fig. 6 The block diagram of group event detection

下载: 全尺寸图片幻灯片

图 7 头肩区域训练样本示例

Fig. 7 Samples of head-shoulder

下载: 全尺寸图片幻灯片

图 8 与当前最先进的检测方法在SED-PD上的对比^[26] (用平均对数漏检率排列,越小越好)

Fig. 8 Comparison of our results with several state-of-the-art methods on SED-PD^[26] (The legends are ordered by log-average miss-rate,the lower the better.

下载: 全尺寸图片幻灯片

图 9 在SED-PD上的部分检测结果^[26]

(红框表示正确检测,蓝框表示虚检,绿框表示漏检)

Fig. 9 Detection results on SED-PD^[26]

(red: correct detection,blue: false alarm,green: missed detection)

下载: 全尺寸图片幻灯片

图 10 高斯过程回归改进效果

Fig. 10 The improved results of Gaussian process regression

下载: 全尺寸图片幻灯片

表 1 2015年TRECVID-SED评测结果

Table 1 Evaluation Results of TRECVID-SED 2015

	排名	其他团队最好成绩(ADCR)	ADCR	#Targ	#CorDet	#FA	#Miss
Embrace	1	0.8680	0.7909	138	36	90	102
ObjectPut	1	1.0160	1.0120	289	2	33	287
PeopleMeet	4	0.8939	1.0426	256	30	278	226
PeopleSplitUp	2	0.8934	0.9387	152	24	168	128
PersonRuns	2	0.5768	0.9700	50	4	87	46
Pointing	1	1.0140	1.0040	794	16	42	778

下载: 导出CSV

参考文献(30)

[1]	Text Retrieval Conference (TREC)[Online], available: http://trec.nist.gov/, April 5, 2016
[2]	National Institute of Standards and Technology (NIST)[Online], available: http://www.nist.gov/index.html, April 5, 2016
[3]	TREC Video Retrieval Evaluation (TRECVID)[Online], available: http://www-nlpir.nist.gov/projects/trecvid/, April 5, 2016
[4]	Dollar P, Wojek C, Schiele B, Perona P. Pedestrian detection: an evaluation of the state of the art. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(4): 743-761
[5]	Benenson R, Omran M, Hosang J, Schiele B. Ten years of pedestrian detection, what have we learned? In: Proceedings of the 12th European Conference on Computer Vision. Zurich, Switzerland: Springer, 2014. 613-627
[6]	Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, USA: IEEE, 2005. 886-893
[7]	Felzenszwalb P, McAllester D, Ramanan D. A discriminatively trained, multiscale, deformable part model. In: Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, Alaska, USA: IEEE, 2008. 1-8
[8]	Ouyang W, Wang X. Joint deep learning for pedestrian detection. In: Proceedings of the 2013 IEEE International Conference on Computer Vision. Sydney, Australia: IEEE, 2013. 2056-2063
[9]	Luo P, Tian Y, Wang X, Tang X. Switchable deep network for pedestrian detection. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, Ohio, USA: IEEE, 2014. 899-906
[10]	Hosang J, Omran M, Benenson R, Schiele B. Taking a deeper look at pedestrians. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE, 2015. 4073-4082
[11]	Cuda-convnet. High-performance C++/CUDA implementation of convolutional neural networks[Online], available: https://code.google.com/p/cuda-convnet/, April 5, 2016
[12]	Huang C, Wu B, Nevatia R. Robust object tracking by hierarchical association of detection responses. In: Proceedings of the 10th European Conference on Computer Vision. Marseille, France: Springer, 2008. 788-801
[13]	Yang B, Nevatia R. Multi-target tracking by online learning of non-linear motion patterns and robust appearance models. In: Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, USA: IEEE, 2012. 1918-1925
[14]	Soomro K, Zamir A R, Shah M. UCF101: A Dataset of 101 Human Actions Classes from Videos in the Wild, Technical Report CRCV-TR-12-01, Center for Research in Computer Vision, University of Central Florida, USA, 2012.
[15]	Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T. HMDB: a large video database for human motion recognition. In: Proceedings of the 2011 IEEE International Conference on Computer Vision. Barcelona, Spain: IEEE, 2011. 2556-2563
[16]	Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Li F F. Large-scale video classification with convolutional neural networks. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, Ohio, USA: IEEE, 2014. 1725-1732
[17]	Simonyan K, Zisserman A. Two-stream convolutional networks for action recognition in videos. In: Proceedings of the 2014 Conference and Workshop on Neural Information Processing Systems. Montreal, Canada, 2014. 568-576
[18]	Over P, Awad G, Fiscus J, Michel M, Smeaton A F, Kraaij W. TRECVID 2009-goals, tasks, data, evaluation mechanisms and metrics. In: TRECVid Workshop 2009. Gaithersburg, MD, USA: NIST, 2010. 1-42
[19]	Du X Z, Cai Y, Zhao Y C, Li H, Yang Y, Hauptmann A. Informedia@trecvid 2014: surveillance event detection. TRECVid video retrieval evaluation workshop[Online], available:http://www-nlpir.nist.gov/projects/tvpubs/tv14.papers/cmu.pdf, April 5, 2016
[20]	Cheng Y, Brown L, Fan Q F, Liu J J, Feris R, Choudhary A, Pankanti S. IBM-Northwestern@TRECVID 2014: Surveillance Event Detection. TRECVid video retrieval evaluation workshop[Online], available: http://www.nlpir.nist.gov/projects/tvpubs/tv14.papers/ibm.pdf, April 5, 2016
[21]	Laptev I. On space-time interest points. International Journal of Computer Vision, 2005, 64(2-3): 107-123
[22]	Chen M Y, Hauptmann A. MoSIFT: Recognizing Human Actions in Surveillance Videos, Technical Report CMU-CS-09-161, Department of Computer Science, Mellon University, USA, 2009.
[23]	Lawrence S, Giles C L, Tsoi A C, Back A D. Face recognition: a convolutional neural-network approach. IEEE Transactions on Neural Networks, 1997, 8(1): 98-113
[24]	Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In: Proceedings of the 2012 Advances in Neural Information Processing Systems. Lake Tahoe, Nevada, USA: Curran Associates, Inc., 2012. 1097-1105
[25]	Jia Y Q, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T. Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia. Orlando, USA: ACM, 2014. 675-678
[26]	Chen Q, Jiang W H, Zhao Y Y, Su F. Part-based deep network for pedestrian detection in surveillance videos. In: Proceedings of the 2015 IEEE International Conference on Visual Communications and Image Processing. Singapore: IEEE, 2015. 1-4
[27]	李澜博. 纸币面值识别及监控视频跟踪算法[硕士学位论文], 北京邮电大学, 中国, 2015. Li Lan-Bo. Currency Recognition and Multi-Target Tracking Algorithm[Master dissertation], Beijing University of Posts and Communications, China, 2015.
[28]	Prince S J D. Computer Vision: Models, Learning, and Inference. Cambridge: Cambridge University Press, 2012.
[29]	SED Pedestrian Dataset (SED-PD)[Online], available: http://www.bupt-mcprl.net/datadownload.php, April 5, 2016
[30]	TRECVID Surveillance Event Detection (SED) Evaluation Plan[Online], available: ftp://jaguar.ncsl.nist.gov/pub/SED15_EvaluationPlan.pdf, April 5, 2016

施引文献

资源附件(0)

访问统计

图(10) / 表(1)

计量

文章访问数: 3805
HTML全文浏览量: 1659
PDF下载量: 1165
被引次数: 0

姓名
邮箱
手机号码
标题
留言内容
验证码

留言板

基于CNN的监控视频事件检测

doi: 10.16383/j.aas.2016.c150729

通讯作者:
王梦来北京邮电大学信息与通信工程学院硕士研究生. 主要研究方向为计算机视觉和深度学习. 本文通信作者. E-mail: wangmenglai@bupt.edu.cn

计量

Surveillance Event Detection Based on CNN

Corresponding author: WANG Meng-Lai Master student at the School of Information and Com- munication Engineering, Beijing Uni- versity of Posts and Telecommunica- tions. His research interest covers computer vision and deep learning. Corresponding author of this paper. E-mail:wangmenglai@bupt.edu.cn

计量

目录

留言板

基于CNN的监控视频事件检测

doi: 10.16383/j.aas.2016.c150729

通讯作者: 王梦来 北京邮电大学信息与通信工程学院硕士研究生. 主要研究方向为计算机视觉和深度学习. 本文通信作者. E-mail: wangmenglai@bupt.edu.cn

计量

出版历程

Surveillance Event Detection Based on CNN

Corresponding author: WANG Meng-Lai Master student at the School of Information and Com- munication Engineering, Beijing Uni- versity of Posts and Telecommunica- tions. His research interest covers computer vision and deep learning. Corresponding author of this paper. E-mail:wangmenglai@bupt.edu.cn

计量

出版历程

目录

通讯作者:
王梦来北京邮电大学信息与通信工程学院硕士研究生. 主要研究方向为计算机视觉和深度学习. 本文通信作者. E-mail: wangmenglai@bupt.edu.cn