深度学习在目标视觉检测中的应用进展与展望

张慧; 王坤峰; 王飞跃

doi:10.16383/j.aas.2017.c160822

深度学习在目标视觉检测中的应用进展与展望

doi: 10.16383/j.aas.2017.c160822 cstr: 32138.14.j.aas.2017.c160822

张慧^1,2,,
王坤峰^1,3,,
王飞跃^1,4, ,

1.
中国科学院自动化研究所复杂系统管理与控制国家重点实验室北京 100190
2.
中国科学院大学北京 100049
3.
青岛智能产业技术研究院青岛 266000
4.
国防科学技术大学军事计算实验与平行系统技术研究中心长沙 410073

基金项目:

国家自然科学基金 61304200

国家留学基金 201504910397

国家自然科学基金 61533019

详细信息

作者简介:
张慧中国科学院自动化研究所复杂系统管理与控制国家重点实验室博士研究生.主要研究方向为智能交通系统, 目标视觉检测, 深度学习.E-mail:zhanghui2015@ia.ac.cn

王坤峰中国科学院自动化研究所复杂系统管理与控制国家重点实验室副研究员.主要研究方向为智能交通系统, 智能视觉计算, 机器学习.E-mail:kunfeng.wang@ia.ac.cn

通讯作者:
王飞跃中国科学院自动化研究所复杂系统管理与控制国家重点实验室研究员.国防科学技术大学军事计算实验与平行系统技术研究中心主任.主要研究方向为智能系统和复杂系统的建模、分析与控制.本文通信作者.E-mail:feiyue.wang@ia.ac.cn

计量
- 文章访问数: 5569
- HTML全文浏览量: 1002
- PDF下载量: 5276
- 被引次数: 0
出版历程
- 收稿日期: 2016-12-15
- 录用日期: 2017-03-16
- 刊出日期: 2017-08-20

Advances and Perspectives on Applications of Deep Learning in Visual Object Detection

ZHANG Hui^{1,2
,},
WANG Kun-Feng^{1,3
,},
WANG Fei-Yue^{1,4
, ,}

1.
State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190
2.
University of Chinese Academy of Sciences, Beijing 100049
3.
Qingdao Academy of Intelligent Industries, Qingdao 266000
4.
Research Center for Computational Experiments and Parallel Systems Technology, National University of Defense Technology, Changsha 410073

Funds:

National Natural Science Foundation of China 61304200

China Scholarship Council 201504910397

National Natural Science Foundation of China 61533019

More Information

Author Bio:
Ph. D. candidate at the State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences. Her research interest covers intelligent transportation systems, object vision detection, and deep learning.E-mail:

Associate professor at the State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences. His research interest covers intelligent transportation systems, intelligent vision computing, and machine learning.E-mail:

Corresponding author: WANG Fei-Yue Professor at the State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences. Director of the Research Center for Computational Experiments and Parallel Systems Technology, National University of Defense Technology. His research interest covers modeling, analysis, and control of intelligent systems and complex systems. Corresponding author of this paper.E-mail:feiyue.wang@ia.ac.cn

摘要

摘要: 目标视觉检测是计算机视觉领域的一个重要问题，在视频监控、自主驾驶、人机交互等方面具有重要的研究意义和应用价值.近年来，深度学习在图像分类研究中取得了突破性进展，也带动着目标视觉检测取得突飞猛进的发展.本文综述了深度学习在目标视觉检测中的应用进展与展望.首先对目标视觉检测的基本流程进行总结，并介绍了目标视觉检测研究常用的公共数据集；然后重点介绍了目前发展迅猛的深度学习方法在目标视觉检测中的最新应用进展；最后讨论了深度学习方法应用于目标视觉检测时存在的困难和挑战，并对今后的发展趋势进行展望.
- 目标视觉检测 /
- 深度学习 /
- 计算机视觉 /
- 平行视觉
Abstract: Visual object detection is an important topic in computer vision, and has great theoretical and practical merits in applications such as visual surveillance, autonomous driving, and human-machine interaction. In recent years, significant breakthroughs of deep learning methods in image recognition research have arisen much attention of researchers and accordingly led to the rapid development of visual object detection. In this paper, we review the current advances and perspectives on the applications of deep learning in visual object detection. Firstly, we present the basic procedure for visual object detection and introduce some newly emerging and commonly used data sets. Then we detail the applications of deep learning techniques in visual object detection. Finally, we make in-depth discussions about the difficulties and challenges brought by deep learning as applied to visual object detection, and propose some perspectives on future trends.
- Visual object detection /
- deep learning /
- computer vision /
- parallel vision
注释:

1) 本文责任编委周涛

HTML全文

图 1 目标视觉检测的基本流程

Fig. 1 Basic procedure for object detection

下载: 全尺寸图片幻灯片

图 2 几种公共数据集的对比图

Fig. 2 Comparison of several common datasets

下载: 全尺寸图片幻灯片

图 3 卷积神经网络的基本结构^[59]

Fig. 3 Basic structure of convolutional neural network^[59]

下载: 全尺寸图片幻灯片

图 4 ILSVRC图像分类任务历年冠军方法的Top-5错误率(下降曲线)和网络层数(上升曲线)

Fig. 4 Top-5 error rate (descent curve) and network layers (rise curve) of the champion methods each year in image classification task of ILSVRC

下载: 全尺寸图片幻灯片

图 5 R-CNN的计算流程^[44]

Fig. 5 Calculation flow of R-CNN^[44]

下载: 全尺寸图片幻灯片

图 6 Fast R-CNN的计算流程^[58]

Fig. 6 Calculation flow of Fast R-CNN^[58]

下载: 全尺寸图片幻灯片

图 7 区域建议网络的基本结构^[7]

Fig. 7 Basic structure of region proposal network^[7]

下载: 全尺寸图片幻灯片

图 8 HyperNet的计算流程^[73]

Fig. 8 Calculation flow of HyperNet^[73]

下载: 全尺寸图片幻灯片

图 9 基于DNN回归的目标检测框架^[1]

Fig. 9 Object detection framework based on DNN regression^[1]

下载: 全尺寸图片幻灯片

图 10 一些目标视觉检测方法在公共数据集上的性能比较

Fig. 10 Performance comparison of some object visual detection methods on public datasets

下载: 全尺寸图片幻灯片

图 11 平行视觉的基本框架^[85]

Fig. 11 Basic framework of parallel vision^[85]

下载: 全尺寸图片幻灯片

表 1 经典CNN模型在ILSVRC图像分类任务上的性能对比

Table 1 Performance comparison of classical CNN model in image classification task of ILSVRC

CNN模型 Top-5错误率(%)

AlexNet^[57] 16.4

ZFNet^[62] 14.8

VGG^[63] 7.3

GoogLeNet^[64] 6.7

ResNet^[8] 3.57

Inception-v4, Inception-ResNet^[65] 3.08

下载: 导出CSV

参考文献(93)

[1]	Szegedy C, Toshev A, Erhan D. Deep neural networks for object detection. In:Proceedings of the 2013 Advances in Neural Information Processing Systems (NIPS). Harrahs and Harveys, Lake Tahoe, USA:MIT Press, 2013. 2553-2561
[2]	Felzenszwalb P F, Girshick R B, McAllester D, Ramanan D. Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(9):1627-1645 doi: 10.1109/TPAMI.2009.167
[3]	黄凯奇, 任伟强, 谭铁牛.图像物体分类与检测算法综述.计算机学报, 2014, 37(6):1225-1240 http://www.cnki.com.cn/Article/CJFDTOTAL-JSJX201406001.htm Huang Kai-Qi, Ren Wei-Qiang, Tan Tie-Niu. A review on image object classification and detection. Chinese Journal of Computers, 2014, 37(6):1225-1240 http://www.cnki.com.cn/Article/CJFDTOTAL-JSJX201406001.htm
[4]	Zhang X, Yang Y H, Han Z G, Wang H, Gao C. Object class detection:a survey. ACM Computing Surveys (CSUR), 2013, 46(1):Article No. 10 http://dl.acm.org/citation.cfm?id=2522978
[5]	Dalal N, Triggs B. Histograms of oriented gradients for human detection. In:Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR). San Diego, CA, USA:IEEE, 2005, 1:886-893
[6]	Uijlings J R R, van de Sande K E A, Gevers T, Smeulders A W M. Selective search for object recognition. International Journal of Computer Vision, 2013, 104(2):154-171 doi: 10.1007/s11263-013-0620-5
[7]	Ren S Q, He K M, Girshick R, Sun J. Faster R-CNN:towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6):1137-1149 doi: 10.1109/TPAMI.2016.2577031
[8]	He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In:Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, Nevada, USA:IEEE, 2016. 770-778
[9]	Lampert C H, Blaschko M B, Hofmann T. Beyond sliding windows:object localization by efficient subwindow search. In:Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Anchorage, Alaska, USA:IEEE, 2008. 1-8
[10]	An S J, Peursum P, Liu W Q, Venkatesh S. Efficient algorithms for subwindow search in object detection and localization. In:Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Miami, Florida, USA:IEEE, 2009. 264-271
[11]	Wei Y C, Tao L T. Efficient histogram-based sliding window. In:Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). San Francisco, CA, USA:IEEE, 2010. 3003-3010
[12]	Van de Sande K E A, Uijlings J R R, Gevers T, Smeulders A W M. Segmentation as selective search for object recognition. In:Proceedings of the 2011 IEEE International Conference on Computer Vision (ICCV). Barcelona, Spain:IEEE, 2011. 1879-1886
[13]	Shotton J, Blake A, Cipolla R. Multiscale categorical object recognition using contour fragments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(7):1270-1281 doi: 10.1109/TPAMI.2007.70772
[14]	Leibe B, Leonardis A, Schiele B. Robust object detection with interleaved categorization and segmentation. International Journal of Computer Vision, 2008, 77(1-3):259-289 doi: 10.1007/s11263-007-0095-3
[15]	Arbelaez P, Maire M, Fowlkes C, Malik J. Contour detection and hierarchical image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(5):898-916 doi: 10.1109/TPAMI.2010.161
[16]	Shotton J, Winn J, Rother C, Criminisi A. TextonBoost:joint appearance, shape and context modeling for multi-class object recognition and segmentation. In:Proceedings of the 9th European Conference on Computer Vision (ECCV). Berlin, Heidelberg, Germany:Springer, 2006. 1-15
[17]	Verbeek J, Triggs B. Region classification with Markov field aspect models. In:Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Minneapolis, Minnesota, USA:IEEE, 2007. 1-8
[18]	Cheng M M, Zhang Z M, Lin W Y, Torr P. BING:binarized normed gradients for objectness estimation at 300fps. In:Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Columbus, USA:IEEE, 2014. 3286-3293
[19]	Zitnick C L, Dollár P. Edge boxes:locating object proposals from edges. In:Proceedings of the 13th European Conference on Computer Vision (ECCV). Zurich, Switzerland:Springer, 2014. 391-405
[20]	Hosang J, Benenson R, Schiele B. How good are detection proposals, really? arXiv:1406.6962, 2014.
[21]	Szegedy C, Reed S, Erhan D, Anguelov D, Ioffe S. Scalable, high-quality object detection. arXiv:1412.1441, 2014.
[22]	Erhan D, Szegedy C, Toshev A, Anguelov D. Scalable object detection using deep neural networks. In:Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Columbus, Ohio, USA:IEEE, 2014. 2155-2162
[23]	Kuo W C, Hariharan B, Malik J. Deepbox:learning objectness with convolutional networks. In:Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile:IEEE, 2015. 2479-2487
[24]	Ghodrati A, Diba A, Pedersoli M, Tuytelaars T, Van Gool L. Deepproposal:hunting objects by cascading deep convolutional layers. In:Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile:IEEE, 2015. 2578-2586
[25]	Gidaris S, Komodakis N. Locnet:improving localization accuracy for object detection. In:Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA:IEEE, 2016. 789-798
[26]	Lawrence G R. Machine Perception of Three-dimensional Solids[Ph.D. dissertation], Massachusetts Institute of Technology, USA, 1963.
[27]	Canny J. A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1986, PAMI-8(6):679-698 doi: 10.1109/TPAMI.1986.4767851
[28]	Marr D, Hildreth E. Theory of edge detection. Proceedings of the Royal Society B:Biological Sciences, 1980, 207(1167):187-217 doi: 10.1098/rspb.1980.0020
[29]	Pellegrino F A, Vanzella W, Torre V. Edge detection revisited. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2004, 34(3):1500-1518 doi: 10.1109/TSMCB.2004.824147
[30]	Harris C, Stephens M. A combined corner and edge detector. In:Proceedings of the 4th Alvey Vision Conference. Manchester, UK:University of Sheffield Printing Unit, 1988. 147-151
[31]	Rosten E, Porter R, Drummond T. Faster and better:a machine learning approach to corner detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(1):105-119 doi: 10.1109/TPAMI.2008.275
[32]	Lowe D G. Object recognition from local scale-invariant features. In:Proceedings of the 7th IEEE International Conference on Computer Vision (ICCV). Kerkyra, Greece:IEEE, 1999, 2:1150-1157
[33]	Lowe D G. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 2004, 60(2):91-110 doi: 10.1023/B:VISI.0000029664.99615.94
[34]	Papageorgiou C P, Oren M, Poggio T. A general framework for object detection. In:Proceedings of the 6th International Conference on Computer Vision (ICCV). Bombay, India:IEEE, 1998. 555-562
[35]	Ojala T, Pietikäinen M, Harwood D. Performance evaluation of texture measures with classification based on Kullback discrimination of distributions. In:Proceedings of the 12th IAPR International Conference on Pattern Recognition, Conference A:Computer Vision and Image Processing. Jerusalem, Israel, Palestine:IEEE, 1994, 1:582-585
[36]	Ojala T, Pietikäinen M, Harwood D. A comparative study of texture measures with classification based on featured distributions. Pattern Recognition, 1996, 29(1):51-59 doi: 10.1016/0031-3203(95)00067-4
[37]	Yan J J, Lei Z, Yi D, Li S Z. Multi-pedestrian detection in crowded scenes:a global view. In:Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Providence, Rhode Island, USA:IEEE, 2012. 3124-3129
[38]	Yan J J, Zhang X C, Lei Z, Liao S C, Li S Z. Robust multi-resolution pedestrian detection in traffic scenes. In:Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Portland, Oregon, USA:IEEE, 2013. 3033-3040
[39]	Yan J J, Zhang X C, Lei Z, Yi D, Li S Z. Structural models for face detection. In:Proceedings of the 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG). Shanghai, China:IEEE, 2013. 1-6
[40]	Zhu X X, Ramanan D. Face detection, pose estimation, and landmark localization in the wild. In:Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Providence, Rhode Island, USA:IEEE, 2012. 2879-2886
[41]	Yang Y, Ramanan D. Articulated pose estimation with flexible mixtures-of-parts. In:Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Providence, RI, USA:IEEE, 2011. 1385-1392
[42]	Yan J J, Lei Z, Wen L Y, Li S Z. The fastest deformable part model for object detection. In:Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Columbus, Ohio, USA:IEEE, 2014. 2497-2504
[43]	Lazebnik S, Schmid C, Ponce J. Beyond bags of features:spatial pyramid matching for recognizing natural scene categories. In:Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR). New York, NY, USA:IEEE, 2006. 2169-2178
[44]	Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In:Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Columbus, Ohio, USA:IEEE, 2014. 580-587
[45]	Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z H, Karpathy A, Khosla A, Bernstein M, Berg A C, Fei-Fei L. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 2015, 115(3):211-252 doi: 10.1007/s11263-015-0816-y
[46]	Everingham M, Van Gool L, Williams C K I, Winn J, Zisserman A. The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision, 2010, 88(2):303-338 doi: 10.1007/s11263-009-0275-4
[47]	Xiao J X, Hays J, Ehinger K A, Oliva A, Torralba A. Sun database:large-scale scene recognition from abbey to zoo. In:Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). San Francisco, CA, USA:IEEE, 2010. 3485-3492
[48]	Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick C L. Microsoft COCO:common objects in context. In:Proceedings of the 13th European Conference on Computer Vision (ECCV). Zurich, Switzerland:Springer, 2014. 740-755
[49]	Rumelhart D E, Hinton G E, Williams R J. Learning representations by back-propagating errors. Nature, 1986, 323(6088):533-536 doi: 10.1038/323533a0
[50]	LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521(7553):436-444 doi: 10.1038/nature14539
[51]	Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks. Science, 2006, 313(5786):504-507 doi: 10.1126/science.1127647
[52]	Hinton G E, Osindero S, Teh Y W. A fast learning algorithm for deep belief nets. Neural Computation, 2006, 18(7):1527-1554 doi: 10.1162/neco.2006.18.7.1527
[53]	Bengio Y, Lamblin P, Popovici D, Larochelle H. Greedy layer-wise training of deep networks. In:Proceedings of the 19th International Conference on Neural Information Processing Systems. Cambridge, MA, USA:MIT Press, 2006. 153-160
[54]	LeCun Y, Chopra S, Hadsell R, Ranzato M, Huang F. A tutorial on energy-based learning. Predicting Structured Data. Cambridge, MA, USA:MIT Press, 2006.
[55]	Lee H, Ekanadham C, Ng A Y. Sparse deep belief net model for visual area V2. In:Proceedings of the 2007 Advances in Neural Information Processing Systems (NIPS). Vancouver, British Columbia, Canada:MIT Press, 2007. 873-880
[56]	Hinton G, Deng L, Yu D, Dahl G E, Mohamed A R, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath T N, Kingsbury B. Deep neural networks for acoustic modeling in speech recognition:the shared views of four research groups. IEEE Signal Processing Magazine, 2012, 29(6):82-97 doi: 10.1109/MSP.2012.2205597
[57]	Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In:Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada, USA:MIT Press, 2012. 1097-1105
[58]	Girshick R. Fast R-CNN. In:Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile:IEEE, 2015. 1440-1448
[59]	Lecun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998, 86(11):2278-2324 doi: 10.1109/5.726791
[60]	Vincent P, Larochelle H, Bengio Y, Manzagol P A. Extracting and composing robust features with denoising Autoencoders. In:Proceedings of the 25th IEEE International Conference on Machine Learning (ICML). Helsinki, Finland:IEEE, 2008. 1096-1103
[61]	Masci J, Meier U, Cireşan D, Schmidhuber J. Stacked convolutional auto-encoders for hierarchical feature extraction. In:Proceedings of the 21th International Conference on Artificial Neural Networks. Berlin, Heidelberg, Germany:Springer, 2011. 52-59
[62]	Zeiler M D, Fergus R. Visualizing and understanding convolutional networks. In:Proceedings of the 13th European Conference on Computer Vision (ECCV). Zurich, Switzerland:Springer, 2014. 818-833
[63]	Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, 2014.
[64]	Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In:Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, Massachusetts, USA:IEEE, 2015. 1-9
[65]	Szegedy C, Ioffe S, Vanhoucke V, Alemi A. Inception-v4, Inception-ResNet and the impact of residual connections on learning. arXiv:1602.07261, 2016.
[66]	Ioffe S, Szegedy C. Batch normalization:accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167, 2015.
[67]	Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the inception architecture for computer vision. arXiv:1512.00567, 2015.
[68]	He K, Zhang X, Ren S, Sun J. Spatial pyramid pooling in deep convolutional networks for visual recognition. In:Proceedings of the 2014 European Conference on Computer Vision (ECCV). Zurich, Switzerland:Springer, 2014. 346-361
[69]	Bell S, Lawrence Zitnick C, Bala K, Girshick R. Inside-outside net:detecting objects in context with skip pooling and recurrent neural networks. In:Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA:IEEE, 2016. 2874-2883
[70]	Yang F, Choi W, Lin Y Q. Exploit all the layers:fast and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers. In:Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA:IEEE, 2016. 2129-2137
[71]	Shrivastava A, Gupta A, Girshick R. Training region-based object detectors with online hard example mining. In:Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA:IEEE, 2016. 761-769
[72]	Sung K K. Learning and Example Selection for Object and Pattern Detection[Ph.D. dissertation], Massachusetts Institute of Technology, USA, 1996.
[73]	Kong T, Yao A B, Chen Y R, Sun F C. HyperNet:towards accurate region proposal generation and joint object detection. In:Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA:IEEE, 2016. 845-853
[74]	Dai J F, Li Y, He K M, Sun J. R-FCN:object detection via region-based fully convolutional networks. In:Proceedings of the 2016 Advances in Neural Information Processing Systems (NIPS). Barcelona, Spain:MIT Press, 2016. 379-387
[75]	Kim K H, Hong S, Roh B, Cheon Y, Park M. PVANET:deep but lightweight neural networks for real-time object detection. arXiv:1608.08021, 2016.
[76]	Shang W L, Sohn K, Almeida D, Lee H. Understanding and improving convolutional neural networks via concatenated rectified linear units. In:Proceedings of the 33rd International Conference on Machine Learning (ICML). New York, USA:IEEE, 2016. 2217-2225
[77]	Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y. Overfeat:integrated recognition, localization and detection using convolutional networks. arXiv:1312.6229, 2013.
[78]	Redmon J, Divvala S, Girshick R, Farhadi A. You only look once:unified, real-time object detection. In:Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA:IEEE, 2016. 779-788
[79]	Najibi M, Rastegari M, Davis L S. G-CNN:an iterative grid based object detector. In:Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA:IEEE, 2016. 2369-2377
[80]	Liu W, Anguelov D, Erhan D, Szegedy C, Reed S E, Fu C Y, Berg A C. SSD:single shot multibox detector. In:Proceedings of the 14th European Conference on Computer Vision (ECCV). Amsterdam, Netherlands:Springer, 2016. 21-37
[81]	Redmon J, Farhadi A. YOLO9000:better, faster, stronger. arXiv:1612.08242, 2016.
[82]	Pepik B, Benenson R, Ritschel T, Schiele B. What is holding back convnets for detection? In:Proceedings of the 2015 German Conference on Pattern Recognition. Cham, Germany:Springer, 2015. 517-528
[83]	Xiang Y, Mottaghi R, Savarese S. Beyond PASCAL:a benchmark for 3d object detection in the wild. In:Proceedings of the 2014 IEEE Winter Conference on Applications of Computer Vision (WACV). Steamboat Springs, Colorado, USA:IEEE, 2014. 75-82
[84]	Amazon Mechanical Turk[Online], available:https://www.mturk.com/, February 13, 2017
[85]	王坤峰, 苟超, 王飞跃.平行视觉:基于ACP的智能视觉计算方法.自动化学报, 2016, 42(10):1490-1500 http://www.aas.net.cn/CN/abstract/abstract18936.shtml Wang Kun-Feng, Gou Chao, Wang Fei-Yue. Parallel vision:an ACP-based approach to intelligent vision computing. Acta Automatica Sinica, 2016, 42(10):1490-1500 http://www.aas.net.cn/CN/abstract/abstract18936.shtml
[86]	Wang K F, Gou C, Zheng N N, Rehg J M, Wang F Y. Parallel vision for perception and understanding of complex scenes:methods, framework, and perspectives. Artificial Intelligence Review[Online], available:https://link.springer.com/article/10.1007/s10462-017-9569-z, July 18, 2017
[87]	王飞跃.平行系统方法与复杂系统的管理和控制.控制与决策, 2004, 19(5):485-489, 514 http://www.cnki.com.cn/Article/CJFDTOTAL-KZYC200405001.htm Wang Fei-Yue. Parallel system methods for management and control of complex systems. Control and Decision, 2004, 19(5):485-489, 514 http://www.cnki.com.cn/Article/CJFDTOTAL-KZYC200405001.htm
[88]	Wang F Y. Parallel control and management for intelligent transportation systems:concepts, architectures, and applications. IEEE Transactions on Intelligent Transportation Systems, 2010, 11(3):630-638 doi: 10.1109/TITS.2010.2060218
[89]	王飞跃.平行控制:数据驱动的计算控制方法.自动化学报, 2013, 39(4):293-302 http://www.aas.net.cn/CN/abstract/abstract17915.shtml Wang Fei-Yue. Parallel control:a method for data-driven and computational control. Acta Automatica Sinica, 2013, 39(4):293-302 http://www.aas.net.cn/CN/abstract/abstract17915.shtml
[90]	Peng X C, Sun B C, Ali K, Saenko K. Learning deep object detectors from 3D models. In:Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile:IEEE, 2015. 1278-1286
[91]	Johnson-Roberson M, Barto C, Mehta R, Sridhar S N, Rosaen K, Vasudevan R. Driving in the matrix:can virtual worlds replace human-generated annotations for real world tasks? arXiv:1610.01983, 2016.
[92]	Pan S J, Yang Q. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(10):1345-1359 doi: 10.1109/TKDE.2009.191
[93]	Taylor M E, Stone P. Transfer learning for reinforcement learning domains:a survey. The Journal of Machine Learning Research, 2009, 10:1633-1685 http://dl.acm.org/citation.cfm?doid=1577069.1755839