数据驱动的层次场景序列识别模型研究

冯文刚

doi:10.3734/SP.J.1004.2014.00763

数据驱动的层次场景序列识别模型研究

doi: 10.3734/SP.J.1004.2014.00763

冯文刚^1,2

1.
中国人民公安大学公安情报学系北京 100038;
2.
中国人民公安大学情报研究中心北京 100038

计量
- 文章访问数: 1727
- HTML全文浏览量: 89
- PDF下载量: 836
- 被引次数: 0
出版历程
- 收稿日期: 2012-06-15
- 修回日期: 2013-07-19
- 刊出日期: 2014-04-20

Data Driven Hierarchical Serial Scene Classification Framework

FENG Wen-Gang^1,2

1.
Department of Policing Intelligence, Chinese People's Public Security University, Beijing 100038, China;
2.
Public Security Intelligence Research Center, Chinese People's Public Security University, Beijing 100038, China

摘要

摘要: 针对层次场景图像序列，本文提出了一种数据驱动的基于快速序列视觉表述任务（rapid serial visual presentation task，RSVP）的场景识别模型. 首先基于金字塔模型提取三层尺度图像块，然后构建包括全局和局部特征的词汇字典，接着分别利用生成模型和判决模型训练视觉词汇，最后通过神经网络从图像块标记中获得场景类别. 实验表明算法能够获得更为精确的分类结果.
- 空间金字塔模型 /
- 视觉词汇字典 /
- 生成方法 /
- 判决方法 /
- 神经网络
Abstract: Scene classification is a complicated task, because it includes much content and it is difficult to capture its distribution. A novel hierarchical serial scene classification framework is presented in this paper. At first, we use hierarchical feature to present both the global scene and local patches containing specific objects. Hierarchy is presented by space pyramid match, and our own codebook is built by two different types of words. Secondly, we train the visual words by generative and discriminative methods respectively based on space pyramid match, which could obtain the local patch labels efficiently. Then, we use a neural network to simulate the human decision process, which leads to the final scene category from local labels. Experiments show that the hierarchical serial scene image representation and classification model obtains superior results with respect to accuracy.
- Space pyramid match /
- visual codebook /
- generative method /
- discriminative method /
- neural network

HTML全文

参考文献(26)

[1]	Lazebnik S, Schmid C, Ponce J. Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the 2006 IEEE Conference on Computer Vision and Pattern Recognition. New York, USA: IEEE, 2006. 2169-2178
[2]	Rasiwasia N, Vasconcelos N. Holistic context modeling using semantic co-occurrences. In: Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, FL: IEEE, 2009. 1889-1895
[3]	Malisiewicz T, Efros A A. Recognition by association via learning per-exemplar distances. In: Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, USA: IEEE, 2008. 1-8
[4]	Torralba A. Contextual priming for object detection. International Journal of Computer Vision, 2003, 53(2): 169-191
[5]	Oliva A, Torralba A. Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision, 2001, 42(3): 145-175
[6]	Zhang J G, Marszalek M, Lazebnik S, Schmid C. Local features and kernels for classification of texture and object categories: a comprehensive study. International Journal of Computer Vision, 2007, 73(2): 213-238
[7]	Berg A, Berg T, Malik J. Shape matching and object recognition using low distortion correspondences. In: Proceedings of the 2005 IEEE Conference on Computer Vision and Pattern Recognition. San Diego, USA: IEEE, 2005. 26-33
[8]	Zhu Hai-Long, Liu Peng, Liu Jia-Feng, Tang Xiang-Long. A graph analysis method for abnormal crowd state detection. Acta Automatica Sinica, 2012, 38(5): 742-750 (in Chinese)
[9]	Bosch A, Muñoz X, Martí R. A review: which is the best way to organize/classify images by content? Image and Vision Computing, 2007, 25(6): 778-791
[10]	Bosch A, Zisserman A, Munoz X. Scene classification via pLSA. In: Proceedings of the 9th European Conference Computer Vision. Berlin, Heidelberg: Springer, 2006. 517530
[11]	Bosch A, Zisserman A, Munoz X. Scene classification using a hybrid generative/discriminative approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(4): 712-727
[12]	Agarwal A, Triggs B. Multilevel image coding with hyperfeatures. International Journal of Computer Vision, 2008, 78(1): 15-27
[13]	Siagian C, Itti L. Rapid biologically-inspired scene classification using features shared with visual attention. IEEE Transactions on Pattern Analysis and Machine Learning, 2007, 29(2): 300-312
[14]	Li F F, Perona P. A bayesian hierarchical model for learning natural scene categories. In: Proceedings of the 2005 IEEE Conference on Computer Vision and Pattern Recognition. San Diego, USA: IEEE, 2005. 524-531
[15]	Blei D M, Ng A Y, Jordan M I. Latent dirichlet allocation. Journal of Machine Learning Research, 2003, 3: 993-1022
[16]	Li F, Fergus R, Perona P. Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. In: Proceedings of the 2004 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC, USA: IEEE, 2004. 59-70
[17]	Fergus R, Perona P, Zisserman A. Object class recognition by unsupervised scale-invariant learning. In: Proceedings of the 2003 IEEE Conference on Computer Vision and Pattern Recognition. Madison, USA: IEEE, 2003. 264-271
[18]	Bosch A, Zisserman A, Muoz X. Image classification using ROIs and multiple kernel learning. International Journal of Computer Vision, 2008, 78(4): 326-338
[19]	Wang X G, Ma X X, Grimson W E L. Unsupervised activity perception in crowded and complicated scenes using hierarchical bayesian models. IEEE Transactions on Pattern Analysis and Machine Learning, 2009, 31(2): 539-555
[20]	Liang X, Huang X, Wang M. Uncalibrated path planning in the image space for the fixed camera configuration. Acta Automatica Sinica, 2013, 39(6): 759-769
[21]	Lowe D G. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 2004, 60(2): 91-110
[22]	Grauman K, Darrell T. The pyramid match kernels: discriminative classification with sets of image features. In: Proceedings of the 2005 IEEE International Conference on Computer Vision. Beijing, China: IEEE, 2005. 1458-1465
[23]	Hofmann T. Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference Research and Development in Information Retrieval. New York, USA: ACM, 1999. 50-57
[24]	Haykin S. Neural Networks. New Jersey: Prentice-Hall, 1994. 328-333
[25]	Feng Wen-Gang, Gao Jun, Buckles B, Wu Ke-Wei. Research on vehicle shadow segmentation with object knowledge constraint based on multi-colors paces. Journal of Image and Graphics, 2011, 16(9): 1599-1606 (in Chinese)
[26]	Feng Wen-Gang, Gao Jun, Buckles B, Wu Ke-Wei. Wireless capsule endoscopy video classification using an unsupervised learning approach. Journal of Image and Graphics, 2011, 16(11): 2041-2046 (in Chinese)