Data Driven Hierarchical Serial Scene Classification Framework
-
摘要: 针对层次场景图像序列,本文提出了一种数据驱动的基于快速序列视觉表述任务(rapid serial visual presentation task,RSVP)的场景识别模型. 首先基于金字塔模型提取三层尺度图像块,然后构建包括全局和局部特征的词汇字典,接着分别利用生成模型和判决模型训练视觉词汇,最后通过神经网络从图像块标记中获得场景类别. 实验表明算法能够获得更为精确的分类结果.Abstract: Scene classification is a complicated task, because it includes much content and it is difficult to capture its distribution. A novel hierarchical serial scene classification framework is presented in this paper. At first, we use hierarchical feature to present both the global scene and local patches containing specific objects. Hierarchy is presented by space pyramid match, and our own codebook is built by two different types of words. Secondly, we train the visual words by generative and discriminative methods respectively based on space pyramid match, which could obtain the local patch labels efficiently. Then, we use a neural network to simulate the human decision process, which leads to the final scene category from local labels. Experiments show that the hierarchical serial scene image representation and classification model obtains superior results with respect to accuracy.
-
Key words:
- Space pyramid match /
- visual codebook /
- generative method /
- discriminative method /
- neural network
-
[1] Lazebnik S, Schmid C, Ponce J. Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the 2006 IEEE Conference on Computer Vision and Pattern Recognition. New York, USA: IEEE, 2006. 2169-2178 [2] Rasiwasia N, Vasconcelos N. Holistic context modeling using semantic co-occurrences. In: Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, FL: IEEE, 2009. 1889-1895 [3] Malisiewicz T, Efros A A. Recognition by association via learning per-exemplar distances. In: Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, USA: IEEE, 2008. 1-8 [4] Torralba A. Contextual priming for object detection. International Journal of Computer Vision, 2003, 53(2): 169-191 [5] Oliva A, Torralba A. Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision, 2001, 42(3): 145-175 [6] Zhang J G, Marszalek M, Lazebnik S, Schmid C. Local features and kernels for classification of texture and object categories: a comprehensive study. International Journal of Computer Vision, 2007, 73(2): 213-238 [7] Berg A, Berg T, Malik J. Shape matching and object recognition using low distortion correspondences. In: Proceedings of the 2005 IEEE Conference on Computer Vision and Pattern Recognition. San Diego, USA: IEEE, 2005. 26-33 [8] Zhu Hai-Long, Liu Peng, Liu Jia-Feng, Tang Xiang-Long. A graph analysis method for abnormal crowd state detection. Acta Automatica Sinica, 2012, 38(5): 742-750 (in Chinese) [9] Bosch A, Muñoz X, Martí R. A review: which is the best way to organize/classify images by content? Image and Vision Computing, 2007, 25(6): 778-791 [10] Bosch A, Zisserman A, Munoz X. Scene classification via pLSA. In: Proceedings of the 9th European Conference Computer Vision. Berlin, Heidelberg: Springer, 2006. 517530 [11] Bosch A, Zisserman A, Munoz X. Scene classification using a hybrid generative/discriminative approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(4): 712-727 [12] Agarwal A, Triggs B. Multilevel image coding with hyperfeatures. International Journal of Computer Vision, 2008, 78(1): 15-27 [13] Siagian C, Itti L. Rapid biologically-inspired scene classification using features shared with visual attention. IEEE Transactions on Pattern Analysis and Machine Learning, 2007, 29(2): 300-312 [14] Li F F, Perona P. A bayesian hierarchical model for learning natural scene categories. In: Proceedings of the 2005 IEEE Conference on Computer Vision and Pattern Recognition. San Diego, USA: IEEE, 2005. 524-531 [15] Blei D M, Ng A Y, Jordan M I. Latent dirichlet allocation. Journal of Machine Learning Research, 2003, 3: 993-1022 [16] Li F, Fergus R, Perona P. Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. In: Proceedings of the 2004 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC, USA: IEEE, 2004. 59-70 [17] Fergus R, Perona P, Zisserman A. Object class recognition by unsupervised scale-invariant learning. In: Proceedings of the 2003 IEEE Conference on Computer Vision and Pattern Recognition. Madison, USA: IEEE, 2003. 264-271 [18] Bosch A, Zisserman A, Muoz X. Image classification using ROIs and multiple kernel learning. International Journal of Computer Vision, 2008, 78(4): 326-338 [19] Wang X G, Ma X X, Grimson W E L. Unsupervised activity perception in crowded and complicated scenes using hierarchical bayesian models. IEEE Transactions on Pattern Analysis and Machine Learning, 2009, 31(2): 539-555 [20] Liang X, Huang X, Wang M. Uncalibrated path planning in the image space for the fixed camera configuration. Acta Automatica Sinica, 2013, 39(6): 759-769 [21] Lowe D G. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 2004, 60(2): 91-110 [22] Grauman K, Darrell T. The pyramid match kernels: discriminative classification with sets of image features. In: Proceedings of the 2005 IEEE International Conference on Computer Vision. Beijing, China: IEEE, 2005. 1458-1465 [23] Hofmann T. Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference Research and Development in Information Retrieval. New York, USA: ACM, 1999. 50-57 [24] Haykin S. Neural Networks. New Jersey: Prentice-Hall, 1994. 328-333 [25] Feng Wen-Gang, Gao Jun, Buckles B, Wu Ke-Wei. Research on vehicle shadow segmentation with object knowledge constraint based on multi-colors paces. Journal of Image and Graphics, 2011, 16(9): 1599-1606 (in Chinese) [26] Feng Wen-Gang, Gao Jun, Buckles B, Wu Ke-Wei. Wireless capsule endoscopy video classification using an unsupervised learning approach. Journal of Image and Graphics, 2011, 16(11): 2041-2046 (in Chinese)
点击查看大图
计量
- 文章访问数: 1516
- HTML全文浏览量: 65
- PDF下载量: 816
- 被引次数: 0