[1] Csurka G, Dance C R, Fan L X, Willamowski J, Bray C. Visual categorization with bags of keypoints. In: Proceedings of the 2004 Workshop on Statistical Learning in Computer Vision, European Conference on Computer Vision. Berlin, Germany: Springer Berlin Heidelberg, 2004. 1-2
[2] Zhang Su-Lan, Guo Ping, Zhang Ji-Fu, Hu Li-Hua. Automatic semantic image annotation with granular analysis method. Acta Automatica Sinica, 2012, 38(5): 688-697(张素兰, 郭平, 张继福, 胡立华. 图像语义自动标注及其粒度分析方法. 自动化学报, 2012, 38(5): 688-697)
[3] [3] Qin J Z, Yung N H C. Scene categorization via contextual visual words. Pattern Recognition, 2010, 43(5): 1874-1888
[4] [4] Elfiky N M, Khan F S, van De Weijer J, Gonzlez J. Discriminative compact pyramids for object and scene recognition. Pattern Recognition, 2012, 45(4): 1627-1636
[5] [5] Wang F, Jiang Y G, Ngo C W. Video event detection using motion relativity and visual relatedness. In: Proceedings of the 16th ACM International Conference on Multimedia. NY, USA: ACM, 2008. 239-248
[6] [6] Liu J E, Yang Y, Saleemi I, Shah M. Learning semantic features for action recognition via diffusion maps. Computer Vision and Image Understanding, 2012, 116(3): 361-377
[7] [7] Lazebnik S, Schmid C, Ponce J. Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York, USA: IEEE, 2006. 2169-2178
[8] [8] Yuan J S, Wu Y, Yang M. Discovery of collocation patterns: from visual words to visual phrases. In: Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, MN: IEEE, 2007. 1-8
[9] [9] Du R, Wu Q, He X J, Yang J. Object categorization based on a supervised mean shift algorithm. In: Proceedings of the Computer Vision EECV 2012 Workshops and Demonstrations. Berlin, Germany: Springer Berlin Heidelberg, 2012. 611-614
[10] Chai Y N, Rahtu E, Lempitsky V, van Gool L, Zisserman A. TriCoS: a tri-level class-discriminative co-segmentation method for image classification. In: Proceedings of the 2012 European Conference on Computer Vision. Berlin, Germany: Springer Berlin Heidelberg, 2012. 794-807
[11] Krapac J, Verbeek J, Jurie F. Modeling spatial layout with fisher vectors for image categorization. In: Proceedings of the 2011 IEEE International Conference on Computer Vision (ICCV). Barcelona, Spain: IEEE, 2011. 1487-1494
[12] Bolovinou A, Pratikakis I, Perantonis S. Bag of spatio-visual words for context inference in scene classification. Pattern Recognition, 2013, 46(3): 1039-1053
[13] Wang J J, Yang J C, Yu K, Lv F J, Huang T, Gong Y H. Locality-constrained linear coding for image classification. In: Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). San Francisco, CA: IEEE, 2010. 3360-3367
[14] van Gemert J C, Veenman C J, Smeulders A W M, Geusebroek J M. Visual word ambiguity. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(7): 1271-1283
[15] Liu J, Zhang C J, Tian Q, Xu C S, Lu H Q, Ma S D. One step beyond bags of features: visual categorization using components. In: Proceedings of the 18th IEEE International Conference on Image Processing (ICIP). Brussels, Belgium: IEEE, 2011. 2417-2420
[16] Avrithis Y, Kalantidis Y. Approximate Gaussian mixtures for large scale vocabularies. In: Proceedings of the 12th European Conference on Computer Vision. Berlin, Germany: Springer-Verlag Berlin, Heidelberg, 2012. 15-28
[17] Mikulk A, Perdoch M, Chum O, Matas J. Learning a fine vocabulary. In: Proceedings of the 11th European Conference on Computer Vision. Berlin, Germany: Springer-Verlag Berlin, Heidelberg, 2010. 1-14
[18] Tang J H, Zha Z J, Tao D C, Chua T S. Semantic-gap-oriented active learning for multilabel image annotation. IEEE Transactions on Image Processing, 2012, 21(4): 2354-2360
[19] Wu L, Hoi S C H, Yu N H. Semantics-preserving bag-of-words models and applications. IEEE Transactions on Image Processing, 2010, 19(7): 1908-1920
[20] Ji C J, Zhou X D, Lin L, Yang W D. Labeling images by integrating sparse multiple distance learning and semantic context modeling. In: Proceedings of the 12th European Conference on Computer Vision. Berlin, Germany: Springer-Verlag Berlin, Heidelberg, 2012. 688-701
[21] Liu J E, Yang Y, Shah M. Learning semantic visual vocabularies using diffusion distance. In: Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, FL: IEEE, 2009. 461-468
[22] Penatti O A B, Silva F B, Valle E, Gouet-Brunet V, Torres R S. Visual word spatial arrangement for image retrieval and classification. Pattern Recognition, 2014, 47(2): 705-720
[23] Li L J, Wang C, Lim Y, Blei D M, Li F F. Building and using a semantivisual image hierarchy. In: Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). San Francisco, CA: IEEE, 2010. 3336-3343
[24] Bannour H, Hudelot C. Building semantic hierarchies faithful to image semantics. In: Proceedings of the 18th International Conference on Advances in Multimedia Modeling. Berlin, Germany: Springer-Verlag Berlin, Heidelberg, 2012. 4-15
[25] Bannour H, Hudelot C. Hierarchical image annotation using semantic hierarchies. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management. New York, NY, USA: ACM, 2012. 2431-2434
[26] Deng J, Berg A C, Li K, Li F F. What does classifying more than 10000 image categories tell us? In: Proceedings of the 11th European Conference on Computer Vision. Berlin, Germany: Springer-Verlag Berlin, Heidelberg, 2010. 71-84
[27] Lorenza S, Jean-Daniel Z. Abstraction in Artificial Intelligence and Complex Systems. New York: Springer-Verlag New York Inc., 2013. 273-325
[28] Everingham M, Van Gool L, Williams C K I, Winn J, Zisserman A. The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 2010, 88(2): 303-338
[29] Li F F, Fergus R, Perona P. Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. Computer Vision and Image Understanding, 2007, 106(1): 59-70
[30] Fan R E, Chang K W, Hsieh C J, Wang X R, Lin C J. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 2008, 9: 1871-1874
[31] Zhang Lin-Bo, Wang Chun-Heng, Xiao Bo-Hua, Shao Yun-Xue. Image representation using bag-of-phrases. Acta Automatica Sinica, 2012, 38(1): 46-54 (张琳波, 王春恒, 肖柏华, 邵允学. 基于Bag-of-phrases的图像表示方法. 自动化学报, 2012, 38(1): 46-54)
[32] Fernando B, Fromont E, Muselet D, Sebban M. Supervised learning of Gaussian mixture models for visual vocabulary generation. Pattern Recognition, 2012, 45(2): 897-907
[33] Su Y, Jurie F. Improving image classification using semantic attributes. International Journal of Computer Vision, 2012, 100(1): 59-77
[34] Perronnin F, Snchez J, Mensink T. Improving the Fisher kernel for large-scale image classification. In: Proceedings of the 11th European Conference on Computer Vision. Berlin, Germany: Springer-Verlag Berlin, Heidelberg, 2010. 143-156
[35] Zhong J, Wang J, Su Y T, Song Z J, Xing S K. Balance between object and background: object-enhanced features for scene image classification. Neurocomputing, 2013, 120: 15-23
[36] Maji S, Berg A C, Malik J. Efficient classification for additive kernel SVMs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(1): 66-77
[37] Bilen H, Namboodiri V P, Van Gool L J. Object and action classification with latent window parameters. International Journal of Computer Vision, 2014, 106(3): 237-251