基于多尺度图匹配核的场景单字识别方法

史存召; 王春恒; 肖柏华; 张阳; 高嵩

doi:10.3724/SP.J.1004.2014.00751

基于多尺度图匹配核的场景单字识别方法

doi: 10.3724/SP.J.1004.2014.00751 cstr: 32138.14.SP.J.1004.2014.00751

1.
中国科学院自动化研究所北京 100190;
2.
北京酷云互动科技有限公司北京 100007

计量
- 文章访问数: 1977
- HTML全文浏览量: 111
- PDF下载量: 1270
- 被引次数: 0
出版历程
- 收稿日期: 2012-05-22
- 修回日期: 2013-09-27
- 刊出日期: 2014-04-20

Multi-scale Graph-matching Based Kernel for Character Recognition from Natural Scenes

1.
The State Key Laboratory of Management and Control of Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China;
2.
Kuyun Interactive Technology Limited, Beijing 100007, China

Funds:

Supported by National Natural Science Foundation of China (60933010, 61172103, 61271429)

摘要

摘要: 由于自然场景中的文字具有较大的类内间距, 因此识别场景文字具有很大的挑战性. 本文提出了一种基于多尺度图匹配核的场景单字识别方法. 为了利用字符特有的结构特征, 将每幅图像表示为基于不同网格划分的无向图, 通过计算两个无向图之间图匹配的最优能量值来得到两幅图像的相似度, 由于图匹配在计算每个节点的最佳匹配节点时也考虑了相邻节点之间的空间位置约束, 因此可以应对具有一定形变的文字. 通过图匹配得到的两幅图像之间的相似度很适合用来构造支持向量机的核矩阵. 本文将不同尺度网格划分下得到的核矩阵进行多核融合, 使得最终得到的核矩阵更加地鲁棒. 在国际公开场景文字识别数据集Chars74k和ICDAR03-CH上的实验结果表明, 本方法取得了高于国际上已发表的其他方法的单字识别率.
- 文字识别 /
- 结构信息 /
- 图匹配 /
- 能量函数 /
- 核矩阵 /
- 梯度直方图特征 /
- 支持向量机
Abstract: Recognizing characters extracted from natural scene images is quite challenging due to the high degree of intraclass variation. In this paper, we propose a multi-scale graph-matching based kernel for scene character recognition. In order to capture the inherently distinctive structures of characters, each image is represented by several graphs associated with multi-scale image grids. The similarity between two images is thus defined as the optimum energy by matching two graphs (images), which finds the best match for each node in the graph while also preserving the spatial consistency across adjacent nodes. The computed similarity is suitable to construct a kernel for support vector machine (SVM). Multiple kernels acquired by matching graphs with multi-scale grids are combined so that the final kernel is more robust. Experimental results on challenging Chars74k and ICDAR03-CH datasets show that the proposed method performs better than the state of the art methods.
- Character recognition /
- structure /
- graph-matching /
- energy /
- kernel /
- histograms of oriented gradients (HOG) /
- support vector machine (SVM)

HTML全文

参考文献(16)

[1]	Epshtein B, Ofek E, Wexler Y. Detecting text in natural scenes with stroke width transform. In: Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, CA: IEEE, 2010. 2963-2970
[2]	Pan Y F, Hou X W, Liu C L. A hybrid approach to detect and localize texts in natural scene images. IEEE Transactions on Image Processing, 2011, 20(3): 800-813
[3]	Shivakumara P, Phan T, Tan C L. A Laplacian approach to multi-oriented text detection in video. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(2): 412-419
[4]	Shahab A, Shafait F, Dengel A. International Conference on Document Analysis and Recognition (ICDAR) 2011 robust reading competition challenge 2. Reading text in scene images. In: Proceedings of the 2011 IEEE Conference on Document Analysis and Recognition. Beijing, China: IEEE, 2011. 1491 -1496
[5]	de Campos T E, Babu B R, Varma M. Character recognition in natural images. In: Proceedings of the 2009 IEEE Conference on Computer Vision Theory and Applications (VISAPP). Lisbon, Portugal: IEEE, 2009. 273-280
[6]	Chen X R, Yuille A L. Detecting and reading text in natural scenes. In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE, 2004. 366-373
[7]	Berg A C, Berg T L, Malik J. Shape matching and object recognition using low distortion correspondences. In: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, USA: IEEE, 2005. 26-33
[8]	Belongie S, Malik J, Puzicha J. Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(4): 509522
[9]	Wang K, Belongie S. Word spotting in the wild. In: Proceedings of the 11th European Conference on Computer Vision. Berlin, Heidelberg: Springer-Verlag, 2010. 591-604
[10]	Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, USA: IEEE, 2005. 886-893
[11]	Newell A J, Griffin L D. Multiscale histogram of oriented gradient descriptors for robust character recognition. In: Proceedings of the 2011 International Conference on Document Analysis and Recognition, Beijing, China: IEEE, 2011. 1085 -1089
[12]	Lucas S M, Panaretos A, Sosa L, Tang A, Wong S, Young R. ICDAR 2003 robust reading competitions. In: Proceedings of the 7th International Conference on Document Analysis and Recognition. Edinburgh, UK: IEEE, 2003. 682-687
[13]	Duchenne O, Joulin A, Ponce J. A graph-matching kernel for object categorization. In: Proceedings of the 2011 International Conference on Computer Vision. Barcelona: IEEE, 2011. 1792-1799
[14]	Gehler P, Nowozin S. On feature combination for multiclass object classification. In: Proceedings of the 12th International Conference on Computer Vision. Kyoto: IEEE, 2009. 221-228
[15]	Bach F R, Lanckriet G R G, Jordan M I. Multiple kernel learning, conic duality, and the SMO algorithm. In: Proceedings of the 21st International Conference on Machine Learning. New York: USA: ACM, 2004. doi: 10.1145/1015330. 1015424
[16]	Chang C C, Lin C J. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2011, 2(3): 1-27