基于边缘特征增强的任意形状文本检测网络

白鹤翔; 王浩然

doi:10.16383/j.aas.c220429

基于边缘特征增强的任意形状文本检测网络

doi: 10.16383/j.aas.c220429 cstr: 32138.14.j.aas.c220429

白鹤翔^1,,
王浩然^1,

1.
山西大学计算机与信息技术学院太原 030006

基金项目: 国家自然科学基金 (41871286, 62072291), 国家重点研发计划课题 (2017YFB0503501)资助

详细信息

作者简介:
白鹤翔：山西大学计算机与信息技术学院副教授. 主要研究方向为空间数据挖掘和图像处理. 本文通信作者. E-mail: baihx@sxu.edu.cn

王浩然：山西大学计算机与信息技术学院硕士研究生. 主要研究方向为深度学习和场景文本检测. E-mail: wanghr_sxu@163.com

计量
- 文章访问数: 745
- HTML全文浏览量: 395
- PDF下载量: 227
- 被引次数: 0
出版历程
- 收稿日期: 2022-05-23
- 录用日期: 2022-09-27
- 网络出版日期: 2023-04-06
- 刊出日期: 2023-05-20

A New Arbitrary-shaped Text Detection Network by Reinforcing Edge Features

BAI He-Xiang^1
,,
WANG Hao-Ran^1
,

1.
School of Computer and Information Technology, Shanxi University, Taiyuan 030006

Funds: Supported by National Natural Science Foundation of China (41871286, 62072291), and National Key Research and Development Plan (2017YFB0503501)

More Information

Author Bio:
BAI He-Xiang　Associate professor at the School of Computer and Information Technology, Shanxi University. His research interest covers spatial data mining and image processing. Corresponding author of this paper

WANG Hao-Ran　Master student at the School of Computer and Information Technology, Shanxi University. His research interest covers deep learning and scene text detection

摘要

摘要: 在场景文本检测方法中, 文本实例的边缘特征与其他特征在大多数模型中都是以同样的方式进行处理, 而准确检测相邻文本边缘区域是正确识别任意形状文本区域的关键之一. 如果对边缘特征进行增强并使用独立分支进行建模, 必能有效提高模型的标识准确率. 为此, 提出了三个用以增强边缘特征的网络模块. 其中, 浅层特征增强模块可有效增强包含更多边缘特征的浅层特征; 边缘区域检测分支将普通特征和边缘特征进行区分以对目标的边缘特征进行显式建模; 而分支特征融合模块可将两种特征在识别过程进行更好的融合. 在将这三个模块引入渐进尺度扩张网络 (Progressive scale expansion network, PSENet) 之后, 相关消融实验表明这三个模块的单独使用及其组合均可进一步增加网络的预测准确率. 此外, 在三个常用公开数据集上与其他十个最新模型的比较结果表明, 改进后得到边缘特征增强网络 (Edge-oriented feature reinforcing network, EFRNet) 的识别结果具有较高的F1值.
- 场景文本检测 /
- 任意形状 /
- 边缘区域 /
- 浅层特征 /
- 渐进尺度扩张网络
Abstract: In the detection of scene texts areas, the text instances＇ edge features are processed in the same way as other features. Nevertheless, the accurate detection of adjacent text edges is crucial in the correct identification of arbitrary-shaped text regions in natural scenes. Obviously, the identification accuracy increases if edge features can be enhanced and modeled through independent branches in the network. To this end, three network modules are proposed to enhance the edge features in this paper. These modules are the shallow feature enhancement module which effectively enhances the shallow features with more edge features, the edge region detection module which decouples the original features into edge features and text features to explicitly model the edge features of the object, and the branch feature fusion module which effectively fuses these two types of features in the recognition process. After the proposed modules are added to the progressive scale expansion network (PSENet), the ablation experiments show that both the independent application and the synthetic application of these modules increase the prediction accuracy. In addition, the comparison experiments on three commonly used public datasets with ten state-of-the-art methods show that the improved edge-oriented feature reinforcing network (EFRNet) has higher F1-measure accuracy.
- Scene text detection /
- arbitrary-shaped /
- edge region /
- shallow feature /
- progressive scale expansion network (PSENet)

HTML全文

图 1 EFRNet网络整体架构

Fig. 1 Overall pipeline of EFRNet

下载: 全尺寸图片幻灯片

图 2 浅层特征增强模块

Fig. 2 Shallow feature enhancement module

下载: 全尺寸图片幻灯片

图 3 分支特征融合模块

Fig. 3 Branch feature fusion module

下载: 全尺寸图片幻灯片

图 4 文本实例及边缘区域检测分支

Fig. 4 Detection branch of text instance and text edge region

下载: 全尺寸图片幻灯片

图 5 文本分割图空间梯度计算过程 ((a) 原始图像; (b) 初始检测结果; (c) 最大池化; (d) 探测到的边缘区域)

Fig. 5 Spatial gradient calculation for text segmentation graph ((a) Original images; (b) Initial detection results; (c) Max pool results; (d) Detected edges)

下载: 全尺寸图片幻灯片

图 6 在数据集CTW1500上的预测结果比较 ((a) 原始图像; (b) 真实值; (c) PSENet; (d) EFRNet; (d) 边缘预测结果)

Fig. 6 Predicted results comparison on dataset CTW1500 ((a) Original image; (b) Ground truth; (c) PSENet; (d) EFRNet; (e) Predicted edge)

下载: 全尺寸图片幻灯片

表 1 CTW1500、Total-Text和ICDAR 2015数据集上的消融实验结果, 其中P表示准确率, R表示召回率

Table 1 Ablation experimental on CTW1500, Total-Text and ICDAR 2015 datasets, P represents accuracy, R represents recall

浅层特征增强模块	分支特征融合模块	边缘检测分支	CTW1500			Total-Text			ICDAR 2015
浅层特征增强模块	分支特征融合模块	边缘检测分支	R (%)	P (%)	F1 (%)	R (%)	P (%)	F1 (%)	R (%)	P (%)	F1 (%)
—	—	—	77.8	83.2	80.4	78.8	88.5	83.4	75.8	84.2	79.8
√	—	—	78.1	83.9	80.8	79.3	88.6	83.7	75.8	85.3	80.3
—	—	√	83.4	85.3	84.3	81.9	87.7	84.7	83.1	87.6	85.2
√	—	√	83.8	86.3	85.0	82.0	87.9	84.9	83.3	87.8	85.5
—	√	√	84.1	86.6	85.2	83.1	88.3	85.6	84.0	88.1	86.0
√	√	√	85.9	86.8	86.3	84.0	88.9	86.4	85.7	89.5	87.6

下载: 导出CSV

表 2 CTW1500、Total-Text和ICDAR 2015数据集模型性能对比

Table 2 Performance comparison on CTW1500, Total-Text and ICDAR 2015 dataset with state-of-the-art models

方法	CTW1500			Total-Text			ICDAR 2015
方法	R (%)	P (%)	F1 (%)	R (%)	P (%)	F1 (%)	R (%)	P (%)	F1 (%)
TextSnake^[41]	85.3	67.9	75.6	74.5	82.7	78.4	80.4	84.9	82.6
PAN++^[42]	81.2	86.4	83.7	81.0	89.3	85.0	81.9	84.0	82.9
PSENet^[22]	75.6	80.6	78.0	75.1	81.8	78.3	79.7	81.5	80.6
DB^[30]	80.2	86.9	83.4	82.5	87.1	84.7	83.2	91.8	87.3
DRRG^[21]	83.0	85.9	84.5	84.9	86.5	85.7	84.7	88.5	86.6
ContourNet^[43]	84.1	83.7	83.9	83.9	86.9	85.4	86.1	87.6	86.9
FCENet^[18]	83.4	87.6	85.5	82.5	89.3	85.8	82.6	90.1	86.2
MOST^[11]	79.4	83.6	81.4	80.0	86.7	83.2	87.3	89.1	88.2
PCR^[12]	83.3	87.2	84.7	82.0	88.5	85.2	84.1	89.6	86.7
DBNet++^[15]	82.8	87.9	85.3	83.2	88.9	86.0	83.9	90.0	87.3
EFRNet (ImageNet)	85.9	86.8	86.3	84.0	88.9	86.4	85.7	89.5	87.6
EFRNet (SynthText)	85.9	86.8	86.3	84.3	89.2	86.7	86.3	89.6	87.9

下载: 导出CSV

参考文献(45)

[1]	Lyu P Y, Liao M H, Yao C, Wu W H, Bai X. Mask TextSpotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer, 2018. 71−88
[2]	He T, Huang W L, Qiao Y, Yao J. Text-attentional convolutional neural network for scene text detection. IEEE Transactions on Image Processing, 2016, 25(6): 2529-2541 doi: 10.1109/TIP.2016.2547588
[3]	Qin S Y, Manduchi R. Cascaded segmentation-detection networks for word-level text spotting. In: Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). Kyoto, Japan: IEEE, 2017. 1275−1282
[4]	Cho H, Sung M, Jun B. Canny text detector: Fast and robust scene text localization algorithm. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE, 2016. 3566−3573
[5]	Tian S X, Pan Y F, Huang C, Lu S J, Yu K, Tan C L. Text flow: A unified text detection system in natural scene images. In: Proceedings of the International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE, 2015. 4651−4659
[6]	王润民, 桑农, 丁丁, 陈杰, 叶齐祥, 高常鑫, 等. 自然场景图像中的文本检测综述. 自动化学报, 2018, 44(12): 2113-2141 Wang Run-Min, Sang Nong, Ding Ding, Chen Jie, Ye Qi-Xiang, Gao Chang-Xin, et al. Text detection in natural scene image: A survey. Acta Automatica Sinica, 2018, 44(12): 2113-2141
[7]	Liu Y L, Jin L W. Deep matching prior network: Toward tighter multi-oriented text detection. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE, 2017. 3454−3461
[8]	Zhang Z, Zhang C Q, Shen W, Yao C, Liu W Y, Bai X. Multi-oriented text detection with fully convolutional networks. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE, 2016. 4159−4167
[9]	Zhong Z Y, Jin L W, Huang S P. DeepText: A new approach for text proposal generation and text detection in natural images. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). New Orleans, USA: IEEE, 2017. 1208−1212
[10]	Tian Z, Huang W L, He T, He P, Qiao Y. Detecting text in natural image with connectionist text proposal network. In: Proceedings of the 14th European Conference on Computer vision. Amsterdam, The Netherlands: Springer, 2016. 56−72
[11]	He M H, Liao M H, Yang Z B, Zhong H M, Tang J, Cheng W Q, et al. MOST: A multi-oriented scene text detector with localization refinement. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, USA: IEEE, 2021. 8809−8818
[12]	Dai P W, Zhang S Y, Zhang H, Cao X C. Progressive contour regression for arbitrary-shape scene text detection. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, TN, USA: IEEE, 2021. 7389−7398
[13]	He P, Huang W L, He T, Zhu Q L, Qiao Y, Li X L. Single shot text detector with regional attention. In: Proceedings of the International Conference on Computer Vision (ICCV). Venice, Italy: IEEE, 2017. 3066−3074
[14]	Deng D, Liu H F, Li X L, Cai D. PixelLink: Detecting scene text via instance segmentation. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence. New Orleans, USA: AAAI, 2018. 6773−6780
[15]	Liao M H, Zou Z S, Wan Z Y, Yao C, Bai X. Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(1): 919-931. doi: 10.1109/TPAMI.2022.3155612
[16]	Wang F F, Chen Y F, Wu F, Li X. TextRay: Contour-based geometric modeling for arbitrary-shaped scene text detection. In: Proceedings of the 28th ACM International Conference on Multimedia. Seattle, USA: ACM, 2020. 111−119
[17]	Liu Y L, Chen H, Shen C H, He T, Jin L W, Wang L W. ABCNet: Real-time scene text spotting with adaptive Bezier-curve network. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2020. 9806−9815
[18]	Zhu Y Q, Chen J Y, Liang L Y, Kuang Z H, Jin L W, Zhang W. Fourier contour embedding for arbitrary-shaped text detection. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, USA: IEEE, 2021. 3122−3130
[19]	Zhang C Q, Liang B R, Huang Z M, En M Y, Han J Y, Ding E R, et al. Look more than once: An accurate detector for text of arbitrary shapes. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE, 2019. 10544−10553
[20]	Qin X G, Zhou Y, Guo Y H, Wu D Y, Tian Z H, Jiang N, et al. Mask is all you need: Rethinking mask R-CNN for dense and arbitrary-shaped scene text detection. In: Proceedings of the 29th ACM International Conference on Multimedia. Chengdu, China: ACM, 2021. 414−423
[21]	Zhang S X, Zhu X B, Hou J B, Liu C, Yang C, Wang H F, et al. Deep relational reasoning graph network for arbitrary shape text detection. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2020. 9696−9705
[22]	Wang W H, Xie E Z, Li X, Hou W B, Lu, T, Yu G, et al. Shape robust text detection with progressive scale expansion network. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE, 2019. 9328−9337
[23]	Sheng T, Chen J, Lian Z H. CentripetalText: An efficient text instance representation for scene text detection. In: Proceedings of the 34th Advances in Neural Information Processing Systems. Cambridge, MA, USA: NIPS, 2021. 335−346
[24]	Liu Z C, Lin G S, Yang S, Liu F Y, Lin W S, Goh W L. Towards robust curve text detection with conditional spatial expansion. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE, 2019. 7261−7270
[25]	Tian Z T, Shu M, Lyu P, Li R Y, Zhou C, Shen X Y, et al. Learning shape-aware embedding for scene text detection. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE, 2019. 4229−4238
[26]	Li J C, Lin Y, Liu R R, Ho C M, Shi H. RSCA: Real-time segmentation-based context-aware scene text detection. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPRW). Nashville, USA: IEEE, 2021. 2349−2358
[27]	He K M, Zhang X Y, Ren S Q, Sun J. Identity mappings in deep residual networks. In: Proceedings of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands: Springer, 2016. 630−645
[28]	Liu W, Liao S C, Ren W Q, Hu W D, Yu Y N. High-level semantic feature detection: A new perspective for pedestrian detection. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE, 2019. 5182−5191
[29]	Hu J, Shen L, Sun G, Wu E. Squeeze-and-excitation networks. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 7132−7141
[30]	Liao M H, Wan Z Y, Yao C, Chen K, Bai X. Real-time scene text detection with differentiable binarization. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York, USA: AAAI, 2020. 11474−11481
[31]	Zhen M M, Wang J L, Zhou L, Li S W, Shen T W, Shang J X, et al. Joint semantic segmentation and boundary detection using iterative pyramid contexts. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2020. 13663−13672
[32]	Milletari F, Navab N, Ahmadi S A. V-Net: Fully convolutional neural networks for volumetric medical image segmentation. In: Proceedings of the 4th International Conference on 3D Vision (3DV). Stanford, USA: IEEE, 2016. 565−571
[33]	Shrivastava A, Gupta A, Girshick R. Training region-based object detectors with online hard example mining. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE, 2016. 761−769
[34]	Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, et al. ICDAR 2015 competition on robust reading. In: Proceedings of the 13th International Conference on Document Analysis and Recognition (ICDAR). Tunis, Tunisia: IEEE, 2015. 1156−1160
[35]	Ch＇ng C K, Chan C S. Total-Text: A comprehensive dataset for scene text detection and recognition. In: Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). Kyoto, Japan: IEEE, 2017. 935−942
[36]	Liu Y L, Jin L W, Zhang S T, Luo C J, Zhang S. Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition, 2019, 90: 337-345 doi: 10.1016/j.patcog.2019.02.002
[37]	Deng J, Wei D, Socher R, Li J, Kai L, Li F F. ImageNet: A large-scale hierarchical image database. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition. Miami, USA: IEEE, 2009. 248−255
[38]	He K M, Zhang X Y, Ren S Q, Sun J. Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In: Proceedings of the International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE, 2015. 1026−1034
[39]	Yin X C, Yin X W, Huang K Z, Hao H W. Robust text detection in natural scene images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(5): 970-983 doi: 10.1109/TPAMI.2013.182
[40]	Vatti B R. A generic solution to polygon clipping. Communications of the ACM, 1992, 35(7): 56-63 doi: 10.1145/129902.129906
[41]	Long S B, Ruan J Q, Zhang W J, He X, Wu W H, Yao C. TextSnake: A flexible representation for detecting text of arbitrary shapes. In: Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer, 2018. 19−35
[42]	Wang W H, Xie E Z, Li X, Liu X B, Liang D, Yang Z B, et al. PAN++: Towards efficient and accurate end-to-end spotting of arbitrarily-shaped text. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(9): 5349-5367
[43]	Wang Y X, Xie H T, Zha Z J, Xing M T, Fu Z L, Zhang Y D. ContourNet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2020. 11750−11759
[44]	Gupta A, Vedaldi A, Zisserman A. Synthetic data for text localisation in natural images. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE, 2016. 2315−2324
[45]	Liao M H, Pang G, Huang J, Hassner T, Bai X. Mask TextSpotter v3: Segmentation proposal network for robust scene text spotting. In: Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer, 2020. 706−722