面向精准价格牌识别的多任务循环神经网络

牟永强; 范宝杰; 孙超; 严蕤; 郭怡适

doi:10.16383/j.aas.c190633

面向精准价格牌识别的多任务循环神经网络

doi: 10.16383/j.aas.c190633 cstr: 32138.14.j.aas.c190633

牟永强^1,,
范宝杰^{1, 2,},
孙超^1,,
严蕤^1,,
郭怡适^1,

1.
广州图匠数据科技有限公司人工智能实验室广州 510310
2.
广州工业大学信息工程学院广州 510006

详细信息

作者简介:
牟永强：广州图匠数据科技有限公司首席AI架构师. 2012年获得西安理工大学信号与信息处理专业硕士学位. 主要研究方向为机器视觉，模式识别以及深度学习. 本文通信作者.E-mail: yongqiang.mou@gmail.com

范宝杰：广东工业大学硕士研究生. 主要研究方向为深度学习和计算机视觉.E-mail: 735678367@qq.com

孙超：华南农业大学硕士研究生. 主要研究方向为深度学习和计算机视觉. E-mail: ice_moyan@163.com

严蕤：广州图匠数据科技有限公司高级研究员. 主要研究方向为深度学习和计算机视觉.E-mail: reeyree@163.com

郭怡适：广州图匠数据科技有限公司首席执行官. 主要研究方向为深度学习和计算机视觉.E-mail: yi.shi@imagedt.com

计量
- 文章访问数: 2176
- HTML全文浏览量: 617
- PDF下载量: 221
- 被引次数: 0
出版历程
- 收稿日期: 2019-09-06
- 录用日期: 2020-02-23
- 网络出版日期: 2022-01-19
- 刊出日期: 2022-02-18

Towards Accurate Price Tag Recognition Algorithm With Multi-task RNN

MOU Yong-Qiang^1
,,
FAN Bao-Jie^{1, 2
,},
SUN Chao^1
,,
YAN Rui^1
,,
GUO Yi-Shi^1
,

1.
AI-Labs, Guangzhou Image Data Technology Co., Ltd., Guangzhou 510310
2.
College of Information Engineering, Guangdong University of Technology, Guangzhou 510006

More Information

Author Bio:
MOU Yong-Qiang　 Chief AI architect at Guangzhou Image Data Technology Co., Ltd. He received his master degree in signal and information processing from Xi＇an University of Technology in 2012. His research interest covers computer vision, pattern recognition, and deep learning. Corresponding author of this paper

FAN Bao-Jie　Master student at Guangdong University of Technology. His research interest covers deep learning and computer vision

SUN Chao　Master student at South China Agricultural University. His research interest covers deep learning and computer vision

YAN Rui　 Advanced Researcher at Guangzhou Image Data Technology Co., Ltd. His research interest covers deep learning and computer vision

GUO Yi-Shi　 Chief executive officer at Guangzhou Image Data Technology Co., Ltd. His research interest covers deep learning and computer vision

摘要

摘要: 为了促进智能新零售在线下业务场景的发展, 提高作为销售关键信息价格牌的识别精度. 本文对价格牌识别问题进行研究, 有效地提高了价格牌的识别精度, 并解决小数点定位不准确的难题. 通过深度卷积神经网络提取价格牌的深度语义表达特征, 将提取到的特征图送入多任务循环网络层进行编码, 然后根据解码网络设计的注意力机制解码出价格数字, 最后将多个分支的结果整合并输出完整价格. 本文所提出的方法能够非常有效地提高线下零售场景价格牌的识别精度, 并解决了一些领域难题如小数点的定位问题, 此外, 为了验证本文方法的普适性, 在其他场景数据集上进行了对比实验, 相关结果也验证了本文方法的有效性.
- 卷积神经网络 /
- 循环神经网络 /
- 文本识别 /
- 多任务学习 /
- 价格牌识别
Abstract: In order to promote the development of smart new retail in the offline scenario and improve the recognition accuracy of price tags, which is a key sales information, this paper studies this application scene to improve the recognition accuracy effectively of the price tag and solves the difficulty of locating decimal point. The deep semantic expression features of price tag are extracted by the deep convolutional neural network, sent to the multi-task recurrent network layer for encoding, and then the price number is decoded according to the attention mechanism of the decoding network, and finally the results of multiple branches are integrated to output the complete price. The method we proposed can effectively improve the price recognition accuracy in the smart new retail scenario and solve some challenge problems such as locating the position of decimal point. In addition, in order to verify the universality of the method, comparative experiments are carried out on datasets of other scenarios, and the related results also verify the effectiveness of the method in this paper.
- Convolutional neural network /
- recurrent neural network /
- text recognition /
- multi-task learning /
- price tag recognition

HTML全文

图 1 卷积循环网络结构

Fig. 1 The structure of convolutional recurrent neural network

下载: 全尺寸图片幻灯片

图 2 价格牌图像

Fig. 2 Images of some price tag samples

下载: 全尺寸图片幻灯片

图 4 基础单任务识别网络结构

Fig. 4 The structure of our basic single recognition network

下载: 全尺寸图片幻灯片

图 5 多任务循环卷积网络结构

Fig. 5 The structure of multi-task RNN

下载: 全尺寸图片幻灯片

图 3 基准识别与多分支识别结果的生成方式

Fig. 3 Baseline method compared with multi-branch method

下载: 全尺寸图片幻灯片

图 6 注意力机制网络解码流程图

Fig. 6 Flowchart of decoder network based on attention

下载: 全尺寸图片幻灯片

图 7 与直接识别方法的比较

Fig. 7 Compared with the single-branch method

下载: 全尺寸图片幻灯片

表 1 模块的研究(%)

Table 1 Study of modules (%)

Model General-data Hard-data

VGG-BiLSTM-CTC 50.20 20.20
VGG-BiLSTM-Attn 61.20 38.60
ResNet-BiLSTM-CTC 55.60 28.80
ResNet-BiLSTM-Attn 68.10 41.40

下载: 导出CSV

表 2 多任务模型结果(%)

Table 2 Results of multitask model (%)

Model General-data Hard-data

Baseline^[13] 68.10 41.40
NDPB&IB 90.10 72.90
NDPB&DB 91.70 74.30
IB&DB 92.20 73.20
NDPB&IB&DB 93.20 75.20

下载: 导出CSV

表 3 车牌数据集实验结果(%)

Table 3 Experimental results on license plate dataset (%)

DB FN Rotate Tilt Weather Challenge

TE2E^[17] 96.90 94.30 90.80 92.50 87.90 85.10
CCPD^[16] 96.90 94.30 90.80 92.50 87.90 85.10
Our method 98.24 98.81 98.12 98.79 98.19 91.92

下载: 导出CSV

参考文献(17)

[1]	Shi B, Bai X, Yao C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE transactions on pattern analysis and machine intelligence, 2016, 39(11): 2298-2304
[2]	Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, et al. Attention is all you need. In: Proceedings of the Neural Information Processing Systems. San Diego, USA: MIT, 2017. 5998−6008
[3]	Luong M T, Pham H, Manning C D. Effective approaches to attention-based neural machine translation [Online], available: https://arxiv.org/abs/1508.04025, Sep 20, 2015
[4]	Li H, Wang P, Shen C. Towards end-to-end text spotting with convolutional recurrent neural networks. In: Proceedings of the IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017. 5238−5246
[5]	Yuan X, He P, Li X A. Adaptive adversarial attack on scene text recognition [Online], available: http://export.arxiv.org/abs/1807.03326, Jul 9, 2018
[6]	Graves A, Fernández S, Gomez F, Schmidhuber J. Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning. Pittsburgh Pennsylvania, USA: ACM, 2006. 369−376
[7]	Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks. In: Proceedings of the Neural Information Processing Systems. Montréal, Canada: MIT, 2014. 3104−3112
[8]	Lei Z, Zhao S, Song H, Shen J. Scene text recognition using residual convolutional recurrent neural network. Machine Vision and Applications, 29(5), 861−871
[9]	Shi B, Yang M, Wang X, Lyu P, Yao C, Bai X. Aster: An attentional scene text recognizer with flexible rectification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 41(9), 2035−2048
[10]	Long M, Wang J. Learning multiple tasks with deep relationship networks [Online], available: https://arxiv.org/abs/1506. 02117v1, Jul 6, 2015
[11]	Veit A, Matera T, Neumann L, Matas J, Belongie S. Coco-text: Dataset and benchmark for text detection and recognition in natural images [Online], available: https://arxiv.org/abs/1601.07140v1, Jan 26, 2016
[12]	Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, Shafait F. ICDAR 2015 competition on robust reading. In: Proceedings of the 13th International Conference on Document Analysis and Recognition (ICDAR). Tunis, Tunisia: IEEE, 2015. 1156−1160
[13]	Baek J, Kim G, Lee J, Park S, Han D. What is wrong with scene text recognition model comparisons? dataset and model analysis [Online], available: https://arxiv.org/abs/1904.01906, Dec 18, 2019
[14]	Bingel J, Søgaard A. Identifying beneficial task relations for multi-task learning in deep neural networks[Online], available: https://arxiv.org/abs/1702.08303, Feb 27, 2017
[15]	Xie Z, Huang Y, Zhu Y, Jin L, Liu Y, Xie L. Aggregation cross-entropy for sequence recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach CA, USA: IEEE, 2019. 6538−6547
[16]	Li H, Wang P, Shen C. Toward end-to-end car license plate detection and recognition with deep neural networks. IEEE Transactions on Intelligent Transportation Systems, 2018, 20(3): 1126-1136
[17]	Xu Z, Yang W, Meng A, Lu N, Huang H, Ying C, Huang L. Towards end-to-end license plate detection and recognition: A large dataset and baseline. In: Proceedings of the European Conference on Computer Vision (ECCV). Munich, Germany: Springer, 2018. 255−271

施引文献

资源附件(0)

访问统计

图(7) / 表(3)

计量

文章访问数: 2176
HTML全文浏览量: 617
PDF下载量: 221
被引次数: 0

姓名
邮箱
手机号码
标题
留言内容
验证码

留言板

面向精准价格牌识别的多任务循环神经网络

doi: 10.16383/j.aas.c190633 cstr: 32138.14.j.aas.c190633

计量

Towards Accurate Price Tag Recognition Algorithm With Multi-task RNN

计量

目录

Model	General-data	Hard-data
VGG-BiLSTM-CTC	50.20	20.20
VGG-BiLSTM-Attn	61.20	38.60
ResNet-BiLSTM-CTC	55.60	28.80
ResNet-BiLSTM-Attn	68.10	41.40

	DB	FN	Rotate	Tilt	Weather	Challenge
TE2E^[17]	96.90	94.30	90.80	92.50	87.90	85.10
CCPD^[16]	96.90	94.30	90.80	92.50	87.90	85.10
Our method	98.24	98.81	98.12	98.79	98.19	91.92

Model	General-data	Hard-data
Baseline^[13]	68.10	41.40
NDPB&IB	90.10	72.90
NDPB&DB	91.70	74.30
IB&DB	92.20	73.20
NDPB&IB&DB	93.20	75.20

留言板

面向精准价格牌识别的多任务循环神经网络

doi: 10.16383/j.aas.c190633 cstr: 32138.14.j.aas.c190633

计量

出版历程

Towards Accurate Price Tag Recognition Algorithm With Multi-task RNN

计量

出版历程

目录