2.765

2022影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

面向精准价格牌识别的多任务循环神经网络

牟永强 范宝杰 孙超 严蕤 郭怡适

牟永强, 范宝杰, 孙超, 严蕤, 郭怡适. 面向精准价格牌识别的多任务循环神经网络. 自动化学报, 2022, 48(2): 608−614 doi: 10.16383/j.aas.c190633
引用本文: 牟永强, 范宝杰, 孙超, 严蕤, 郭怡适. 面向精准价格牌识别的多任务循环神经网络. 自动化学报, 2022, 48(2): 608−614 doi: 10.16383/j.aas.c190633
Mou Yong-Qiang, Fan Bao-Jie, Sun Chao, Yan Rui, Guo Yi-Shi. Towards accurate price tag recognition algorithm with multi-task RNN. Acta Automatica Sinica, 2022, 48(2): 608−614 doi: 10.16383/j.aas.c190633
Citation: Mou Yong-Qiang, Fan Bao-Jie, Sun Chao, Yan Rui, Guo Yi-Shi. Towards accurate price tag recognition algorithm with multi-task RNN. Acta Automatica Sinica, 2022, 48(2): 608−614 doi: 10.16383/j.aas.c190633

面向精准价格牌识别的多任务循环神经网络

doi: 10.16383/j.aas.c190633
详细信息
    作者简介:

    牟永强:广州图匠数据科技有限公司首席AI架构师. 2012年获得西安理工大学信号与信息处理专业硕士学位. 主要研究方向为机器视觉,模式识别以及深度学习. 本文通信作者.E-mail: yongqiang.mou@gmail.com

    范宝杰:广东工业大学硕士研究生. 主要研究方向为深度学习和计算机视觉.E-mail: 735678367@qq.com

    孙超:华南农业大学硕士研究生. 主要研究方向为深度学习和计算机视觉. E-mail: ice_moyan@163.com

    严蕤:广州图匠数据科技有限公司高级研究员. 主要研究方向为深度学习和计算机视觉.E-mail: reeyree@163.com

    郭怡适:广州图匠数据科技有限公司首席执行官. 主要研究方向为深度学习和计算机视觉.E-mail: yi.shi@imagedt.com

Towards Accurate Price Tag Recognition Algorithm With Multi-task RNN

More Information
    Author Bio:

    MOU Yong-Qiang  Chief AI architect at Guangzhou Image Data Technology Co., Ltd. He received his master degree in signal and information processing from Xi'an University of Technology in 2012. His research interest covers computer vision, pattern recognition, and deep learning. Corresponding author of this paper

    FAN Bao-Jie Master student at Guangdong University of Technology. His research interest covers deep learning and computer vision

    SUN Chao Master student at South China Agricultural University. His research interest covers deep learning and computer vision

    YAN Rui  Advanced Researcher at Guangzhou Image Data Technology Co., Ltd. His research interest covers deep learning and computer vision

    GUO Yi-Shi  Chief executive officer at Guangzhou Image Data Technology Co., Ltd. His research interest covers deep learning and computer vision

  • 摘要: 为了促进智能新零售在线下业务场景的发展, 提高作为销售关键信息价格牌的识别精度. 本文对价格牌识别问题进行研究, 有效地提高了价格牌的识别精度, 并解决小数点定位不准确的难题. 通过深度卷积神经网络提取价格牌的深度语义表达特征, 将提取到的特征图送入多任务循环网络层进行编码, 然后根据解码网络设计的注意力机制解码出价格数字, 最后将多个分支的结果整合并输出完整价格. 本文所提出的方法能够非常有效地提高线下零售场景价格牌的识别精度, 并解决了一些领域难题如小数点的定位问题, 此外, 为了验证本文方法的普适性, 在其他场景数据集上进行了对比实验, 相关结果也验证了本文方法的有效性.
  • 图  1  卷积循环网络结构

    Fig.  1  The structure of convolutional recurrent neural network

    图  2  价格牌图像

    Fig.  2  Images of some price tag samples

    图  4  基础单任务识别网络结构

    Fig.  4  The structure of our basic single recognition network

    图  5  多任务循环卷积网络结构

    Fig.  5  The structure of multi-task RNN

    图  3  基准识别与多分支识别结果的生成方式

    Fig.  3  Baseline method compared with multi-branch method

    图  6  注意力机制网络解码流程图

    Fig.  6  Flowchart of decoder network based on attention

    图  7  与直接识别方法的比较

    Fig.  7  Compared with the single-branch method

    表  1  模块的研究(%)

    Table  1  Study of modules (%)

    ModelGeneral-dataHard-data
    VGG-BiLSTM-CTC50.2020.20
    VGG-BiLSTM-Attn61.2038.60
    ResNet-BiLSTM-CTC55.6028.80
    ResNet-BiLSTM-Attn68.1041.40
    下载: 导出CSV

    表  2  多任务模型结果(%)

    Table  2  Results of multitask model (%)

    ModelGeneral-dataHard-data
    Baseline[13]68.1041.40
    NDPB&IB90.1072.90
    NDPB&DB91.7074.30
    IB&DB92.2073.20
    NDPB&IB&DB93.2075.20
    下载: 导出CSV

    表  3  车牌数据集实验结果(%)

    Table  3  Experimental results on license plate dataset (%)

    DBFNRotateTiltWeatherChallenge
    TE2E[17]96.9094.3090.8092.5087.9085.10
    CCPD[16]96.9094.3090.8092.5087.9085.10
    Our method98.2498.8198.1298.7998.1991.92
    下载: 导出CSV
  • [1] Shi B, Bai X, Yao C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE transactions on pattern analysis and machine intelligence, 2016, 39(11): 2298-2304
    [2] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, et al. Attention is all you need. In: Proceedings of the Neural Information Processing Systems. San Diego, USA: MIT, 2017. 5998−6008
    [3] Luong M T, Pham H, Manning C D. Effective approaches to attention-based neural machine translation [Online], available: https://arxiv.org/abs/1508.04025, Sep 20, 2015
    [4] Li H, Wang P, Shen C. Towards end-to-end text spotting with convolutional recurrent neural networks. In: Proceedings of the IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017. 5238−5246
    [5] Yuan X, He P, Li X A. Adaptive adversarial attack on scene text recognition [Online], available: http://export.arxiv.org/abs/1807.03326, Jul 9, 2018
    [6] Graves A, Fernández S, Gomez F, Schmidhuber J. Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning. Pittsburgh Pennsylvania, USA: ACM, 2006. 369−376
    [7] Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks. In: Proceedings of the Neural Information Processing Systems. Montréal, Canada: MIT, 2014. 3104−3112
    [8] Lei Z, Zhao S, Song H, Shen J. Scene text recognition using residual convolutional recurrent neural network. Machine Vision and Applications, 29(5), 861−871
    [9] Shi B, Yang M, Wang X, Lyu P, Yao C, Bai X. Aster: An attentional scene text recognizer with flexible rectification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 41(9), 2035−2048
    [10] Long M, Wang J. Learning multiple tasks with deep relationship networks [Online], available: https://arxiv.org/abs/1506. 02117v1, Jul 6, 2015
    [11] Veit A, Matera T, Neumann L, Matas J, Belongie S. Coco-text: Dataset and benchmark for text detection and recognition in natural images [Online], available: https://arxiv.org/abs/1601.07140v1, Jan 26, 2016
    [12] Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, Shafait F. ICDAR 2015 competition on robust reading. In: Proceedings of the 13th International Conference on Document Analysis and Recognition (ICDAR). Tunis, Tunisia: IEEE, 2015. 1156−1160
    [13] Baek J, Kim G, Lee J, Park S, Han D. What is wrong with scene text recognition model comparisons? dataset and model analysis [Online], available: https://arxiv.org/abs/1904.01906, Dec 18, 2019
    [14] Bingel J, Søgaard A. Identifying beneficial task relations for multi-task learning in deep neural networks[Online], available: https://arxiv.org/abs/1702.08303, Feb 27, 2017
    [15] Xie Z, Huang Y, Zhu Y, Jin L, Liu Y, Xie L. Aggregation cross-entropy for sequence recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach CA, USA: IEEE, 2019. 6538−6547
    [16] Li H, Wang P, Shen C. Toward end-to-end car license plate detection and recognition with deep neural networks. IEEE Transactions on Intelligent Transportation Systems, 2018, 20(3): 1126-1136
    [17] Xu Z, Yang W, Meng A, Lu N, Huang H, Ying C, Huang L. Towards end-to-end license plate detection and recognition: A large dataset and baseline. In: Proceedings of the European Conference on Computer Vision (ECCV). Munich, Germany: Springer, 2018. 255−271
  • 加载中
图(7) / 表(3)
计量
  • 文章访问数:  1950
  • HTML全文浏览量:  565
  • PDF下载量:  194
  • 被引次数: 0
出版历程
  • 收稿日期:  2019-09-06
  • 录用日期:  2020-02-23
  • 网络出版日期:  2022-01-19
  • 刊出日期:  2022-02-18

目录

    /

    返回文章
    返回