2.793

2018影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

面向精准价格牌识别的多任务循环神经网络

牟永强 范宝杰 孙超 严蕤 郭怡适

牟永强, 范宝杰, 孙超, 严蕤, 郭怡适. 面向精准价格牌识别的多任务循环神经网络. 自动化学报, 2020, 45(x): 1−7 doi: 10.16383/j.aas.c190633
引用本文: 牟永强, 范宝杰, 孙超, 严蕤, 郭怡适. 面向精准价格牌识别的多任务循环神经网络. 自动化学报, 2020, 45(x): 1−7 doi: 10.16383/j.aas.c190633
Mou Yong-Qiang, Fan Bao-Jie, Sun Chao, Yan Rui, Guo Yi-Shi. Towards accurate price tag recognition algorithm with multi-task RNN. Acta Automatica Sinica, 2020, 45(x): 1−7 doi: 10.16383/j.aas.c190633
Citation: Mou Yong-Qiang, Fan Bao-Jie, Sun Chao, Yan Rui, Guo Yi-Shi. Towards accurate price tag recognition algorithm with multi-task RNN. Acta Automatica Sinica, 2020, 45(x): 1−7 doi: 10.16383/j.aas.c190633

面向精准价格牌识别的多任务循环神经网络

doi: 10.16383/j.aas.c190633
详细信息
    作者简介:

    牟永强:广州图匠数据科技有限公司首席AI架构师. 在此之前任职惠普实验室高级机器学习研究员. 2012年获西安理工大学信号与信息处理专业硕士学位. 主要研究方向为机器视觉,模式识别以及深度学习. 本文通信作者.E-mail: yongqiang.mou@gmail.com

    范宝杰:广东工业大学硕士研究生,主要研究方向为深度学习和计算机视觉.E-mail: 735678367@qq.com

    孙超:华南农业大学研究生,主要研究方向为深度学习和计算机视觉.E-mail: ice_moyan@163.com

    严蕤:广州图匠数据科技有限公司高级研究员,主要研究方向为深度学习和计算机视觉.E-mail: reeyree@163.com

    郭怡适:广州图匠数据科技有限公司首席执行官,主要研究方向为深度学习和计算机视觉.E-mail: yi.shi@imagedt.com

Towards accurate price tag recognition algorithm with multi-task RNN

  • 摘要: 为了促进智能新零售在线下业务场景的发展, 提高作为销售关键信息价格牌的识别精度. 本文对价格牌识别问题进行研究, 有效地提高了价格牌的识别精度, 并解决小数点定位不准确的难题. 通过深度卷积神经网络提取价格牌的深度语义表达特征, 将提取到的特征图送入多任务循环网络层进行编码, 然后根据解码网络设计的注意力机制解码出价格数字, 最后将多个分支的结果整合并输出完整价格. 本文所提出的方法能够非常有效的提高线下零售场景价格牌的识别精度, 并解决了一些领域难题如小数点的定位问题, 此外, 为了验证本文方法的普适性, 在其他场景数据集上进行了对比实验, 相关结果也验证了本文方法的有效性.
  • 图  1  卷积循环网络结构

    Fig.  1  The structure of convolutional recurrent neural network

    图  2  价格牌图像

    Fig.  2  Images of some price tag samples

    图  4  基础单任务识别网络结构

    Fig.  4  The structure of our basic single recognition network

    图  3  基准识别与多分支识别结果的生成方式

    Fig.  3  Baseline method compared with multi-branch method

    图  5  多任务循环卷积网络结构

    Fig.  5  The structure of multi-task RNN

    图  6  注意力机制网络解码流程图

    Fig.  6  Flowchart of decoder network based on attention

    图  7  与直接识别方法的比较

    Fig.  7  Compared with the single-branch method

    表  1  模块的研究

    Table  1  Study of modules

    ModelGeneral-dataHard-data
    VGG-BiLSTM-CTC50.20%20.20%
    VGG-BiLSTM-Attn61.20%38.60%
    ResNet-BiLSTM-CTC55.60%28.80%
    ResNet-BiLSTM-Attn68.10%41.40%
    下载: 导出CSV

    表  2  多任务模型结果

    Table  2  Results of multitask model

    ModelGeneral-dataHard-data
    Baseline[13]68.10%41.40%
    NDPB&IB90.10%72.90%
    NDPB&DB91.70%74.30%
    IB&DB92.20%73.20%
    NDPB&IB&DB93.20%75.20%
    下载: 导出CSV

    表  3  车牌数据集实验结果

    Table  3  Experimental results on license plate dataset

    DBFNRotateTiltWeatherChallenge
    TE2E[17]96.90%94.30%90.80%92.50%87.90%85.10%
    CCPD[16]96.90%94.30%90.80%92.50%87.90%85.10%
    Ours method98.24%98.81%98.12%98.79%98.19%91.92%
    下载: 导出CSV
  • [1] 1 Shi B, Bai X, Yao C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE transactions on pattern analysis and machine intelligence, 2016, 39(11): 2298−2304
    [2] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Polosukhin I. Attention is all you need//Advances in neural information processing systems. 2017: 5998−6008
    [3] Luong M T, Pham H, Manning C D. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv: 1508.04025, 2015
    [4] Li H, Wang P, Shen C. Towards end-to-end text spotting with convolutional recurrent neural networks//Proceedings of the IEEE International Conference on Computer Vision. IEEE, 2017: 5238−5246
    [5] Yuan X, He P, Li X A. Adaptive adversarial attack on scene text recognition. arXiv preprint arXiv: 1807.03326, 2018
    [6] Graves A, Fernández S, Gomez F, Schmidhuber J. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks//Proceedings of the 23rd international conference on Machine learning. ACM, 2006: 369−376
    [7] 7 Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks. Advances in neural information processing systems, 2014, : 3104−3112
    [8] 8 Lei Z, Zhao S, Song H, Shen J. Scene text recognition using residual convolutional recurrent neural network. Machine Vision and Applications, 2018, 29(5): 861−871 doi: 10.1007/s00138-018-0942-y
    [9] Shi B, Yang M, Wang X, Lyu P, Yao C, Bai X. Aster: An attentional scene text recognizer with flexible rectification. IEEE transactions on pattern analysis and machine intelligence, 2018
    [10] Long M, Wang J. Learning multiple tasks with deep relationship networks. arXiv preprint arXiv: 1506.02117, 2015, 2
    [11] Veit A, Matera T, Neumann L, Matas J, Belongie S. Coco-text: Dataset and benchmark for text detection and recognition in natural images. arXiv preprint arXiv: 1601.07140, 2016
    [12] Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, Shafait F. ICDAR 2015 competition on robust reading//2015 13th International Conference on Document Analysis and Recognition (ICDAR). IEEE, 2015: 1156−1160
    [13] Baek J, Kim G, Lee J, Park S, Han D. What is wrong with scene text recognition model comparisons? dataset and model analysis. arXiv preprint arXiv: 1904.01906, 2019
    [14] Bingel J, Søgaard A. Identifying beneficial task relations for multi-task learning in deep neural networks. arXiv preprint arXiv: 1702.08303, 2017
    [15] Xie Z, Huang Y, Zhu Y, Jin L, Liu Y, Xie L. Aggregation Cross-Entropy for Sequence Recognition// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2019: 6538−6547
    [16] 16 Li H, Wang P, Shen C. Toward end-to-end car license plate detection and recognition with deep neural networks. IEEE Transactions on Intelligent Transportation Systems, 2018, 20(3): 1126−1136
    [17] Xu Z, Yang W, Meng A, Lu N, Huang H, Ying C, Huang L. Towards End-to-End License Plate Detection and Recognition: A Large Dataset and Baseline// Proceedings of the European Conference on Computer Vision (ECCV). 2018: 255−271
  • 加载中
计量
  • 文章访问数:  590
  • HTML全文浏览量:  356
  • 被引次数: 0
出版历程
  • 收稿日期:  2019-09-06
  • 录用日期:  2020-02-23

目录

    /

    返回文章
    返回