2.765

2022影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

多阶段注意力胶囊网络的图像分类

宋燕 王勇

宋燕, 王勇. 多阶段注意力胶囊网络的图像分类. 自动化学报, 2021, 47(x): 1−14 doi: 10.16383/j.aas.c210012
引用本文: 宋燕, 王勇. 多阶段注意力胶囊网络的图像分类. 自动化学报, 2021, 47(x): 1−14 doi: 10.16383/j.aas.c210012
Song Yan, Wang Yong. Multi-stage attention-based capsule networks for image classification. Acta Automatica Sinica, 2021, 47(x): 1−14 doi: 10.16383/j.aas.c210012
Citation: Song Yan, Wang Yong. Multi-stage attention-based capsule networks for image classification. Acta Automatica Sinica, 2021, 47(x): 1−14 doi: 10.16383/j.aas.c210012

多阶段注意力胶囊网络的图像分类

doi: 10.16383/j.aas.c210012
基金项目: 国家自然科学基金 (62073223), 上海市自然科学基金 (18ZR1427100), 航天飞行动力学技术国防科技重点实验室开放课题 (6142210200304)
详细信息
    作者简介:

    宋燕:上海理工大学教授. 2001年吉林大学获得学士学位, 2005年电子科技大学获得硕士学位, 2013年上海交通大学获得博士学位. 2016年至2017年, 访问英国布鲁奈尔大学主要研究方向为模式识别, 数据分析和预测控制. 本文通信作者. E-mail: sonya@usst.edu.cn

    王勇:上海理工大学硕士研究生. 2019年获得皖西学院学士学位. 主要研究方向为图像处理. E-mail: 18856496454@163.com

Multi-stage Attention-Based Capsule Networks for Image Classification

Funds: Supported in part by the National Natural Science Foundation of China under Grants (62073223), the Natural Science Foundation of Shanghai under Grant (18ZR1427100), and the Open Project of Key Laboratory of Aerospace Flight Dynamics and National Defense Science and Technology under Grants (6142210200304)
More Information
    Author Bio:

    SONG Yan Professor at University of Shanghai for Science and Technology. She received her bachelor degree from Jilin University in 2001, the master degree from University of Electronic Science and Technology of China in 2005, and the Ph.D. degree from Shanghai Jiao Tong University in 2013. From 2016 to 2017, she visited Brunel University, UK. Her research interests include pattern recognition, data analysis and predictive control. Corresponding author of this paper

    WANG Yong Master student at University of Shanghai for Science and Technology. He received his bachelor degree from Western Anhui University in 2019. His main research interest is image processing

  • 摘要: 本文针对胶囊网络特征提取不充分的问题, 提出了一种图像分类的多阶段注意力胶囊网络模型. 首先在卷积层对低层特征和高层特征分别采用空间和通道注意力来提取有效特征; 然后提出基于向量方向的注意力机制作用于动态路由层, 增加对重要胶囊的关注, 进而提高低层胶囊对高层胶囊预测的准确性; 最后, 在五个公共数据集上进行对比实验, 结果表明本文提出的模型在分类精度和鲁棒性上优于其他胶囊网络模型, 在仿射变换图像重构上也表现良好.
  • 图  1  胶囊网络结构图

    Fig.  1  The structure of CapsNet

    图  2  多阶段注意力的胶囊网络模型

    Fig.  2  A capsule network model of multi-stage attention

    图  3  CA和SA机制

    Fig.  3  Channel attention mechanism and spatial attention mechanism

    图  4  向量注意力机制

    Fig.  4  Vector Attention mechanism

    图  5  图像重构

    Fig.  5  Image reconstruction

    图  6  不同改进模块在五个数据集上的迭代曲线

    Fig.  6  Iteration curves of different improvement modules over five data sets

    图  7  原图和仿射变换图 (a): MNIST数据集, (b): 旋转后的MNIST数据集

    Fig.  7  Raw image and affine image (a): MNIST dataset, (b): MNIST dataset after rotation

    图  8  不同模型的鲁棒性对比实验

    Fig.  8  Comparison of robustness of different models

    图  9  (a): MNIST真实图像, (b): 胶囊网络重构, (c): 本文模型重构

    Fig.  9  (a): Real image for MNIST, (b): capsule network reconstruction, (c): our model reconstruction

    图  13  (a): smallNORB真实图像, (b): 胶囊网络重构, (c): 本文模型重构

    Fig.  13  (a): Real image for smallNORB, (b): capsule network reconstruction, (c): our model reconstruction

    图  10  (a): Fashion-MNIST真实图像, (b): 胶囊网络重构, (c): 本文模型重构

    Fig.  10  (a): Real image for Fashion-MNIST, (b): capsule network reconstruction, (c): our model reconstruction

    图  11  (a): CIFAR-10真实图像, (b): 胶囊网络重构, (c): 本文模型重构

    Fig.  11  (a): Real image for CIFAR-10, (b): capsule network reconstruction, (c): our model reconstruction

    图  12  (a): SVHN真实图像, (b): 胶囊网络重构, (c): 本文模型重构

    Fig.  12  (a): Real image for SVHN, (b): capsule network reconstruction, (c): our model reconstruction

    图  14  MINST数据集原图和仿射变换图 (a): 真实图像, (b): 旋转25度, (c): 旋转−25度

    Fig.  14  Comparison of reconstruction (a): Real image, (b): 25 degrees rotation, (c): −25 degrees rotation

    图  15  图14(b)的重构实验对比图 (a): 文献[10]CapsNet重构, (b): 本文模型重构

    Fig.  15  Comparison of reconstruction to Fig.14(b) (a): Reconstruction by CapsNet, (b): Reconstruction by our model

    图  16  图14(c)的重构实验对比图 (a): 文献[10]CapsNet重构, (b): 本文模型重构

    Fig.  16  Comparison of reconstruction to Fig.14(c) (a): reconstruction by CapsNet, (b): reconstruction by our model

    图  17  本文模型和文献[10]的CapsNet重构损失对比曲线

    Fig.  17  Comparison of reconstruction loss curves between our model and CapsNet in [10]

    表  1  不同改进模块在五个数据集上的分类错误率

    Table  1  Classification error rates of different improved modules on five datasets

    模型MNISTFashion-MNISTCIFAR-10SVHNsmallNORB
    Baseline0.38%7.11%21.21%5.12%5.62%
    Baseline+(SA+CA)0.32%5.54%11.69%4.61%5.07%
    Baseline+VA0.28%5.53%14.65%4.99%5.21%
    Baseline+(SA+CA+VA)0.22%4.63%9.99%4.08%4.89%
    下载: 导出CSV

    表  2  不同模型在五个数据集上的分类错误率

    Table  2  Classification error rates of different models on five datasets

    模型MNISTFashion-MNISTCIFAR-10SVHNsmallNORB
    Prem Nair et al.’s CapsNet [5]0.5%10.2%31.47%8.94%
    HitNet [7]0.32%7.7%26.7%5.5%
    Matrix Capsule EM Routing [9]0.7%5.97%16.79%9.64%5.2%
    SACN [10]0.5%5.98%16.65%5.01%7.79%
    AR CapsNet [11]0.54%12.71%
    DCNet [30]0.25%5.36%17.37%4.42%5.57%
    MS-CapsNet [31]6.01%18.81%
    VB-Routing[32]5.2%11.2%4.75%1.6%
    Aff-CapsNets[33]0.46%7.47%23.72%7.85%
    Ours0.22%4.63%9.99%4.08%4.89%
    下载: 导出CSV

    表  3  不同模型的鲁棒性对比实验

    Table  3  Robustness comparison test of different models

    模型MNISTMNIST-Rotation
    CNN0.74%5.52%
    CapsNet[6]0.38%2.11%
    EM Routing[9]0.43%2.65%
    Ours0.22%0.63%
    下载: 导出CSV
  • [1] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In: Proceedings of the 2012 Conference on Neural Information Processing Systems. Lake Tahoe, NV, USA: NIPS, 2012. 1097−1105.
    [2] Simonyan K, Zissweman A. Very deep convolutional networks for large-scale image recognition. In: Proceedings of the 2015 International Conference on Learning Representations. San Diego, CA, USA: ICLR, 2015. 1−14.
    [3] Howard A G, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, et al. Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv: 1704.04861, 2017.
    [4] Huang G, Liu Z, Van Der Maaten L, Weinberger K Q. Densely connected convolutional networks. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2017. 2261−2269.
    [5] Nair P, Doshi R, Keselj S. Pushing the limits of capsule networks. arXiv preprint arXiv: 2103.08074, 2021.
    [6] Sabour S, Frosst N, Hinton G E. Dynamic routing between capsules. In: Proceedings of the 2017 Neural Information Processing Systems. Long Beach, CA, USA: NIPS, 2017. 3856−3866.
    [7] Deliege A, Cioppa A, Van Droogenbroeck M. Hitnet: a neural network with capsules embedded in a hit-or-miss layer, extended with hybrid data augmentation and ghost capsules. arXiv preprint arXiv: 1806.06519, 2018.
    [8] Xi E, Bing S, Jin Y. Capsule network performance on complex data. arXiv preprint arXiv: 1712.03480, 2017.
    [9] Hinton G E, Sabour S, Frosst N. Matrix capsules with EM routing. In: Proceedings of the 2018 International Conference on Learning Representations. Vancouver, BC, Canada: ICLR, 2018, 1−15.
    [10] Hoogi A, Wilcox B, Gupta Y, Rubin D L. Self-attention capsule networks for object classification. arXiv preprint arXiv: 1904.12483, 2019.
    [11] Choi J, Seo H, Im S, Kang M. Attention routing between capsules. In: Proceedings of the 2019 IEEE International Conference on Computer Vision. Seoul, Korea(South): IEEE, 2019. 1981−1989.
    [12] Wang X, Tu Z, Zhang M. Incorporating statistical machine translation word knowledge into neural machine translation. IEEE/ACM Transactions on Audio, Speech, and Language Proceeding, 2018, 26(12): 2255−2266 doi: 10.1109/TASLP.2018.2860287
    [13] Zhang B, Xiong D, Su J. Neural machine translation with deep attention. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 42(1): 154−163
    [14] Zhang B, Xiong D, Xie J, Su J. Neural machine translation with gru-gated attention model. IEEE Transactions on Neural Networks and Learning Systems, 2020, 31(11): 4688−4698 doi: 10.1109/TNNLS.2019.2957276
    [15] 王金甲, 纪绍男, 崔琳, 夏静, 杨倩. 基于注意力胶囊网络的家庭活动识别. 自动化学报, 2019, 45(11): 2199−2204

    Wang Jin-Jia, Ji Shao-Nan, Cui Lin, Xia Jing, Yang Qian. Identification of Family Activities based on Attention Capsule Network. Acta Automatica Sinica, 2019, 45(11): 2199−2204
    [16] Xu K, Ba J, Kiros R, Cho K, Courville Aaron, Salakhutdinov R, et al. Show, attend and tell: Neural image caption generation with visual attention. In: Proceedings of the 2015 International Conference on Machine Learning. Lugano, Switzerland: ICML, 2015. 2048−2057.
    [17] Gao L, Li X, Song J, Shen H T. Hierarchical lstms with adaptive attention for visual captioning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 42(5): 1112−1131
    [18] Lu X, Wang B, Zheng X. Sound active attention framework for remote sensing image captioning. IEEE Transactions on Geoscience and Remote Sensing, 2019, 58(3): 1985−2000
    [19] Wang X, Duan H. Hierarchical visual attention model for saliency detection inspired by avian pathways. IEEE/CAA Journal of Automatica Sinica, 2017, 6(2): 540−552
    [20] Xu H, Saenko K. Ask, attend and answer: Exploring question-guided spatial attention for visual question answering. In: Proceedings of the 2016 European Conference on Computer Vision. Amsterdam, Netherlands: ECCV, 2016. 451−466.
    [21] Liang J, Jiang L, Cao L, Kalantidis Y, Li L J, Hauptmann A G. Focal visual-text attention for memex question answering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(8): 1893−1908 doi: 10.1109/TPAMI.2018.2890628
    [22] 肖进胜, 申梦瑶, 江明俊, 雷俊峰, 包振宇. 融合包注意力机制的监控视频异常行为检测. 自动化学报, 在线, DOI: 10.16383/j.aas.c190805.

    Xiao Jin-Sheng, Shen Meng-Yao, Jiang Ming-Jun, Lei Jun-Feng, Bao Zheng-Yu. Detection of abnormal behavior in surveillance video by fusion Packet attention mechanism. Acta Automatica Sinica, online, DOI: 10.16383/j.aas.c190805.
    [23] Zhao X, Chen Y, Guo J, Zhao D. A spatial-temporal attention model for human trajectory prediction. IEEE/CAA Journal of Automatica Sinica, 2020, 7(4): 965−974 doi: 10.1109/JAS.2020.1003228
    [24] 王亚珅, 黄河燕, 冯冲, 周强. 基于注意力机制的概念化句嵌入研究. 自动化学报, 2020, 46(7): 1390−1400

    Wang Ya-Kun, Huang He-Yan, Feng Chong, Zhou Qiang. A study of conceptual sentence embedding based on attentional mechanism. Acta Automatica Sinica, 2020, 46(7): 1390−1400
    [25] 冯建周, 马祥聪. 基于迁移学习的细粒度实体分类方法的研究. 自动化学报, 2020, 46(8): 1759−1766

    Feng Jian-Zhou, Ma Xiang-Cong. Research on fine-grained entity classification method based on transfer learning. Acta Automatica Sinica, 2020, 46(8): 1759−1766
    [26] 王县县, 禹龙, 田生伟, 王瑞锦. 独立RNN和胶囊网络的维吾尔语事件缺失元素填充. 自动化学报, 2021, 47(4): 903−912

    Wang Xian-Xian, Yu Long, Tian Sheng-Wei, Wang Rui-Jing. Independent RNN and CAPE networks were populated with missing elements of Uyghur events. Acta Automatica Sinica, 2021, 47(4): 903−912
    [27] Wang X, Girshick R, Gupta A, He K. Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake, USA: IEEE, 2018.7794−7803.
    [28] Woo S, Park J, Lee J Y, Kweon I S. Cbam: Convolutional block attention module. In: Proceedings of the 2018 European Conference on Computer Vision. Munich, Germany: ECCV, 2018. 3−19.
    [29] Hu J, Shen L, Sun G, Wu E. Squeeze-and-excitation networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(8): 2011−2023 doi: 10.1109/TPAMI.2019.2913372
    [30] Phaye S S R, Sikka A, Dhall A, Bathula D. Dense and diverse capsule networks: Making the capsules learn better. arXiv preprint arXiv: 1805.04001, 2018.
    [31] Xiang C, Zhang L, Tang Y, Zou W, Xu C. MS-CapsNet: A novel multi-scale capsule network. IEEE Signal Processing Letters, 2018, 25(12): 1850−1854 doi: 10.1109/LSP.2018.2873892
    [32] Ribeiro F D S, Leontidis G, Kollias S. Capsule routing via variational bayes. In: Proceedings of the AAAI Conference on Artificial Intelligence. New York, Hilton Midtown, New York, United States: AAAI, 2020. 3749−3756.
    [33] Gu J, Tresp V. Improving the robustness of capsule networks to image affine transformation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA: IEEE, 2020. 7283−7291.
  • 加载中
计量
  • 文章访问数:  1434
  • HTML全文浏览量:  1050
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-01-05
  • 录用日期:  2021-05-12
  • 网络出版日期:  2021-06-20

目录

    /

    返回文章
    返回