2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

多级注意力传播驱动的生成式图像修复方法

曹承瑞 刘微容 史长宏 张浩琛

曹承瑞, 刘微容, 史长宏, 张浩琛. 多级注意力传播驱动的生成式图像修复方法. 自动化学报, 2022, 48(5): 1343−1352 doi: 10.16383/j.aas.c200485
引用本文: 曹承瑞, 刘微容, 史长宏, 张浩琛. 多级注意力传播驱动的生成式图像修复方法. 自动化学报, 2022, 48(5): 1343−1352 doi: 10.16383/j.aas.c200485
Cao Cheng-Rui, Liu Wei-Rong, Shi Chang-Hong, Zhang Hao-Chen. Generative image inpainting with attention propagation. Acta Automatica Sinica, 2022, 48(5): 1343−1352 doi: 10.16383/j.aas.c200485
Citation: Cao Cheng-Rui, Liu Wei-Rong, Shi Chang-Hong, Zhang Hao-Chen. Generative image inpainting with attention propagation. Acta Automatica Sinica, 2022, 48(5): 1343−1352 doi: 10.16383/j.aas.c200485

多级注意力传播驱动的生成式图像修复方法

doi: 10.16383/j.aas.c200485
基金项目: 国家自然科学基金(61461028, 61861027)资助
详细信息
    作者简介:

    曹承瑞:兰州理工大学硕士研究生. 主要研究方向为深度学习和图像处理. E-mail: xiaocao1239@outlook.com

    刘微容:兰州理工大学教授. 主要研究方向为机器视觉与人工智能, 复杂系统先进控制理论与应用. 本文通信作者. E-mail: liu_weirong@163.com

    史长宏:兰州理工大学博士研究生. 主要研究方向为深度学习和图像处理. E-mail: changhong_shi@126.com

    张浩琛:兰州理工大学电气工程与信息工程学院讲师. 主要研究方向为机器人传感与控制. E-mail: zhanghc@lut.edu.cn

Generative Image Inpainting With Attention Propagation

Funds: Supported by National Natural Science Foundation of China (61461028, 61861027)
More Information
    Author Bio:

    CAO Cheng-Rui Master student at Lanzhou University of Technology. His research interest covers deep learning and image processing

    LIU Wei-Rong Professor at Lanzhou University of Technology. His research interest covers machine vision and artificial intelligence, advanced control theory and application. Corresponding author of this paper

    SHI Chang-Hong Ph. D. candidate at Lanzhou University of Technology. Her research interest covers deep learning and image processing

    ZHANG Hao-Chen Lecturer in the Department of Electrical and Information Engineering at Lanzhou University of Technology. His research interest covers sensing and control of robots

  • 摘要: 现有图像修复方案普遍存在着结构错乱和细节纹理模糊的问题, 这主要是因为在图像破损区域的重建过程中, 修复网络难以充分利用非破损区域内的信息来准确地推断破损区域内容. 为此, 本文提出了一种由多级注意力传播驱动的图像修复网络. 该网络通过将全分辨率图像中提取的高级特征压缩为多尺度紧凑特征, 进而依据尺度大小顺序驱动紧凑特征进行多级注意力特征传播, 以期达到包括结构和细节在内的高级特征在网络中充分传播的目标. 为进一步实现细粒度图像修复重建, 本文还同时提出了一种复合粒度判别器, 以期实现对图像修复过程进行全局语义约束与非特定局部密集约束. 大量实验表明, 本文提出的方法可以产生更高质量的修复结果.
  • 图  1  当前图像修复方法所存在的结构和细节问题展示

    Fig.  1  The structure and detail issues encountered in current image inpainting method

    图  2  常规自动编码器

    Fig.  2  Conventional autoencoder

    图  3  多级注意力特征传播自动编码器

    Fig.  3  Multi-scale attention propagation driven autoencoder

    图  4  多级注意力传播网络整体框架

    Fig.  4  The framework of multistage attention propagation network

    图  5  注意力特征匹配与传播

    Fig.  5  Flowchart of attention feature matching and propagation

    图  6  Places2数据集上的结果比较

    Fig.  6  Comparisons on the test images from Places2 dataset

    图  7  Facade数据集上的结果比较

    Fig.  7  Comparisons on the test image from Facade dataset

    图  8  CelebA-HQ数据集上的结果比较

    Fig.  8  Comparisons on the test image from CelebA-HQ dataset

    图  9  有/无注意力传播时的图像修复结果

    Fig.  9  Results with/without attention propagation

    图  10  有/无复合判别器时的图像修复结果

    Fig.  10  Results with/without compound discriminator

    图  11  在Facade、CelebA-HQ和Places2数据集上的实例研究结果

    Fig.  11  Case study on Facade, CelebA-HQ and Places2

    表  1  3个数据集的训练和测试分割

    Table  1  Training and test splits on three datasets

    数据集训练测试总数
    Facade506100606
    CelebA-HQ28000200030000
    Places280266283285008355128
    下载: 导出CSV

    表  2  CelebA-HQ、Facade和Places2数据集上的定量对比

    Table  2  Quantitative comparisons on CelebA-HQ, Facade and Places2

    数据集 掩码率 PSNR SSIM Mean L1 loss
    CA MC Ours CA MC Ours CA MC Ours
    CelebA-HQ 10% ~ 20% 26.16 29.62 31.35 0.901 0.933 0.945 0.038 0.022 0.018
    20% ~ 30% 23.03 26.53 28.38 0.835 0.888 0.908 0.066 0.038 0.031
    30% ~ 40% 21.62 24.94 26.93 0.787 0.855 0.882 0.087 0.051 0.040
    40% ~ 50% 20.18 23.07 25.46 0.727 0.809 0.849 0.115 0.069 0.052
    Facade 10% ~ 20% 25.93 27.05 28.28 0.897 0.912 0.926 0.039 0.032 0.028
    20% ~ 30% 25.30 24.49 25.36 0.870 0.857 0.871 0.064 0.052 0.047
    30% ~ 40% 22.00 23.21 24.53 0.780 0.815 0.841 0.084 0.068 0.059
    40% ~ 50% 20.84 21.92 23.32 0.729 0.770 0.803 0.106 0.086 0.074
    Places2 10% ~ 20% 22.49 27.34 27.68 0.867 0.910 0.912 0.059 0.031 0.029
    20% ~ 30% 19.95 24.58 25.05 0.786 0.854 0.857 0.097 0.051 0.048
    30% ~ 40% 18.49 22.72 23.41 0.714 0.800 0.805 0.131 0.071 0.066
    40% ~ 50% 17.54 21.42 22.29 0.658 0.755 0.765 0.159 0.089 0.081
    下载: 导出CSV

    表  3  组件有效性研究

    Table  3  Effectiveness study on each component

    Att8
    Att4
    Att2
    Att0
    Single-D
    Cg-D
    Mean L1 loss0.0910.0890.0860.0810.0780.074
    下载: 导出CSV
  • [1] Wang Y, Tao X, Qi X, Shen X, Jia J. Image inpainting via generative multi-column convolutional neural networks. In: Proceedings of the 32nd Conference on Neural Information Processing Systems. Montreal, Canada: Curran Associates, Inc., 2018. 331−340
    [2] Yu J, Lin Z, Yang J, Shen X, Lu X, Huang T S. Generative image inpainting with contextual attention. In: Proceedings of the 2018 IEEE Conference on Computer Vision And Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 5505−5514
    [3] Barnes C, Shechtman E, Finkelstein A, Goldman D B. PatchMatch: A randomized correspondence algorithm for structural image editing. In: Proceedings of the ACM SIGGRAPH Conference. New Orleans, LA, USA: ACM, 2009. 1−11
    [4] Levin A, Zomet A, Peleg S, Weiss Y. Seamless image stitching in the gradient domain. In: Proceedings of the 8th European Conference on Computer Vision. Prague, Czech Republic: Springer, 2004. 377−389
    [5] Voronin V V, Sizyakin R A, Marchuk V I, Cen, Y, Galustov G G, Egiazarian K O. Video inpainting of complex scenes based on local statistical model. Electronic Imaging, 2016, 111(2): 681-690
    [6] Park E, Yang J, Yumer E, Berg A C. Transformation-grounded image generation network for novel 3D view synthesis, In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2017. 702−711
    [7] Simakov D, Caspi Y, Shechtman E, Irani M. Summarizing visual data using bidirectional similarity. In: Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, AK, USA: IEEE, 2008. 1−8
    [8] Yeh R, Chen C, Lim T Y, Hasegawa J M, Do N M. Semantic image inpainting with perceptual and contextual losses. arXiv: 1607.07539v2, 2016.
    [9] Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde F D, Ozair S, et al. Generative adversarial nets. In: Proceedings of the 2014 Advances in Neural Information Processing Systems, arXiv: 1406.2661v1, 2014.
    [10] Mirza M, Osindero S. Conditional generative adversarial nets. arXiv: 14111784, 2014.
    [11] Iizuka Satoshi, Edgar. S-S, Hiroshi I. Globally and locally consistent image completion. ACM Transactions on Graphics, 2017, 36(4): 107:1-107:14
    [12] Song Y, Yang C, Lin Z, Liu X, Huang Q, Li H, et al. Contextual-based image inpainting: Infer, match, and translate. In: Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer, 2018. 3−18
    [13] Ballester C, Bertalmio M, Caselles V, Sapiro G, Verdera J, et al. Filling-in by joint interpolation of vector fields and gray levels. IEEE Transactions on Image Processing, 2001, 10(8): 1200-1211 doi: 10.1109/83.935036
    [14] Bertalmio M, Sapiro G, Caselles V, Ballester C, et al. Image inpainting. In: Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, New Orleans, LA, USA: SIGGRAPH, 2000. 417−424
    [15] Efros A A, Freeman W T. Image quilting for texture synthesis and transfer. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, New York, NY, USA: SIGGRAPH, 2001. 341−346
    [16] 朱为, 李国辉. 基于自动结构延伸的图像修补方法. 自动化学报, 2009, 35(8): 1041-1047 doi: 10.3724/SP.J.1004.2009.01041

    ZHU Wei, LI Guo-Hui. Image Completion Based on Automatic Structure Propagation. ACTA AUTOMATICA SINICA, 2009, 35(8): 1041-1047 doi: 10.3724/SP.J.1004.2009.01041
    [17] 王志明, 张丽. 局部结构自适应的图像扩散. 自动化学报, 2009, 35(3): 244-250 doi: 10.3724/SP.J.1004.2009.00244

    WANG Zhi-Ming, ZHANG Li. Local-structure-adapted Image Diffusion. ACTA AUTOMATICA SINICA, 2009, 35(3): 244-250 doi: 10.3724/SP.J.1004.2009.00244
    [18] 孟祥林, 王正志. 基于视觉掩蔽效应的图像扩散. 自动化学报, 2011, 37(1): 21-27 doi: 10.3724/SP.J.1004.2011.00021

    MENG Xiang-Lin, WANG Zheng-Zhi. Image Diffusion Based on Visual Masking Effect. ACTA AUTOMATICA SINICA, 2011, 37(1): 21-27 doi: 10.3724/SP.J.1004.2011.00021
    [19] Pathak D, Krahenbuhl P, Donahue J, Darrell T, Efros A A. Context encoders: Feature learning by inpainting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA: IEEE, 2016. 2536−2544
    [20] Liu G, Reda F A, Shih K J, Wang T, Tao A, Catanzaro b. Image inpainting for irregular holes using partial convolutions. In: Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer, 2018. 89−105
    [21] Zeng Y, Fu J, Chao H, Guo B. Learning pyramid-context encoder network for high-quality image inpainting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA: IEEE, 2019. 1486−1494
    [22] Hao D, Neekhara P, Chao W, Guo Y. Unsupervised image-to-image translation with generative adversarial networks. arXiv: 1701.02676, 2017.
    [23] Gulrajani I, Ahmed F, Arjovsky M, Vincent D, Aaron C. Improved training of wasserstein gans. In: Proceedings of the 2017 Advances in Neural Information Processing Systems, Long Beach, CA, USA: Curran Associates, Inc., 2017. 5767−5777
    [24] Johnson J, Alahi A, Fei-Fei L. Perceptual losses for real-time style transfer and super-resolution. In: Proceedings of the 2016 European Conference on Computer Vision, Amsterdam, the Netherlands: Springer, 2016. 694−711
    [25] Gatys L A, Ecker A S, Bethge M. Image style transfer using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA: IEEE, 2016. 2414−2423
    [26] Gatys L, Ecker A S, Bethge M. Texture synthesis using convolutional neural networks. In: Proceedings of the 2015 Advances in Neural Information Processing Systems, Montreal, Quebec, Canada: Curran Associates, Inc., 2015. 262−270
    [27] Zhou B, Lapedriza A, Khosla A, Khosla A, Oliva A, Torralba A. Places: A 10 million Image Database for Scene Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(6): 1452-1464 doi: 10.1109/TPAMI.2017.2723009
    [28] Karras T, Aila T, Laine S, Lehtinen J. Progressive growing of GANs for improved quality, stability, and variation. arXiv: 1710.10196, 2017.
    [29] Tyleček R, Šára R. Spatial pattern templates for recognition of objects with regular structure. In: Proceedings of the 2013 German Conference on Pattern Recognition, Münster, Germany: Springer, 2013. 364−374
  • 加载中
图(11) / 表(3)
计量
  • 文章访问数:  1054
  • HTML全文浏览量:  378
  • PDF下载量:  205
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-07-01
  • 录用日期:  2020-12-14
  • 网络出版日期:  2021-02-25
  • 刊出日期:  2022-05-13

目录

    /

    返回文章
    返回