2.765

2022影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于运动引导的高效无监督视频目标分割网络

赵子成 张开华 樊佳庆 刘青山

赵子成, 张开华, 樊佳庆, 刘青山. 基于运动引导的高效无监督视频目标分割网络. 自动化学报, 2023, 49(4): 872−880 doi: 10.16383/j.aas.c210626
引用本文: 赵子成, 张开华, 樊佳庆, 刘青山. 基于运动引导的高效无监督视频目标分割网络. 自动化学报, 2023, 49(4): 872−880 doi: 10.16383/j.aas.c210626
Zhao Zi-Cheng, Zhang Kai-Hua, Fan Jia-Qing, Liu Qing-Shan. Learning motion guidance for efficient unsupervised video object segmentation. Acta Automatica Sinica, 2023, 49(4): 872−880 doi: 10.16383/j.aas.c210626
Citation: Zhao Zi-Cheng, Zhang Kai-Hua, Fan Jia-Qing, Liu Qing-Shan. Learning motion guidance for efficient unsupervised video object segmentation. Acta Automatica Sinica, 2023, 49(4): 872−880 doi: 10.16383/j.aas.c210626

基于运动引导的高效无监督视频目标分割网络

doi: 10.16383/j.aas.c210626
基金项目: 科技创新2030 —— “新一代人工智能”重大项目(2018AAA0100400), 国家自然科学基金(61876088, U20B2065, 61532009), 江苏省333工程人才项目(BRA2020291)资助
详细信息
    作者简介:

    赵子成:南京信息工程大学自动化学院硕士研究生. 主要研究方向为视频目标分割, 深度学习. E-mail: 20191222013@nuist.edu.cn

    张开华:南京信息工程大学自动化学院教授. 主要研究方向为视频目标分割, 视觉追踪. 本文通信作者. E-mail: zhkhua@gmail.com

    樊佳庆:南京信息工程大学自动化学院硕士研究生. 主要研究方向为视频目标分割. E-mail: jqfan@nuaa.edu.cn

    刘青山:南京信息工程大学自动化学院教授. 主要研究方向为视频内容分析与理解. E-mail: qsliu@nuist.edu.cn

Learning Motion Guidance for Efficient Unsupervised Video Object Segmentation

Funds: Supported by National Key Research and Development Program of China (2018AAA0100400), National Natural Science Foundation of China (61876088, U20B2065, 61532009), and 333 High-level Talents Cultivation of Jiangsu Province (BRA2020291)
More Information
    Author Bio:

    ZHAO Zi-Cheng Master student at the School of Automation, Nanjing University of Information Science and Technology. His research interest covers video object segmentation and deep learning

    ZHANG Kai-Hua Professor at the School of Automation, Nanjing University of Information Science and Technology. His research interest covers video object segmentation and visual tracking. Corresponding author of this paper

    FAN Jia-Qing Master student at the School of Automation, Nanjing University of Information Science and Technology. His main research interest is video object segmentation

    LIU Qing-Shan Professor at the School of Automation, Nanjing University of Information Science and Technology. His research interest covers video content analysis and understanding

  • 摘要: 大量基于深度学习的无监督视频目标分割(Unsupervised video object segmentation, UVOS)算法存在模型参数量与计算量较大的问题, 这显著限制了算法在实际中的应用. 提出了基于运动引导的视频目标分割网络, 在大幅降低模型参数量与计算量的同时, 提升视频目标分割性能. 整个模型由双流网络、运动引导模块、多尺度渐进融合模块三部分组成. 具体地, 首先, RGB图像与光流估计输入双流网络提取物体外观特征与运动特征; 然后, 运动引导模块通过局部注意力提取运动特征中的语义信息, 用于引导外观特征学习丰富的语义信息; 最后, 多尺度渐进融合模块获取双流网络的各个阶段输出的特征, 将深层特征渐进地融入浅层特征, 最终提升边缘分割效果. 在3个标准数据集上进行了大量评测, 实验结果表明了该方法的优越性能.
  • 图  1  网络框架图

    Fig.  1  Figure of network structure

    图  2  注意力结构

    Fig.  2  Attention structure

    图  3  UNet方式的上采样与多尺度渐进融合模块

    Fig.  3  Upsampling module and multi-scale progressive fusion module in UNet mode

    图  4  分割结果对比展示

    Fig.  4  Comparative display of segmentation results

    图  5  分割结果展示

    Fig.  5  Display of segmentation results

    表  1  不同模块每秒浮点运算数对比

    Table  1  Comparison of floating-point operations per second of different modules

    输入尺寸 (像素)互注意模块 (MB)运动引导模块 (MB)
    $64 \times 64 \times 16$10.02.3
    $64 \times 32 \times 32$153.19.0
    下载: 导出CSV

    表  2  不同方法在DAVIS-16 和FBMS数据集的评估结果 (%)

    Table  2  Evaluation results of different methods on DAVIS-16 and FBMS datasets (%)

    方法DAVIS-16FBMS
    $J\&F$$J$$F$$J$
    LMP[25]68.070.065.9
    LVO[16]74.075.972.1
    PDB[14]75.977.074.574.0
    MBNM[26]79.580.478.573.9
    AGS[27]78.679.777.4
    COSNet[10]80.080.579.475.6
    AGNN[7]79.980.779.1
    AnDiff[28]81.181.780.5
    MATNet[17]81.682.480.776.1
    本文算法83.683.783.475.9
    下载: 导出CSV

    表  3  不同方法在DAVIS-16、FBMS和ViSal数据集的评估结果 (%)

    Table  3  Evaluation results of different methods on DAVIS-16、FBMS and ViSal datasets (%)

    方法DAVIS-16FBMSViSal
    MAE${F_\beta}$MAE${F_\beta}$MAE${F_\beta}$
    FCNS[29]5.372.910.073.54.187.7
    FGRNE[30]4.378.68.377.94.085.0
    TENet[31]1.990.42.689.71.494.9
    MBNM[26]3.186.24.781.64.7
    PDB[14]3.084.96.981.52.291.7
    AnDiff[28]4.480.86.481.23.090.4
    本文算法1.492.45.984.21.992.1
    下载: 导出CSV

    表  4  不同方法的模型参数量、计算量与推理时延

    Table  4  Model parameters, computation and infer latency of different methods

    算法COSNet[8]MATNet[17]本文算法
    输入尺寸 (像素)$473 \times 473$$473 \times 473$$384 \times 672$
    参数量 (MB)81.2142.76.4
    计算量 (GB)585.5193.75.4
    时延 (ms)657815
    下载: 导出CSV

    表  5  不同方法在GTX2080 Ti上的性能表现

    Table  5  Performance of different methods on GTX2080 Ti

    方法并发量每秒帧数时延 (ms)
    MATNet[17]181662.40
    本文算法1301616.21
    下载: 导出CSV

    表  6  运动引导模块与多尺度渐进融合模块的消融实验(%)

    Table  6  Ablation experiment on motion guidance module and multi-scale progressivefusion module (%)

    指标本文算法$无\; {\rm{FG} }$${\rm{FG}}$
    $J$83.775.876.1
    $F$83.473.575.6
    下载: 导出CSV

    表  7  不同核K大小与堆叠次数对比

    Table  7  Comparison of different Kernel sizes and cascading times

    K堆叠层数$J$ (%)$F$ (%)
    3182.882.4
    3283.482.7
    3383.783.4
    3483.5 83.2
    5183.282.6
    7183.482.7
    9183.182.4
    下载: 导出CSV
  • [1] Papazoglou A, Ferrari V. Fast object segmentation in unconstrained video. In: Proceedings of the IEEE International Conference on Computer Vision. Sydney, Australia: IEEE, 2013. 1777−1784
    [2] 黄宏图, 毕笃彦, 侯志强, 胡长城, 高山, 查宇飞, 库涛. 基于稀疏表示的视频目标跟踪研究综述[J]. 自动化学报, 2018, 44(10): 1747-1763

    HUANG Hong-Tu, BI Du-Yan, HOU Zhi-Qiang, HU Chang-Cheng, GAO Shan, ZHA Yu-Fei, KU Tao. Research of Sparse Representation-based Visual Object Tracking: A Survey. Acta Automatica Sinica, 2018, 44(10): 1747-1763
    [3] Wang W, Shen J, Porikli F. Saliency-aware geodesic video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE, 2015. 3395−3402
    [4] 钱生, 陈宗海, 林名强, 张陈斌. 基于条件随机场和图像分割的显著性检测[J]. 自动化学报, 2015, 41(4): 711-724

    QIAN Sheng, CHEN Zong-Hai, LIN Ming-Qiang, ZHANG Chen-Bin. Saliency Detection Based on Conditional Random Field and Image Segmentation. Acta Automatica Sinica, 2015, 41(4): 711-724.
    [5] Ochs P, Brox T. Object segmentation in video: A hierarchical variational approach for turning point trajectories into dense regions. In: Proceedings of the IEEE International Conference on Computer Vision. Barcelona, Spain: IEEE, 2011. 1583−1590
    [6] 苏亮亮, 唐俊, 梁栋, 王年. 基于最大化子模和RRWM的视频协同分割[J]. 自动化学报, 2016, 42(10): 1532-1541

    SU Liang-Liang, TANG Jun, LIANG Dong, WANG Nian. A Video Co-segmentation Algorithm by Means of Maximizing Submodular Function and RRWM. Acta Automatica Sinica, 2016, 42(10): 1532-1541
    [7] Ventura C, Bellver M, Girbau A, Salvador A, Marques F, Giroinieto X. RVOS: End-to-end recurrent network for video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019. 5277−5286
    [8] Wang W, Lu X, Shen J, Crandall D J, Shao L. Zero-shot video object segmentation via attentive graph neural networks. In: Proceedings of the IEEE International Conference on Computer Vision. Seoul, South Korea: IEEE, 2019. 9236−9245
    [9] Chen L C, Papandreou G, Schroff F, Adam H. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv: 1706.05587, 2017.
    [10] Lu X, Wang W, Ma C, Shen J, Shao L, Porikli F. See more, know more: Unsupervised video object segmentation with co-attention siamese networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019. 3623−3632
    [11] Faktor A, Irani M. Video segmentation by non-local consensus voting. In: Proceedings of the British Machine Vision Conference. Nottingham, UK: 2014.
    [12] Perazzi F, Pont-Tuset J, McWilliams B, Van-Gool L, Gross M, Sorkine-Hornung A. A benchmark dataset and evaluation methodology for video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016. 724−732
    [13] Xu N, Yang L J, Fan Y C, Yang J C, Yue D C, Liang Y C, et al. Youtube-VOS: Sequence-to-sequence video object segmentation. In: Proceedings of the European Conference on Computer Vision.Munich, Germany: 2018. 585−601
    [14] Song H, Wang W, Zhao S, Shen J, Lam K M. Pyramid dilated deeper ConvLSTM for video salient object detection. In: Proceed-ings of the European Conference on Computer Vision. Munich, Germany: 2018. 715−731
    [15] Jampani V, Gadde R, Gehler P V. Video propagation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Hawaii, USA: IEEE, 2017. 451−461
    [16] Tokmakov P, Alahari K, Schmid C. Learning video object segmentation with visual memory. In: Proceedings of the IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017. 4481−4490
    [17] Zhou T, Li J, Wang S, Tao R, Shen J. Matnet: Motion-attentive transition network for zero-shot video object segmentation. IEEE Transactions on Image Processing, 2020, 29: 8326−8338
    [18] Chu X, Yang W, Ouyang W, Ma C, Yuille A L, Wang X. Multi-context attention for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Hawaii, USA: IEEE, 2017. 1831−1840
    [19] Chen L, Zhang H, Xiao J, Nie L, Shao J, Liu W, et al. SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Hawaii, USA: IEEE, 2017. 5659−5667
    [20] Lu J, Yang J, Batra D, Parikh D. Hierarchical question-image co-attention for visual question answering. arXiv preprint arXiv: 1606.00061, 2016.
    [21] Wu Q, Wang P, Shen C, Reid I, Van-Den-Hengel A. Are you talking to me? Reasoned visual dialog generation through adversarial learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 6106−6115
    [22] Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L C. Mobile-Net v2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 4510−4520
    [23] Ronneberger O, Fischer P, Brox T. UNet: Convolutional networksfor biomedical image segmentation. In: Proceedings of the International Conference on Medical Image Computing and Computer-assisted Intervention. Munich, Germany: 2015. 234−241
    [24] Wang W, Shen J, Shao L. Consistent video saliency using local gradient flow optimization and global refinement. IEEE Transactions on Image Processing, 2015, 24(11): 4185−4196
    [25] Tokmakov P, Alahari K, Schmid C. Learning motion patterns in videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Hawaii, USA: IEEE, 2017. 3386−3394
    [26] Li S, Seybold B, Vorobyov A, Lei X, Kuo C C J. Unsupervised video object segmentation with motion-based bilateral networks. In: Proceedings of the European Conference on Computer Vision. Munich, Germany: 2018. 207−223
    [27] Wang W, Song H, Zhao S, Shen J, Zhao S, Hoi S C, et al. Learning unsupervised video object segmentation through visual attention. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019. 3064−3074
    [28] Yang Z, Wang Q, Bertinetto L, Hu W, Bai S, Torr P H. Anchor diffusion for unsupervised video object segmentation. In: Procee-dings of the IEEE International Conference on Computer Vision. Seoul, South Korea: IEEE, 2019. 931−940
    [29] Wang W, Shen J, Shao L. Video salient object detection via fully convolutional networks. IEEE Transactions on Image Processing, 2017, 27(1): 38−49
    [30] Li G, Xie Y, Wei T, Wang K, Lin L. Flow guided recurrent neural encoder for video salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 3243−3252
    [31] Ren S, Han C, Yang X, Han G, He S. TENet: Triple excitation network for video salient object detection. In: Proceedings of the European Conference on Computer Vision. Edinburgh, Scotland: 2020. 212−228
  • 加载中
图(5) / 表(7)
计量
  • 文章访问数:  985
  • HTML全文浏览量:  472
  • PDF下载量:  139
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-07-06
  • 网络出版日期:  2021-11-20
  • 刊出日期:  2023-04-20

目录

    /

    返回文章
    返回