2.765

2022影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于光流与多尺度上下文的图像序列运动遮挡检测

冯诚 张聪炫 陈震 李兵 黎明

冯诚, 张聪炫, 陈震, 李兵, 黎明. 基于光流与多尺度上下文的图像序列运动遮挡检测. 自动化学报, 2021, x(x): 1001−1012 doi: 10.16383/j.aas.c210324
引用本文: 冯诚, 张聪炫, 陈震, 李兵, 黎明. 基于光流与多尺度上下文的图像序列运动遮挡检测. 自动化学报, 2021, x(x): 1001−1012 doi: 10.16383/j.aas.c210324
Feng Cheng, Zhang Cong-Xuan, Chen Zhen, Li Bing, Li Ming. Occlusion detection based on optical flow and multiscale context aggregation. Acta Automatica Sinica, 2021, x(x): 1001−1012 doi: 10.16383/j.aas.c210324
Citation: Feng Cheng, Zhang Cong-Xuan, Chen Zhen, Li Bing, Li Ming. Occlusion detection based on optical flow and multiscale context aggregation. Acta Automatica Sinica, 2021, x(x): 1001−1012 doi: 10.16383/j.aas.c210324

基于光流与多尺度上下文的图像序列运动遮挡检测

doi: 10.16383/j.aas.c210324
基金项目: 国家重点研发计划(2020YFC2003800), 国家自然科学基金(61866026, 61772255, 61866025), 江西省杰出青年人才计划(20192BCB23011), 江西省自然科学基金重点项目(20202ACB214007), 江西省优势科技创新团队(20165BCB19007)资助
详细信息
    作者简介:

    冯诚:南昌航空大学测试与光电工程学院硕士研究生. 主要研究方向为计算机视觉. E-mail: fengcheng00016@163.com

    张聪炫:南昌航空大学测试与光电工程学院副教授. 2014年获得南京航空航天大学博士学位, 主要研究方向为图像处理与计算机视觉. 本文通讯作者. E-mail: zcxdsg@163.com

    陈震:南昌航空大学测试与光电工程学院教授, 2003年获得西北工业大学博士学位, 主要研究方向为图像处理与计算机视觉. E-mail: dr_chenzhen@163.com

    李兵:中国科学院自动化研究所模式识别国家重点实验室研究员, 2009年获得北京交通大学博士学位, 主要研究方向为视频内容理解与多媒体内容安全. E-mail: bli@nlpr.ia.ac.cn

    黎明:南昌航空大学信息工程学院教授, 1997年获得南京航空航天大学博士学位, 主要研究方向为图像处理与人工智能. E-mail: liming@nchu.edu.com

Occlusion Detection Based on Optical Flow and Multiscale Context Aggregation

Funds: Supported by National Key Research and Development Program of China (2020YFC2003800), National Natural Science Foundation of China (61866026, 61772255 and 61866025), Outstanding Young Scientist Project of Jiangxi Province (20192BCB23011), National Natural Science Foundation of Jiangxi Province (20202ACB214007) and Advantage Subject Team of Jiangxi Province (20165BCB19007)
More Information
    Author Bio:

    FENG Cheng Master student at the School of Measuring and Optical Engineering, Nanchang Hangkong University, China. His main research interest is computer vision

    ZHANG Cong-Xuan Assistant Professor at School of Measuring and Optical Engineering, Nanchang Hangkong University, China. He received his Ph.D. degree from Nanjing University of Aeronautics and Astronautics in 2014. His research interest covers image processing and computer vision. Corresponding author of this paper

    CHEN Zhen Professor at School of Measuring and Optical Engineering, Nanchang Hangkong University, China. He received his Ph.D. degree from Northwestern Polytechnical University in 2003. His main research interest is image processing and computer vision

    LI Bing Professor at the National Key Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences. He received his Ph.D. degree from Beijing Jiaotong University in 2009. His research interest includes video understanding and multimedia content security

    Li Ming Professor at School of Information Engineering, Nanchang Hangkong University, China. He received the Ph.D. degree from Nanjing University of Aeronautics and Astronautics in 1997. His research interest covers image processing and artificial intelligence

  • 摘要: 针对非刚性运动和大位移场景下运动遮挡检测的准确性与鲁棒性问题, 本文提出一种基于光流与多尺度上下文的图像序列运动遮挡检测方法. 首先, 设计基于扩张卷积的多尺度上下文信息聚合网络, 通过图像序列多尺度上下文信息获取更大范围的图像特征; 然后, 采用特征金字塔构建基于多尺度上下文与光流的端到端运动遮挡检测网络模型, 利用光流优化非刚性运动和大位移区域的运动遮挡信息; 最后, 构造基于运动边缘的网络模型训练损失函数, 获取准确的运动遮挡边界. 分别采用MPI-Sintel和KITTI测试数据集对本文方法与现有的代表性遮挡检测模型进行实验对比与分析. 实验结果表明, 本文方法能够有效提高运动遮挡检测的准确性, 尤其在非刚性运动和大位移等困难场景下具有更好的遮挡检测鲁棒性.
  • 图  1  上下文网络结构示意图

    Fig.  1  Structure diagram of context network

    图  2  常见的感受野扩张网络结构示意图

    (a) GoogLeNet (b) DeepLabv3+

    Fig.  2  The structure diagram of common receptive field expansion

    (a) GoogLeNet (b) DeepLabv3+

    图  3  多尺度上下文信息聚合网络结构示意图

    Fig.  3  Structure diagram of multi-scale context information aggregation network

    图  4  遮挡检测网络结构示意图

    Fig.  4  Structure diagram of occlusion detection network

    图  5  基于光流和多尺度上下文的遮挡检测模型结构

    Fig.  5  The structure of the occlusion detection model based on optical flow and multi-scale context information

    图  6  本文方法和IRR-PWC方法遮挡检测结果对比

    Fig.  6  Comparison of occlusion detection results between our method and IRR-PWC method

    图  7  MPI-Sinte数据集非刚性运动与大位移序列遮挡检测对比图. 从左往右:alley_2、ambush_2、market_6以及temple_2序列.

    Fig.  7  Comparison results of occlusion detection between non-rigid motion and large-displacement sequences on MPI-Sinte dataset. From left to right are alley_2、ambush_2、market_6 and temple_2 sequence.

    图  8  各个遮挡检测方法在KITTI数据集上的遮挡检测结果对比图. 从左往右分别是输入图像和Unflow、Back2Future、MaskFlownet、IRR-PWC以及本文方法的运动遮挡检测图.

    Fig.  8  Comparison of occlusion detection results of each occlusion detection method on KITTI dataset. From left to right are the input image, Unflow, back2future, MaskFlownet, IRR-PWC and our method.

    图  9  利用光流真实值生成的运动遮挡掩膜部分示例图 (N=3)

    Fig.  9  Examples of motion boundary mask generated by ground truth of flow field (N=3)

    图  10  各消融模型可视化结果对比图

    Fig.  10  Comparison of visualization results of each ablation model

    表  1  MPI-Sintel数据集平均F1分数对比结果

    Table  1  Comparison of Average F1 score on MPI-Sintel dataset

    对比方法多帧类型cleanfinal
    Unflow[24]传统方法0.280.27
    Back2Future[25]无监督学习0.490.44
    MaskFlownet[27]无监督学习0.370.36
    IRR-PWC[26]监督学习0.710.67
    本文方法监督学习0.750.72
    下载: 导出CSV

    表  2  MPI-Sintel数据集平均漏检与误检率对比结果

    Table  2  Comparison of average omission rate and false rate on MPI-Sintel dataset

    对比方法Clean Final
    ORFRORFR
    Unflow[24]1.96%18.32% 1.94%20.51%
    Back2Future[25]5.03%2.75%5.08%2.96%
    MaskFlownet[27]5.77%1.37%5.76%1.72%
    IRR-PWC[26]1.98%0.96%2.84%1.29%
    本文方法1.85%0.83%2.31%1.08%
    下载: 导出CSV

    表  3  非刚性运动与大位移图像序列运动遮挡检测平均F1分数对比结果

    Table  3  Comparison of average F1 scores of occlusion detection between non-rigid motion and large-displacement image sequences

    对比方法
    clean final
    alley_2ambush_2market_6temple_2alley_2ambush_2market_6temple_2
    Unflow[24]0.41490.43130.43300.3243 0.40570.39200.44990.3120
    Back2Future[25]0.68160.58880.62900.27120.67560.51990.62390.2683
    MaskFlownet[27]0.50570.54030.46600.38380.50390.40850.47350.3508
    IRR-PWC[26]0.87090.91720.81550.74040.87700.78090.80230.6905
    本文方法0.88110.92160.83040.77470.87640.79590.81060.7103
    下载: 导出CSV

    表  4  不同方法的时间消耗对比(加粗为评价最优值)

    Table  4  Comparison of time consumption of different methods (bold is the best evaluation value)

    对比方法多帧输入类型运行时间
    Unflow[24]传统方法0.13 s
    Back2Future[25]无监督学习0.13 s
    MaskFlownet[27]无监督学习0.10 s
    IRR-PWC[26]监督学习0.18 s
    本文方法监督学习0.19 s
    下载: 导出CSV

    表  5  MPI-Sintel全序列平均F1分数对比(加粗为评价最优值)

    Table  5  Comparison of average F1 scores of whole image sequence on MPI-Sintel (bold is the best evaluation value)

    模型类型
    MPI-Sintel training dataset
    cleanfinal运行时间训练时间
    全模型0.750.720.19 s13days
    去除多尺度上下文网络0.720.680.18 s12days
    去除边缘损失函数0.740.710.19 s13days
    下载: 导出CSV

    表  6  MPI-Sintel全序列在不同运动边界区域内的平均F1分数对比(加粗为评价最优值)

    Table  6  Comparison of average F1 scores of whole image sequence in different motion boundary regions on MPI-Sintel (bold is the best evaluation value)

    模型类型MPI-Sintel training dataset
    clean final
    N=1N=3N=5N=10N=1N=3N=5N=10
    全模型0.630.670.690.71 0.590.620.640.67
    去除多尺度上下文网络0.590.620.650.670.550.590.610.63
    去除边缘损失函数0.600.640.670.690.560.600.620.64
    下载: 导出CSV
  • [1] 张世辉, 何琦, 董利健, 杜雪哲. 基于遮挡区域建模和目标运动估计的动态遮挡规避方法. 自动化学报, 2019, 45(4): 771-786.

    Zhang Shi-Hui, He Qi, Dong Li-Jian, Du Xue-Zhe. Dynamic occlusion avoidance approach by means of occlusion region model and object motion estimation. Acta Automatica Sinica, 2019, 45(4): 771-786.
    [2] Yu C, Bo Y, Bo W, Yan W D, Robby T. Occlusion-aware networks for 3D human pose estimation in video. In: Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea: IEEE, 2019.723−732.
    [3] 张聪炫, 陈震, 熊帆, 黎明, 葛利跃, 陈昊. 非刚性稠密匹配大位移运动光流估计. 电子学报, 2019, 47(6): 1316-1323. doi: 10.3969/j.issn.0372-2112.2019.06.019

    Zhang Cong-Xuan, Chen Zhen, Xiong Fan, Li Ming, Ge Li-Yue, Chen Hao. Large displacement motion optical flow estimation with non-rigid dense patch matching. Acta Electronica Sinica. 2019, 47(6): 1316-1323. doi: 10.3969/j.issn.0372-2112.2019.06.019
    [4] 姚乃明, 郭清沛, 乔逢春, 陈辉, 王宏安. 基于生成式对抗网络的鲁棒人脸表情识别. 自动化学报, 2018, 44(5): 865-877.

    Yao Nai-Ming, Guo Qing-Pei, Qiao Feng-Chun, Chen Hui, Wang Hong-An. Robust facial expression recognition with generative adversarial networks. Acta Automatica Sinica, 2019, 44(5): 865-877.
    [5] Pan J Y, Bo H. Robust occlusion handling in object tracking. In: Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Minneapolis, MN, USA: IEEE, 2007.1−8.
    [6] 刘鑫, 许华荣, 胡占义. 基于 GPU 和 Kinect 的快速物体重建. 自动化学报, 2012, 38(8): 1288-1297.

    Liu Xin, Xu Hua-Rong, Hu Zhan-Yi. GPU based fast 3D-object modeling with Kinect. 自动化学报, 2012, 38(8): 1288-1297
    [7] 张聪炫, 陈震, 黎明. 单目图像序列光流三维重建技术研究综述. 电子学报, 2016, 44(12): 3044-3052. doi: 10.3969/j.issn.0372-2112.2016.12.033

    Zhang Cong-Xuan, Chen Zhen, Li Ming, Review of the 3D reconstruction technology based on optical flow of monocular image sequence. Acta Electronica Sinica. 2016, 44 (12): 3044-3052. doi: 10.3969/j.issn.0372-2112.2016.12.033
    [8] Bailer C, Taetz B, Stricker D. Flow Fields: Dense correspondence fields for highly accurate large displacement optical flow estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(8): 1879-1892. doi: 10.1109/TPAMI.2018.2859970
    [9] Wolf L, Gadot D. PatchBatch: A batch augmented loss for optical flow. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA: IEEE, 2016.4236−4245.
    [10] Li Y S, Song R, Hu Y L. Efficient coarse-to-fine patch match for large displacement optical flow. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA: IEEE, 2016.5704−5712.
    [11] Menze M, Heipke C, Geiger A. Discrete optimization for optical flow. In: Proceedings of the 2015 German Conference on Pattern Recognition (GCPR), Aachen, Germany: Springer Press, 2015.16−28.
    [12] Chen Q F, Koltun V. Full flow: Optical flow estimation by global optimization over regular grids. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA: IEEE, 2016.4706−4714.
    [13] Guney F, Geiger A. Deep discrete flow. In: Proceedings of the 2016 Asian Conference on Computer Vision (ACCV), Taipei, Taiwan, China: Springer Press, 2016.207−224.
    [14] Hur J, Roth S. Joint optical flow and temporally consistent semantic segmentation. In: Proceedings of the 2016 European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands: Springer, 2016.163−177.
    [15] Ince S, Konrad J. Occlusion-aware optical flow estimation. IEEE Transactions on Image Process, 2008, 17(8): 1443-1451. doi: 10.1109/TIP.2008.925381
    [16] Sun D Q, Liu C, Pfister H. Local layering for joint motion estimation and occlusion detection. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, Ohio, USA: IEEE, 2014.1098−1105.
    [17] Sun D Q, Sudderth E B, Black M J. Layered image motion with explicit occlusions, temporal consistency, and depth ordering. In: Proceedings of the 24st International Conference on Neural Information Processing Systems (NIPS), Vancouver, Canada: Curran Associates Inc., 2010.2226−2234.
    [18] Vogel C, Roth S, Schindler K. View-consistent 3D scene flow estimation over multiple frames. In: Proceedings of the 2014 European Conference on Computer Vision (ECCV), Zurich, Switzerland: Springer Press, 2014.263−278.
    [19] Zanfir A, Sminchisescu C. Large displacement 3D scene flow with occlusion reasoning. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile: IEEE, 2015.4417−4425.
    [20] Zhang C X, Chen Z, Wang M R, Li M, Jiang S F. Robust non-local TV-L1 optical flow estimation with occlusion detection. IEEE Transactions on Image Process, 2017, 26(8): 4055-4067. doi: 10.1109/TIP.2017.2712279
    [21] 张聪炫, 陈震, 汪明润, 黎明, 江少锋. 基于光流与Delaunay三角网格的图像序列运动遮挡检测. 电子学报, 2018, 46(2): 479-485. doi: 10.3969/j.issn.0372-2112.2018.02.030

    Zhang Cong-Xuan, Chen Zhen, Wang Ming-Run, Li Ming, Jiang Shao-Feng. Motion occlusion detecting from image sequence based on optical flow and Delaunay triangulation. Acta Electronica Sinica, 2018, 46(2): 479-485. doi: 10.3969/j.issn.0372-2112.2018.02.030
    [22] Kennedy R, Taylor C J. Optical flow with geometric occlusion estimation and fusion of multiple frames. In: International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition (EMMCVPR), Hong Kong, China: IEEE, 2015.364−377.
    [23] Yu J J, Harley A W, Derpanis K G. Back to basics: Unsupervised learning of optical flow via brightness constancy and motion smoothness. In: Proceedings of the 2016 European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands: Springer, 2016.3−10.
    [24] Meister S, Hur J, Roth S. UnFlow: Unsupervised learning of optical flow with a bidirectional census loss. In: Proceedings of the 31rd AAAI Conference on Artificial Intelligence (AAAI), San Francisco, California, USA: AAAI, 2017.7251−7259.
    [25] Janai J, Güney F, Ranjan A, Black M, Geiger A. Unsupervised learning of multi-frame optical flow with occlusions. In: Proceedings of the 2018 European Conference on Computer Vision (ECCV), Munich, Germany: Springer, 2018.713−731.
    [26] Hur J, Roth S. Iterative residual refinement for joint optical flow and occlusion estimation. In: Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA: IEEE, 2019.5747−5756.
    [27] Zhao S Y, Sheng Y L, Dong Y, Chang E I C, Xu Y. MaskFlownet: Asymmetric feature matching with learnable occlusion mask. In: Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual: IEEE, 2020.6277−6286.
    [28] Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In: Proceedings of the 2015 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Boston, Massachusetts, USA: IEEE, 2015.1−9.
    [29] Chen L C, Zhu Y K, Papandreou G, Schroff F, Adam H. Encoder-Decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the 2018 European Conference on Computer Vision (ECCV), Munich, Germany: Springer, 2018.833−851.
    [30] Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions. [Online], available: https://arxiv.org/abs/1511.07122, Apr 30, 2016.
    [31] Yang M K, Yu K, Zhang C, Li Z W, Yang K Y. DenseASPP for semantic segmentation in street scenes. In: Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, Utah, USA: IEEE, 2018.3684−3692.
    [32] Mehta S, Rastegari M, Caspi A, Shapiro L, Hajishirzi H. ESPNet: Efficient spatial pyramid of dilated convolutions for semantic segmentation. In: Proceedings of the 2018 European Conference on Computer Vision (ECCV), Munich, Germany: Springer, 2018.561−580.
    [33] Butler D J, Wulff J, Stanley G B, Black M J. A naturalistic open source movie for optical flow evaluation. In: Proceedings of the 2012 European Conference on Computer Vision (ECCV), Florence, Italy: Springer, 2012.611−625.
    [34] Menze M, Geiger A. Object scene flow for autonomous vehicles. In: Proceedings of the 2015 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Boston, Massachusetts, USA: IEEE, 2015.2061−3070.
  • 加载中
计量
  • 文章访问数:  648
  • HTML全文浏览量:  320
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-04-15
  • 录用日期:  2021-07-02
  • 网络出版日期:  2021-08-31

目录

    /

    返回文章
    返回