2.624

2020影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于采样汇集网络的场景深度估计

谢昭 马海龙 吴克伟 高扬 孙永宣

谢昭, 马海龙, 吴克伟, 高扬, 孙永宣. 基于采样汇集网络的场景深度估计. 自动化学报, 2020, 46(3): 600−612 doi: 10.16383/j.aas.c180430
引用本文: 谢昭, 马海龙, 吴克伟, 高扬, 孙永宣. 基于采样汇集网络的场景深度估计. 自动化学报, 2020, 46(3): 600−612 doi: 10.16383/j.aas.c180430
Xie Zhao, Ma Hai-Long, Wu Ke-Wei, Gao Yang, Sun Yong-Xuan. Sampling aggregate network for scene depth estimation. Acta Automatica Sinica, 2020, 46(3): 600−612 doi: 10.16383/j.aas.c180430
Citation: Xie Zhao, Ma Hai-Long, Wu Ke-Wei, Gao Yang, Sun Yong-Xuan. Sampling aggregate network for scene depth estimation. Acta Automatica Sinica, 2020, 46(3): 600−612 doi: 10.16383/j.aas.c180430

基于采样汇集网络的场景深度估计

doi: 10.16383/j.aas.c180430
基金项目: 国家自然科学基金(61503111,61273237)资助
详细信息
    作者简介:

    谢昭:合肥工业大学计算机与信息学院副研究员. 2007年于合肥工业大学获得博士学位. 主要研究方向为计算机视觉, 图像处理, 模式识别. E-mail: xiezhao@hfut.edu.cn

    马海龙:合肥工业大学硕士研究生. 主要研究方向为计算机视觉, 图像处理, 模式识别. E-mail: mhl_hfut@163.com

    吴克伟:合肥工业大学计算机与信息学院副研究员. 2013年于合肥工业大学获得博士学位. 主要研究方向为计算机视觉, 图像处理, 模式识别. 本文通信作者. E-mail: wukewei@hfut.edu.cn

    高扬合肥工业大学硕士研究生. 主要研究方向为计算机视觉, 图像处理, 模式识别. E-mail: alto1996@163.com

    孙永宣:合肥工业大学计算机与信息学院讲师. 2013年于合肥工业大学获得博士学位. 主要研究方向为计算机视觉, 图像处理, 模式识别. E-mail: syx@hfut.edu.cn

Sampling Aggregate Network for Scene Depth Estimation

Funds: Supported by National Natural Science Foundation of China (61503111, 61273237)
  • 摘要: 针对现有场景深度估计方法中, 由于下采样操作引起的复杂物体边界定位不准确, 而造成物体边界处的场景深度估计模糊的问题, 受密集网络中特征汇集过程的启发, 本文提出一种针对上/下采样过程的汇集网络模型. 在下采样过程中, 使用尺度特征汇集策略, 兼顾不同尺寸物体的估计; 在上采样过程中, 使用上采样反卷积恢复图像分辨率; 同时, 引入采样跨层汇集策略, 提供下采样过程中保存的物体边界的有效定位信息. 本文提出的采样汇集网络 (Sampling aggregate network, SAN) 中使用的尺度特征汇集和采样跨层汇集, 都可以有效缩短特征图到输出损失之间的路径, 从而有利于避免模型的参数优化时陷入局部最优解. 在公认场景深度估计NYU-Depth-v2数据集上的实验说明, 本文方法能够有效改善复杂物体边界等干扰情况下的场景深度估计效果, 并在深度估计误差和准确性上, 优于当前场景深度估计的主流方法.
  • 图  1  基于采样汇集网络的场景深度估计

    Fig.  1  Sampling aggregate network for scene depth estimation

    图  2  尺度特征汇集的下采样网络

    Fig.  2  Downsampling network with scale feature aggregation

    图  3  上采样反卷积过程

    Fig.  3  Upsampling network with deconvolution process

    图  4  采样汇集跨层的上采样网络

    Fig.  4  Upsampling network with sample skip aggregation

    图  5  采样汇集网络的消融模型对比实例图. (a)原始图像; (b) 真实场景深度; (c) SFAN-129结果; (d) NSN-129结果; (e) SAN-129结果

    Fig.  5  Contrasting examples of ablation models for sampling aggregate network. (a) RGB image; (b) GT depth; (c) result of SFAN-129; (d) result of NSN-129; (e) result of SAN-129

    图  6  采样汇集网络中下采样次数定性结果,图6的原图和真实场景深度与图5中对应. (a) SAN-31结果; (b) SAN-47结果; (c) SAN-69结果; (d) SAN-95结果; (e) SAN-129结果

    Fig.  6  Qualitative results of downsampling times in sampling aggregate network. Fig. 6 and Fig. 5 have the same RGB images and GT depth images. (a) result of SAN-31; (b) result of SAN-47; (c) result of SAN-69; (d) result of SAN-95; (e) result of SAN-129

    图  7  场景估计中的困难实例,第1行小物体干扰,第2行复杂边界干扰,第3行光照干扰,第4行深度范围大干扰,第5行深度范围小的干扰. (a) 原始图像; (b)真实场景深度; (c) SAN-95结果; (d) SFAN-129结果; (e) SAN-129结果

    Fig.  7  Challenge examples in depth estimation, including small object interference (Line 1), complex boundary interference (line 2), illumination interference (line 3), large depth range interference (line 4), small depth range interference (line 5). (a) RGB image; (b) GT depth; (c) result of SAN-95; (d) result of SFAN-129; (e) result of SAN-129

    表  1  采样汇集网络的消融分析

    Table  1  Ablation analysis of sampling aggregate network

    消融模型 Error Accuracy (%)
    REL ${\rm{log}}_{10}$ RMS $\delta_1$ $\delta_2$ $\delta_3$
    SFAN-129 0.165 0.072 0.586 75.70 93.70 98.10
    NSN-129 0.163 0.070 0.583 76.00 94.10 98.20
    SAN-129 0.158 0.067 0.567 77.60 95.20 98.80
    下载: 导出CSV

    表  2  采样汇集网络中下采样次数定量分析

    Table  2  Quantitative analysis of downsampling times in sampling aggregate network

    采样汇集网络模型 Error Accuracy (%)
    REL ${\rm{log}}_{10}$ RMS $\delta_1$ $\delta_2$ $\delta_3$
    SAN-31 0.311 0.129 1.012 46.20 77.10 91.80
    SAN-47 0.250 0.107 0.830 55.70 84.90 95.50
    SAN-69 0.194 0.083 0.672 68.00 90.70 97.70
    SAN-95 0.169 0.073 0.608 73.60 93.10 98.30
    SAN-129 0.158 0.067 0.567 77.60 95.20 98.80
    下载: 导出CSV

    表  3  采样汇集网络中输入图像分辨率定量分析

    Table  3  Quantitative analysis of image resolution in sampling aggregate network

    采样汇集网络模型 图像分辨率 Error Accuracy (%)
    REL ${\rm{log}}_{10}$ RMS $\delta_1$ $\delta_2$ $\delta_3$
    SAN-129 304 × 228 0.158 0.067 0.567 77.60 95.20 98.80
    SAN-129 152 × 114 0.149 0.064 0.562 79.95 95.23 98.80
    下载: 导出CSV

    表  4  本文采样汇集网络与现有方法定量对比

    Table  4  Quantitative analysis of our sampling aggregate network with state-of-the-art methods

    对比方法 Error Accuracy (%)
    REL ${\rm{log}}_{10}$ RMS $\delta_1$ $\delta_2$ $\delta_3$
    Su等[12] 0.302 0.128 0.937
    Laina等[23] 0.215 0.083 0.790 62.90 88.90 97.10
    Liu等[9] 0.213 0.087 0.759 65.00 90.60 97.60
    Wang等[8] 0.210 0.094 0.745 60.50 89.00 97.00
    Roy等[6] 0.187 0.078 0.744
    Cao等[10] 0.187 0.071 0.681 71.20 92.30 98.00
    Fu等[7] 0.160 0.586 76.50 95.00 99.10
    Sharma-RMSE[26] 0.159 0.064 0.549 79.10 94.60 98.40
    SAN-129 @ 304 × 228 0.158 0.067 0.567 77.60 95.20 98.80
    Sharma-berHu[26] 0.153 0.062 0.549 79.90 95.00 98.50
    SAN-129 @ 152 × 114 0.149 0.064 0.562 79.95 95.23 98.80
    下载: 导出CSV
  • [1] Jégou S, Drozdzal M, Vazquez D, Romero A, Bengio Y. The one hundred layers tiramisu: fully convolutional densenets for semantic segmentation. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA: IEEE, 2017. 1175−1183
    [2] 2 Cheng Y H, Zhao X, Huang K Q, Tan T N. Semi-supervised learning and feature evaluation for rgb-d object recognition. Computer Vision and Image Understanding, 2015, 139: 149−160 doi: 10.1016/j.cviu.2015.05.007
    [3] Borghi G, Venturelli M, Vezzani R, Cucchiara R. POSEidon: face-from-depth for driver pose estimation. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA: IEEE, 2017. 5494−5503
    [4] 4 Saxena A, Sun M, Ng A Y. Make3d: learning 3d scene structure from a single still image. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(5): 824−840 doi: 10.1109/TPAMI.2008.132
    [5] Tateno K, Tombari F, Laina I, Navab N. CNN-SLAM: real-time dense monocular SLAM with learned depth prediction. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA: IEEE, 2017. 6565−6574
    [6] Roy A, Todorovic S. Monocular depth estimation using neural regression forest. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA: IEEE, 2016. 5506−5514
    [7] Fu H, Gong M M, Wang C H, Tao D. A compromise principle in deep monocular depth estimation. arXiv preprint arXiv: 1708.08267, 2017. 1−11
    [8] Wang P, Shen X H, Lin Z, Cohen S, Price B, Yuille A. Towards unified depth and semantic prediction from a single image. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA: IEEE, 2015. 2800−2809
    [9] 9 Liu F Y, Shen C H, Lin G S, Reid I. Learning depth from single monocular images using deep convolutional neural fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(10): 2024−2039 doi: 10.1109/TPAMI.2015.2505283
    [10] 10 Cao Y, Wu Z, Shen C H. Estimating depth from monocular images as classification using deep fully convolutional residual networks. IEEE Transactions on Circuits and Systems for Video Technology, 2017, 28(11): 1−11
    [11] Huang G, Liu Z, Maaten L V D. Weinberger K Q. Densely connected convolutional networks. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA: IEEE, 2017. 2261−2269
    [12] 12 Su C C, Cormack L K, Bovik A C. Bayesian depth estimation from monocular natural images. Journal of Vision, 2017, 17(5): 22−22 doi: 10.1167/17.5.22
    [13] Liu B Y, Gould S, Koller D. Single image depth estimation from predicted semantic labels. In: Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA: IEEE, 2010. 1253−1260
    [14] 14 Karsch K, Liu C, Kang S B. Depth transfer: depth extraction from videos using nonparametric sampling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(11): 2144−2158 doi: 10.1109/TPAMI.2014.2316835
    [15] Batra D and Saxena A. Learning the right model: efficient max-margin learning in laplacian CRFS. In: Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA: IEEE, 2012. 2136−2143
    [16] Saxena A, Chung S H, Ng A Y. Learning depth from single monocular images. In: Proceedings of the 2005 Advances in Neural Information Processing Systems, Vancouver, British Columbia, Canada: MIT Press, 2005. 1161−1168
    [17] Eigen D, Puhrsch C, Fergus R. Depth map prediction from a single image using a multi-scale deep network. In: Proceedings of the 2014 Advances in Neural Information Processing Systems, Montreal, Quebec, Canada: MIT Press, 2014. 2366−2374
    [18] Liu M M, Salzmann M, He X M. Discrete-continuous depth estimation from a single image. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA: IEEE, 2014. 716−723
    [19] Liu F Y, Shen C H, Lin G S. Deep convolutional neural fields for depth estimation from a single image. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA: IEEE, 2015. 5162−5170
    [20] Xu D, Ricci E, Ouyang W L, Wang X G, Sebe N. Multi-scale continuous crfs as sequential deep networks for monocular depth estimation. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA: IEEE, 2017. 161−169
    [21] Zhuo W, Salzmann M, He X M, Liu M M. Indoor scene structure analysis for single image depth estimation. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA: IEEE, 2015. 614−622
    [22] 22 Yan H, Zhang S L, Zhang Y, Zhang L. Monocular depth estimation with guidance of surface normal map. Neurocomputing, 2017, 280: 86−100
    [23] Laina I, Rupprecht C, Belagiannis V, Tombari F, Navab N. Deeper depth prediction with fully convolutional residual networks. In: Proceedings of the 4th International Conference on 3D Vision, Stanford, CA, USA: IEEE, 2016. 239−248
    [24] Godard C, Aodha O, Brostow G J. Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA: IEEE, 2017. 6602−6611.
    [25] 25 Grigorev A, Jiang F, Rho S, Sori W J, Liu S H, Sai S. Depth estimation from single monocular images using deep hybrid network. Multimedia Tools and Applications, 2017, 76(18): 18585−18604 doi: 10.1007/s11042-016-4200-x
    [26] Sharma S, Padhy R P, Choudhury S K, Goswami N, Sa P K. DenseNet with pre-activated deconvolution for estimating depth map from single image. In: Proceeding of the 2017 British Machine Vision Conference, London, UK: BMVA, 2017. 1−12
    [27] Zhu Y, Newsam S. Densenet for dense flow. In: Proceedings of the 2017 IEEE International Conference on Image Processing, Beijing, China: IEEE, 2017. 790−794
    [28] Collobert R, Kavukcuoglu K, Farabet C. Torch7: a matlab-like environment for machine learning. In: Proceeding of the 2011 Advances in Neural Information Processing Systems, Granada, Spain: Springer, 2011. 1−6
    [29] He K M, Zhang X Y, Ren S Q, Sun J. Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Chile: IEEE, 2015. 1026−1034
    [30] Silberman N, Hoiem D, Kohli P, Fergus R. Indoor segmentation and support inference from rgbd images. In: Proceedings of the 2012 European Conference on Computer Vision, Florence, Italy: IEEE, 2012. 746−760
  • 加载中
图(7) / 表(4)
计量
  • 文章访问数:  1666
  • HTML全文浏览量:  704
  • PDF下载量:  147
  • 被引次数: 0
出版历程
  • 收稿日期:  2018-06-15
  • 录用日期:  2019-02-13
  • 网络出版日期:  2020-03-30
  • 刊出日期:  2020-03-30

目录

    /

    返回文章
    返回