-
摘要: 基于深度学习的三维点云数据分析技术得到了越来越广泛的关注, 然而点云数据的不规则性使得高效提取点云中的局部结构信息仍然是一大研究难点. 本文提出了一种能够作用于局部空间邻域的卦限卷积神经网络(Octant Convolutional Neural Network, Octant-CNN), 它由卦限卷积模块和下采样模块组成. 针对输入点云, 卦限卷积模块在每个点的近邻空间中定位八个卦限内的最近邻点, 接着通过多层卷积操作将八卦限中的几何特征抽象成语义特征, 并将低层几何特征与高层语义特征进行有效融合, 从而实现了利用卷积操作高效提取三维邻域内的局部结构信息; 下采样模块对原始点集进行分组及特征聚合, 从而提高特征的感受野范围, 并且降低网络的计算复杂度. Octant-CNN通过对卦限卷积模块和下采样模块的分层组合, 实现了对三维点云进行由底层到抽象、从局部到全局的特征表示. 实验结果表明, Octant-CNN在对象分类、部件分割、语义分割和目标检测四个场景中均取得了较好的性能.Abstract: 3D point cloud data analysis based on deep learning has attracted increasing attention recently. However, it is still a great challenge to extract local structure information from point cloud efficiently due to its irregularity. In this paper, we propose a new network named Octant Convolutional Neural Network (Octant-CNN) which can handle local spatial neighborhoods. It consists of octant convolution module and sub-sampling module. For the input point cloud, the octant convolution module locates nearest points in eight octants of each point, and then transforms the geometric features into semantic features through a multi-layer convolution operation. The low-level geometric features are effectively fused with the high-level semantic features so that the local structure information can be efficiently extracted. The sub-sampling module groups the original point set and aggregates the features to expand the receptive field of features, and also reduce the computation overhead the network. By stacking the octant convolution module and sub-sampling module, Octant-CNN obtains the feature representation of 3D point cloud from low-level to abstract, and from local to global. Extensive experiments demonstrate that Octant-CNN achieves great performance in four 3D scene understanding tasks including object classification, part segmentation, semantic segmentation, and object detection.
-
Key words:
- Deep learning /
- point cloud /
- Octant-CNN /
- local geometric feature
-
表 1 ModelNet40分类结果
Table 1 Classification results on ModelNet40
表 2 ShapeNet部件分割结果
Table 2 Part segmentation results on ShapeNet
Method mIoU aero bag cap car chair earphone guitar knife lamp laptop motor mug pistol rocket skateboard table PointNet[12] 83.7 83.4 78.7 82.5 74.9 89.6 73.0 91.5 85.9 80.8 95.3 65.2 93.0 81.2 57.9 72.8 80.6 PointNet++[13] 85.1 82.4 79.0 87.7 77.3 90.8 71.8 91.0 85.9 83.7 95.3 71.6 94.1 81.3 58.7 76.4 82.6 PointSIFT[14] 79.0 75.1 78.4 81.8 74.5 85.2 64.3 89.6 81.9 77.5 95.1 64.0 93.5 77.1 54.2 70.6 74.3 RGCNN[19] 84.3 80.2 82.8 92.6 75.3 89.2 73.7 91.3 88.4 83.3 96.0 63.9 95.7 60.9 44.6 72.9 80.4 DGCNN[20] 85.1 84.2 83.7 84.4 77.1 90.9 78.5 91.5 87.3 82.9 96.0 67.8 93.3 82.6 59.7 75.5 82.0 SCN[23] 84.6 83.8 80.8 83.5 79.3 90.5 69.8 91.7 86.5 82.9 96.0 69.2 93.8 82.5 62.9 74.4 80.8 Kd-Net[26] 82.3 80.1 74.6 74.3 70.3 88.6 73.5 90.2 87.2 81.0 94.9 57.4 86.7 78.1 51.8 69.9 80.3 SO-Net[27] 84.6 81.9 83.5 84.8 78.1 90.8 72.2 90.1 83.6 82.3 95.2 69.3 94.2 80.0 51.6 72.1 82.6 RS-Net[29] 84.9 82.7 86.4 84.1 78.2 90.4 69.3 91.4 87.0 83.5 95.4 66.0 92.6 81.8 56.1 75.8 82.2 Octant-CNN 85.3 83.9 83.6 88.3 79.2 91.1 70.8 91.8 87.5 82.9 95.7 72.2 94.5 83.6 60.0 75.5 81.9 表 3 S3DIS语义分割结果
Table 3 Semantic segmentation results on S3DIS
Method mIoU OA ceiling floor wall beam column windows door chair table bookcase sofa board clutter PointNet[12] 47.7 78.6 88.0 88.7 69.3 42.4 23.1 47.5 51.6 42.0 54.1 38.2 9.6 29.4 35.2 PointNet++[13] 57.3 83.8 91.5 92.8 74.6 41.3 28.1 54.5 59.6 64.6 58.9 27.1 52.0 52.3 48.0 PointSIFT[14] 55.5 83.5 91.1 91.3 75.5 42.0 24.0 51.4 56.6 60.2 55.8 17.0 50.2 57.1 49.9 RS-Net[29] 56.5 - 92.5 92.8 78.6 32.8 34.4 51.6 68.1 59.7 60.1 16.4 50.2 44.9 52.0 Octant-CNN 58.3 84.6 92.1 94.5 76.3 48.9 30.8 56.9 62.9 65.8 55.5 28.0 48.1 50.3 48.4 表 4 3D目标检测对比结果
Table 4 Performance compression in 3D object detection
Method Cars Pedestrians Cyclists Easy Moderate Hard Easy Moderate Hard Easy Moderate Hard F-PointNet v1[32] 83.75 69.37 62.83 65.39 55.32 48.62 70.17 52.87 48.27 F-PointNet v2[32] 83.93 71.23 63.72 64.23 56.95 50.15 74.04 54.92 50.53 Frustum PointSIFT[14] 71.56 66.17 58.97 63.13 55.08 49.05 70.36 52.56 48.53 Frustum Geo-CNN[33] 85.09 71.02 63.38 69.64 60.50 52.88 75.64 56.25 52.54 Frustum Octant-CNN 85.10 72.31 64.46 67.90 59.73 52.44 76.56 57.50 54.26 表 5 结构设计分析
Table 5 Analysis of the structure design
模型 多层融合 残差 投票 oAcc(%) A 90.7 B $\checkmark$ 91.2 C $\checkmark$ $\checkmark$ 91.5 D $\checkmark$ $\checkmark$ $\checkmark$ 91.9 表 6 2D卷积和MLP的对比
Table 6 Comparisons of 2D CNN and MLP
模型 运算 oAcc(%) A MLP 90.8 B 2D CNN 91.9 表 7 不同邻点的比较
Table 7 The results of different neighbor points
模型 邻点 准确率 A K近邻 90.2 B 八卦限搜索 91.9 表 8 不同搜索半径的比较
Table 8 Comparison of different search radius
模型 搜索半径 oAcc(%) A (0.25, 0.5, 1.0) 88.0 B (0.4, 0.8, 1.0) 89.2 C (0.5, 1.0, 1.0) 89.9 D None 91.9 表 9 不同输入通道的结果比较
Table 9 The results of different input channels
模型 输入通道 oAcc(%) A ( $f_{ij}$ )90.1 B ( $x_i-x_{ij}, f_{ij}$ )90.3 C ( $x_i, f_{ij}$ )90.8 D ( $x_i, x_i-x_{ij}, f_{ij}$ )91.9 表 10 点云旋转鲁棒性比较
Table 10 Comparison of robustness to point cloud rotation
角度 $0^\circ$ $30^\circ$ $60^\circ$ $90^\circ$ $180^\circ$ 均值 方差 PointSIFT[14] 88.2 89.2 88.9 88.7 88.5 88.7 0.124 PointSIFT+T-Net 89.1 89.4 89.4 88.6 88.6 89.04 0.114 Octant-CNN 91.5 91.7 91.9 91.5 91.8 91.68 0.025 -
[1] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems. Nevada, USA, 2012. 1097−1105 [2] He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016. 770−778 [3] Girshick R. Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015. 1440-1448 [4] Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016. 779-788 [5] Zhu Z, Xu M, Bai S, Huang T, Bai X. Asymmetric non-local neural networks for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision. Seoul, South Korea: IEEE, 2019. 593-602 [6] Li Y, Qi H, Dai J, Ji X, Wei Y. Fully convolutional instance-aware semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Hawaii, USA: IEEE, 2017. 2359-2367 [7] 彭秀平, 仝其胜, 林洪彬, 冯超, 郑武. 一种面向散乱点云语义分割的深度残差-特征金字塔网络框架. 自动化学报, 2019, 45(x): 1−10Peng Xiu-Ping, Tong Qi-Sheng, Lin Hong-Bin, Feng Chao, Zheng Wu. A deep residual-feature pyramid network for scattered point cloud semantic segmentation. Acta Automatica Sinica, 2019, 45(x): 1−10 [8] Maturana D, Scherer S. Voxnet: a 3d convolutional neural network for real-time object recognition. In: Proceedings of 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems. Hamburg, Germany: IEEE, 2015. 922-928 [9] Wu Z, Song S, Khosla A, et al. 3d shapenets: a deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE, 2015. 1912-1920 [10] Su H, Maji S, Kalogerakis E, Learned-Miller E. Multi-view convolutional neural networks for 3d shape recognition. In: Proceedings of the IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015. 945-953 [11] Yang Z, Wang L. Learning relationships for multi-view 3d object recognition. In: Proceedings of the IEEE International Conference on Computer Vision. Seoul, South Korea: IEEE, 2019. 7505-7514 [12] Qi C R, Su H, Mo K, Guibas L J. Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Hawaii, USA: IEEE, 2017. 652-660 [13] Qi C R, Yi L, Su H, Guibas L J. Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems. Long Beach, USA, 2017. 5099-5108 [14] Jiang M, Wu Y, Zhao T, Zhao Z, Lu C. Pointsift: a sift-like network module for 3d point cloud semantic segmentation[Online], available: https://arxiv.org/abs/1807.00652, July 22, 2020 [15] Rao Y, Lu J, Zhou J. Spherical fractal convolutional neural networks for point cloud recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA, 2019. 452-460 [16] Liu Y, Fan B, Xiang S, Pan C. Relation-shape convolutional neural network for point cloud analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA, 2019. 8895-8904 [17] Boulch A. Convpoint: continuous convolutions for point cloud processing. Computers & Graphics, 2020, 88: 24−34 [18] Simonovsky M, Komodakis N. Dynamic edge-conditioned filters in convolutional neural networks on graphs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Hawaii, USA: IEEE, 2017. 3693-3702 [19] Te G, Hu W, Zheng A, Guo Z. Rgcnn: regularized graph cnn for point cloud segmentation. In: Proceedings of the 26th ACM International Conference on Multimedia. Seoul, South Korea: ACM, 2018. 746-754 [20] Wang Y, Sun Y, Liu Z, Sarma S E, Bronstein M M, Solomon J M. Dynamic graph cnn for learning on point clouds. ACM Transactions on Graphics (TOG), 2019, 38(5): 1−12 [21] Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 2014, 15(1): 1929−1958 [22] Yang J, Zhang Q, Ni B, et al. Modeling point clouds with self-attention and gumbel subset sampling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA, 2019. 3323-3332 [23] Xie S, Liu S, Chen Z, Tu Z. Attentional shapecontextnet for point cloud recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 4606-4615 [24] Duan Y, Zheng Y, Lu J, Zhou J, Tian Q. Structual relational reasoning of point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA: IEEE, 2019. 949-958 [25] Lin H, Xiao Z, Tan Y, Chao H, Ding S. Justlookup: one millisecond deep feature extraction for point clouds by lookup tables. In: Proceedings of 2019 IEEE International Conference on Multimedia and Expo. Shanghai, China: IEEE, 2019. 326-331 Wang P, Liu Y, Guo Y, Sun C, Tong X. O-cnn: octree-based convolutional neural networks for 3d shape analysis. ACM Transactions on Graphics (TOG), 2017, 36(4): 1-11 [26] Klokov R, Lempitsky V. Escape from cells: deep kd-networks for the recognition of 3d point cloud models. In: Proceedings of the IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017. 863-872 [27] Li J, Chen B M, Hee L G. So-net: self-organizing network for point cloud analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 9397-9406 [28] Yi L, Kim V G, Ceylan D, et al. A scalable active framework for region annotation in 3d shape collections. ACM Transactions on Graphics (ToG), 2016, 35(6): 1−12 [29] Huang Q, Wang W, Neumann U. Recurrent slice networks for 3d segmentation of point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 2626-2635 [30] Armeni I, Sener O, Zamir A R, et al. 3d semantic parsing of large-scale indoor spaces. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016. 1534-1543 [31] Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Rhode Island, USA: IEEE, 2012. 3354-3361 [32] Qi C R, Liu W, Wu C, Su H, Guibas L J. Frustum pointnets for 3d object detection from rgb-d data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 918-927 [33] Lan S, Yu R, Yu G, Davis L S. Modeling local geometric structure of 3d point clouds using geo-cnn. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019. 998-1008 -

计量
- 文章访问数: 48
- HTML全文浏览量: 16
- 被引次数: 0