基于卦限卷积神经网络的<b>3D</b>点云分析

许翔; 帅惠; 刘青山

doi:10.16383/j.aas.c200080

基于卦限卷积神经网络的3D点云分析

doi: 10.16383/j.aas.c200080 cstr: 32138.14.j.aas.c200080

许翔^1,,
帅惠^1,,
刘青山^{1, 2,}

1.
江苏省大数据分析技术重点实验室南京 210044
2.
南京信息工程大学计算机学院、软件学院、网络空间学院南京 210044

基金项目: 国家自然科学基金(61825601, 61532009), 江苏省研究生科研创新计划 (KYCX21_0995)资助

详细信息

作者简介:
许翔：南京信息工程大学自动化学院硕士研究生. 2018年获得南京信息工程大学信息与控制学院学士学位. 主要研究方向为三维点云场景感知. E-mail: xuxiang0103@gmail.com

帅惠：南京信息工程大学博士研究生. 2018年获得南京信息工程大学信息与控制学院硕士学位. 主要研究方向为目标检测, 3D点云场景感知. E-mail: huishuai13@163.com

刘青山：南京信息工程大学计算机学院、软件学院、网络空间安全学院院长, 教授. 2003年获得中国科学院自动化研究所博士学位. 主要研究方向为图像理解, 模式识别, 机器学习. 本文通信作者. E-mail: qsliu@nuist.edu.cn

计量
- 文章访问数: 1720
- HTML全文浏览量: 540
- PDF下载量: 297
- 被引次数: 0
出版历程
- 收稿日期: 2020-02-25
- 录用日期: 2020-07-21
- 网络出版日期: 2021-10-15
- 刊出日期: 2021-12-23

Octant Convolutional Neural Network for 3D Point Cloud Analysis

XU Xiang^1
,,
SHUAI Hui^1
,,
LIU Qing-Shan^{1, 2
,}

1.
Jiangsu Key Laboratory of Big Data Analysis Technology, Nanjing 210044
2.
School of Computer Science, Nanjing University of Information Science and Technology, Nanjing 210044

Funds: Supported by National Natural Science Foundation of China (61825601, 61532009) and Postgraduate Research and Practice Innovation Program of Jiangsu Province (KYCX21_0995)

More Information

Author Bio:
XU Xiang　Master student at the School of Automation, Nanjing University of Information Science and Technology. He received his bachelor degree from the School of Information and Control, Nanjing University of Information Science and Technology in 2018. His research interest covers 3D point cloud scene perception

SHUAI Hui　Ph.D. candidate at Nanjing University of Information Science and Technology. He received his master degree from the School of Information and Control, Nanjing University of Information Science and Technology in 2018. His research interest covers object detection and 3D point cloud scene perception

LIU Qing-Shan　Dean and professor of the School of Computer Science, Nanjing University of Information Science and Technology. He received his Ph.D. degree from the Institute of Automation, Chinese Academy of Sciences in 2003. His research interest covers image understanding, pattern recognition and machine learning. Corresponding author of this paper

摘要

摘要: 基于深度学习的三维点云数据分析技术得到了越来越广泛的关注, 然而点云数据的不规则性使得高效提取点云中的局部结构信息仍然是一大研究难点. 本文提出了一种能够作用于局部空间邻域的卦限卷积神经网络(Octant convolutional neural network, Octant-CNN), 它由卦限卷积模块和下采样模块组成. 针对输入点云, 卦限卷积模块在每个点的近邻空间中定位8个卦限内的最近邻点, 接着通过多层卷积操作将8卦限中的几何特征抽象成语义特征, 并将低层几何特征与高层语义特征进行有效融合, 从而实现了利用卷积操作高效提取三维邻域内的局部结构信息; 下采样模块对原始点集进行分组及特征聚合, 从而提高特征的感受野范围, 并且降低网络的计算复杂度. Octant-CNN通过对卦限卷积模块和下采样模块的分层组合, 实现了对三维点云进行由底层到抽象、从局部到全局的特征表示. 实验结果表明, Octant-CNN在对象分类、部件分割、语义分割和目标检测四个场景中均取得了较好的性能.
- 深度学习 /
- 点云 /
- 卦限卷积神经网络 /
- 局部几何特征
Abstract: The 3D point cloud data analysis based on deep learning has attracted increasing attention recently. However, it is still a great challenge to extract local structure information from point cloud efficiently due to its irregularity. In this paper, we propose a new network named octant convolutional neural network (Octant-CNN) which can handle local spatial neighborhoods. It consists of octant convolution module and sub-sampling module. For the input point cloud, the octant convolution module locates nearest points in eight octants of each point, and then transforms the geometric features into semantic features through a multi-layer convolution operation. The low-level geometric features are effectively fused with the high-level semantic features so that the local structure information can be efficiently extracted. The sub-sampling module groups the original point set and aggregates the features to expand the receptive field of features, and also reduce the computation overhead of the network. By stacking the octant convolution module and sub-sampling module, Octant-CNN obtains the feature representation of the 3D point cloud from low-level to abstract, and from local to global. Extensive experiments demonstrate that Octant-CNN achieves great performance in four 3D scene understanding tasks including object classification, part segmentation, semantic segmentation, and object detection.
- Deep learning /
- point cloud /
- octant convolutional neural network (Octant-CNN) /
- local geometric feature

HTML全文

图 1 网络框架图

Fig. 1 Illustration of network architecture

下载: 全尺寸图片幻灯片

图 2 三阶段与单阶段2D卷积的对比

Fig. 2 Comparison of 2D CNN with three-stage and one-stage

下载: 全尺寸图片幻灯片

图 3 卦限卷积模块

Fig. 3 Octant convolution module

下载: 全尺寸图片幻灯片

图 4 S3DIS可视化结果

Fig. 4 Visualization of results on S3DIS

下载: 全尺寸图片幻灯片

图 5 KITTI目标检测可视化结果

Fig. 5 Visualization of detection results on KITTI

下载: 全尺寸图片幻灯片

图 6 K近邻和8卦限搜索的比较

Fig. 6 Comparison of KNN and 8 octant search

下载: 全尺寸图片幻灯片

表 1 ModelNet40分类结果(%)

Table 1 Classification results on ModelNet40 (%)

方法	oAcc	mAcc
PointNet^[12]	89.2	86.2
PointNet++^[13]	90.7	—
PointSIFT^[14]	90.2	86.9
SFCNN^[15]	91.4	—
ConvPoint^[17]	91.8	88.5
ECC^[18]	87.4	83.2
RGCNN^[19]	90.5	87.3
PAT^[22]	91.7	—
SCN^[23]	90.0	87.6
SRN-PointNet++^[24]	91.5	—
JUSTLOOKUP^[25]	89.5	86.4
Kd-Net^[26]	91.8	88.5
SO-Net^[27]	90.9	87.2
Octant-CNN	91.9	88.7

下载: 导出CSV

表 2 ShapeNet部件分割结果(%)

Table 2 Part segmentation results on ShapeNet (%)

方法	mIoU	aero	bag	cap	car	chair	earphone	guitar	knife	lamp	laptop	motor	mug	pistol	rocket	skateboard	table
PointNet^[12]	83.7	83.4	78.7	82.5	74.9	89.6	73.0	91.5	85.9	80.8	95.3	65.2	93.0	81.2	57.9	72.8	80.6
PointNet++^[13]	85.1	82.4	79.0	87.7	77.3	90.8	71.8	91.0	85.9	83.7	95.3	71.6	94.1	81.3	58.7	76.4	82.6
PointSIFT^[14]	79.0	75.1	78.4	81.8	74.5	85.2	64.3	89.6	81.9	77.5	95.1	64.0	93.5	77.1	54.2	70.6	74.3
RGCNN^[19]	84.3	80.2	82.8	92.6	75.3	89.2	73.7	91.3	88.4	83.3	96.0	63.9	95.7	60.9	44.6	72.9	80.4
DGCNN^[20]	85.1	84.2	83.7	84.4	77.1	90.9	78.5	91.5	87.3	82.9	96.0	67.8	93.3	82.6	59.7	75.5	82.0
SCN^[23]	84.6	83.8	80.8	83.5	79.3	90.5	69.8	91.7	86.5	82.9	96.0	69.2	93.8	82.5	62.9	74.4	80.8
Kd-Net^[26]	82.3	80.1	74.6	74.3	70.3	88.6	73.5	90.2	87.2	81.0	94.9	57.4	86.7	78.1	51.8	69.9	80.3
SO-Net^[27]	84.6	81.9	83.5	84.8	78.1	90.8	72.2	90.1	83.6	82.3	95.2	69.3	94.2	80.0	51.6	72.1	82.6
RS-Net^[29]	84.9	82.7	86.4	84.1	78.2	90.4	69.3	91.4	87.0	83.5	95.4	66.0	92.6	81.8	56.1	75.8	82.2
Octant-CNN	85.3	83.9	83.6	88.3	79.2	91.1	70.8	91.8	87.5	82.9	95.7	72.2	94.5	83.6	60.0	75.5	81.9

下载: 导出CSV

表 3 S3DIS语义分割结果

Table 3 Semantic segmentation results on S3DIS

方法	mIoU	OA	ceiling	floor	wall	beam	column	windows	door	chair	table	bookcase	sofa	board	clutter
PointNet^[12]	47.7	78.6	88.0	88.7	69.3	42.4	23.1	47.5	51.6	42.0	54.1	38.2	9.6	29.4	35.2
PointNet++^[13]	57.3	83.8	91.5	92.8	74.6	41.3	28.1	54.5	59.6	64.6	58.9	27.1	52.0	52.3	48.0
PointSIFT^[14]	55.5	83.5	91.1	91.3	75.5	42.0	24.0	51.4	56.6	60.2	55.8	17.0	50.2	57.1	49.9
RS-Net^[29]	56.5	—	92.5	92.8	78.6	32.8	34.4	51.6	68.1	59.7	60.1	16.4	50.2	44.9	52.0
Octant-CNN	58.3	84.6	92.1	94.5	76.3	48.9	30.8	56.9	62.9	65.8	55.5	28.0	48.1	50.3	48.4

下载: 导出CSV

表 4 3D目标检测对比结果(%)

Table 4 Performance compression in 3D object detection (%)

方法	Cars			Pedestrians			Cyclists
方法	Easy	Moderate	Hard	Easy	Moderate	Hard	Easy	Moderate	Hard
Frustum PointNet v1^[32]	83.75	69.37	62.83	65.39	55.32	48.62	70.17	52.87	48.27
Frustum PointNet v2^[32]	83.93	71.23	63.72	64.23	56.95	50.15	74.04	54.92	50.53
Frustum PointSIFT^[14]	71.56	66.17	58.97	63.13	55.08	49.05	70.36	52.56	48.53
Frustum Geo-CNN^[33]	85.09	71.02	63.38	69.64	60.50	52.88	75.64	56.25	52.54
Frustum Octant-CNN	85.10	72.31	64.46	67.90	59.73	52.44	76.56	57.50	54.26

下载: 导出CSV

表 5 结构设计分析

Table 5 Analysis of the structure design

模型	多层融合	残差	投票	oAcc (%)
A				90.7
B	$\checkmark$			91.2
C	$\checkmark$	$\checkmark$		91.5
D	$\checkmark$	$\checkmark$	$\checkmark$	91.9

下载: 导出CSV

表 6 2D卷积和MLP的对比

Table 6 Comparisons of 2D CNN and MLP

模型	运算	oAcc (%)
A	MLP	90.8
B	2D CNN	91.9

下载: 导出CSV

表 7 不同邻点的比较

Table 7 The results of different neighbor points

模型	邻点	准确率 (%)
A	K近邻	90.2
B	8 卦限搜索	91.9

下载: 导出CSV

表 8 不同搜索半径的比较

Table 8 Comparison of different search radius

模型	搜索半径	oAcc (%)
A	(0.25, 0.5, 1.0)	88.0
B	(0.4, 0.8, 1.0)	89.2
C	(0.5, 1.0, 1.0)	89.9
D	None	91.9

下载: 导出CSV

表 9 不同输入通道的结果比较

Table 9 The results of different input channels

模型	输入通道	oAcc (%)
A	($f_{ij}$)	90.1
B	($x_i-x_{ij}, f_{ij}$)	90.3
C	($x_i, f_{ij}$)	90.8
D	($x_i, x_i-x_{ij}, f_{ij}$)	91.9

下载: 导出CSV

表 10 点云旋转鲁棒性比较

Table 10 Comparison of robustness to point cloud rotation

方法	0° (%)	30° (%)	60° (%)	90° (%)	180° (%)	均值	方差
PointSIFT^[14]	88.2	89.2	88.9	88.7	88.5	88.7	0.124
PointSIFT+T-Net	89.1	89.4	89.4	88.6	88.6	89.04	0.114
Octant-CNN	91.5	91.7	91.9	91.5	91.8	91.68	0.025

下载: 导出CSV

表 11 点云语义分割的复杂度

Table 11 Complexity in point cloud semantic segmentation

方法	参数量 (MB)	FLOPs (B)
PointNet^[12]	1.17	7.22
PointNet++^[13]	0.97	1.96
PointSIFT^[14]	13.53	24.32
Octant-CNN	4.31	2.44

下载: 导出CSV

参考文献(33)

[1]	Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks. In: Proceedings of the 2012 Advances in Neural Information Processing Systems. Nevada, USA, 2012. 1097−1105
[2]	He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016. 770−778
[3]	Girshick R. Fast R-CNN. In: Proceedings of the 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015. 1440−1448
[4]	Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: Unified, real-time object detection. In: Proceedings of the 2016 I EEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE, 2016. 779−788
[5]	Zhu Z, Xu M D, Bai S, Huang T T, Bai X. Asymmetric non-local neural networks for semantic segmentation. In: Proceedings of the 2019 IEEE International Conference on Computer Vision. Seoul, South Korea: IEEE, 2019. 593−602
[6]	Li Y, Qi H Z, Dai J F, Ji X Y, Wei Y C. Fully convolutional instance-aware semantic segmentation. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Hawaii, USA: IEEE, 2017. 2359−2367
[7]	彭秀平, 仝其胜, 林洪彬, 冯超, 郑武. 一种面向散乱点云语义分割的深度残差−特征金字塔网络框架. 自动化学报, 2019. DOI: 10.16383/j.ass.c190063 Peng Xiu-Ping, Tong Qi-Sheng, Lin Hong-Bin, Feng Chao, Zheng Wu. A deep residual-feature pyramid network for scattered point cloud semantic segmentation. Acta Automatica Sinica, 2019. DOI: 10.16383/j.aas.c190063
[8]	Maturana D, Scherer S. Voxnet: A 3d convolutional neural network for real-time object recognition. In: Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems. Hamburg, Germany: IEEE, 2015. 922−928
[9]	Wu Z R, Song S R, Khosla A, Yu F, Zhang L G, Tang X O, Xiao J X. 3d shapenets: A deep representation for volumetric shapes. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE, 2015. 1912−1920
[10]	Su H, Maji S, Kalogerakis E, Learned-Miller E. Multi-view convolutional neural networks for 3d shape recognition. In: Proceedings of the 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015. 945−953
[11]	Yang Z, Wang L W. Learning relationships for multi-view 3d object recognition. In: Proceedings of the 2019 IEEE International Conference on Computer Vision. Seoul, Korea (South): IEEE, 2019. 7505−7514
[12]	Qi C R, Su H, Mo K, Guibas L J. Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Hawaii, USA: IEEE, 2017. 652−660
[13]	Qi C R, Yi L, Su H, Guibas L J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In: Proceedings of the 2017 Advances in Neural Information Processing Systems. Long Beach, USA, 2017. 5099−5108
[14]	Jiang M Y, Wu Y R, Zhao T Q, Zhao Z L, Lu C W. Pointsift: A sift-like network module for 3D point cloud semantic segmentation [Online], available: https://arxiv.org/abs/1807.00652, July 22, 2020
[15]	Rao Y M, Lu J W, Zhou J. Spherical fractal convolutional neural networks for point cloud recognition. In: Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA, 2019. 452−460
[16]	Liu Y C, Fan B, Xiang S M, Pan C H. Relation-shape convolutional neural network for point cloud analysis. In: Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA, 2019. 8895−8904
[17]	Boulch A. Convpoint: continuous convolutions for point cloud processing. Computers & Graphics, 2020, 88: 24-34
[18]	Simonovsky M, Komodakis N. Dynamic edge-conditioned filters in convolutional neural networks on graphs. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Hawaii, USA: IEEE, 2017. 3693−3702
[19]	Te G, Hu W, Zheng A, Guo Z M. RGCNN: Regularized graph cnn for point cloud segmentation. In: Proceedings of the 26th ACM International Conference on Multimedia. Seoul, Korea (South): ACM, 2018. 746−754
[20]	Wang Y, Sun Y, Liu Z, Sarma S E, Bronstein M M, Solomon J M. Dynamic graph cnn for learning on point clouds. ACM Transactions on Graphics (TOG), 2019, 38(5): 1-12
[21]	Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 2014, 15(1): 1929-1958
[22]	Yang J C, Zhang Q, Ni B, B, Li L G, Liu J X, Zhou M D, Tian Q. Modeling point clouds with self-attention and gumbel subset sampling. In: Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA, 2019. 3323−3332
[23]	Xie S N, Liu S N, Chen Z Y, Tu Z W. Attentional shapecontextnet for point cloud recognition. In: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 4606−4615
[24]	Duan Y Q, Zheng Y, Lu J W, Zhou J, Tian Q. Structural relational reasoning of point clouds. In: Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA: IEEE, 2019. 949−958
[25]	Lin H X, Xiao Z L, Tan Y, Chao H Y, Ding S Y. Justlookup: one millisecond deep feature extraction for point clouds by lookup tables. In: Proceedings of the 2019 IEEE International Conference on Multimedia and Expo. Shanghai, China: IEEE, 2019. 326−331
[26]	Klokov R, Lempitsky V. Escape from cells: Deep KD-networks for the recognition of 3d point cloud models. In: Proceedings of the 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017. 863−872
[27]	Li J X, Chen B M, Hee L G. So-net: Self-organizing network for point cloud analysis. In: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 9397−9406
[28]	Yi L, Kim V G, Ceylan D, et al. A scalable active framework for region annotation in 3d shape collections. ACM Transactions on Graphics (ToG), 2016, 35(6): 1-12
[29]	Huang Q G, Wang W Y, Neumann U. Recurrent slice networks for 3d segmentation of point clouds. In: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 2626−2635
[30]	Armeni I, Sener O, Zamir A R, Jiang H, Brilakis I, Fischer M, Savarese S. 3d semantic parsing of large-scale indoor spaces. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016. 1534−1543
[31]	Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving. In: Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. Rhode Island, USA: IEEE, 2012. 3354−3361
[32]	Qi C R, Liu W, Wu C, X Su H, Guibas L J. Frustum pointnets for 3d object detection from RGB-D data. In: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018. 918−927
[33]	Lan S Y, Yu R C, Yu G, Davis L S. Modeling local geometric structure of 3d point clouds using GEO-CNN. In: Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019. 998−1008