Research on Detection Method of Refined Rotated Boxes in Remote Sensing
-
摘要: 遥感图像中的目标往往呈现出任意方向排列, 而常见的目标检测算法均采用水平框检测, 并不能满足这类场景的应用需求. 因此本文提出一种旋转框检测网络R2-FRCNN. 该网络利用粗调与细调两阶段实现旋转框检测, 粗调阶段将水平框转换为旋转框, 细调阶段进一步优化旋转框的定位. 针对遥感图像存在较多小目标的特点, 本文提出像素重组金字塔结构, 融合深浅层特征, 提升复杂背景下小目标的检测精度. 此外, 为了在金字塔各层中提取更加有效的特征信息, 本文在粗调阶段设计一种积分与面积插值法相结合的感兴趣区域特征提取方法, 同时在细调阶段设计旋转框区域特征提取方法. 最后, 本文在粗调和细调阶段均采用全连接层与卷积层相结合的预测分支, 并且利用SmoothLn作为网络的回归损失函数, 进一步提升算法性能. 本文提出的网络在大型遥感数据集DOTA上进行评估, 评估指标mAP达到0.7602. 对比实验表明所提出的R2-FRCNN网络的有效性.Abstract: The objects in remote sensing images are often shown in any direction. The common algorithms of object detection adopt horizontal detection, which cannot fulfill the application requirements in remote sensing. Therefore, this paper proposes an object detector of rotated boxes named R2-FRCNN. The network adopts two stages of rough and refined adjustment to realize the detection of rotated boxes. The rough adjustment stage is used to transform the horizontal boxes into rotated boxes, and the refined adjustment stage is used to further optimize the position of the rotated boxes. In view of the fact that there are many small objects in remote sensing images, this paper proposes a pixel-recombination pyramid structure to improve the detection accuracy of small objects in a complex background by integrating deep and shallow features. In addition, in order to extract more effective feature information from each layer of the pyramid, this paper designs a region pooling method combining integration and area interpolation in the rough adjustment stage, and a region pooling method of rotated boxes in the refine adjustment stage. Finally, this paper adopts the prediction branch combining the fully connected layers and the convolutional layers, and takes the SmoothLn as the regression loss function of the network to further improve the performance of the algorithm. The network proposed in this paper is evaluated on a large remote sensing dataset DOTA, and the evaluation mAP reaches 0.7602. Comparative experiments show the effectiveness of R2-FRCNN modules.
-
表 1 不同方法在DOTA数据集的检测精度对比
Table 1 Comparison of detection accuracy of different methods in DOTA
类别 方法 R2CNN[10] RT[12] CADNet[13] SCRDet[15] R3Det[16] GV[17] 本文 PL 80.94 88.64 87.80 89.98 89.24 89.64 89.10 BD 65.67 78.52 82.40 80.65 80.81 85.00 81.22 BR 35.34 43.44 49.40 52.09 51.11 52.26 54.47 GTF 67.44 75.92 73.50 68.36 65.62 77.34 72.97 SV 59.92 68.81 71.10 68.36 70.67 73.01 79.99 LV 50.91 73.68 64.50 60.32 76.03 73.14 82.28 SH 55.81 83.59 76.60 72.41 78.32 86.82 87.64 TC 90.67 90.74 90.90 90.85 90.83 90.74 90.54 BC 66.92 77.27 79.20 87.94 84.89 79.02 87.31 ST 72.39 81.46 73.30 86.86 84.42 86.81 86.33 SBF 55.06 58.39 48.40 65.02 65.10 59.55 54.20 RA 52.23 53.54 60.90 66.68 57.18 70.91 68.18 HA 55.14 62.83 62.00 66.25 68.10 72.94 76.12 SP 53.35 58.93 67.00 68.24 68.98 70.86 70.83 HC 48.22 47.67 62.20 65.21 60.88 57.32 59.19 mAP(%) 60.67 69.56 69.90 72.61 72.81 75.02 76.02 表 2 R2-FRCNN模块分离检测结果
Table 2 R2-FRCNN module separates detection results
模块 R2-FRCNN Baseline √ √ √ √ √ √ √ 精细调整 √ √ √ √ √ √ IRoIPool √ √ √ √ √ RRoIPool √ √ √ √ PFPN √ √ √ SmoothLn √ √ ConvFc √ mAP(%) 69.52 73.62 73.99 74.31 74.97 75.13 75.96 表 3 不同水平框特征提取方法的实验结果
Table 3 Experimental results of feature extraction methods of different horizontal boxes
模块 Baseline + 精细调整 方法 RoI Pooling RoI Align IRoIPool mAP(%) 71.21 73.62 73.99 表 4 不同旋转框特征提取方法的实验结果
Table 4 Experimental results of different feature extraction methods of rotated boxes
模块 Baseline + 精细调整 + IRoIPool 方法 RRoI A-Pooling RRoI Align RRoIPool mAP(%) 73.38 73.99 74.31 -
[1] Ya, Ying, et al. Fusion object detection of satellite imagery with arbitrary-oriented region convolutional neural network. Aerospace Systems, 2019, 2(2): 163−174 doi: 10.1007/s42401-019-00033-x [2] 王彦情, 马雷, 田原. 光学遥感图像舰船目标检测与识别综述. 自动化学报, 2011, 37(9): 1029−1039WANG Yan-Qing, MA Lei, TIAN Yuan. State-of-the-art of Ship Detection and Recognition in Optical Remotely Sensed Imagery. ACTA AUTOMATICA SINICA, 2011, 37(9): 1029−1039 [3] 张慧, 王坤峰, 王飞跃. 深度学习在目标视觉检测中的应用进展与展望. 自动化学报, 2017, 43(8): 1289−1305ZHANG Hui, WANG Kun-Feng, WANG Fei-Yue. Advances and Perspectives on Applications of Deep Learning in Visual Object Detection. ACTA AUTO-MATICA SINICA, 2017, 43(8): 1289−1305 [4] Ren S Q, He K M, Girshick R, Sun J. Faster R-CNN: to-wards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137−1149 doi: 10.1109/TPAMI.2016.2577031 [5] Dai J F, Li Y, He K M, Sun J. R-FCN: object detection via region-based fully convolutional networks. In: Proceedings of the 2016 Advances in Neural Information Processing Systems (NIPS). Barcelona, Spain: MIT Press: IEEE, 2016. 379−387. [6] Cai, Zhaowei, and Nuno Vasconcelos. Cascade r-cnn: Delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). Salt Lake City, UT: IEEE, 2018. 6154−6162. [7] Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: unified, real-time object detection. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE, 2016. 779−788. [8] Liu W, Anguelov D, Erhan D, Szegedy C, Reed S E, Fu C Y, Berg A C. SSD: single shot multibox detector. In: Proceeding of the 14th European Conference on Computer Vision (ECCV). Amsterdam, Netherlands: Springer, 2016. 21−37. [9] Lin, Tsung-Yi, et al. Focal loss for dense object detection. In: Proceedings of the IEEE Transactions on Pattern Analysis and Machine Intelligence: IEEE, 2017, 42(2): 318−327. [10] Jiang Y, Zhu X, Wang X, et al. R2cnn: rotational region cnn for orientation robust scene text detection[Online], available: https://arxiv.org/abs/1706.09579, 29 Jun, 2017. [11] Ma J, Shao W, Ye H, et al. Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia, 2018, 20(11): 3111−3122 doi: 10.1109/TMM.2018.2818020 [12] Ding, Jian, et al. Learning roi transformer for detecting oriented objects in aerial images. In: Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, CA, USA: IEEE, 2019. 2844−2853. [13] Zhang, Gongjie, Shijian Lu, and Wei Zhang. CAD-Net: A context-aware detection network for objects in remote sensing imagery. IEEE Transactions on Geoscience and Remote Sensing, 2019, 57(12): 10015−10024 doi: 10.1109/TGRS.2019.2930982 [14] Azimi, Seyed Majid, Vig, Eleonora, Bahmanyar, Reza, et al. Towards Multi-class Object Detection in Unconstrained Remote Sensing Imagery. Cham: Springer International Publishing, 2019. 150−165. [15] Yang, Xue, et al. SCRDet: Towards More Robust Detection for Small, Cluttered and Rotated Objects. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), Seoul, Korea (South): IEEE, 2019. 8231−8240. [16] Yang, Xue, et al. R3DET: Refined single-stage detector with feature refinement for rotating object[Online], available: https://arxiv.org/abs/1908.05612, 15 Aug, 2019. [17] Xu, Yongchao, et al. Gliding vertex on the horizontal bounding box for multi-oriented object detection[Online], available: https://arxiv.org/abs/1911.09358, 21 Nov, 2019. [18] Wei, Haoran, et al. Oriented Objects as pairs of Middle Lines[Online], available: https://arxiv.org/abs/1912.10694, 23 Dec, 2019. [19] L i, Yangyang, et al. RADet: Refine Feature Pyramid Network and Multi-Layer Attention Network for Arbi-trary-Oriented Object Detection of Remote Sensing Images. Remote Sensing, 2020, 12(3): 389−409 doi: 10.3390/rs12030389 [20] Wa ng, Jinwang, et al. Mask OBB: A Semantic Atten-tion-Based Mask Oriented Bounding Box Representation for Multi-Category Object Detection in Aerial Images. Remote Sensing, 2019, 11(24): 2930−2951 doi: 10.3390/rs11242930 [21] Xia, Gui-Song, et al. DOTA: A large-scale dataset for object detection in aerial images. In: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Salt Lake City, UT: IEEE, 2018. 3974−3983. [22] He, Kaiming, et al. Deep residual learning for image recognition. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE, 2016. 770−778. [23] M a, Jianqi, et al. Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia, 2018, 20(11): 3111−3122 doi: 10.1109/TMM.2018.2818020 [24] T. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan and S. Belongie. Feature Pyramid Networks for Object Detection. In: Proceeding of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI: IEEE, 2017. 936−944. [25] Yi, Jingru, Pengxiang Wu, and Dimitris N. Metaxas. ASSD: Attentive single shot multibox detector. Computer Vision and Image Understanding, 2019, 189: 102827−102836. [26] Zeiler M D, Krishnan D, Taylor G W, et al. Deconvolu-tional networks. In: 2010 Proceedings of the IEEE Computer Society Conference on computer vision and pattern recognition (CVPR). San Francisco, CA: IEEE, 2010. 2528−2535. [27] Wang J, Chen K, Xu R, et al. CARAFE: Content-Aware ReAssembly of Features [Online], available: https://arxiv.org/abs/1905.02188, 6 May, 2019. [28] Zhou, Peng, et al. Scale-transferrable object detection. In: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Salt Lake City, UT: IEEE, 2018. 528−537. [29] Bridle, John S. Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. Neurocomputing. Springer, Berlin, Heidelberg, 1990, 68: 227−236 [30] K. He, G. Gkioxari, P. Dollár and R. Girshick. Mask R-CNN. In: Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice: IEEE, 2017. 2980−2988. [31] Jiang, Borui, et al. Acquisition of localization confidence for accurate object detection [Online], available: https://arxiv.org/abs/1807.11590, 30 Jul, 2018. [32] Wu Y, Chen Y, Yuan L, et al. Rethinking Classification and Localization for Object Detection[Online], available: https://arxiv.org/abs/1904.06493, 13 Apr, 2019. [33] Liu, Yuliang, and Lianwen Jin. Deep matching prior network: Toward tighter multi-oriented text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Salt Lake City, UT: IEEE, 2018. 8759−8768. [34] Dai J, Qi H, Xiong Y, et al. Deformable convolutional networks. In: Proceedings of the IEEE international con-ference on computer vision (CVPR). Honolulu, HI: IEEE, 2017. 3454−3461. -

计量
- 文章访问数: 45
- HTML全文浏览量: 11
- 被引次数: 0