• 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

面向复杂环境的频域导向稀疏融合多光谱目标检测算法

肖海东 任志刚 蔡声泽 许超 吴宗泽

肖海东, 任志刚, 蔡声泽, 许超, 吴宗泽. 面向复杂环境的频域导向稀疏融合多光谱目标检测算法. 自动化学报, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250634
引用本文: 肖海东, 任志刚, 蔡声泽, 许超, 吴宗泽. 面向复杂环境的频域导向稀疏融合多光谱目标检测算法. 自动化学报, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250634
Xiao Hai-Dong, Ren Zhi-Gang, Cai Sheng-Ze, Xu Chao, Wu Zong-Ze. A frequency-guided sparse fusion algorithm for multispectral object detection in complex environments. Acta Automatica Sinica, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250634
Citation: Xiao Hai-Dong, Ren Zhi-Gang, Cai Sheng-Ze, Xu Chao, Wu Zong-Ze. A frequency-guided sparse fusion algorithm for multispectral object detection in complex environments. Acta Automatica Sinica, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250634

面向复杂环境的频域导向稀疏融合多光谱目标检测算法

doi: 10.16383/j.aas.c250634 cstr: 32138.14.j.aas.c250634
基金项目: 广东省基础与应用基础研究基金(2024A1515011768), 广东省基础与应用基础研究重大专项项目(2023B0303000009), 中国烟草总公司重点研发项目(110202402018)资助
详细信息
    作者简介:

    肖海东:广东工业大学自动化学院硕士研究生. 主要研究方向为目标检测与计算机视觉. E-mail: 2112304427@mail2.gdut.edu.cn

    任志刚:广东工业大学自动化学院教授. 主要研究方向为控制科学与工程,模式识别. 本文通讯作者 E-mail: renzhigang@gdut.edu.cn

    蔡声泽:浙江大学控制科学与工程学院教授. 主要研究方向为智能控制与机器学习. E-mail: shengze_cai@zju.edu.cn

    许超:浙江大学控制科学与工程学院教授. 主要研究方向为智能自主系统,机器视觉与工业人工智能. E-mail: cxu@zju.edu.cn

    吴宗泽:深圳大学机电与控制工程学院教授. 主要研究方向为智能制造,工业互联网,大数据分析与深度学习. E-mail: zzwu@gdut.edu.cn

A Frequency-guided Sparse Fusion Algorithm for Multispectral Object Detection in Complex Environments

Funds: Supported by Guangdong Basic and Applied Basic Research Foundation (2024A1515011768), Guangdong Major Project of Basic and Applied Basic Research (2023B0303000009), and Key R&D Project of China National Tobacco Corporation (110202402018)
More Information
    Author Bio:

    XIAO Hai-Dong Master student at the School of Automation, Guangdong University of Technology. His research interests include object detection and computer vision

    REN Zhi-Gang Professor at the School of Automation, Guangdong University of Technology. His research interests include control science and engineering and pattern recognition. Corresponding author of this paper

    CAI Sheng-Ze Professor at the College of Control Science and Engineering, Zhejiang University. His research interests include intelligent control and machine learning

    XU Chao Professor at the College of Control Science and Engineering, Zhejiang University. His research interests include intelligent autonomous systems, machine vision and industrial artificial intelligence

    WU Zong-Ze Professor at the School of Mechatronics and Control Engineering, Shenzhen University. His research interests include intelligent manufacturing, industrial Internet, big data analysis and deep learning

  • 摘要: 随着多光谱感知与智能视觉技术的发展, 如何在复杂环境中实现稳定而精确的目标检测已成为自动化视觉检测领域的重要研究方向. 针对传统单模态可见光目标检测在夜间、大雾及低照度等复杂环境中性能下降的问题, 提出一种基于频域特征细化导向的多光谱稀疏融合目标检测方法. 该方法利用共享权重的双分支编码器分别提取可见光与红外光特征, 并通过组稀疏自注意力模块实现跨模态长距离特征筛选, 以抑制冗余信息、增强显著特征表达. 同时, 设计频域自适应加权模块, 在频域空间中进行多光谱特征解耦与自适应融合, 实现不同光谱模态间的高效语义交互与动态权重分配. 该方法可在端到端框架下实现跨模态特征的高精度对齐与融合, 有效提升模型的检测精度与鲁棒性. 在M3FD、KAIST和FLIR三个公开数据集上分别取得83.5%, 76.2%和81.6%的mAP50结果, 显著优于现有多光谱目标检测算法, 验证了所提方法在复杂场景下的优越性能和泛化能力.
  • 图  1  大雾场景下的可见光(左上)与红外光(右上)及黑夜场景下的可见光(左下)与红外光(右下)示例图

    Fig.  1  Example images of visible light (top-left) and infrared light (top-right) under dense fog scenes, and visible light (bottom-left) and infrared light (bottom-right) under night-time scenes

    图  2  网络结构图

    Fig.  2  Network structure diagram

    图  3  组稀疏自注意力模块结构图

    Fig.  3  Group sparse self-attention module structure diagram

    图  4  频域自适应加权模块结构图

    Fig.  4  Frequency domain adaptive weighting module block diagram

    图  5  不同算法在$ {\rm{M}}^3 $FD数据集上的检测结果图

    Fig.  5  Detection results diagram for different algorithms on the $ {\rm{M}}^3 $FD dataset

    图  6  特征融合阶段示意图

    Fig.  6  schematic diagram of the feature \fusion stage

    图  7  不同阶段注意力热力图

    Fig.  7  attention heatmaps across different stages

    表  1  不同方法在$ {\rm{M}}^3 $FD数据集上的实验评价结果比较

    Table  1  Comparative experimental evaluation results of different methods on the $ {\rm{M}}^3 $FD dataset

    $ {\rm{AP}}_{50} $ (%)
    架构 方法 巴士 轿车 摩托车 行人 卡车 精度(%) 召回率(%) $ {\rm{mAP}}_{50} $ (%) $ {\rm{mAP}}_{50:95} $ (%) param (M) FPS
    单光谱模态 Visible 90.9 88.5 79.6 75.2 67.7 85.3 89.4 73.4 81.2 50.5 1.77 175.4
    Infrared 90.7 86.5 57.8 69.5 79.0 83.9 85.1 71.5 77.9 48.2 1.77 175.4
    融合检测架构 U2fusion[16] 80.8 86.3 73.8 56.4 73.2 78.2 84.1 70.1 74.8 45.4
    RFN[27] 79.2 85.7 69.9 53.1 71.4 76.4 88.5 65.1 72.6 44.9
    Tardal[17] 77.7 85.0 67.1 56.7 73.1 74.4 84.4 65.7 72.3 43.6
    MFEIF[18] 79.7 84.9 68.2 56.9 72.9 73.5 85.2 65.9 72.7 44.6
    HALDeR[28] 79.8 86.0 70.8 52.7 71.2 75.4 86.7 66.8 72.7 44.4
    端到端架构 Twin-Yolov5-n 89.1 87.8 77.1 66.2 77.0 83.8 85.0 75.1 80.2 47.8 2.84 192.3
    Twin-Yolov8-n 89.8 88.6 78.2 67.5 78.3 85.1 85.2 73.0 81.2 49.6 4.28 128.2
    DEYOLO[29] 89.3 87.5 77.7 67.8 77.4 85.2 84.0 75.9 80.8 49.3 4.57 188.5
    ICAFusion[20] 90.4 87.6 78.5 65.2 77.2 86.0 83.6 76.1 80.8 48.5 5.94 144.9
    CFT [19] 90.4 87.9 77.0 70.2 76.6 85.1 87.7 74.3 81.2 48.1 11.2 154
    DAMSDet[21] 83.1 92.7 71.1 73.5 73.6 78.6 78.7 83.1 78.8 51.0 78.9 88.5
    SuperYOLO[30] 89.1 87.6 74.7 68.9 76.6 83.2 88.2 70.1 80.0 48.0 1.8 212.7
    FGSFNet 92.1 89.0 80.5 74.3 79.0 86.1 86.7 77.3 83.5 51.2 7.44 208.3
    下载: 导出CSV

    表  2  不同方法在KAIST和FLIR数据集上的实验评价结果比较

    Table  2  Comparative of experimental evaluation results for different methods on the KAIST and FLIR datasets

    KAIST (单类别:行人) FLIR (多类别)
    $ {\rm{AP}}_{50} $ (%)
    架构 方法 精度(%) 召回率(%) $ {\rm{AP}}_{50} $ (%) $ {\rm{mAP}}_{50:95} $ (%) 自行车 汽车 行人 精度(%) 召回率(%) $ {\rm{mAP}}_{50} $ (%) $ {\rm{mAP}}_{50:95} $ (%)
    单光谱模态Visible71.654.462.024.668.983.570.080.267.674.134.8
    Infrared75.865.473.532.375.887.581.284.572.081.540.6
    融合检测架构U2fusion[16]65.445.952.221.163.183.071.679.465.972.633.8
    RFN[27]59.643.748.918.963.182.071.478.765.272.133.4
    Tardal[17]62.445.050.620.660.181.568.577.163.370.032.3
    MFEIF[18]62.143.850.020.662.980.969.776.066.271.232.8
    HALDeR[28]63.345.751.621.162.982.170.179.664.971.732.9
    端到端架构Twin-Yolov5-n76.768.774.832.874.987.282.084.074.681.439.4
    Twin-Yolov8-n78.268.074.232.973.887.581.283.072.680.839.7
    DEYOLO[29]74.968.274.431.974.586.982.084.772.781.140.0
    ICAFusion[20]75.569.175.733.274.887.481.682.275.981.339.6
    CFT[19]77.667.275.732.972.987.482.182.973.480.839.2
    SuperYOLO[30]74.365.571.831.570.686.378.382.169.178.437.7
    FGSFNet77.671.176.233.875.087.782.183.475.081.640.2
    下载: 导出CSV

    表  3  稀疏率组合选取研究实验

    Table  3  Study on sparsity rate combination selection

    稀疏率参数组合 实验结果(%)
    实验 50% 67% 75% 80% $ {\rm{mAP}}_{50} $ $ {\rm{mAP}}_{50:95} $
    I 0 0 0 1 78.4 46.2
    II 0 1 0 1 82.3 49.7
    III 1 1 0 1 83.5 51.2
    IV 1 1 1 0 83.5 50.2
    V 1 1 1 1 81.4 49.1
    下载: 导出CSV

    表  4  不同融合阶段性能研究

    Table  4  performance study at different integration stages

    融合策略 精度(%) 召回率(%) $ {\rm{mAP}}_{50} $ (%) $ {\rm{mAP}}_{50:95} $ (%) params (M)
    基线 85.0 75.1 81.2 47.7 7.12
    早期融合 86.3 71.8 80.1 48.1 7.12
    中期融合 86.7 77.3 83.5 51.2 7.44
    晚期融合 86.8 75.2 82.5 49.1 7.39
    下载: 导出CSV

    表  5  消融实验

    Table  5  Ablation experiments

    实验 组稀疏
    自注意力模块
    频域自适应
    加权模块
    $ {\rm{mAP}}_{50} $ (%) $ {\rm{mAP}}_{50:95} $ (%) params (M)
    I $ \times $ $ \times $ 80.2 47.8 2.84
    II $ \times $ 81.2 47.7 7.12
    III $ \times $ 82.8 49.5 3.20
    IV 83.5 51.2 7.44
    下载: 导出CSV
  • [1] Mao J G, Shi S S, Wang X, and Li H S. 3d object detection for autonomous driving: A comprehensive survey. International Journal of Computer Vision, 2023, 131(8): 1909−1963 doi: 10.1007/s11263-023-01790-1
    [2] Shi X L and Song A J. Defog yolo for road object detection in foggy weather. The Computer Journal, 2024, 67(11): 3115−3127
    [3] Wang J, Yang P, Liu Y S, Shang D, Hui X, Song J H, and Chen X H. Research on improved yolov5 for low-light environment object detection. Electronics, 2023, 12(14): 3089
    [4] 闫梦凯, 钱建军, 杨健. 弱对齐的跨光谱人脸检测. 自动化学报, 2023, 49(1): 135−147 doi: 10.16383/j.aas.c210058

    Meng-Kai Yan, Jian-Jun Qian, and Jian Yang. Weakly aligned cross-spectral face detection. Acta Automatica Sinica, 2023, 49(1): 135−147 doi: 10.16383/j.aas.c210058
    [5] 赵兴科, 李明磊, 张弓, 黎宁, 李家松. 基于显著图融合的无人机载热红外图像目标检测方法. 自动化学报, 2021, 47(9): 2120−2131

    Xin-Ke Zhao, Ming-Lei Li, Gong Zhang, Ning Li, and Jia-Song Li. Object detection method for uav thermal infrared images based on saliency map fusion. Acta Automatica Sinica, 2021, 47(9): 2120−2131
    [6] Hu S M, Zhao F, Lu H Z, Deng Y J, Du J M, and Shen X L. Improving yolov7-tiny for infrared and visible light image object detection on drones. Remote Sensing, 2023, 15(13): 3214
    [7] Wang P, Wu J S, Fang A Q, Zhu Z X, and Wang C W. Multi-spectral image fusion for moving object detection. Infrared Physics & Technology, 2024, 141: 105489
    [8] He X, Tang C, Zou X, and Zhang W. Multispectral object detection via cross-modal conflict-aware learning. In Proceedings of the 31st ACM International Conference on Multimedia, pages 1465–1474, 2023.
    [9] Girshick R, Donahue J, Darrell T, and Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 580–587, 2014.
    [10] 张鹏, 雷为民, 赵新蕾, 董力嘉, 林兆楠, 景庆阳. 跨摄像头多目标跟踪方法综述. 计算机学报, 2024, 47(2): 287−309

    Peng Zhang, Wei-Min Lei, Xin-Lei Zhao, Li-Jia Dong, Zhao-Nan Lin, and Qing-Yang Jing. A survey on multi-target multi-camera tracking methods. Chinese Journal of Computers, 2024, 47(2): 287−309
    [11] Redmon J, Divvala S, Girshick R, and Farhadi A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016.
    [12] Redmon J and Farhadi A. Yolov3: An incremental improvement. arXiv preprint arXiv: 1804.02767, 2018.
    [13] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser L, and Polosukhin I. Attention is all you need. Advances in neural information processing systems, 30, 2017.
    [14] Dosovitskiy A. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv: 2010.11929, 2020.
    [15] Chen Y T, Shi J H, Ye Z L, Mertz C, Ramanan D, and Kong S. Multimodal object detection via probabilistic ensembling. In European Conference on Computer Vision, pages 139–158. Springer, 2022.
    [16] Xu H, Ma J, Jiang J J, Guo X J, and Ling H B. U2fusion: A unified unsupervised image fusion network. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020.
    [17] Liu J Y, Fan X, Huang Z B, Wu G Y, Liu R S, Zhong W, and Luo Z X. Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5802–5811, 2022.
    [18] Liu J Y, Fan X, Jiang J, Liu R S, and Luo Z X. Learning a deep multi-scale feature ensemble and an edge-attention guidance for image fusion. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 32(1): 105−119
    [19] Fang Q Y, Han D P, and Wang Z K. Cross-modality fusion transformer for multispectral object detection. arXiv preprint arXiv: 2111.00273, 2021.
    [20] Shen J F, Chen Y F, Liu Y, Zuo X, Fan H, and Yang W K. Icafusion: Iterative cross-attention guided feature fusion for multispectral object detection. Pattern Recognition, 2024, 145: 109913
    [21] Guo J J, Gao C Q, Liu F C, Meng D Y, and Gao X B. Damsdet: Dynamic adaptive multispectral detection transformer with competitive query selection and adaptive feature fusion. In European Conference on Computer Vision, pages 464–481. Springer, 2024.
    [22] Zhu X, Su W, Lu L, Li B, Wang X, and Dai J. Deformable detr: Deformable transformers for end-to-end object detection. arXiv preprint arXiv: 2010.04159, 2020.
    [23] Guobao Xiao, Zhimin Tang, Hanlin Guo, Jun Yu, and Heng Tao Shen. Fafusion: Learning for infrared and visible image fusion via frequency awareness. IEEE Transactions on Instrumentation and Measurement, 2024, 73: 1−11
    [24] Jiayi Ma, Yong Ma, and Chang Li. Infrared and visible image fusion methods and applications: A survey. Information fusion, 2019, 45: 153−178
    [25] Hwang S, Park J, Kim N, Choi Y, and Kweon I S. Multispectral pedestrian detection: Benchmark dataset and baseline. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1037–1045, 2015.
    [26] Zhang H, Fromont E, Lefevre S, and Avignon B. Multispectral fusion for object detection with cyclic fuse-and-refine blocks. In 2020 IEEE International conference on image processing (ICIP), pages 276–280. IEEE, 2020.
    [27] Li H, Wu X J, and Kittler J. Rfn-nest: An end-to-end residual fusion network for infrared and visible images. Information Fusion, 2021, 73: 72−86
    [28] Liu J Y, Shang J J, Liu R S, and Fan X. Halder: Hierarchical attention-guided learning with detail-refinement for multi-exposure image fusion. In 2021 IEEE International Conference on Multimedia and Expo (ICME), 2021.
    [29] Chen Y S, Wang B R, Guo X Y, Zhu W B, He J S, Liu X B, and Yuan J. Deyolo: Dual-feature-enhancement yolo for cross-modality object detection. In International Conference on Pattern Recognition, pages 236–252. Springer, 2025.
    [30] Zhang J Q, Lei J, Xie W Y, Fang Z M, Li Y S, and Du Q. Superyolo: Super resolution assisted object detection in multimodal remote sensing imagery. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 1−15
  • 加载中
计量
  • 文章访问数:  15
  • HTML全文浏览量:  7
  • 被引次数: 0
出版历程
  • 收稿日期:  2025-11-14
  • 录用日期:  2026-03-19
  • 网络出版日期:  2026-04-13

目录

    /

    返回文章
    返回