2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

面向可信自动驾驶策略优化: 一种对抗鲁棒强化学习方法

何祥坤 赵洋 房建武 程洪 吕辰

何祥坤, 赵洋, 房建武, 程洪, 吕辰. 面向可信自动驾驶策略优化: 一种对抗鲁棒强化学习方法. 自动化学报, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250193
引用本文: 何祥坤, 赵洋, 房建武, 程洪, 吕辰. 面向可信自动驾驶策略优化: 一种对抗鲁棒强化学习方法. 自动化学报, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250193
He Xiang-Kun, Zhao Yang, Fang Jian-Wu, Cheng Hong, Lv Chen. Toward trustworthy policy optimization for autonomous driving: an adversarial robust reinforcement learning approach. Acta Automatica Sinica, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250193
Citation: He Xiang-Kun, Zhao Yang, Fang Jian-Wu, Cheng Hong, Lv Chen. Toward trustworthy policy optimization for autonomous driving: an adversarial robust reinforcement learning approach. Acta Automatica Sinica, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250193

面向可信自动驾驶策略优化: 一种对抗鲁棒强化学习方法

doi: 10.16383/j.aas.c250193 cstr: 32138.14.j.aas.c250193
基金项目: 国家自然科学基金(W2411052, 62273057), 电子科技大学高层次人才配套项目(A25007)资助
详细信息
    作者简介:

    何祥坤:电子科技大学研究员. 主要研究方向为自动驾驶, 强化学习, 可信AI, 决策与控制. E-mail: xiangkun.he@uestc.edu.cn

    赵洋:电子科技大学副研究员. 主要研究方向为模式识别与机器学习, 计算机视觉, 智能汽车. E-mail: yzhao@uestc.edu.cn

    房建武:西安交通大学研究员. 主要研究方向为交通场景理解与智能驾驶. E-mail: fangjianwu@mail.xjtu.edu.cn

    程洪:电子科技大学教授. 主要研究方向为模式识别与机器学习, 计算机视觉, 机器人. E-mail: hcheng@uestc.edu.cn

    吕辰:南洋理工大学副教授. 主要研究方向为自动驾驶, 机器人, 人机系统. 本文通信作者. E-mail: lyuchen@ntu.edu.sg

Toward Trustworthy Policy Optimization for Autonomous Driving: An Adversarial Robust Reinforcement Learning Approach

Funds: Supported by National Natural Science Foundation of China (W2411052, 62273057) and Talent Support Program of University of Electronic Science and Technology of China (A25007)
More Information
    Author Bio:

    HE Xiang-Kun Professor at University of Electronic Science and Technology of China. His research interest covers autonomous driving, reinforcement learning, trustworthy AI, and decision and control

    ZHAO Yang Associate Professor at University of Electronic Science and Technology of China. His research interest covers pattern recognition and machine learning, computer vision, intelligent vehicles

    FANG Jian-Wu Professor at Xi'an Jiaotong University. His research interest covers traffic scene understanding and intelligent driving

    CHENG Hong Professor at University of Electronic Science and Technology of China. His research interest covers pattern recognition and machine learning, computer vision, robotics

    LV Chen Associate Professor at Nanyang Technological University. His research interest covers autonomous driving, robotics, human-machine systems. Corresponding author of this paper

  • 摘要: 虽然强化学习近年来取得显著成功, 但策略鲁棒性仍然是其在安全攸关的自动驾驶领域部署的关键瓶颈之一. 一个根本性挑战在于, 许多现实世界中的自动驾驶任务面临难以预测的环境变化和不可避免的感知噪声, 这些不确定性因素可能导致系统执行次优的决策与控制, 甚至引发灾难性后果. 针对上述多源不确定性问题, 提出一种对抗鲁棒强化学习方法, 实现可信端到端控制策略优化. 首先, 构建一个可在线学习的对手模型, 用于同时逼近最坏情况下环境动态扰动与状态观测扰动. 其次, 基于零和博弈建模自动驾驶智能体与环境动态扰动之间的对抗性. 再次, 针对所模拟的多源不确定性, 提出鲁棒约束演员-评论家算法, 在连续动作空间下实现策略累计奖赏最大化的同时, 有效约束环境动态扰动与状态观测扰动对所学端到端控制策略的影响. 最后, 所提出的方案在不同的场景、交通流及扰动条件下进行评估, 并与三种代表性的方法进行对比分析, 验证了该方法在复杂工况和对抗环境中的有效性与鲁棒性.
  • 图  1  面向可信自动驾驶策略优化的对抗鲁棒强化学习框架

    Fig.  1  Adversarial robust reinforcement learning framework for trustworthy autonomous driving policy optimization

    图  2  自动驾驶策略模型的测试评估方案

    Fig.  2  Evaluation scheme for autonomous driving policy models

    图  3  不同强化学习方法训练的策略模型性能对比((a)-(c)分别展示各模型在平均回报、成功率及行驶效率方面的性能)

    Fig.  3  Performance comparison of policy models trained with different reinforcement learning methods ((a)-(c) show the performance of each model in terms of average return, success rate, and travel efficiency, respectively)

    图  4  自动驾驶策略模型在不同交通流下的性能对比((a)-(c)分别展示各模型在成功率、碰撞率及行驶效率方面的性能)

    Fig.  4  Performance comparison of autonomous driving policy models under different traffic flow settings ((a)-(c) show the performance of each model in terms of success rate, collision rate, and travel efficiency, respectively)

    图  5  自动驾驶策略模型在不同交通环境和对抗条件下的性能对比((a)-(c)分别展示各模型在交叉口场景下的成功率、碰撞率及行驶效率方面的表现;(d)-(f)分别展示各模型在匝道汇入场景下的性能表现)

    Fig.  5  Performance comparison of autonomous driving policy models under different traffic environments and adversarial conditions ((a)–(c) respectively illustrate the success rate, collision rate, and driving efficiency of each model in the intersection scenario; (d)–(f) present the performance of each model in the ramp scenario)

    图  6  自动驾驶智能体在不同对抗攻击条件下的行驶轨迹

    Fig.  6  The trajectory of autonomous driving agents under different adversarial attack conditions

    图  7  约束函数上界设定对模型性能的影响分析与对比

    Fig.  7  Analysis and comparison of the impact of constraint function upper bound settings on model performance

    表  1  自动驾驶智能体在不同交通流条件下模型测试统计结果

    Table  1  Statistical results of model testing of autonomous driving agents under different traffic flow conditions

    指标 PPO SAC NARRL ARRL
    交通流-1 交通流-2 交通流-1 交通流-2 交通流-1 交通流-2 交通流-1 交通流-2
    平均回报 $ 36.41 \pm 2.69 $ $ 37.58 \pm 2.34 $ $ 35.93 \pm 1.37 $ $ 35.33 \pm 1.54 $ $ 33.27 \pm 6.11 $ $ 34.40 \pm 4.74 $ $ {\bf{38.90}} \pm {\bf{1.42}} $ $ {\bf{39.09}} \pm {\bf{1.33}} $
    成功率 $ 0.78 \pm 0.12 $ $ 0.84 \pm 0.11 $ $ 0.95 \pm 0.07 $ $ 0.93 \pm 0.08 $ $ 0.64 \pm 0.26 $ $ 0.70 \pm 0.21 $ $ {\bf{0.97}} \pm {\bf{0.05}} $ $ {\bf{0.98}} \pm {\bf{0.04}}$
    碰撞率 $ 0.22 \pm 0.12 $ $ 0.16 \pm 0.11 $ $ 0.04 \pm 0.06 $ $ 0.07 \pm 0.08 $ $ 0.36 \pm 0.26 $ $ 0.30 \pm 0.21 $ $ {\bf{0.03}} \pm {\bf{0.05}} $ $ {\bf{0.02}} \pm {\bf{0.04}} $
    行驶效率 $ 13.62 \pm 0.13 $ $ 13.62 \pm 0.08 $ $ 12.22 \pm 0.19 $ $ 12.21 \pm 0.21 $ $ {\bf{13.77}} \pm {\bf{0.17}} $ $ {\bf{13.71}} \pm {\bf{0.10}} $ $ 13.20 \pm 0.17 $ $ 13.18 \pm 0.11 $
    下载: 导出CSV

    表  2  不同攻击条件下自动驾驶智能体在交叉口场景中的测试统计结果

    Table  2  Statistical results of model testing of autonomous driving agents in the intersection scenario under different attack conditions

    指标PPOSACNARRLARRL
    无攻击有攻击无攻击有攻击无攻击有攻击无攻击有攻击
    平均回报$ 36.71 \pm 2.56 $$ 31.61 \pm 2.96 $$ 35.83 \pm 1.34 $$ 20.94 \pm 1.13 $$ 38.92 \pm 3.71 $$ 33.45 \pm 4.93 $$ {\bf{39.00}} \pm {\bf{1.45}} $$ {\bf{37.01}} \pm {\bf{1.92}} $
    成功率$ 0.80 \pm 0.12 $$ 0.35 \pm 0.13 $$ 0.95 \pm 0.07 $$ 0.14 \pm 0.12 $$ 0.83 \pm 0.15 $$ 0.65 \pm 0.21 $$ {\bf{0.98}} \pm {\bf{0.05}} $$ {\bf{0.88}} \pm {\bf{0.09}} $
    碰撞率$ 0.20 \pm 0.12 $$ 0.65 \pm 0.13 $$ 0.05 \pm 0.07 $$ 0.55 \pm 0.17 $$ 0.17 \pm 0.15 $$ 0.35 \pm 0.21 $$ {\bf{0.02}} \pm {\bf{0.05}} $$ {\bf{0.12}} \pm {\bf{0.09}} $
    行驶效率$ 13.61 \pm 0.09 $$ 13.61 \pm 0.15 $$ 12.23 \pm 0.19 $$ 7.98 \pm 0.41 $$ {\bf{14.30}} \pm {\bf{0.17}} $$ {\bf{13.71}} \pm {\bf{0.15}} $$ 13.21 \pm 0.12 $$ 13.21 \pm 0.11 $
    鲁棒性$ 0.17 \pm 0.00 $$ 0.36 \pm 0.04 $$ 0.15 \pm 0.04 $$ {\bf{0.04}} \pm {\bf{0.01}} $
    下载: 导出CSV

    表  3  不同攻击条件下自动驾驶智能体在匝道汇入场景中的测试统计结果

    Table  3  Statistical results of model testing of autonomous driving agents in the ramp under different attack conditions

    指标PPOSACNARRLARRL
    无攻击有攻击无攻击有攻击无攻击有攻击无攻击有攻击
    平均回报$ 14.34 \pm 1.32 $$ 16.42 \pm 1.56 $$ 31.80 \pm 4.38 $$ 21.25 \pm 3.13 $$ {\bf{38.48}} \pm {\bf{5.31}} $$ {\bf{38.39}} \pm {\bf{5.03}} $$ 37.30 \pm 3.55 $$ 37.87 \pm 3.72 $
    成功率$ 0.77 \pm 0.14 $$ 0.77 \pm 0.12 $$ 0.72 \pm 0.15 $$ 0.76 \pm 0.15 $$ 0.84 \pm 0.16 $$ 0.86 \pm 0.15 $$ {\bf{0.88}} \pm {\bf{0.11}} $$ {\bf{0.87}} \pm {\bf{0.11}} $
    碰撞率$ 0.23 \pm 0.14 $$ 0.23 \pm 0.12 $$ 0.28 \pm 0.15 $$ 0.24 \pm 0.15 $$ 0.16 \pm 0.16 $$ 0.14 \pm 0.15 $$ {\bf{0.12}} \pm {\bf{0.11}} $$ {\bf{0.13}} \pm {\bf{0.11}} $
    行驶效率$ 5.77 \pm 0.47 $$ 6.66 \pm 0.36 $$ 12.97 \pm 0.26 $$ 8.50 \pm 0.52 $$ {\bf{14.31}} \pm {\bf{0.26}} $$ {\bf{14.19}} \pm {\bf{0.26}} $$ 13.65 \pm 0.22 $$ 13.90 \pm 0.19 $
    鲁棒性$ 0.17 \pm 0.00 $$ 0.26 \pm 0.01 $$ 0.13 \pm 0.06 $$ {\bf{0.10}} \pm {\bf{0.02}} $
    下载: 导出CSV
  • [1] Chen L, Wu P, Chitta K, et al. End-to-end autonomous driving: Challenges and frontiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(12): 10164−10183 doi: 10.1109/TPAMI.2024.3435937
    [2] Almaskati D, Kermanshachi S, Pamidimukkala A. Convergence of emerging transportation trends: A comprehensive review of shared autonomous vehicles. Journal of Intelligent and Connected Vehicles, 2024, 7(3): 177−189 doi: 10.26599/JICV.2023.9210043
    [3] 褚端峰, 王如康, 王竞一, 花俏枝, 陆丽萍, 吴超仲. 端到端自动驾驶的研究进展及挑战. 中国公路学报, 2024, 37(10): 209−232

    Chu Duan-Feng, Wang Ru-Kang, Wang Jing-Yi, Hua Qiao-Zhi, Lu Li-Ping, Wu Chao-Zhong. End-to-end autonomous driving: Advancements and challenges. China Journal of Highway and Transport, 2024, 37(10): 209−232
    [4] Pomerleau D A. ALVINN: An autonomous land vehicle in a neural network. In: Proceedings of the 2nd International Conference on Neural Information Processing Systems (NIPS). Denver, CO, USA: NIPS Foundation, 1988. 305−313
    [5] Bojarski M, Del Testa D, Dworakowski D, Firner B, Flepp B, Goyal P, et al. End to end learning for self-driving cars. arXiv preprint arXiv: 1604.07316, 2016.
    [6] Couto G C K, Antonelo E A. Generative adversarial imitation learning for end-to-end autonomous driving on urban environments. In: Proceedings of 2021 IEEE Symposium Series on Computational Intelligence (SSCI). Orlando, FL, USA: IEEE, 2021. 1−7
    [7] Du J, Bai Y, Li Y, Geng J, Huang Y, Chen H. Evolutionary end-to-end autonomous driving model with continuous-time neural networks. IEEE/ASME Transactions on Mechatronics, 2024, 29(4): 2983−2990 doi: 10.1109/TMECH.2024.3402126
    [8] Yang S, Wang W, Liu C, Deng W. Scene understanding in deep learning-based end-to-end controllers for autonomous vehicles. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2018, 49(1): 53−63
    [9] Zheng W, Song R, Guo X, Zhang C, Chen L. Genad: Generative end-to-end autonomous driving. In: Proceedings of the European Conference on Computer Vision. Milan, Italy: ECVA, 2024. 87−104
    [10] Tampuu A, Matiisen T, Semikin M, Fishman D, Muhammad N. A survey of end-to-end driving: Architectures and training methods. IEEE Transactions on Neural Networks and Learning Systems, 2020, 33(4): 1364−1384
    [11] Le Mero L, Yi D, Dianati M, Mouzakitis A. A survey on imitation learning techniques for end-to-end autonomous vehicles. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(9): 14128−14147 doi: 10.1109/TITS.2022.3144867
    [12] 高阳, 陈世福, 陆鑫. 强化学习研究综述. 自动化学报, 2004, 30(1): 86−100

    Gao Yang, Chen Shi-Fu, Lu Xin. Research on reinforcement learning technology: A review. Acta Automatica?Sinica, 2004, 30(1): 86−100
    [13] Wang X, Wang S, Liang X, Zhao D, Huang J, Xu X, et al. Deep reinforcement learning: A survey. IEEE Transactions on Neural Networks and Learning Systems, 2022, 35(4): 5064−5078
    [14] Kiran BR, Sobh I, Talpaert V, Mannion P, Al Sallab AA, Yogamani S, et al. Deep reinforcement learning for autonomous driving: A survey. IEEE Transactions on Intelligent Transportation Systems, 2021, 23(6): 4909−4926
    [15] He X, Wu J, Huang Z, Hu Z, Wang J, Sangiovanni-Vincentelli A, Lv C. Fear-neuro-inspired reinforcement learning for safe autonomous driving. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(1): 267−279 doi: 10.1109/TPAMI.2023.3322426
    [16] 张兴龙, 陆阳, 李文璋, 徐昕. 基于滚动时域强化学习的智能车辆侧向控制算法. 自动化学报, 2023, 49(12): 2481−2492

    Zhang Xing-Long, Lu Yang, Li Wen-Zhang, Xu Xin. Receding horizon reinforcement learning algorithm for lateral control of intelligent vehicles. Acta Automatica Sinica, 2023, 49(12): 2481−2492
    [17] Wu J, Huang C, Huang H, Lv C, Wang Y, Wang F Y. Recent advances in reinforcement learning-based autonomous driving behavior planning: A survey. Transportation Research Part C: Emerging Technologies, 2024, 164: 1−28
    [18] Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. arxiv preprint arxiv: 1707.06347, 2017.
    [19] Haarnoja T, Zhou A, Hartikainen K, Tucker G, Ha S, Tan J, et al. Soft actor-critic algorithms and applications. arxiv preprint arxiv: 1812.05905, 2018.
    [20] Haarnoja T, Zhou A, Abbeel P, Levine S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Proceedings of International Conference on Machine Learning (ICML). Stockholm, Sweden: PMLR, 2018. 1861−1870
    [21] Toromanoff M, Wirbel E, Moutarde F. End-to-end model-free reinforcement learning for urban driving using implicit affordances. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR). Washington, USA: IEEE, 2020. 7153−7162
    [22] Osiński B, Jakubowski A, Zicina P, Miłoś P, Galias C, Homoceanu S, Michalewski H. Simulation-based reinforcement learning for real-world autonomous driving. In: Proceedings of the 2020 IEEE international conference on robotics and automation (ICRA). Paris, France: IEEE, 2020. 6411−6418
    [23] Anzalone L, Barra P, Barra S, Castiglione A, Nappi M. An end-to-end curriculum learning approach for autonomous driving scenarios. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(10): 19817−19826 doi: 10.1109/TITS.2022.3160673
    [24] Chen J, Li S E, Tomizuka M. Interpretable end-to-end urban autonomous driving with latent deep reinforcement learning. IEEE Transactions on Intelligent Transportation Systems, 2021, 23(6): 5068−5078
    [25] Amini A, Gilitschenski I, Phillips J, Moseyko J, Banerjee R, Karaman S, et al. Learning robust control policies for end-to-end autonomous driving from data-driven simulation. IEEE Robotics and Automation Letters, 2020, 5(2): 1143−1150 doi: 10.1109/LRA.2020.2966414
    [26] Chen S, Wang M, Song W, Yang Y, Li Y, Fu M. Stabilization approaches for reinforcement learning-based end-to-end autonomous driving. IEEE Transactions on Vehicular Technology, 2020, 69(5): 4740−4750 doi: 10.1109/TVT.2020.2979493
    [27] Peng B, Sun Q, Li SE, Kum D, Yin Y, Wei J, et al. End-to-end autonomous driving through dueling double deep Q-network. Automotive Innovation, 2021, 4: 328−337 doi: 10.1007/s42154-021-00151-3
    [28] Moos J, Hansel K, Abdulsamad H, Stark S, Clever D, Peters J. Robust reinforcement learning: A review of foundations and recent advances. Machine Learning and Knowledge Extraction, 2022, 4(1): 276−315 doi: 10.3390/make4010013
    [29] Mannor S, Mebel O, Xu H. Lightning does not strike twice: robust MDPs with coupled uncertainty. In: Proceedings of the 29th International Coference on International Conference on Machine Learning (ICML). Edinburgh, Scotland: PMLR, 2012. 451−458
    [30] Tirinzoni A, Petrik M, Chen X, Ziebart B. Policy-conditioned uncertainty sets for robust Markov decision processes. In: Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS 2018). Montréal, Canada: NIPS Foundation, 2018. 1−11
    [31] Bhardwaj M, Xie T, Boots B, Jiang N, Cheng Ching-An. Adversarial model for offline reinforcement learning. In: Proceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, Louisiana, USA: NIPS Foundation, 2023. 1245−1269
    [32] Zhang H, Chen H, Xiao C, Li B, Liu M, Boning D, et al. Robust deep reinforcement learning against adversarial perturbations on state observations. In: Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS 2020). Vancouver, Canada: NIPS Foundation, 2020. 21024−21037
    [33] Tessler C, Efroni Y, Mannor S. Action robust reinforcement learning and applications in continuous control. In: Proceedings of the 36th International Conference on Machine Learning (ICML). Long Beach, California, USA: PMLR, 2019. 6215−6224
    [34] Kumar A, Levine A, Feizi S. Policy smoothing for provably robust reinforcement learning. In: Proceedings of the Tenth International Conference on Learning Representations (ICLR). Virtual: ICLR, 2022. 1−29
    [35] Wu F, Li L, Huang Z, Vorobeychik Y, Zhao D, Li B. CROP: Certifying robust policies for reinforcement learning through functional smoothing. In: Proceedings of the Tenth International Conference on Learning Representations (ICLR). Virtual: ICLR, 2022. 1−34
    [36] He X, Lou B, Yang H, Lv C. Robust decision making for autonomous vehicles at highway on-ramps: A constrained adversarial reinforcement learning approach. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(4): 4103−4113 doi: 10.1109/TITS.2022.3229518
    [37] He X, Huang W, Lv C. Toward trustworthy decision-making for autonomous vehicles: A robust reinforcement learning approach with safety guarantees. Engineering, 2024, 33: 77−89 doi: 10.1016/j.eng.2023.10.005
    [38] Lopez P A, Behrisch M, Bieker-Walz L, Erdmann J, Fltterd Y P, Hilbrich R, et al. Microscopic traffic simulation using sumo. In: Proceedings of 2018 21st international conference on intelligent transportation systems (ITSC). Maui, HI, USA: IEEE, 2018. 2575−2582
    [39] Bae I, Moon J, Jhung J, Suk H, Kim T, Park H, et al. Self-driving like a human driver instead of a robocar: Personalized comfortable driving experience for autonomous vehicles. arxiv preprint arxiv: 2001.03908, 2020.
    [40] Goodfellow I J, Shlens J, Szegedy C. Explaining and harnessing adversarial examples. In: Proceedings of 3rd International Conference on Learning Representations (ICLR). San Diego, California, USA: ICLR, 2015. 1−11
  • 加载中
计量
  • 文章访问数:  18
  • HTML全文浏览量:  11
  • 被引次数: 0
出版历程
  • 网络出版日期:  2025-09-30

目录

    /

    返回文章
    返回