Receding Horizon Reinforcement Learning Algorithm for Lateral Control of Intelligent Vehicles
-
摘要: 本文针对智能车辆的高精度侧向控制问题, 提出了一种基于滚动时域强化学习(Receding horizon reinforcement learning, RHRL)的侧向控制方法. 车辆的侧向控制量由前馈和反馈两部分构成, 前馈控制量由参考路径的曲率以及动力学模型直接计算得出; 而反馈控制量通过采用滚动时域强化学习算法求解最优跟踪控制问题得到. 本文提出的方法结合滚动时域优化机制, 将无限时域最优控制问题转化为若干有限时域控制问题进行求解. 与已有的有限时域执行器-评价器学习不同, 在每个预测时域采用时间独立型执行器-评价器网络结构学习最优值函数和控制策略. 与模型预测控制(Model predictive control, MPC)方法求解开环控制序列不同, RHRL控制器的输出是一个显式状态反馈控制律, 兼具直接离线部署和在线学习部署的能力. 此外, 本文从理论上证明了RHRL算法在每个预测时域的收敛性, 并分析了闭环系统的稳定性. 在仿真环境中完成了结构化道路下的车辆侧向控制测试, 仿真结果表明提出的RHRL方法在控制性能方面优于预瞄控制器和启发式动态规划算法, 在计算效率方面优于MPC; 与最近流行的软执行器-评价器(Soft actor-critic, SAC)算法和深度确定性策略梯度(Deep deterministic policy gradient, DDPG)算法相比控制性能更好, 且具有更低的样本复杂度和更高的学习效率. 最后, 以红旗E-HS3电动汽车作为实车平台, 在封闭结构化城市测试道路和乡村起伏砂石道路下进行了侧向控制实验. 实验结果显示, RHRL在结构化城市道路中的侧向控制性能优于预瞄控制, 在乡村道路中具有较强的路面适应能力和较好的控制性能.Abstract: This paper presents a receding horizon reinforcement learning (RHRL) algorithm for realizing high-accuracy lateral control of intelligent vehicles. The overall lateral control is composed of a feedforward control term that is directly computed using the curvature of the reference path and the dynamic model, and a feedback control term that is generated by solving an optimal control problem using the proposed RHRL algorithm. The proposed RHRL adopts a receding horizon optimization mechanism, and decomposes the infinite-horizon optimal control problem into several finite-horizon ones to be solved. Different from existing finite-horizon actor-critic learning algorithms, in each prediction horizon of RHRL, a time-independent actor-critic structure is utilized to learn the optimal value function and control policy. Also, compared with model predictive control (MPC), the control learned by RHRL is an explicit state-feedback control policy, which can be deployed directly offline or learned and deployed synchronously online. Moreover, the convergence of the proposed RHRL algorithm in each prediction horizon is proven and the stability analysis of the closed-loop system is peroformed. Simulation studies on a structural road show that, the proposed RHRL algorithm performs better than the pure pursuit method and heuristic dynamic programming in terms of control performance and better than MPC in terms of computational efficiency. Moreover, compared with the recent developed deep reinforcement learning algorithms such as soft actor-critic (SAC) and deep deterministic policy gradient (DDPG), our approach exhibits better control performance, lower sample complexity, and higher learning efficiency. The experimental studies on an intelligent driving platform built with a Hongqi E-HS3 electric car show that RHRL performs better than the pure pursuit method in the adopted structural city road scenario, and exhibits strong adaptability to road conditions and satisfactory control performance in the country road scenario.
-
Key words:
- Receding horizon /
- reinforcement learning /
- intelligent vehicles /
- lateral control
1) 收稿日期 2021-06-20 录用日期 2021-11-02 Manuscript received June 20, 2021; accepted November 2, 2021 国家重点研究发展计划 (2018YFB1305105), 国家自然科学基金(62003361, 61825305) 资助 Supported by National Key R&D Program of China 2018YFB1305105, National Natural Science Foundation of China2) under Grant (62003361, 61825305) 本文责任编委 Recommended by Associate Editor 1. 国防科技大学智能科学学院 长沙 410073 1. College of Intelligence Science and Technology, NationalUniversity of Defense Technology, Changsha 410073 -
表 1 车辆动力学参数表
Table 1 The parameters of the vehicle dynamics
符号 物理意义 取值 单位 $m$ 车身质量 1723 $kg$ $I_z$ 转动惯量 4175 ${\rm{kg}}\cdot {\rm{m}}^2$ $l_f$ 质心到前轴距离 1.232 $m$ $l_r$ 质心到后轴距离 1.468 $m$ $C_f$ 前轮侧偏刚度 66900 $N$ $C_r$ 后轮侧偏刚度 62700 $N$ 表 2 各控制器的均方根误差对比
Table 2 The root mean square error (RMSE) comparison among all the controllers.
RMSE 30 ${\rm{km/h}}$ 50 ${\rm{km/h }}$ $e_y(m)$ $e_{\varphi}(rad)$ $e_y(m)$ $e_{\varphi}(rad)$ RHRL $\boldsymbol{0.156}$ 0.03 0.246 0.02 HDP 0.165 0.03 0.315 0.019 SAC 0.189 0.029 0.283 0.017 DDPG 0.172 $0.037$ 0.319 0.017 MPC 0.212 0.025 0.278 0.015 纯点预瞄 0.159 0.036 0.286 0.03 -
[1] Kabzan J, Hewing L, Liniger A, et al. Learning-based Model Predictive Control for Autonomous Racing. IEEE Robotics and Automation Letters, 2019, 4(4): 3363-3370 doi: 10.1109/LRA.2019.2926677 [2] Lian C, Xu X, Chen H, et al. Near-optimal Tracking Control of Mobile Robots via Receding-horizon Dual Heuristic Programming. IEEE Transactions on Systems, Man, and Cybernetics, 2016, 46(11): 2484-2496 [3] Dong L, Yan J, Yuan X, et al. Functional Nonlinear Model Predictive Control Based on Adaptive Dynamic Programming. IEEE Transactions on Systems, Man, and Cybernetics, 2019, 49(12): 4206-4218 [4] Ahmed A A, Alshandoli A F S. Using Of Neural Network Controller And Fuzzy PID Control To Improve Electric Vehicle Stability Based On A14-DOF Model. In proceedings of: the 2020 International Conference on Electrical Engineering (ICEE). Takamatsu, Japan: IEEE, 2020. 1−6. [5] Marino R, Scalzi S, Orlando G, et al. Meng J, Liu A, Yang Y, et al. Two-wheeled robot platform based on PID control. In: Proceedings of the 5th International Conference on Information Science and Control Engineering (ICISCE). Zhengzhou, China: IEEE, 2018. 1011−1014 [6] Farag W. Complex trajectory tracking using PID control for autonomous driving. International Journal of Intelligent Transportation Systems Research, 2020, 18(2): 356-366. doi: 10.1007/s13177-019-00204-2 [7] Zhao P, Chen J, Song Y, et al. Design of a Control System for an Autonomous Vehicle Based on Adaptive-PID. International Journal of Advanced Robotic Systems. 2012, 9(44): 44 [8] Han G, Fu W, Wang W, et al. The Lateral Tracking Control for the Intelligent Vehicle Based on Adaptive PID Neural Network. Sensors. 2017, 17(6): 1244 doi: 10.3390/s17061244 [9] Fraichard T, Garnier P. Fuzzy Control to Drive Car-like Vehicles. Robotics and Autonomous Systems. 2001, 34(1):1-22 doi: 10.1016/S0921-8890(00)00096-8 [10] Perez J, Milanes V, Onieva E. Cascade Architecture for Lateral Control in Autonomous Vehicles. IEEE Transactions on Intelligent Transportation Systems 2011, 12(1): 73-82 doi: 10.1109/TITS.2010.2060722 [11] Li H, Wang X, Song S, et al. Vehicle Control Strategies Analysis Based on PID and Fuzzy Logic Control. Procedia Engineering. 2016, 137: 234-243 doi: 10.1016/j.proeng.2016.01.255 [12] Park M, Lee S, Han W. Development of Lateral Control System for Autonomous Vehicle Based on Adaptive Pure Pursuit Algorithm. In: Proceedings of the 14th International Conference on Control, Automation and Systems (ICCAS 2014). KINTEX, Korea: IEEE, 2014. 1443−1447 [13] Lie G. Study on Lateral Fuzzy Control of Unmanned Vehicles via Genetic Algorithms. Journal of Mechanical Engineering. 2012, 48(06): 76 doi: 10.3901/JME.2012.06.076 [14] Leonard J J, How J P, Teller S, et al. A Perception-driven Autonomous Urban Vehicle. Journal of Field Robotics. 2008, 25(10): 727-774 doi: 10.1002/rob.20262 [15] Rajamani R, Zhu C, Alexander L. Lateral Control of a Backward Driven Front steering Vehicle. Control Engineering Practice. 2003, 11(5): 531-540 doi: 10.1016/S0967-0661(02)00143-0 [16] Thrun S, Montemerlo M, Dahlkamp H, et al. Stanley: The Robot That Won the DARPA Grand Challenge. Journal of Field Robotics. 2006, 23(9): 661-692 doi: 10.1002/rob.20147 [17] 龚建伟, 姜岩, 徐威. 无人驾驶车辆模型预测控制. 北京理工大学出版社, 2014Gong Jian-Wei, Jiang Yan, Xu Wei. Model Predictive Control for Self-driving Vehicles. Beijing Institure of Technology Press, 2014 [18] Falcone P, Borrelli F, Asgari J, et al. Predictive Active Steering Control for Autonomous Vehicle Systems. IEEE Transactions on Control Systems and Technology. 2007, 15(3): 566-580 doi: 10.1109/TCST.2007.894653 [19] Carvalho A, Gao Y, Gray A, et al. Predictive Control of an Autonomous Ground Vehicle Using an Iterative Linearization Approach [C]. In: Proceedings of 16th International IEEE conference on intelligent transportation systems (ITSC 2013). The Hague, The Netherlands: IEEE, 2013. 2335−2340 [20] Beal C E, Gerdes J C. Model Predictive Control for Vehicle Stabilization at the Limits of Handling. IEEE Transactions on Control Systems and Technology. 2013, 21(4): 1258-1269 doi: 10.1109/TCST.2012.2200826 [21] Liniger A, Domahidi A, Morari M. Optimization‐based Autonomous Racing of 1:43 Scale RC Cars. Optimal Control Applications and Methods. 2015, 36(5): 628-647 doi: 10.1002/oca.2123 [22] Ostafew C J, Schoellig A P, Barfoot T D. Robust Constrained Learning-based NMPC Enabling Reliable Mobile Robot Path Tracking. The International Journal of Robotics Research. 2016, 35(13): 1547−1563 [23] Oh S, Lee J, Choi D. A New Reinforcement Learning Vehicle Control Architecture for Vision-based Road Following. IEEE Transactions on Vehicular Technology. 2000, 49(3): 997-1005 doi: 10.1109/25.845116 [24] 杨慧媛. 基于增强学习的优化控制方法及其在移动机器人中的应用 [硕士学位论文], 国防科学技术大学, 中国, 2014Yang H Y. Reinforcement Learning-based Optimal Control Methods with Applications to Mobile Robots [Master dissertation], National University of Defense Technology, 2014 [25] 连传强. 基于近似动态规划的优化控制方法及在自主驾驶车辆中的应用 [博士学位论文], 国防科学技术大学, 中国, 2016Lian C Q. Optimization Control Methods Based on Approximate Dynamic Programming and Its Applications in Autonomous Land Vehicles[Ph. D. dissertation], National University of Defense Technology, 2016 [26] 黄振华. 智能车辆自评价学习控制方法研究 [博士学位论文], 国防科学技术大学, 中国, 2017Huang Z H. Researches on Adaptive Critic Learning Control Approaches for Intelligent Driving Vehicles [Ph. D. dissertation], National University of Defense Technology, 2017 [27] Snider J M. Automatic Steering Methods for Autonomous Automobile Path Tracking. CMU-RI-TR-09-08 [R], 2009. [28] 熊璐, 杨兴, 卓桂荣, 等. 无人驾驶车辆的运动控制发展现状综述. 机械工程学报, 2020, 56(10): 127-143 doi: 10.3901/JME.2020.10.127Xiong Lu, Yang Xing, Zhuo Gui-Rong, et al. Review on Motion Control of Autonomous Vehicles. Journal of Mechanical Engineering, 2020, 56(10): 127-143 doi: 10.3901/JME.2020.10.127 [29] 由智恒. 基于MPC算法的无人驾驶车辆轨迹跟踪控制研究 [硕士学位论文], 吉林大学, 中国, 2018You Zhi-Heng. Reaearch on Model Predictive Control-based Trajectory Tracking for Unmanned Vehicles [Master dissertation], Jilin Unversity, 2018 [30] Xu X, Chen H, Lian C, et al. Learning-based predictive control for discrete-time nonlinear systems with stochastic disturbances. IEEE transactions on neural networks and learning systems, 2018, 29(12): 6202-6213 doi: 10.1109/TNNLS.2018.2820019 [31] Rawlings, J, Mayne D, and Diehl, M. Model predictive control: theory, computation, and design (Vol 2). Madison, WI: Nob Hill Publishing, 2017 [32] Chmielewski D, Manousiouthakis V. On constrained infinite-time linear quadratic optimal control. Systems and Control Letters, 1996, 29(3): 121-129 doi: 10.1016/S0167-6911(96)00057-6 [33] Wang D, Ha M, Qiao J. Data-driven iterative adaptive critic control toward an urban wastewater treatment plant. IEEE Transactions on Industrial Electronics, 2020, 68(8): 7362-7369. [34] 王鼎. 基于学习的鲁棒自适应评判控制研究进展. 自动化学报, 2019, 45(6): 1031-1043Wang Ding. Research Progress on Learning-based Robust Adaptive Critic Control. Acta Automatica Sinica, 2019, 45(6): 1031-1043 [35] 陈虹, 郭露露, 宫洵, 等. 智能时代的汽车控制. 自动化学报, 2020, 46(7): 1313-1332Chen Hong, Guo Lu-Lu, Gong Xun, et al. Automative Control in Intelligent Era. Acta Automatica Sinica, 2020, 46(7): 1313-1332 [36] 田涛涛, 侯忠生, 刘世达, 邓志东. 基于无模型自适应控制的无人驾驶汽车横向控制方法. 自动化学报, 2017, 43(11): 1931-1940Tian Tao-Tao, Hou Zhong-Sheng, Liu Shi-Da, Deng Zhi-Dong. Model-free Adaptive Control Based Lateral Control of Self-driving Car. Acta Automatica Sinica, 2017, 43(11): 1931-1940 [37] Rajamani R. Vehicle dynamics and control. Springer Science & Business Media, 2011. [38] Haarnoja T, Zhou A, Abbeel P, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Proceedings of the 35th International conference on machine learning. PMLR 80, 2018, 1861−1870 [39] Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning. arXiv preprint arXiv: 1509.02971, 2015 [40] Kuutti S, Bowden R, Jin Y, et al. A survey of deep learning applications to autonomous vehicle control. IEEE Transactions on Intelligent Transportation Systems, 2020, 22(2): 712-733 [41] Li D, Zhao D, Zhang Q, et al. Reinforcement learning and deep learning based lateral control for autonomous driving [application notes]. IEEE Computational Intelligence Magazine, 2019, 14(2): 83-98 doi: 10.1109/MCI.2019.2901089 [42] Chen Y, Hereid A, Peng H, et al. Enhancing the performance of a safe controller via supervised learning for truck lateral control. Journal of Dynamic Systems, Measurement, and Control, 2019, 141(10 [43] Mayne D Q, Kerrigan E C, Van Wyk E J, et al. Tube‐based robust nonlinear model predictive control. International Journal of Robust and Nonlinear Control, 2011, 21(11): 1341-1353 doi: 10.1002/rnc.1758 [44] Zhang X, Pan W, Scattolini R, et al. Robust Tube-based Model Predictive Control with Koopman Operators. Automatica, to be published -

计量
- 文章访问数: 303
- HTML全文浏览量: 174
- 被引次数: 0