2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于强化学习的控制方向未知非线性系统的最优输出调节

齐佳鑫 孟桂芝

齐佳鑫, 孟桂芝. 基于强化学习的控制方向未知非线性系统的最优输出调节. 自动化学报, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c240323
引用本文: 齐佳鑫, 孟桂芝. 基于强化学习的控制方向未知非线性系统的最优输出调节. 自动化学报, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c240323
Qi Jia-Xin, Meng Gui-Zhi. Optimal output regulation for nonlinear systems with unknown control direction based on reinforcement learning. Acta Automatica Sinica, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c240323
Citation: Qi Jia-Xin, Meng Gui-Zhi. Optimal output regulation for nonlinear systems with unknown control direction based on reinforcement learning. Acta Automatica Sinica, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c240323

基于强化学习的控制方向未知非线性系统的最优输出调节

doi: 10.16383/j.aas.c240323 cstr: 32138.14.j.aas.c240323
基金项目: 黑龙江省“百千万”工程科技重大专项项目 (2020ZX10A03), 黑龙江省重点研发计划(揭榜挂帅) (2022ZXJ01A02), 黑龙江省外国专家项目 (G2024034) 资助
详细信息
    作者简介:

    齐佳鑫:哈尔滨理工大学理学院硕士研究生. 主要研究方向为非线性控制. E-mail: qjxin999@163.com

    孟桂芝:哈尔滨理工大学理学院教授.2013年获得哈尔滨工业大学控制科学与工程专业博士学位. 主要研究方向为非线性控制和优化, 强化学习. 本文通信作者. E-mail: menggz13@163.com

Optimal Output Regulation for Nonlinear Systems With Unknown Control Direction Based on Reinforcement Learning

Funds: Supported by Heilongjiang Province “Million and Ten Thousand” Engineering Science and Technology Major Special Project (2020ZX10A03), Heilongjiang Province Champion Listing Project (2022ZXJ01A02) and, Heilongjiang Province Foreign Expert Project (G2024034)
More Information
    Author Bio:

    QI Jia-Xin Master student at the Faculty of Science, Harbin University of Science and Technology. His main research interest is nonlinear control

    MENG Gui-Zhi Professor at the Faculty of Science, Harbin University of Science and Technology. She received her Ph.D. degree in control science and engineering from Harbin Institute of Technology in 2013. Her research interest covers nonlinear control and optimization, reinforcement learning. Corresponding author of this paper

  • 摘要: 针对一类由线性中性稳定的外系统驱动的带有未知非线性函数和外界扰动的控制方向未知非线性系统, 研究基于强化学习的有限时间最优输出调节问题. 首先, 根据调节器方程可解条件和坐标变换, 将控制方向未知非线性系统的输出调节问题转化为控制增益已知的增广系统的镇定问题. 接着利用径向基神经网络去逼近未知非线性函数, 设计具有内模的高增益神经网络自适应观测器去估计不可测的状态, 引入Nussbaum函数来解决控制方向未知问题. 然后, 设计基于神经网络观测器和Nussbaum函数的新的自适应内模, 提出与内模相关的代价函数, 并且在反步法中运用基于强化学习中的执行—评价网络的近似最优算法, 保证了虚拟控制器为最优, 同时结合动态面技术避免反步法中的“复杂度爆炸”问题. 最后, 通过所设计的最优自适应有限时间输出反馈控制器, 不仅使得提出的价值函数达到最优, 而且还确保闭环系统的信号半全局实际有限时间稳定, 且跟踪误差在期望的任意精度内. 数值仿真验证所提方法的有效性.
  • 图  1  基于强化学习的最优输出调节系统结构图

    Fig.  1  Structure diagram of optimal output regulation system based on reinforcement learning

    图  2  跟踪误差对比1

    Fig.  2  Comparison of tracking error 1

    图  6  控制律$u$

    Fig.  6  Control law $u$

    图  3  跟踪误差对比2

    Fig.  3  Comparison of tracking error 2

    图  4  状态$\xi_1$和$w_1$

    Fig.  4  Status $\xi_1$ and $w_1$

    图  5  $x_1$和$\hat x_1$

    Fig.  5  $x_1$ and $\hat x_1$

    图  7  相对角度$\delta (t)$的跟踪轨迹

    Fig.  7  Tracking trajectory of relative angle $\delta (t)$

    图  10  控制律$u$

    Fig.  10  Control law $u$

    图  8  相对角度$\delta (t)$的跟踪误差的对比

    Fig.  8  Tracking error comparison of relative angle $\delta (t)$

    图  9  $x_1$和$\hat x_1$

    Fig.  9  $x_1$ and $\hat x_1$

  • [1] Zhang J F, Li Z X, Wang S, Dai Y, Zhang R R, Lai J, et al. Adaptive optimal output regulation for wheel-legged robot ollie: A data-driven approach. Frontiers in Neurorobotics, 2023, 16:DOI10.3389/fnbot.2022.1102259 doi: 10.3389/fnbot.2022.1102259
    [2] Rivera J, Ortega-cisneros S, Chavira F. Sliding mode output regulation for a boost power converter. Energies, 2019, 12(5):DOI10.3390/en12050879 doi: 10.3390/en12050879
    [3] 孟桂芝. 不确定非线性系统的输出调节及其应用. 黑龙江: 哈尔滨工业大学, 2013: 15−25

    Meng Gui-Zhi. Output regulation of uncertain nonlinear systems and its application. Heilongjiang: Harbin Institute of Technology, 2013: 15−25
    [4] Franicis B A, Wonham W M. The internal model principle of control theory. Automatica, 1976, 12(5): 457 doi: 10.1016/0005-1098(76)90006-6
    [5] Huang J, Chen Z Y. A General framework for tackling the output regulation problem. IEEE Transactions on Automatic Control, 2004, 49(12): 2203 doi: 10.1109/TAC.2004.839236
    [6] 闫茂德, 许化龙, 贺昱曜. 基于调节函数的一类三角结构非线性系统的自适应滑模控制. 控制理论与应用, 2004, 21(5): 840−843 doi: 10.3969/j.issn.1000-8152.2004.05.035

    Yan Mao-De, Xu Hua-Lon, He Yu-Yao. Adaptive sliding mode control based on tuning function for nonlinear systems with triangular structure. Control Theory & Applications, 2004, 21(5): 840−843 doi: 10.3969/j.issn.1000-8152.2004.05.035
    [7] D. Swaroop, J.K. Hedrick, P.P. Yip, J.C. Gerdes. Dynamic surface control for a class of nonlinear systems. IEEE Transactions on Automatic Control, 2000, 45(10): 1893−1899 doi: 10.1109/TAC.2000.880994
    [8] 辛红伟, 李昊齐, 祝国强, 张秀宇. 基于自调节有限时间预设性能函数的多智能体系统动态面状态约束量化控制. 控制与决策, 2023, 38(05): 1319−1326

    Xin Hong-wei, Li Hao-qi, Zhu Guo-qiang, Zhang Xiu-Yu. Dynamic surface state constrained quantized control for multi-agent system with an adjustable finite-time prescribed performance function. Control and Decision, 2023, 38(05): 1319−1326
    [9] Nussbaum R D. Some remarks on a conjecture in parameter adaptive control. Systems & Control Letters, 1983, 3(5): 243−246
    [10] 邓涛, 姚宏, 杜军, 苏磊. 控制增益未知非线性系统的动态面控制. 信息与控制, 2013, 42(6): 686−692

    Deng Tao, Yao Hon, Du Jun, Su Lei. Dynamic surface control for nonlinear systems with unknown control gain. Information and Control, 2013, 42(6): 686−692
    [11] Jiang Y, Dai J Y. Adaptive output regulation of a class of nonlinear output feedback systems with unknown high frequency gain. IEEE/CAA Journal of Automatica Sinica, 2020, 7(2): 568 doi: 10.1109/JAS.2020.1003060
    [12] 孙伟杰, 乔雨晨, 彭云建. 基于障碍Lyapunova函数的未知控制方向非线性系统的约束鲁棒输出调节. 控制理论与应用, 2023, 40(9): 1696−1701

    Sun Wei-Jie, Qiao Yu-Chen, Peng Yun-Jian. Constrained robust output regulation for nonlinear systems with unknown control direction based on obstacle Lyapunov function. Control Theory & Applications, 2023, 40(9): 1696−1701
    [13] Qi X, Liu W H, Yang Y G, Lu JW. Adaptive finite-time fuzzy control for nonlinear systems with input quantization and unknown times delays. Journal of the Franklin Institute, 2020, 357(12): 7718−7742 doi: 10.1016/j.jfranklin.2020.05.036
    [14] Cui D, Wu Y F, Xiang Z R. Finite-time adaptive fault-tolerant tracking control for nonlinear switched systems with dynamic uncertainties. International Journal of Robust and Nonlinear Control, 2021, 31(8): 2976−2992 doi: 10.1002/rnc.5429
    [15] Sui C, Chen C L P, Tong S C. Fuzzy adaptive finite-time control design for nontriangular stochastic nonlinear systems. IEEE Transactions on Fuzzy Systems, 2019, 27(1): 172−184 doi: 10.1109/TFUZZ.2018.2882167
    [16] 刘海涛, 田雪虹, 俞国燕, 王贵, 刘焕牢. 一类不确定非线性系统的有限时间输出调节方法. 电机与控制学报, 2017, 21(10): 108−115

    Liu Hai-Tao, Tian Xue-Hon, Yu Guo-Yan, Wang Gui, Liu Huan-Lao. Finite time output regulation method for a class of uncertain nonlinear systems. Electric Machines and Control, 2017, 21(10): 108−115
    [17] Ma J L, Park J H, Xu S Y. Global adaptive finite-time control for uncertain nonlinear systems with actuator faults and unknown control directions. Nonlinear Dynamics, 2019, 97(4): 2533−2545 doi: 10.1007/s11071-019-05146-8
    [18] 孟波, 刘文慧. 控制方向未知的非线性系统有限时间跟踪控制. 南京师范大学学报(工程技术版), 2021, 21(3): 33−41

    Meng Bo, Liu Wen-Hui. Finite-time tracking control for nonlinear systems with unknown control direction. Journal of Nanjing Normal University (Engineering and Technology Edition), 2021, 21(3): 33−41
    [19] Jia F J, Lu J W, Li Y M. Adaptive finite-time control for output regulation of nonlinear systems with completely unknown control directions. International Journal of Adaptive Control and Signal Processing, 2021, 35(7): 1354−1369 doi: 10.1002/acs.3244
    [20] Semenov SS, Tsurkov VI. Reinforcement learning for model problems of optimal control. Journal of Computer and Systems Sciences International, 2023, 62(3): 508−521 doi: 10.1134/S1064230723030127
    [21] Asl HJ, Uchibe E. Reinforcement learning-based optimal control of unknown constrained-input nonlinear systems using simulated experience. Nonlinear Dynamics, 2023, 111(17): 16093−16110 doi: 10.1007/s11071-023-08688-0
    [22] Yuan L E, Li T S, Tong S C, Xiao Y, Gao X Y. NN adaptive optimal tracking control for a class of uncertain nonstrict feedback nonlinear systems. Neurocomputing, 2022, 491: 382−394 doi: 10.1016/j.neucom.2022.03.049
    [23] Zhao B, Liu D R, Luo C M. Reinforcement learning-based optimal stabilization for unknown nonlinear systems subject to inputs with uncertain constraints. IEEE Transactions on Neural Networks and Learning Systems, 2020, 31(10): 4330−4340 doi: 10.1109/TNNLS.2019.2954983
    [24] Mu C X, Wang K, Zhu S, Cai G B. Decentralized triggering and event-based integral reinforcement learning for multiplayer differential game systems. IEEE Transactions on Emerging Topics in Computational Intelligence, 2024, 8(6): 3727−3741 doi: 10.1109/TETCI.2024.3372389
    [25] 罗玉涛, 薛志成. 面向自动驾驶的多任务辅助驾驶策略学习方法. 华南理工大学学报(自然科学版), 2024, 52(10): 31−40

    Luo Yu-Tao, Xue Zhi-Cheng. Multi-task assisted driving policy learning method for autonomous driving. Journal of South China University of Technology(Natural Science Edition), 2024, 52(10): 31−40
    [26] 徐宁, 何之煜, 李辉, 刘磊. 基于强化学习的货物列车长大下坡区段 运行控制优化算法. 铁道运输与经济, 2023, 45(08):DOI: 10.16668/j.cnki.issn.1003-1421.2023.08.06

    Xu Ning, He Zhi-Yu, Li Hui, Liu Lei. Optimization method based on reinforcement learning for operation and control of freight train in long steep downhill scenarios. Railway T-ransport and Economy, 2023, 45(08):DOI: 10.16668/j.cnki.issn.1003-1421.2023.08.06
    [27] Wang K, Mu C X, Ni Z, Liu D R. Safe reinforcement learning and adaptive optimal control with applications to obstacle avoidance problem. IEEE Transactions on Automation Science and Engineering, 2024, 21(3): 4599−4612 doi: 10.1109/TASE.2023.3299275
    [28] Wen G X, Ge S S, Tu F W. Optimized backstepping for tracking control of strict-feedback systems. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(8): 3850 doi: 10.1109/TNNLS.2018.2803726
    [29] 罗傲, 肖文彬, 周琪, 鲁仁全. 基于强化学习的一类具有输入约束非线性系统最优控制. 控制理论与应用, 2021, 39(01): 154

    Luo Ao, Xiao Wen-Bin, Zhou Qi, Lu Ren-Quan. Optimal control for a class of nonlinear systems with input constraints based on reinforcement learning. Control Theory and Applications, 2021, 39(01): 154
    [30] Jin P, Ma Q, Zhou G P, Miao G Y. Reinforcement learning-based robust optimal output regulation for constrained nonlinear systems with static and dynamic uncertainties. International Journal of Robust and Nonlinear Control, 2023, 33(3): 2022 doi: 10.1002/rnc.6475
    [31] Xiao W B, Cao L, Lih Y, Lu R Q. Observer-based adaptive consensus control for nonlinear multi-agent systems with time-delay. Science China Information Sciences, 2020, 63(132202): 1−17
    [32] Wang F, Chen B, Liu X P, Lin C. Finite-time adaptive fuzzy tracking control design for nonlinear systems. IEEE Transactions on Fuzzy Systems, 2017, 26(3): 1207−1216
    [33] 杜军, 邓涛. 自适应动态面输出调节方法. 信息与控制, 2013, 42(3): 327−332

    Du Jun, Deng Tao. Adaptive dynamic surfaceoutput regulation method. Information and Control, 2013, 42(3): 327−332
    [34] 金鹏, 马倩, 周国鹏. 负荷扰动互联电力系统模糊自适应输出跟踪与干扰抗御. 控制理论与应用, 2021, 38(5): 571−577 doi: 10.7641/CTA.2020.00448

    Jin Peng, Ma Qian, Zhou Guo-Peng. Fuzzy adaptive output tracking and disturbance rejection for interconnected power systems with load disturbance. Control Theory & Applications, 2021, 38(5): 571−577 doi: 10.7641/CTA.2020.00448
  • 加载中
计量
  • 文章访问数:  52
  • HTML全文浏览量:  33
  • 被引次数: 0
出版历程
  • 收稿日期:  2024-06-04
  • 录用日期:  2025-02-14
  • 网络出版日期:  2025-05-06

目录

    /

    返回文章
    返回