• 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于积分强化学习的学习感知型动态事件触发最优控制

王珂 许振钰 张俊楠 穆朝絮

王珂, 许振钰, 张俊楠, 穆朝絮. 基于积分强化学习的学习感知型动态事件触发最优控制. 自动化学报, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250620
引用本文: 王珂, 许振钰, 张俊楠, 穆朝絮. 基于积分强化学习的学习感知型动态事件触发最优控制. 自动化学报, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250620
Wang Ke, Xu Zhen-Yu, Zhang Jun-Nan, Mu Chao-Xu. Scalable dynamic event-triggered optimal control for nonlinear systems via integral reinforcement learning. Acta Automatica Sinica, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250620
Citation: Wang Ke, Xu Zhen-Yu, Zhang Jun-Nan, Mu Chao-Xu. Scalable dynamic event-triggered optimal control for nonlinear systems via integral reinforcement learning. Acta Automatica Sinica, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250620

基于积分强化学习的学习感知型动态事件触发最优控制

doi: 10.16383/j.aas.c250620 cstr: 32138.14.j.aas.c250620
基金项目: 国家自然科学基金(62503356, 62333016), 中国高校产学研创新基金(2024ZY009)资助
详细信息
    作者简介:

    王珂:天津大学电气自动化与信息工程学院助理研究员. 主要研究方向为强化学习与自适应动态规划, 博弈智能及应用. E-mail: walker_wang@tju.edu.cn

    许振钰:天津大学电气自动化与信息工程学院博士研究生. 主要研究方向为强化学习, 无人集群系统优化决策. E-mail: zhenyuxu@tju.edu.cn

    张俊楠:天津大学电气自动化与信息工程学院博士研究生. 主要研究方向为多智能体系统与强化学习, 机器人运动规划. E-mail: zjn8018@tju.edu.cn

    穆朝絮:天津大学电气自动化与信息工程学院教授. 主要研究方向为自适应学习系统, 智能无人系统优化与控制, 智能电网, 多机器人协同制造. 本文通信作者. E-mail: cxmu@tju.edu.cn

Scalable Dynamic Event-triggered Optimal Control for Nonlinear Systems via Integral Reinforcement Learning

Funds: Supported by National Natural Science Foundation of China (62503356, 62333016), and China Higher Education Institution Industry-University-Research Innovation Fund (2024ZY009)
More Information
    Author Bio:

    WANG Ke Research Associate at the School of Electrical and Information Engineering, Tianjin University. His research interest covers reinforcement learning and adaptive dynamic programming, game intelligence and its applications

    XU Zhen-Yu Ph.D. candidate at the School of Electrical and Information Engineering, Tianjin University. His research interest covers reinforcement learning, optimal decision-making of unmanned systems

    ZHANG Jun-Nan Ph.D. candidate at the School of Electrical and Information Engineering, Tianjin University. His research interest covers multi-agent systems and reinforcement learning, robot motion planning

    MU Chao-Xu Professor at the School of Electrical and Information Engineering, Tianjin University. Her research interest covers adaptive learning system, intelligent unmanned system optimization and control, smart grids, and multi-robot collaborative manufacturing. Corresponding author of this paper

  • 摘要: 事件触发机制, 尤其是动态事件触发机制, 近年来在控制领域引起了广泛关注, 其核心挑战在于平衡控制性能与通信资源利用率. 当该机制与学习系统结合时, 这种平衡变得尤为关键, 因为还需兼顾学习效率. 本文针对具有未知动态的非线性连续时间系统, 提出了一种集成积分强化学习、最优控制与学习感知设计的新型动态事件触发最优控制方法, 该方法采用仅含评价网络的自适应结构在线学习最优控制策略, 并通过灵活配置的动态触发规则调控数据传输. 其核心创新在于设计了一种学习感知型动态事件触发机制, 该机制通过分析评价网络权值的历史变化趋势, 构建学习感知参数, 进而自适应地调整事件触发规则中的动态阈值参数. 这使得系统能适宜地在学习关键期采用"繁忙采样"以保障控制与学习精度, 在学习平稳期切换至"空闲采样"以节约通信与计算资源, 从而实现了控制性能、学习效率与资源消耗的有效平衡. 理论分析严格证明了闭环系统的渐近稳定性和权值误差的一致最终有界性. 最后, 在一个基准非线性系统和一个单连杆机械臂系统进行了仿真验证与对比实验, 结果表明与传统静态及动态事件触发方法相比, 提出方法能以更少的通信代价获得相当甚至更优的学习与控制效果.
    1)  11注意, 在不引起混淆的情况下, 后续表达中将省略时间变量$ t $.
    2)  22在不引起混淆的情况下, 本文中“基于事件的”、“事件驱动的”和“事件触发”等术语将互换使用.
    3)  33为紧凑表示起见, 后续将使用简洁表达式$ \nabla\phi\triangleq\partial\phi(x)/\partial x $和$ \nabla\varepsilon\triangleq\partial\varepsilon(x)/\partial x $.
    4)  44在本文中, 如果一个事件机制的触发阈值函数仅依赖于时间或系统状态信息, 则称之为静态的; 而在动态触发机制中, 阈值函数不仅包括状态信息, 还包括一些动态演化表征的辅助变量.
  • 图  1  所提出LDETM的简要示意图, 包括学习组件、基于事件的控制策略、学习感知模块和动态阈值参数等核心部分

    Fig.  1  A brief schematic of the proposed LDETM, including the learning component, event-based control policy, learning-aware design and dynamic threshold parameters

    图  2  SETM中的触发率与权值近似误差

    Fig.  2  The triggering rate versus weight error in SETM

    图  3  LDETM$_S$的学习和触发结果: (a) 评价权值的收敛过程, (b) 动态阈值参数$\sigma(t)$, (c) 切换参数$\alpha$, (d) 事件触发过程

    Fig.  3  Learning results of LDETM$_S$: (a) Convergence of critic weights, (b) Dynamic threshold parameter, (c) Switching parameter, (d) Triggering process

    图  4  LDETM$_T$的学习和触发结果: (a) 系统状态, (b) 动态阈值参数$\sigma(t)$, (c) 切换参数$\alpha$, (d) 记录的事件

    Fig.  4  Learning results of LDETM$_T$: (a) Evolution of system states, (b) Dynamic threshold parameter, (c) Switching parameter, (d) Recorded events

    图  5  区间$x_1,\;x_2\in[-2,\;2]$上不同ETM的值函数近似误差

    Fig.  5  Cost approximation errors under different ETMs, which are plotted on intervals $x_1\in[-2,\;2];x_2\in[-2,\;2]$

    图  6  不同事件触发机制的触发间隔比较: (a) SETM$_{\sigma=0.1}$, (b) LDETM$_S$, (c) LDETM$_T$, (d) IDETM

    Fig.  6  Triggering intervals of different ETMs

    图  7  LDETMT在单连杆机械臂系统上的学习和触发结果: (a) 系统状态, (b) 评价权值的收敛过程, (c) 三个阈值参数

    Fig.  7  Learning and triggering results of LDETMT on the one-link manipulator: (a) Evolution of system states, (b) Convergence of critic weights, (c) Three threshold parameters

    图  8  近似值函数的对比结果: (a) $ \hat{V}^*(x_1,\;x_2,\;0) $, (b) $ \hat{V}^*(x_1,\;0,\;x_3) $, (c) $ \hat{V}^*(0,\;x_2,\;x_3) $

    Fig.  8  Comparison of value functions for $ x_1,\;x_2,\;x_3\in[-2,\;2] $: (a) $ \hat{V}^*(x_1,\;x_2,\;0) $, (b) $ \hat{V}^*(x_1,\;0,\;x_3) $, (c) $ \hat{V}^*(0,\;x_2,\;x_3) $

    表  1  两个算例的主要仿真参数

    Table  1  Main simulation parameters

    参数算例1算例2
    $ \alpha_c $0.1510
    $ {\cal{L}}_{u} $4.54.5
    $ \sigma_1(0) $0.450.45
    $ \sigma_2(0) $0.50.5
    $ \underline{\sigma} $0.50.5
    $ \bar{\sigma} $0.90.9
    $ \xi $2525
    $ \vartheta $0.010.01
    下载: 导出CSV

    表  2  算例1中不同事件触发机制的对比结果

    Table  2  Comparative results of ETMs in Example 1

    事件触发方法NoRE($ \downarrow $)TR($ \downarrow $)EoCW($ \downarrow $)EoVF($ _{10^{-2}}\downarrow $)
    SETM$ _{\sigma=0.1} $2,10442.08%$ {\bf{0.0011}} $$ {\bf{[–0.27,\;0.51]}} $
    SETM$ _{\sigma=0.5} $1,58731.56%0.0055[–1.99,1.96]
    SETM$ _{\sigma=0.9} $$ {\bf{1,\;332}} $$ {\bf{26.64}} $%0.0119[–2.76,5.57]
    LDETM$ _S $1,56431.28%0.0033[–0.78,1.26]
    LDETMT1,47929.58%0.0043[–0.98,1.71]
    IDETM1,49729.94%0.0089[–4.19,2.80]
    下载: 导出CSV

    表  3  算例2中不同事件触发机制的对比结果

    Table  3  Comparative results of ETMs in Example 2

    事件触发方法NoRE($ \downarrow $)TR($ \downarrow $)EoCW($ \downarrow $)
    SETM$ _{\sigma=0.5} $1,41223.53%0.0296
    LDETM$ _S $1,44624.10%$ {\bf{0.0207}} $
    LDETMT$ {\bf{1,\;298}} $21.63%0.0213
    IDETM1,33522.25%0.0301
    下载: 导出CSV
  • [1] Song Y L, Romero A, Müller M, Koltun V, Scaramuzza D. Reaching the limit in autonomous racing: Optimal control versus reinforcement learning. Science Robotics, 2023, 8(82): eadg1462 doi: 10.1126/scirobotics.adg1462
    [2] Dawson C, Gao S, Fan C. Safe control with learned certificates: A survey of neural Lyapunov, barrier, and contraction methods for robotics and control. IEEE Transactions on Robotics, 2023, 39(3): 1749−1767 doi: 10.1109/TRO.2022.3232542
    [3] Paden B, Čáp M, Yong S Z, Yershov D, Frazzoli E. A survey of motion planning and control techniques for self-driving urban vehicles. IEEE Transactions on Intelligent Vehicles, 2016, 1(1): 33−55 doi: 10.1109/TIV.2016.2578706
    [4] Ma C D, Li A M, Du Y L, Deng H, Yang Y D. Efficient and scalable reinforcement learning for large-scale network control. Nature Machine Intelligence, 2024, 6(9): 1006−1020
    [5] Bertsekas D. Reinforcement Learning and Optimal Control. USA: Athena Scientific, 2019
    [6] 罗彪, 胡天萌, 周育豪, 黄廷文, 阳春华, 桂卫华. 多智能体强化学习控制与决策研究综述. 自动化学报, 2025, 51(3): 510−539

    Luo Biao, Hu Tian-Meng, Zhou Yu-Hao, Huang Ting-Wen, Yang Chun-Hua, Gui Wei-Hua. Survey on multiagent reinforcement learning for control and decision-making. Acta Automatica Sinica, 2025, 51(3): 510−539
    [7] Na J, Lv Y F, Zhang K X, Zhao J. Adaptive identifier-critic-based optimal tracking control for nonlinear systems with experimental validation. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2020, 52(1): 459−472 doi: 10.1109/tsmc.2020.3003224
    [8] Lewis F L, Vrabie D, Vamvoudakis K G. Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers. IEEE Control Systems Magazine, 2012, 32(6): 76−105 doi: 10.1109/MCS.2012.2214134
    [9] Wang D, Hu L Z, Wang H, Qiao J F. Nonperiodic and periodic event-triggered online H control for constrained nonlinear systems. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2025, 55(1): 331−343 doi: 10.1109/TSMC.2024.3461781
    [10] Wang K, Mu C X. Learning-based control with decentralized dynamic event-triggering for vehicle systems. IEEE Transactions on Industrial Informatics, 2022, 19(3): 2629−2639 doi: 10.1109/tii.2022.3168034
    [11] Jiang Y, Liu L, Feng G. Adaptive optimal tracking control of networked linear systems under two-channel stochastic dropouts. Automatica, 2024, 165: 111690 doi: 10.1016/j.automatica.2024.111690
    [12] Mailhot N, Abouheaf M, Spinello D. Model-free force control of cable-driven parallel manipulators for weight-shift aircraft actuation. IEEE Transactions on Instrumentation and Measurement, 2023, 73: 2505108 doi: 10.1109/tim.2023.3346524
    [13] Cohen M H, Belta C. Safe exploration in model-based reinforcement learning using control barrier functions. Automatica, 2023, 147: 110684 doi: 10.1016/j.automatica.2022.110684
    [14] Liu D R, Xue S, Zhao B, Luo B, Wei Q L. Adaptive dynamic programming for control: A survey and recent advances. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2020, 51(1): 142−160
    [15] Wallace B A, Si J. Continuous-time reinforcement learning control: A review of theoretical results, insights on performance, and needs for new designs. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(8): 10199−10219 doi: 10.1109/TNNLS.2023.3245980
    [16] Tan J K, Xue S S, Li H, Guo Z H, Cao H, Chen B D. Hierarchical safe reinforcement learning control for leader-follower systems with prescribed performance. IEEE Transactions on Automation Science and Engineering, 2025, 22: 19568−19581 doi: 10.1109/TASE.2025.3596912
    [17] Zhao M M, Wang D, Song S J, Qiao J F. Safe Q-learning for data-driven nonlinear optimal control with asymmetric state constraints. IEEE/CAA Journal of Automatica Sinica, 2024, 11(12): 2408−2422
    [18] Liang Y L, Zhang H G, Zhang J, Ming Z Y. Event-triggered guarantee cost control for partially unknown stochastic systems via explorized integral reinforcement learning strategy. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(6): 7830−7844 doi: 10.1109/TNNLS.2022.3221105
    [19] Mu C X, Zhang Y, Sun C Y. Data-based feedback relearning control for uncertain nonlinear systems with actuator faults. IEEE Transactions on Cybernetics, 2023, 53(7): 4361−4374 doi: 10.1109/TCYB.2022.3171047
    [20] Zhou Y H, Luo B, Xu X D, Yang C H. Adaptive robust attitude control for multiple quadrotor systems via integral reinforcement learning. IEEE Transactions on Aerospace and Electronic Systems, 2025, 61(4): 10799−10810 doi: 10.1109/TAES.2025.3560941
    [21] Zhao F, Gao W N, Liu T, Jiang Z-P. Event-triggered robust adaptive dynamic programming with output feedback for large-scale systems. IEEE Transactions on Control of Network Systems, 2022, 10(1): 63−74
    [22] 孙长银, 穆朝絮. 多智能体深度强化学习的若干关键科学问题. 自动化学报, 2020, 46(7): 1301−1312

    Sun Chang-Yin, Mu Chao-Xu. Important scientific problems of multi-agent deep reinforcement learning. Acta Automatica Sinica, 2020, 46(7): 1301−1312
    [23] Girard A. Dynamic triggering mechanisms for event-triggered control. IEEE Transactions on Automatic Control, 2015, 60(7): 1992−1997 doi: 10.1109/TAC.2014.2366855
    [24] Ge X H, Han Q-L, Zhang X-M, Ding D R. Communication resource-efficient vehicle platooning control with various spacing policies. IEEE/CAA Journal of Automatica Sinica, 2024, 11(2): 362−376 doi: 10.1109/JAS.2023.123507
    [25] Ge X H, Han Q-L, Zhang X-M, Ding D R. Dynamic event-triggered control and estimation: A survey. International Journal of Automation and Computing, 2021, 18(6): 857−886 doi: 10.1007/s11633-021-1306-z
    [26] Wang D, Hu L Z, Zhao M M, Qiao J F. Adaptive critic for event-triggered unknown nonlinear optimal tracking design with wastewater treatment applications. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(9): 6276−6288 doi: 10.1109/TNNLS.2021.3135405
    [27] Mu C X, Wang K, Ni Z. Adaptive learning and sampled-control for nonlinear game systems using dynamic event-triggering strategy. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(9): 4437−4450 doi: 10.1109/TNNLS.2021.3057438
    [28] Chen L, Hao F. Optimal tracking control for unknown nonlinear systems with uncertain input saturation: A dynamic event-triggered ADP algorithm. Neurocomputing, 2024, 564: 126964 doi: 10.1016/j.neucom.2023.126964
    [29] Zhang J, Yang D S, Zhang H G, Wang Y C, Zhou B W. Dynamic event-based tracking control of boiler turbine systems with guaranteed performance. IEEE Transactions on Automation Science and Engineering, 2024, 21(3): 4272−4282 doi: 10.1109/TASE.2023.3294187
    [30] Shen H, Li Z, Wang J, Cao J. Nonzero-sum games using actor-critic neural networks: A dynamic event-triggered adaptive dynamic programming. Information Sciences, 2024, 662: 120236 doi: 10.1016/j.ins.2024.120236
    [31] Modares H, Lewis F L. Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning. Automatica, 2014, 50(7): 1780−1792 doi: 10.1016/j.automatica.2014.05.011
    [32] Khalil H K, Nonlinear Systems (3rd edition). Prentice-Hall: Upper Saddle River, 2002
    [33] Tian E N, Peng C. Memory-based event-triggering H load frequency control for power systems under deception attacks. IEEE Transactions on Cybernetics, 2020, 50(11): 4610−4618 doi: 10.1109/TCYB.2020.2972384
    [34] Xie L, Cheng J, Zou Y, Wu Z G, Yan H. A dynamic-memory event-triggered protocol to multiarea power systems with semi-Markov jumping parameter. IEEE Transactions on Cybernetics, 2023, 53(10): 6577−6587 doi: 10.1109/TCYB.2022.3208363
    [35] Vamvoudakis K G, Lewis F L. Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica, 2010, 46(5): 878−888 doi: 10.1016/j.automatica.2010.02.018
    [36] Xue L, Zhang T, Zhang W, Xie X J. Global adaptive stabilization and tracking control for high-order stochastic nonlinear systems with time-varying delays. IEEE Transactions on Automatic Control, 2018, 63(9): 2928−2943 doi: 10.1109/TAC.2018.2797169
    [37] Xing L, Wen C. Dynamic event-triggered adaptive control for a class of uncertain nonlinear systems. Automatica, 2023, 158: 111286 doi: 10.1016/j.automatica.2023.111286
  • 加载中
计量
  • 文章访问数:  5
  • HTML全文浏览量:  3
  • 被引次数: 0
出版历程
  • 收稿日期:  2025-11-11
  • 录用日期:  2026-01-30
  • 网络出版日期:  2026-05-22

目录

    /

    返回文章
    返回