2.765

2022影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于折扣广义值迭代的智能最优跟踪及应用验证

王鼎 赵明明 哈明鸣 乔俊飞

王鼎, 赵明明, 哈明鸣, 乔俊飞. 基于折扣广义值迭代的智能最优跟踪及应用验证. 自动化学报, 2022, 48(1): 182−193 doi: 10.16383/j.aas.c210658
引用本文: 王鼎, 赵明明, 哈明鸣, 乔俊飞. 基于折扣广义值迭代的智能最优跟踪及应用验证. 自动化学报, 2022, 48(1): 182−193 doi: 10.16383/j.aas.c210658
Wang Ding, Zhao Ming-Ming, Ha Ming-Ming, Qiao Jun-Fei. Intelligent optimal tracking with application verifications via discounted generalized value iteration. Acta Automatica Sinica, 2022, 48(1): 182−193 doi: 10.16383/j.aas.c210658
Citation: Wang Ding, Zhao Ming-Ming, Ha Ming-Ming, Qiao Jun-Fei. Intelligent optimal tracking with application verifications via discounted generalized value iteration. Acta Automatica Sinica, 2022, 48(1): 182−193 doi: 10.16383/j.aas.c210658

基于折扣广义值迭代的智能最优跟踪及应用验证

doi: 10.16383/j.aas.c210658
基金项目: 北京市自然科学基金 (JQ19013), 国家自然科学基金 (61773373, 61890930-5, 62021003), 科技创新2030——“新一代人工智能”重大项目(2021ZD0112302, 2021ZD0112301), 国家重点研发计划 (2018YFC1900800-5) 资助
详细信息
    作者简介:

    王鼎:北京工业大学信息学部教授. 2009 年获得东北大学理学硕士学位, 2012 年获得中国科学院自动化研究所工学博士学位. 主要研究方向为强化学习与智能控制. 本文通信作者. E-mail: dingwang@bjut.edu.cn

    赵明明:北京工业大学硕士研究生. 主要研究方向为强化学习和智能控制. E-mail: zhaomm@emails.bjut.edu.cn

    哈明鸣:北京科技大学博士研究生. 2016 年获得北京科技大学学士学位, 2019 年获得北京科技大学硕士学位. 主要研究方向为最优控制, 自适应动态规划, 强化学习. E-mail: hamingming_0705@foxmail.com

    乔俊飞:北京工业大学信息学部教授. 主要研究方向为污水处理过程智能控制, 神经网络结构设计与优化. E-mail: junfeq@bjut.edu.cn

Intelligent Optimal Tracking With Application Verifications via Discounted Generalized Value Iteration

Funds: Supported by Beijing Natural Science Foundation (JQ19013), National Natural Science Foundation of China (61773373, 61890930-5, 62021003), and National Key Research and Development Program of China (2021ZD0112302, 2021ZD0112301, 2018YFC1900800-5)
More Information
    Author Bio:

    WANG Ding Professor at the Faculty of Information Technology, Beijing University of Technology. He received his master degree in operations research and cybernetics from Northeastern University, and his Ph.D. degree in control theory and control engineering from Institute of Automation, Chinese Academy of Sciences, in 2009 and 2012, respectively. His research interest covers reinforcement learning and intelligent control. Corresponding author of this paper

    ZHAO Ming-Ming Master student at the Faculty of Information Technology, Beijing University of Technology. His research interest covers reinforcement learning and intelligent control

    HA Ming-Ming Ph.D. candidate at the School of Automation and Electrical Engineering, University of Science and Technology Beijing. He received his master and bachelor degrees from the School of Automation and Electrical Engineering, University of Science and Technology Beijing, in 2016 and 2019, respectively. His research interest covers optimal control, adaptive dynamic programming, and reinforcement learning

    QIAO Jun-Fei Professor at the Faculty of Information Technology, Beijing University of Technology. His research interest covers intelligent control of wastewater treatment processes, structure design and optimization of neural networks

  • 摘要: 设计了一种基于折扣广义值迭代的智能算法, 用于解决一类复杂非线性系统的最优跟踪控制问题. 通过选取合适的初始值, 值迭代过程中的代价函数将以单调递减的形式收敛到最优代价函数. 基于单调递减的值迭代算法, 在不同折扣因子的作用下, 讨论了迭代跟踪控制律的可容许性和误差系统的渐近稳定性. 为了促进算法的实现, 建立一个数据驱动的模型网络用于学习系统动态信息, 同时构造评判网络和执行网络用于近似迭代代价函数和计算迭代跟踪控制律. 值得注意的是, 我们提出了新颖的停止准则来保证迭代跟踪控制律的有效性. 这种停止准则包含两个条件, 一个条件用来保证迭代跟踪控制律的可用性, 这有利于评估误差系统的渐近稳定性; 而另一个条件用来确保跟踪控制律的近似最优性. 最后, 通过包括污水处理在内的两个应用实例验证了本文提出的近似最优跟踪控制方法的可行性和有效性.
    1)  收稿日期 2021-07-15 录用日期 2021-11-02 Manuscript received July 15, 2021; accepted November 2, 2021 北京市自然科学基金 (JQ19013), 国家自然科学基金 (61773373, 61890930-5, 62021003), 科技创新2030——“新一代人工智能”重大项目(2021ZD0112302, 2021ZD0112301), 国家重点研发计划 (2018YFC1900800-5) 资助 Supported by Beijing Natural Science Foundation (JQ19013), National Natural Science Foundation of China (61773373, 61890930-5, 62021003), and National Key Research and Development Program of China (2021ZD0112302, 2021ZD0112301, 2018YFC1900800-5) 本文责任编委 刘艳军 Recommended by Associate Editor LIU Yan-Jun 1. 北京工业大学信息学部 北京 100124 2. 计算智能与智能系统北京市重点实验室 北京 100124 3. 北京人工智能研究院 北京
    2)  100124 4. 智慧环保北京实验室 北京 100124 5. 北京科技大学自动化学院 北京 100083 1. Faculty of Information Technology, Beijing University of Technology, Beijing 100124 2. Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing 1001243. Beijing Institute of Artificial Intelligence, Beijing 100124 4. Beijing Laboratory of Smart Environmental Protection, Beijing100124 5. School of Automation and Electrical Engineering,University of Science and Technology Beijing, Beijing 100083
  • 图  1  模型网络的训练误差

    Fig.  1  The training errors of the model network

    图  2  代价函数收敛过程

    Fig.  2  The convergence process of the cost function

    图  3  折扣因子和$ \Psi_i $曲线

    Fig.  3  The curves of the discount factor and $ \Psi_i $

    图  4  权值矩阵范数收敛过程

    Fig.  4  The convergence process of the norm of weight matrices

    图  5  系统状态和控制律轨迹

    Fig.  5  Trajectories of the state and the control law

    图  6  跟踪误差和跟踪控制律轨迹

    Fig.  6  Trajectories of the error and the tracking control law

    图  7  污水处理过程示意图

    Fig.  7  The simple structure of the wastewater treatment process

    图  8  模型网络的训练误差

    Fig.  8  The training errors of the model network

    图  9  代价函数收敛过程

    Fig.  9  The convergence process of the cost function

    图  10  折扣因子和$ \Psi_i $曲线

    Fig.  10  The curves of the discount factor and $ \Psi_i $

    图  11  权值矩阵范数收敛过程

    Fig.  11  The convergence process of the norm of weight matrices

    图  12  系统状态和控制律轨迹

    Fig.  12  Trajectories of the state and the control law

    图  13  跟踪误差和跟踪控制律轨迹

    Fig.  13  Trajectories of the error and the tracking control law

    图  14  带有干扰的系统状态和控制律轨迹

    Fig.  14  Trajectories of the state and the control law with the disturbance input

    表  1  基于广义值迭代算法的跟踪控制参数值

    Table  1  Parameter values of tracking control based on generalized value iterative algorithm

    符号 $Q$ $R$ $\Lambda$ $\gamma$
    例1 $I_2$ $0.5I_2$ $40I_2$ 0.97
    例2 $0.01I_2$ $0.01I_2$ $I_2$ 0.98
    下载: 导出CSV
  • [1] Liu Y J, Zeng Q, Tong S C, Chen C L P, Liu L. Actuator failure compensation-based adaptive control of active suspension systems with prescribed performance. IEEE Transactions on Industrial Electronics, 2020, 67(8): 7044- 7053 doi: 10.1109/TIE.2019.2937037
    [2] Wang T C, Li Y M. Neural-network adaptive output-feedback saturation control for uncertain active suspension systems. IEEE Transactions on Cybernetics, 2020. DOI: 10.1109/TCYB.2020.3001581
    [3] 王鼎. 基于学习的鲁棒自适应评判控制研究进展. 自动化学报, 2019, 45(6): 1031-1043

    Wang D. Research progress on learning-based robust adaptive critic control. Acta Automatica Sinica, 2019, 45(6): 1031-1043
    [4] 刘德荣, 李宏亮, 王鼎. 基于数据的自学习优化控制: 研究进展与展望. 自动化学报, 2013, 39(11): 1858-1870 doi: 10.3724/SP.J.1004.2013.01858

    Liu D R, Li H L, Wang D. Data-based self-learning optimal control: Research progress and prospects. Acta Automatica Sinica, 2013, 39(11): 1858-1870 doi: 10.3724/SP.J.1004.2013.01858
    [5] Song R Z, Zhu L. Optimal flxed-point tracking control for discrete-time nonlinear systems via ADP. IEEE/CAA Journal of Automatica Sinica, 2019, 6(3): 657-666 doi: 10.1109/JAS.2019.1911453
    [6] Zhang H G, Wei Q L, Luo Y H. A novel inflnite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm. IEEE Transactions on Systems, Man, and Cybernetics- Part B: Cybernetics, 2008, 38(4): 937-942 doi: 10.1109/TSMCB.2008.920269
    [7] Wang D, Liu D R, Wei Q L. Finite-horizon neuro-optimal tracking control for a class of discrete-time nonlinear systems using adaptive dynamic programming approach. Neurocomputing, 2012, 78: 14-22 doi: 10.1016/j.neucom.2011.03.058
    [8] Kiumarsi B, Lewis F L. Actor-critic-based optimal tracking for partially unknown nonlinear discrete-time systems. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(1): 140-151 doi: 10.1109/TNNLS.2014.2358227
    [9] Wang D, He H B, Liu D R. Adaptive critic nonlinear robust control: A survey. IEEE Transactions on Cybernetics, 2017, 47(10): 3429-3451 doi: 10.1109/TCYB.2017.2712188
    [10] Li J N, Ding J L, Chai T Y, Lewis F L, Sarangapani J. Adaptive interleaved reinforcement learning: Robust stability of affine nonlinear systems with unknown uncertainty. IEEE Transactions on Neural Networks and Learning Systems, 2020. DOI: 10.1109/TNNLS.2020.3027653
    [11] Zhang Q C, Zhao D B. Data-based reinforcement learning for nonzero-sum games with unknown drift dynamics. IEEE Transactions on Cybernetics, 2019, 49(8): 2874-2885 doi: 10.1109/TCYB.2018.2830820
    [12] Ha M M, Wang D, Liu D R. Event-triggered adaptive critic control design for discrete-time constrained nonlinear systems. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2020, 50(9): 3158-3168 doi: 10.1109/TSMC.2018.2868510
    [13] Dong L, Zhong X N, Sun C Y, He H B. Adaptive eventtriggered control based on heuristic dynamic programming for nonlinear discrete-time systems. IEEE Transactions on Neural Networks and Learning Systems, 2017, 28(7): 1594-1605 doi: 10.1109/TNNLS.2016.2541020
    [14] Wang D, Ha M M, Qiao J F. Self-learning optimal regulation for discrete-time nonlinear systems under event-driven formulation. IEEE Transactions on Automatic Control, 2020, 65(3): 1272-1279 doi: 10.1109/TAC.2019.2926167
    [15] Al-Tamimi A, Lewis F L, Abu-Khalaf M. Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof. IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics. 2008, 38(4): 943-949 doi: 10.1109/TSMCB.2008.926614
    [16] Liu D, Wei Q L. Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25(3): 621-634 doi: 10.1109/TNNLS.2013.2281663
    [17] Wei Q L, Liu D R, Lin H Q. Value iteration adaptive dynamic programming for optimal control of discrete-time nonlinear systems. IEEE Transactions on Cybernetics, 2016, 46(3): 840-853 doi: 10.1109/TCYB.2015.2492242
    [18] Li H L, Liu D R. Optimal control for discrete-time a–ne non-linear systems using general value iteration. IET Control Theory and Applications, 2012, 6(18): 2725-2736 doi: 10.1049/iet-cta.2011.0783
    [19] Wei Q L, Lewis F L, Liu D R, Song R Z, Lin H Q. Discrete-time local value iteration adaptive dynamic programming: Convergence analysis. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2018, 48(6): 875-891 doi: 10.1109/TSMC.2016.2623766
    [20] Ha M M, Wang D, Liu D R. Generalized value iteration for discounted optimal control with stability analysis. Systems & Control Letters, 2021, 147: 104847
    [21] Song R Z, Xiao W D, Sun C Y. Optimal tracking control for a class of unknown discrete-time systems with actuator saturation via data-based ADP algorithm. Acta Automatica Sinica, 2013, 39(9): 1413-1420 doi: 10.1016/S1874-1029(13)60070-1
    [22] Ha M M, Wang D, Liu D R. Data-based nonaffine optimal tracking control using iterative DHP approach. IFAC-PapersOnLine, 2020, 53(2): 4246−4251
    [23] Wang D, Ha M M, Qiao J F. Data-Driven iterative adaptive critic control toward an urban wastewater treatment plant. IEEE Transactions on Industrial Electronics, 2021, 68(8): 7362-7369 doi: 10.1109/TIE.2020.3001840
    [24] Wang D, Zhao M M, Ha M M, Ren J. Neural optimal tracking control of constrained nona–ne systems with a wastewater treatment application. Neural Networks, 2021, 143: 121-132 doi: 10.1016/j.neunet.2021.05.027
    [25] Wang D, Zhao M M, Qiao J F. Intelligent optimal tracking with asymmetric constraints of a nonlinear wastewater treatment system. International Journal of Robust and Nonlinear Control, 2021, 31(14): 6773-6787 doi: 10.1002/rnc.5639
    [26] Zhang H G, Luo Y H, Liu D R. Neural-network-based nearoptimal control for a class of discrete-time a–ne nonlinear systems with control constraints. IEEE Transactions on Neural Networks, 2009, 20(9): 1490-1503 doi: 10.1109/TNN.2009.2027233
    [27] Wang D, Qiao J F. Approximate neural optimal control with reinforcement learning for a torsional pendulum device. Neural Networks, 2019, 117: 1-7 doi: 10.1016/j.neunet.2019.04.026
    [28] Bo Y C, Qiao J F. Heuristic dynamic programming using echo state network for multivariable tracking control of wastewater treatment process. Asian Journal of Control, 2015, 17(5): 1654-1666 doi: 10.1002/asjc.994
    [29] 韩红桂, 张琳琳, 伍小龙, 乔俊飞. 数据和知识驱动的城市污水处理过程多目标优化控制. 自动化学报, 2021, 47(11): 1-9

    Han H G, Zhang L L, Wu X L, Qiao J F. Dataknowledge driven multiobjective optimal control for municipal wastewater treatment process. Acta Automatica Sinica, 2021, 47(11): 1-9
  • 加载中
图(14) / 表(1)
计量
  • 文章访问数:  1226
  • HTML全文浏览量:  438
  • PDF下载量:  177
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-07-15
  • 录用日期:  2021-11-02
  • 网络出版日期:  2021-11-10
  • 刊出日期:  2022-01-25

目录

    /

    返回文章
    返回