2.793

2018影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于强化学习的浓密机底流浓度在线控制算法

袁兆麟 何润姿 姚超 李佳 班晓娟

袁兆麟,  何润姿,  姚超,  李佳,  班晓娟.  基于强化学习的浓密机底流浓度在线控制算法.  自动化学报,  2021,  47(7): 1558−1571 doi: 10.16383/j.aas.c190348
引用本文: 袁兆麟,  何润姿,  姚超,  李佳,  班晓娟.  基于强化学习的浓密机底流浓度在线控制算法.  自动化学报,  2021,  47(7): 1558−1571 doi: 10.16383/j.aas.c190348
Yuan Zhao-Lin,  He Run-Zi,  Yao Chao,  Li Jia,  Ban Xiao-Juan.  Online reinforcement learning control algorithm for concentration of thickener underflow.  Acta Automatica Sinica,  2021,  47(7): 1558−1571 doi: 10.16383/j.aas.c190348
Citation: Yuan Zhao-Lin,  He Run-Zi,  Yao Chao,  Li Jia,  Ban Xiao-Juan.  Online reinforcement learning control algorithm for concentration of thickener underflow.  Acta Automatica Sinica,  2021,  47(7): 1558−1571 doi: 10.16383/j.aas.c190348

基于强化学习的浓密机底流浓度在线控制算法

doi: 10.16383/j.aas.c190348
基金项目: 海南省重点研发计划(ZDYF2019009), 国家重点基础研究发展计划(2019YFC0605300, 2016YFB0700500),国家自然科学基金(61572075, 61702036, 61873299)资助
详细信息
    作者简介:

    袁兆麟:北京科技大学计算机与通信工程学院博士研究生. 2017年获得北京科技大学计算机科学与技术系学士学位. 主要研究方向为自适应动态规划和强化学习. E-mail: b20170324@xs.ustb.edu.cn

    何润姿:北京科技大学计算机与通信工程学院硕士研究生. 2017年获得北京信息科技大学计算机科学与技术系学士学位. 主要研究方向为流体仿真和强化学习. E-mail: hrz.claire@gmail.com

    姚超:北京科技大学的助理教授. 2009年获得北京交通大学计算机科学学士学位, 2016年获得北京交通大学信息科学研究所博士学位. 2014年至2015年, 他在瑞士洛桑联邦理工学院担任访问博士. 2016年至2018年, 他在北京邮电大学传感技术与商业研究所担任博士后. 主要研究方向为图像和视频处理,计算机视觉. E-mail: yaochao@ustb.edu.cn

    李佳:北京科技大学计算机与通信工程学院硕士研究生, 主要研究方向为自适应动态规划, 自适应控制, 强化学习. E-mail: lijia1117@foxmail.com

    班晓娟:北京科技大学教授, 中国人工智能学会常务理事. 主要研究方向为人工智能,自然人机交互,三维可视化技术. 本文通信作者. E-mail: banxj@ustb.edu.cn

  •  1(Mean Square Error, MSE)=\begin{document}$\frac{1}{T} \sum_{k=1}^{T}\left|(y(k)-y^*(k))\right|^{2}$\end{document} 2(Max Absolute Error, MAE)=\begin{document}$\max _{1 \leq k \leq T}\{|y(k)-y^*(k)|\}$\end{document} 3(Integral Absolute Error, IAE)=\begin{document}$\frac{1}{T} \sum_{k=1}^{T}\left|(y(k)-y^*(k))\right|$\end{document}

Online Reinforcement Learning Control Algorithm for Concentration of Thickener Underflow

Funds: Supported by Finance Science and Technology Project of Hainan Province (ZDYF2019009), National Key Research and Development Program of China (2019YFC0605300, 2016YFB0700500), National Natural Science Foundation of China (61572075, 61702036, 61873299)
More Information
    Author Bio:

    YUAN Zhao-Lin Ph.D. candidate at the School of Computer and Communication Engineering, University of Science and Technology Beijing. He received his bachelor degree in computer science from University of Science and Technology Beijing in 2017. His research interest covers adaptive dynamic programming and reinforcement learning

    HE Run-Zi Master student at the School of Computer and Communication Engineering, University of Science and Technology in Beijing. She received her bachelor degree from Beijing Science and Technology University in 2017. Her research interest covers fluid simulation and reinforcement learning

    YAO Chao Assistant professor at University of Science Technology, Beijing (USTB), China. He received his bachelor degree in computer science from Beijing Jiaotong University (BJTU), Beijing, China in 2009 and the Ph.D. degree from the Institute of Information Science, BJTU in 2016. From 2014 to 2015, he served as a visiting Ph.D. student at the Ecole Polytechnique Federale de Lausanne, Switzerland. From 2016 to 2018, he served as a post-doctoral at the Institute of Sensing Technology and Business, Beijing University of Posts and Telecommunications, Beijing. His research interest covers image and video processing and computer vision

    LI Jia Master student at the School of Computer and Communication Engineering, University of Science and Technology in Beijing. His research interest covers adaptive dynamic programming, adaptive control, and reinforcement learning

    BAN Xiao-Juan Professor at University of Science and Technology Beijing and she is an executive council member in Chinese Association for Artificial Intelligence (CAAI). Her research interest covers artificial intelligence, natural human-computer interaction, and 3D visualization. Corresponding author of this paper

  • 摘要:

    复杂过程工业控制一直是控制应用领域研究的前沿问题. 浓密机作为一种复杂大型工业设备广泛用于冶金、采矿等领域. 由于其在运行过程中具有多变量、非线性、高时滞等特点, 浓密机的底流浓度控制技术一直是学界、工业界的研究难点与热点. 本文提出了一种基于强化学习技术的浓密机在线控制算法. 该算法在传统启发式动态规划 (Heuristic dynamic programming, HDP)算法的基础上, 设计融合了评价网络与模型网络的双网结构, 并提出了基于短期经验回放的方法用于增强评价网络的训练准确性, 实现了对浓密机底流浓度的稳定控制, 并保持控制输入稳定在设定范围之内. 最后, 通过浓密机仿真实验的方式验证了算法的有效性, 实验结果表明本文提出的方法在时间消耗、控制精度上优于其他算法.

    1)   1(Mean Square Error, MSE)=$\frac{1}{T} \sum_{k=1}^{T}\left|(y(k)-y^*(k))\right|^{2}$ 2(Max Absolute Error, MAE)=$\max _{1 \leq k \leq T}\{|y(k)-y^*(k)|\}$ 3(Integral Absolute Error, IAE)=$\frac{1}{T} \sum_{k=1}^{T}\left|(y(k)-y^*(k))\right|$
  • 图  1  浓密过程示意图

    Fig.  1  Illustration of thickening process.

    图  2  HCNVI算法结构示意图

    Fig.  2  Structure diagram of algorithm HCNVI

    图  3  人工神经网络结构示意图

    Fig.  3  Structure diagram of artificial neural network

    图  4  迭代梯度下降过程可视化

    Fig.  4  Visualize the process of iterative gradient decline

    图  5  短期经验回放对评价网络的输出值的影响

    Fig.  5  The effect of short-term experience replay on critic network

    图  6  噪音量变化曲线

    Fig.  6  Noise input in the simulation experiment

    图  7  HCNVI与其他ADP算法在恒定噪音输入下的对比

    Fig.  7  HCNVI versu other ADP algorithms under stable noisy input

    图  8  短期经验回放对HDP与HCNVI的影响

    Fig.  8  The influence of short-term experience replay on HDP and HCNVI

    图  9  实验一中HDP与HCNVI在时间消耗上的对比

    Fig.  9  Comparison of time consuming in HDP and HCNVI in Experiment 1

    图  10  噪音量变化曲线

    Fig.  10  The fluctuation of noisy input

    图  11  HCNVI与其他ADP算法在波动噪声输入下的对比

    Fig.  11  HCNVI versu other ADP algorithms under fluctuate noisy input

    图  12  噪音持续变化下短期经验回放对HCNVI的影响

    Fig.  12  The influence of short-term experience replay on HCNVI

    图  13  实验二中HCNVI算法与HDP算法在时间消耗上的对比

    Fig.  13  Comparison of time consuming in HDP and HCNVI in Experiment 2

    表  1  参量定义

    Table  1  Variables definition

    变量含义量纲初始值补充说明
    $f_{i}(t)$进料泵频${\rm Hz}$40扰动量
    $f_{u}(t)$底流泵频${\rm Hz}$85控制量
    $f_{f}(t)$絮凝剂泵频${\rm Hz}$40控制量
    $c _ { i } ( t )$进料浓度${\rm kg/m^3}$73扰动量
    $h(t)$泥层高度${\rm m}$1.48状态量
    $c_u(t)$底流浓度${\rm kg/m^3}$680目标量
    下载: 导出CSV

    表  2  仿真模型常量

    Table  2  Definitions for constant variables

    变量含义量纲参考值
    $\rho _s$干砂密度${\rm kg/m^3}$4150
    $\rho _e$介质表观密度${\rm kg/m^3}$1803
    $\mu _ { e }$悬浮体系的表观粘度${\rm Pa \cdot s}$1
    $d_0$进料颗粒直径${\rm m}$0.00008
    $p$平均浓度系数0.5
    $A$浓密机横截面积${\rm m^2}$300.5
    $k_s$絮凝剂作用系数${\rm s/m^2}$0.157
    $k_i$压缩层浓度系数${\rm m^3/s}$0.0005×3600
    $K_i$进料流量与进料泵频的系数${\rm m^3/r}$50/3600
    $K_u$底流流量与底流泵频的系数${\rm m^3/r}$2/3600
    $K_f$絮凝剂流量与絮凝剂泵频的系数${\rm m^3/r}$0.75/3600
    $\theta$压缩时间${\rm s}$2300
    下载: 导出CSV

    表  3  部分变量计算方法

    Table  3  Definitions for part intermediate variables

    变量含义公式
    $q_i(t)$进料流量$q _ { i } ( t ) = K _ { i } f _ { i } ( t )$
    $q_u(t)$底流流量$q _ { u } ( t ) = K _ { u } f _ { u } ( t )$
    $q_f(t)$絮凝剂添加量$q _ { f } ( t ) = K _ { f } f _ { f } ( t )$
    $d(t)$絮凝作用后的颗粒直径$d ( t ) = k _ { s } q _ { f } ( t ) + d _ { 0 }$
    $u_t(t)$颗粒的干涉沉降速度$u _ { t} ( t ) = \dfrac { d ^ { 2 } ( t ) \left( \rho _ { s } - \rho _ { e } \right) g } { 18 \mu _ { e } }$
    $u_r(t)$底流导致的颗粒下沉速度$u _ { r } ( t ) = \dfrac { q _ { u } ( t ) } { A }$
    $c_l(t)$泥层高度处单位体积含固量$c _ { l } ( t ) = k _ { i } q _ { i } ( t ) c _ { i } ( t )$
    $c_a(t)$泥层界面内单位体积含固量$c _ { a } ( t ) = p \left[ c _ { l } ( t ) + c _ { u } ( t ) \right]$
    $r(t)$泥层内液固质量比$r(t)=\rho_{l}\left(\dfrac{1}{c_ a(t)}-\frac{1}{\rho_s}\right)$
    $W ( t )$单位时间进入浓密机内的固体质量$W ( t ) = c _ { i } (t ) q _ { i } ( t )$
    下载: 导出CSV

    表  4  不同控制算法之间性能分析

    Table  4  Performances analysis of different algorithms

    实验组实验1实验2
    对比指标MSE1MAE2IAE3MSEMAEIAE
    HDP414.182141.8547.2466 105.619275.07554.952
    DHP290.886109.3125.392732.81496.14516.560
    ILPL364.397135.4748.2892 473.661211.61535.222
    HCNVI44.44566.6043.867307.61876.17612.998
    下载: 导出CSV
  • [1] Shen Y, Hao L, Ding S X. Real-time implementation of fault tolerant control systems with performance optimization. IEEE Trans. Ind. Electron, 2014, 61(5): 2402−2411 doi: 10.1109/TIE.2013.2273477
    [2] Kouro S, Cortes P, Vargas R, Ammann U, Rodriguez J. Model predictive control — A simple and powerful method to control power converters. IEEE Trans. Ind. Electron, 2009, 56(6): 1826−1838 doi: 10.1109/TIE.2008.2008349
    [3] Dai W, Chai T, Yang S X. Data-driven optimization control for safety operation of hematite grinding process. IEEE Trans. Ind. Electron, 2015, 62(5): 2930−2941 doi: 10.1109/TIE.2014.2362093
    [4] Wang D, Liu D, Zhang Q, Zhao D. Data-based adaptive critic designs for nonlinear robust optimal control with uncertain dynamics. IEEE Trans. Syst., Man, Cybern., Syst., 2016, 46(11): 1544−1555 doi: 10.1109/TSMC.2015.2492941
    [5] Sutton S R, Barto G A. Reinforcement Learning: An Introduction. Cambridge: MIT Press, 2nd edition, 2018.
    [6] Lewis F L, Vrabie D, Syrmos V L. Optimal Control. New York, USA: John Wiley & Sons, Hoboken, 3rd Edition, 2012.
    [7] Prokhorov V D, Wunsch C D. Adaptive critic design. IEEE Transactions on Neural Networks, 1997, 8(5): 997−1007 doi: 10.1109/72.623201
    [8] Werbos P J. Foreword - ADP: the key direction for future research in intelligent control and understanding brain intelligence. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)., 2008, 38(4): 898−900 doi: 10.1109/TSMCB.2008.924139
    [9] 段艳杰, 吕宜生, 张杰, 赵学亮, 王飞跃. 深度学习在控制领域的研究现状与展望. 自动化学报, 2016, 42(5): 643−654

    Duan Yan-Jie, Lv Yi-Sheng, Zhang Jie, Zhao Xue-Liang, Wang Fei-Yue. Deep learning for control: the state of the art and prospects. Acta Automatica Sinica, 2016, 42(5): 643−654
    [10] Liu Y-J, Tang L, Tong S-C, Chen C L P, Li D-J. Reinforcement learning design-based adaptive tracking control with less learning parameters for nonlinear discrete-time MIMO systems. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(1): 165−176 doi: 10.1109/TNNLS.2014.2360724
    [11] Liu L, Wang Z, Zhang H. Adaptive fault-tolerant tracking control for MIMO discrete-time systems via reinforcement learning algorithm with less learning parameters. IEEE Transactions on Automation Science and Engineering, 2017, 14(1): 299−313 doi: 10.1109/TASE.2016.2517155
    [12] Xu X, Yang H, Lian C, Liu J. Self-learning control using dual heuristic programming with global laplacian eigenmaps. IEEE Transactions on Industrial Electronics, 2017, 64(12): 9517−9526 doi: 10.1109/TIE.2017.2708002
    [13] Wei Q-L, Liu D-R. Adaptive dynamic programming for optimal tracking control of unknown nonlinear systems with application to coal gasification. IEEE Transactions on Automation Science and Engineering, 2014, 11(4): 1020−1036 doi: 10.1109/TASE.2013.2284545
    [14] Jiang Y, Fan J-L, Chai T-Y, Li J-N, Lewis L F. Data-driven flotation industrial process operational optimal control based on reinforcement learning. IEEE Transactions on Industrial Informatics, 2017, 14(5): 1974−1989
    [15] Jiang Y, Fan J-L, Chai T-Y, Lewis L F. Dual-rate operational optimal control for flotation industrial process with unknown operational model. IEEE Transactions on Industrial Electronics, 2019, 66(6): 4587−4599 doi: 10.1109/TIE.2018.2856198
    [16] Modares H, Lewis F L. Automatica integral reinforcement learning and experience replay for adaptive optimal control of partiallyunknownconstrained-input. Automatica, 2014, 50(1): 193−202 doi: 10.1016/j.automatica.2013.09.043
    [17] Mnih V, Silver D, Riedmiller M. Playing atari with deep reinforcement learning. In: Procedings of the NIPS Deep Learning Workshop 2013, Lake Tahoe, USA: NIPS 2013, 1−9
    [18] Wang D, Liu D R, Wei Q L, Zhao D B, Jin N. Automatica optimal control of unknown nonaffine nonlinear discrete-time systems basedon adaptive dynamic programming. Automatica, 2012, 48(8): 1825−1832 doi: 10.1016/j.automatica.2012.05.049
    [19] Chai T Y, Jia Y, Li H B, Wang H. An intelligent switching control for a mixed separation thickener process. Control Engineering Practice, 2016, 57: 61−71 doi: 10.1016/j.conengprac.2016.07.007
    [20] Kim B H, Klima M S. Development and application of a dynamic model for hindered-settling column separations. Minerals Engineering, 2004, 17(3): 403−410 doi: 10.1016/j.mineng.2003.11.013
    [21] Wang L Y, Jia Y, Chai T Y, Xie W F. Dual rate adaptive control for mixed separationthickening process using compensation signal basedapproach. IEEE Transactions on Industrial Electronics, 2017, PP: 1−1
    [22] 王猛. 矿浆中和沉降分离过程模型软件的研发. 东北大学, 2011

    Wang Meng. Design and development of model software of processes of slurry neutralization, sedimentation and separation. Northeastern University, 2011
    [23] 唐谟堂. 湿法冶金设备. 中南大学出版社, 2009

    Tang Mo-Tang. Hydrometallurgical equipment. Central South University, 2009
    [24] 王琳岩, 李健, 贾瑶, 柴天佑. 混合选别浓密过程双速率智能切换控制. 自动化学报, 2018, 44(2): 330−343

    Wang Lin-Yan, Li Jian, Jia Yao, Chai Tian-You. Dual-rate intelligent switching control for mixed separation thickening process. Acta Automatica Sinica, 2018, 44(2): 330−343
    [25] Luo B, Liu D R, Huang T W, Wang D. Model-free optimal tracking control via critic-only Q-learning. IEEE Transactions on Neural Networks and Learning Systems, 2016, 27(10): 2134−2144 doi: 10.1109/TNNLS.2016.2585520
    [26] Padhi R, Unnikrishnan N, Wang X H, Balakrishnan S N. A single network adaptive critic (SNAC) architecture for optimal controlsynthesis for a class of nonlinear systems. Neural Networks, 2006, 19(10): 1648−1660 doi: 10.1016/j.neunet.2006.08.010
  • 加载中
图(13) / 表(4)
计量
  • 文章访问数:  3864
  • HTML全文浏览量:  1062
  • PDF下载量:  211
  • 被引次数: 0
出版历程
  • 收稿日期:  2019-05-10
  • 录用日期:  2019-08-15
  • 修回日期:  2019-07-02
  • 网络出版日期:  2019-12-25
  • 刊出日期:  2021-07-27

目录

    /

    返回文章
    返回