Collaborative Optimization of Multiple Operating Parameters for Industrial Processes Based on Multi-Agent Reinforcement Learning
-
摘要: 流程工业普遍存在多操作参数强耦合、工艺拓扑复杂和多工序协同难等问题, 导致传统局部优化方法难以满足全局最优运行需求. 针对上述挑战, 提出一种基于图谱理论的流程拓扑结构感知的多智能体强化学习协同优化方法, 以实现复杂拓扑流程工业的多操作参数协同优化. 首先, 构建基于拉普拉斯谱分析的拓扑结构解析框架, 刻画工业过程多操作参数耦合结构关系, 为智能体任务分配与协同决策提供支撑; 然后, 设计融合长短期记忆网络与多头注意机制的时序感知模块, 实现历史状态轨迹中关键时间依赖特征提取; 进一步, 引入多层次空间注意力机制, 面向组织层、变量层及连续控制域实现优化关注度的动态自适应调节; 在此基础上, 构建局部-全局协同的分层强化学习决策架构, 实现多智能体间的协调控制与策略优化. 在连续搅拌釜反应器系统和盐湖化工典型流程的工业数据基础上, 构建了仿真实验以验证所提方法的有效性. 实验结果表明, 所提方法相较于传统方法性能提升41.2%, 展现出更优的收敛性能和策略稳定性, 为流程工业多操作参数协同优化提供了新思路和参考技术路径.
-
关键词:
- 多操作参数协同优化 /
- 基于图谱理论的多智能体强化学习 /
- 拉普拉斯谱分解 /
- 层次化注意力机制 /
- 流程工业智能优化
Abstract: Process industries are often confronted with challenges such as strong multi-operational parameter couplings, intricate process topologies, and difficulties in multi-stage coordination. These challenges render conventional localized optimization methods inadequate for achieving globally optimal operational requirements. To address these issues, this paper proposes a graph spectral theory-based process topology-aware multi-agent reinforcement learning collaborative optimization method to achieve multiple operating parameter collaborative optimization in complex topological process industries. Specifically, a topology analysis framework based on Laplacian spectral analysis is developed to capture the structural coupling relationships of multiple operating parameters in industrial processes, thereby supporting agent task allocation and coordinated decision-making. Subsequently, a temporal perception module is designed by integrating long short-term memory (LSTM) networks with a multi-head attention mechanism, enabling the extraction of key temporal dependencies from historical state trajectories. Furthermore, a hierarchical spatial attention mechanism is introduced to achieve dynamic and adaptive regulation of optimization attention across organizational, variable, and continuous control levels. On this basis, a hierarchical reinforcement learning architecture is constructed to coordinate local and global policy optimization, facilitating cooperative control and strategy optimization among multiple agents. To validate the effectiveness of the proposed method, simulation experiments are conducted using industrial data from both a Continuous Stirred Tank Reactor (CSTR) system and a representative salt-lake chemical process. Experimental results demonstrate that the proposed framework outperforms conventional approaches, achieving up to a 41.2% improvement in performance. The results highlight the superior convergence behavior and policy stability of the proposed method, offering new insights and a viable technical pathway for multiple operating parameter collaborative optimization in process industries. -
表 1 超参数配置
Table 1 Hyperparameter configuration
组件 配置 Actor网络 两层隐藏层(400, 300单元), ReLU激活 Critic网络 两层隐藏层(400, 300单元), ReLU激活 图网络 隐藏维度: 64, GCN层数: 2, 循环维度: 64 优化器 Adam, 学习率: $ 3 \times 10^{-4} $ 训练配置 1,000,000步, 回放缓冲: 1,000,000条转移 折扣因子$ \gamma $ 0.99 软更新系数$ \tau $ 0.005 批处理大小 256 多头注意力头数$ M $ 8 历史窗口长度$ H $ CSTR: 24, 盐湖: 16 评估设置 10次随机种子, 报告均值和标准差 表 2 算法性能对比结果
Table 2 Algorithm performance comparison results
方法 CSTR过程 盐湖化工过程 DDPG 358.8030 ±30.2031 18.5837 ±1.3508 IDDPG 233.8783 ±29.5558 17.1829 ±2.4319 MADDPG 206.1738 ±69.6413 18.8161 ±1.1507 所提方法 506.5871 ±25.8564 19.2423 ±1.0250 -
[1] 阳春华, 孙备, 李勇刚, 黄科科, 桂卫华. 复杂生产流程协同优化与智能控制. 自动化学报, 2023, 49(3): 528−539 doi: 10.16383/j.aas.c220737Yang Chun-Hua, Sun Bei, Li Yong-Gang, Huang Ke-Ke, Gui Wei-Hua. Cooperative optimization and intelligent control of complex production processes. Acta Automatica Sinica, 2023, 49(3): 528−539 doi: 10.16383/j.aas.c220737 [2] 柴天佑. 工业人工智能发展方向. 自动化学报, 2020, 46(10): 2005−2012 doi: 10.16383/j.aas.c200796Chai Tian-You. Development directions of industrial artificial intelligence. Acta Automatica Sinica, 2020, 46(10): 2005−2012 doi: 10.16383/j.aas.c200796 [3] Wang Y L, Tan X J, Liu C L, Huang P Q, Zhang Q, Yang C H. Exploring interpretable evolutionary optimization via significance of each constraint and population diversity. Swarm and Evolutionary Computation, 2024, 91: 101679 doi: 10.1016/j.swevo.2024.101679 [4] Han H G, Zhang L, Zhang L L, He Z, Qiao J F. Cooperative optimal controller and its application to activated sludge process. IEEE Transactions on Cybernetics, 2019, 51(8): 3938−3951 [5] Han H G, Tang Z C, Wu X L, Yang H Y, Qiao J F. Robust reconstructed neural network with spectral reshaping activation. IEEE Transactions on Cybernetics, 2025, 55(6): 2765−2778 doi: 10.1109/TCYB.2025.3557397 [6] Liu D J, Wang Y L, Liu C L, Yuan X F, Wang K, Yang C H. Scope-free global multi-condition-aware industrial missing data imputation framework via diffusion transformer. IEEE Transactions on Knowledge and Data Engineering, 2024, 36(11): 6977−6988 doi: 10.1109/TKDE.2024.3392897 [7] Li L, Rong S M, Wang R, Yu S L. Recent advances in artificial intelligence and machine learning for nonlinear relationship analysis and process control in drinking water treatment: A review. Chemical Engineering Journal, 2021, 405: 126673 doi: 10.1016/j.cej.2020.126673 [8] Zhao C. Perspectives on nonstationary process monitoring in the era of industrial artificial intelligence. Journal of Process Control, 2022, 116: 255−272 doi: 10.1016/j.jprocont.2022.06.011 [9] Liu D J, Wang Y L, Liu C L, Luo B, Huang B. EKG-AC: A new paradigm for process industrial optimization based on offline reinforcement learning with expert knowledge guidance. IEEE Transactions on Cybernetics, early access, doi: 10.1109/TCYB.2025.3579361 [10] Ding J, Yang C, Chai T. Recent progress on data-based optimization for mineral processing plants. Engineering, 2017, 3(2): 183−187 doi: 10.1016/J.ENG.2017.02.015 [11] 李康, 王福利, 何大阔, 贾润达. 基于数据的湿法冶金全流程操作量优化设定补偿方法. 自动化学报, 2017, 43(6): 1047−1055Li Kang, Wang Fu-Li, He Da-Kuo, Jia Run-Da. A data-based compensation method for optimal setting of hydrometallurgical process. Acta Automatica Sinica, 2017, 43(6): 1047−1055 [12] Schwenzer M, Ay M, Bergs T, Abel D. Review on model predictive control: An engineering perspective. The International Journal of Advanced Manufacturing Technology, 2021, 117: 1327−1349 doi: 10.1007/s00170-021-07682-3 [13] Zhou P, Chai T, Wang H. Intelligent optimal-setting control for grinding circuits of mineral processing process. IEEE Transactions on Automation Science and Engineering, 2009, 6(4): 730−743 doi: 10.1109/TASE.2008.2011562 [14] 丁进良, 杨翠娥, 陈远东, 柴天佑. 复杂工业过程智能优化决策系统的现状与展望. 自动化学报, 2018, 44(11): 1931−1943 doi: 10.16383/j.aas.2018.c180550Ding Jin-Liang, Yang Cui-E, Chen Yuan-Dong, Chai Tian-You. Research progress and prospects of intelligent optimization decision making in complex industrial process. Acta Automatica Sinica, 2018, 44(11): 1931−1943 doi: 10.16383/j.aas.2018.c180550 [15] Sun B, Yang C H, Zhu H Q, Gui W H. Modeling, optimization, and control of solution purification process in zinc hydrometallurgy. IEEE/CAA Journal of Automatica Sinica, 2018, 5(2): 564−576 doi: 10.1109/JAS.2017.7510844 [16] Lattanzi L, Raffaeli R, Peruzzini M, Pellicciari M. Digital twin for smart manufacturing: A review of concepts towards a practical industrial implementation. International Journal of Computer Integrated Manufacturing, 2021, 34(6): 567−597 doi: 10.1080/0951192X.2021.1911003 [17] 代伟, 陆文捷, 付俊, 马小平. 工业过程多速率分层运行优化控制. 自动化学报, 2019, 45(10): 1946−1959Dai Wei, Lu Wen-Jie, Fu Jun, Ma Xiao-Ping. Multi-rate layered optimal operational control of industrial processes. Acta Automatica Sinica, 2019, 45(10): 1946−1959 [18] 阳春华, 刘一顺, 黄科科, 孙备, 李勇刚, 陈晓方, 等. 有色金属工业智能模型库构建方法及应用. 中国工程科学, 2022, 24(4): 188−201Yang Chun-Hua, Liu Yi-Shun, Huang Ke-Ke, Sun Bei, Li Yong-Gang, Chen Xiao-Fang, et. al. Intelligent model library for nonferrous metal industry: Construction method and application. Strategic Study of Chinese Academy of Engineering, 2022, 24(4): 188−201 [19] 刘强, 卓洁, 郎自强, 秦泗钊. 数据驱动的工业过程运行监控与自优化研究展望. 自动化学报, 2018, 44(11): 1944−1956 doi: 10.16383/j.aas.2018.c180207Liu Qiang, Zhuo Jie, Lang Zi-Qiang, Qin S J. Perspectives on data-driven operation monitoring and self-optimization of industrial processes. Acta Automatica Sinica, 2018, 44(11): 1944−1956 doi: 10.16383/j.aas.2018.c180207 [20] 周晓君, 阳春华, 桂卫华. 全局优化视角下的有色冶金过程建模与控制. 控制理论与应用, 2015, 32(9): 1158−1169Zhou Xiao-Jun, Yang Chun-Hua, Gui Wei-Hua. Modeling and control of nonferrous metallurgical processes on the perspective of global optimization. Control Theory & Applications, 2015, 32(9): 1158−1169 [21] Wei D, Ding S F, Zhang C L, Shi Z Z. Multiagent reinforcement learning with heterogeneous graph attention network. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(10): 6851−6860 doi: 10.1109/TNNLS.2022.3215774 [22] 朱美强, 程玉虎, 李明, 王雪松, 冯涣婷. 一类基于谱方法的强化学习混合迁移算法. 自动化学报, 2012, 38(11): 1765−1776 doi: 10.3724/SP.J.1004.2012.01765Zhu Mei-Qiang, Cheng Yu-Hu, Li Ming, Wang Xue-Song, Feng Huan-Ting. A hybrid transfer algorithm for reinforcement learning based on spectral method. Acta Automatica Sinica, 2012, 38(11): 1765−1776 doi: 10.3724/SP.J.1004.2012.01765 [23] Huang J, Su J, Chang Q. Graph neural network and multi-agent reinforcement learning for machine-process-system integrated control to optimize production yield. Journal of Manufacturing Systems, 2022, 64: 81−93 doi: 10.1016/j.jmsy.2022.05.018 [24] Jiang Y, Fan J L, Chai T Y, Li J N, Lewis F L. Data-driven flotation industrial process operational optimal control based on reinforcement learning. IEEE Transactions on Industrial Informatics, 2017, 14(5): 1974−1989 [25] 李金娜, 袁林, 丁进良. 不确定工业过程运行指标异步更新强化学习决策算法. 自动化学报, 2023, 49(2): 461−472 doi: 10.16383/j.aas.c210983Li Jin-Na, Yuan Lin, Ding Jin-Liang. Asynchronous updating reinforcement learning algorithm for decision-making operational indices of uncertain industrial processes. Acta Automatica Sinica, 2023, 49(2): 461−472 doi: 10.16383/j.aas.c210983 [26] Dogru O, Chiplunkar R, Huang B. Reinforcement learning with constrained uncertain reward function through particle filtering. IEEE Transactions on Industrial Electronics, 2022, 69(7): 7491−7499 doi: 10.1109/TIE.2021.3099234 [27] Yoo H, Byun H E, Han D, Lee J H. Reinforcement learning for batch process control: Review and perspectives. Annual Reviews in Control, 2021, 52: 108−119 doi: 10.1016/j.arcontrol.2021.10.006 [28] Shakya A K, Pillai G, Chakrabarty S. Reinforcement learning algorithms: A brief survey. Expert Systems with Applications, 2023, 231: 120495 doi: 10.1016/j.eswa.2023.120495 [29] Liu D J, Wang Y L, Liu C L, Yuan X F, Yang C H, Gui W H. Data mode related interpretable transformer network for predictive modeling and key sample analysis in industrial processes. IEEE Transactions on Industrial Informatics, 2023, 19(9): 9325−9336 doi: 10.1109/TII.2022.3227731 [30] Lillicrap T P, Hunt J J, Pritzel A, Heess N, Erez T, Tassa Y, et al. Continuous control with deep reinforcement learning. arXiv preprint arXiv: 1509.02971, 2015 [31] Duan Y, Chen X, Houthooft R, Schulman J, Abbeel P. Benchmarking deep reinforcement learning for continuous control. In: Proceedings of the International Conference on Machine Learning. PMLR, 2016. 1329-1338 [32] Lowe R, Wu Y I, Tamar A, Harb J, Abbeel P, Mordatch, I, et al. Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in Neural Information Processing Systems, 2017, 30: 1−12 -
计量
- 文章访问数: 15
- HTML全文浏览量: 8
- 被引次数: 0
下载: