• 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于多智能体合作博弈的液压支架调直策略研究

杨艺 王静怡 钱伟 王田

杨艺, 王静怡, 钱伟, 王田. 基于多智能体合作博弈的液压支架调直策略研究. 自动化学报, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250624
引用本文: 杨艺, 王静怡, 钱伟, 王田. 基于多智能体合作博弈的液压支架调直策略研究. 自动化学报, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250624
Yang Yi, Wang Jing-Yi, Qian Wei, Wang Tian. Study on hydraulic support alignment strategy based on multi-agent cooperative game. Acta Automatica Sinica, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250624
Citation: Yang Yi, Wang Jing-Yi, Qian Wei, Wang Tian. Study on hydraulic support alignment strategy based on multi-agent cooperative game. Acta Automatica Sinica, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250624

基于多智能体合作博弈的液压支架调直策略研究

doi: 10.16383/j.aas.c250624 cstr: 32138.14.j.aas.c250624
基金项目: 国家自然科学基金(92467108)资助
详细信息
    作者简介:

    杨艺:河南理工大学电气工程与自动化学院副教授. 2007年获得河南理工大学硕士学位, 2017年获得北京航空航天大学博士学位. 主要研究方向为人工智能, 强化学习, 自适应动态规划及其在煤矿中的应用. 本文通信作者. E-mail: yangyi@hpu.edu.cn

    王静怡:河南理工大学电气工程与自动化学院硕士研究生. 主要研究方向为多智能体强化学习. E-mail: Anna_wangjy@outlook.com

    钱伟:河南理工大学电气工程与自动化学院教授. 2005年获得东南大学硕士学位, 2009年获得浙江大学工业控制技术国家重点实验室博士学位. 主要研究方向为时滞系统, 随机系统, 网络控制系统和多智能体系统. E-mail: qwei@hpu.edu.cn

    王田:北京航空航天大学人工智能学院教授. 2010年获得西安交通大学硕士学位, 2014年获得法国特鲁瓦工程技术大学博士学位. 主要研究方向为人工智能, 计算机视觉和模式识别. E-mail: wangtian@buaa.edu.cn

Study on Hydraulic Support Alignment Strategy Based on Multi-agent Cooperative Game

Funds: Supported by National Natural Science Foundation of China (92467108)
More Information
    Author Bio:

    YANG Yi Associate professor at the School of Electrical Engineering and Automation, Henan Polytechnic University. He received his master degree from Henan Polytechnic University in 2007, and his Ph.D. degree from Beihang University in 2017. His research interests include artificial intelligence, reinforcement learning, adaptive dynamic programming, and their applications in coal mine. Corresponding author of this paper

    WANG Jing-Yi Master student at the School of Electrical Engineering and Automation, Henan Polytechnic University. Her main research interest is multi-agent reinforcement learning

    Professor at the School of Electrical Engineering and Automation, Henan Polytechnic University. He received his master degree from Southeast University in 2005, and his Ph.D. degree from the State Key Laboratory of Industrial Control Technology, Zhejiang University in 2009. His research interests include time-delay systems, stochastic systems, networked control systems, and multi-agent systems

    WANG Tian Professor at the School of Artificial Intelligence, Beihang University. He received his master degree from Xi'an Jiaotong University in 2010, and his Ph.D. degree from Université de Technologie de Troyes, France, in 2014. His research interests include artificial intelligence, computer vision, and pattern recognition

  • 摘要: 综采工作面液压支架群调直过程中, 支架群间的强耦合关系与液压缸摩擦−滑移非线性构成典型的“耦合−非线性”双重复杂性, 使传统控制方法难以有效建模. 现有多智能体强化学习方法能实现并行决策, 但面临全局奖励无法精确归因至各支架动作、仅依赖观测状态难以捕获支架群姿态的时序演化规律等问题, 阻碍策略的有效收敛. 为此, 提出融合合作博弈与长短期记忆网络(long short-term memory, LSTM)的多智能体强化学习算法CG-LSTM-MATD3. 该算法基于去中心化部分可观测马尔科夫决策过程对液压支架调直过程建模, 引入Shapley值实现各液压支架边际贡献的合理归因, 并设计Coalition网络通过联盟生成降低计算复杂度. 其次, 在Actor、Critic以及Coalition网络中嵌入LSTM模块, 通过对状态序列等信息的记忆, 使模型能够捕获状态的时序依赖关系, 增强对环境动态特性和真实状态的感知能力. 实验结果表明, 该算法在七支架任务中直线度相对基线算法提升80.59%, 消融实验进一步证明了合作博弈机制与LSTM模块的有效性.
  • 图  1  煤矿综采工作面液压支架协同作业

    Fig.  1  Cooperative operation of hydraulic supports in fully mechanized coal mining face

    图  2  CG-LSTM-MATD3算法框架

    Fig.  2  The framework of CG-LSTM-MATD3 algorithm

    图  3  Shapley值信用分配

    Fig.  3  Shapley value credit assignment

    图  4  LSTM融入各网络示意图

    Fig.  4  Schematic diagram of LSTM integration into each network

    图  5  $ 5\sim10 $个智能体规模下的性能曲线

    Fig.  5  Performance curves with $ 5\sim10 $ agents

    图  6  五个智能体下的奖励和直线度曲线

    Fig.  6  Curves of reward and linearity for 5 agents

    图  7  六个智能体下的奖励和直线度曲线

    Fig.  7  Curves of reward and linearity for 6 agents

    图  8  七个智能体下的奖励和直线度曲线

    Fig.  8  Curves of reward and linearity for 7 agents

    图  9  不同噪声场景下七个智能体的性能曲线

    Fig.  9  Performance curves of 7 agents under different noise scenes

    表  1  算法超参数设置

    Table  1  Algorithm hyperparameter setting

    参数 MATD3 CG-MATD3 CG-LSTM-MATD3 DIPO
    Actor学习率 $ 3\times10^{-4} $ $ 3\times10^{-4} $ $ 3\times10^{-4} $ $ 3\times10^{-4} $
    Critic学习率 $ 3\times10^{-4} $ $ 3\times10^{-4} $ $ 3\times10^{-4} $ $ 3\times10^{-4} $
    联盟学习率 $ 3\times10^{-4} $ $ 3\times10^{-4} $
    折扣因子 0.99 0.99 0.99 0.99
    软更新参数 $ 5\times10^{-3} $ $ 5\times10^{-3} $ $ 5\times10^{-3} $ $ 5\times10^{-3} $
    步数 10 10 10 10
    训练总回合 2000 2000 2000 2000
    经验池容量 10000 10000 10000 10000
    批容量 256 256 256 256
    动作梯度步数 4
    扩散推理时间步 4
    动作学习率 0.04
    Actor、Critic梯度裁剪范数 1.0
    动作梯度范数比例 0.1
    下载: 导出CSV

    表  2  不同场景下的奖励与直线度数值

    Table  2  Reward and linearity values under different scenes

    场景 说明 奖励 直线度
    Original 无干扰 6.476 $ - $1.028
    Scene 1 液压支架偏斜 6.180 $ - $1.188
    Scene 2 传感器噪声 5.930 $ - $1.534
    Scene 3 煤层起伏 6.416 $ - $0.985
    Scene 4 混和噪声干扰 5.895 $ - $1.683
    下载: 导出CSV

    表  3  LSTM与经验回放模块互补性分析

    Table  3  Complementary analysis of LSTM and experience replay modules

    MATD3算法配置 收敛回合数 平均直线度 直线度$ \uparrow $ 平均奖励 奖励$ \uparrow $
    经验回放 未收敛至最优 $ - $5.301 5.001
    LSTM+顺序采样 530 $ - $1.795 66.14% 5.548 10.94%
    LSTM+经验回放 350 $ - $1.497 71.76% 6.320 26.37%
    下载: 导出CSV

    表  4  消融实验

    Table  4  Ablation experiment

    CG LSTM 直线度 奖励 运行时间(ms) GPU占用(MB) 模型参数量(KB)
    Avg Std 提升 Avg Std 提升
    $ \times $ $ \times $ $ - $5.301 1.71 5.001 0.89 6.88 7.71 344.9
    $ \times $ $ \checkmark $ $ - $1.497 1.61 71.76% 6.320 0.47 26.37% 9.34 17.42 363.7
    $ \checkmark $ $ \times $ $ - $1.675 0.54 68.40% 5.428 0.53 8.54% 9.34 8.37 357.1
    $ \checkmark $ $ \checkmark $ $ - $1.029 0.43 80.59% 6.477 0.44 29.51% 9.34 19.73 378.9
    下载: 导出CSV
  • [1] 张德生, 周杰, 任怀伟, 冯银辉, 韩会军, 巩师鑫. 超大采高工作面装备群智能协同控制关键技术. 煤炭技术, 2025, 44(10): 1−5 doi: 10.13301/j.cnki.ct.2025.10.001

    Zhang De-Sheng, Zhou Jie, Ren Huai-Wei, Feng Yin-Hui, Han Hui-Jun, Gong Shi-Xin. Key technologies for intelligent collaborative control of equipment group in super high mining face. Coal Technology, 2025, 44(10): 1−5 doi: 10.13301/j.cnki.ct.2025.10.001
    [2] Murphy K. Reinforcement learning: an overview. arXiv preprint arXiv: 2412.05265, 2024
    [3] 罗开成, 高阳, 杨艺, 常亚军, 袁瑞甫. 基于均值偏差奖赏函数的放煤口控制策略研究. 煤炭工程, 2022, 54(9): 105−111

    Luo Kai-Cheng, Gao Yang, Yang Yi, Chang Ya-Jun, Yuan Rui-Fu. Control strategy of drawing opening based on mean deviation reward function. Coal Engineering, 2022, 54(9): 105−111
    [4] 杨艺, 李庆元, 李化敏, 李东印, 杨延麟, 费树岷. 基于批量式强化学习的群组放煤智能决策研究. 煤炭科学技术, 2022, 50(10): 188−197

    Yang Yi, Li Qing-Yuan, Li Hua-Min, Li Dong-Yin, Yang Yan-Lin, Fei Shu-Min. Research on intelligent decision-making of group drawing based on batch reinforcement learning. Coal Science and Technology, 2022, 50(10): 188−197
    [5] 杨艺, 王圣文, 崔科飞, 费树岷. 基于模糊深度Q网络的放煤智能决策方法. 工矿自动化, 2023, 49(4): 78−85 doi: 10.13272/j.issn.1671-251x.2022090068

    Yang Yi, Wang Sheng-Wen, Cui Ke-Fei, Fei Shu-Min. Intelligent drawing decision method based on fuzzy deep Q-network. Industry and Mine Automation, 2023, 49(4): 78−85 doi: 10.13272/j.issn.1671-251x.2022090068
    [6] 杨艺, 孙英杰, 常亚军, 刘斌斌, 王科平. 基于链式基站坐标融合的采煤机定位方法. 工矿自动化, 2025, 51(5): 49−56

    Yang Yi, Sun Ying-Jie, Chang Ya-Jun, Liu Bin-Bin, Wang Ke-Ping. Shearer positioning method based on chain base station coordinate fusion. Industry and Mine Automation, 2025, 51(5): 49−56
    [7] Luo B, Liu D, Wu H-N, Huang T, Yang C, Gui W. Recent advances on off-policy reinforcement learning for optimization control. IEEE Transactions on Cybernetics, 20261−14
    [8] Yang Y, Dai Y, Wang T, Qian W. Hydraulic-supports alignment by TD3 with segmented experience pool. Neural Processing Letters, 2025, 57(2): 35 doi: 10.1007/s11063-025-11744-y
    [9] 郭雷, 梁成庆. 基于MATD3算法的多智能体避碰控制. 计算技术与自动化, 2024, 43(1): 9−15

    Guo Lei, Liang Cheng-Qing. Multi-agent collision avoidance control based on MATD3 algorithm. Computing Technology and Automation, 2024, 43(1): 9−15
    [10] Guicheng S, Yang W. Review on dec-POMDP model for MARL algorithms. In: Proceedings of the Smart Communications, Intelligent Algorithms and Interactive Methods. Singapore: Springer, 2022. 29-35
    [11] Alvarez-Mozos M, Macho-Stadler I, Perez-Castrillo D. Sequential creation of surplus and the shapley value. Games and Economic Behavior, 2026, 155: 149−166 doi: 10.1016/j.geb.2025.09.007
    [12] Neuman E, Tuschmann S. Stochastic graphon games with interventions. arXiv preprint arXiv: 2507.00561, 2025
    [13] Krichen M, Mihoub A. Long short-term memory networks: a comprehensive survey. AI, 2025, 6(9): 215 doi: 10.3390/ai6090215
    [14] 白晋铭, 王然风, 付翔. 基于架间行走机器人的液压支架直线度测量方法. 工矿自动化, 2019, 45(1): 45−51

    Bai Jin-Ming, Wang Ran-Feng, Fu Xiang. Straightness measurement method of hydraulic support based on walking robot between supports. Industry and Mine Automation, 2019, 45(1): 45−51
    [15] 张旭辉, 王冬曼, 杨文娟. 基于视觉测量的液压支架位姿检测方法. 工矿自动化, 2019, 45(3): 56−60 doi: 10.13272/j.issn.1671-251x.2018090039

    Zhang Xu-Hui, Wang Dong-Man, Yang Wen-Juan. Position and posture detection method of hydraulic support based on visual measurement. Industry and Mine Automation, 2019, 45(3): 56−60 doi: 10.13272/j.issn.1671-251x.2018090039
    [16] 张树楠, 曹现刚, 崔亚仲, 罗璇, 张国祯. 基于多传感器的液压支架直线度测量方法研究. 煤矿机械, 2020, 41(4): 56−59 doi: 10.13436/j.mkjx.202004019

    Zhang Shu-Nan, Cao Xian-Gang, Cui Ya-Zhong, Luo Xuan, Zhang Guo-Zhen. Research on straightness measurement method of hydraulic support based on multi-sensor. Coal Mine Machinery, 2020, 41(4): 56−59 doi: 10.13436/j.mkjx.202004019
    [17] 王宇卓, 常宗旭, 高飞, 廉自生. 液压支架的调直方法研究. 机电工程, 2021, 38(5): 645−649 doi: 10.3969/j.issn.1001-4551.2021.05.020

    Wang Yu-Zhuo, Chang Zong-Xu, Gao Fei, Lian Zi-Sheng. Research on alignment method of hydraulic support. Journal of Mechanical & Electrical Engineering, 2021, 38(5): 645−649 doi: 10.3969/j.issn.1001-4551.2021.05.020
    [18] 宋单阳, 卢春贵, 陶心雅, 杨金衡, 王培恩, 郑文强. 基于最大熵卡尔曼滤波算法的液压支架调直方法. 工矿自动化, 2022, 48(11): 119−124

    Song Dan-Yang, Lu Chun-Gui, Tao Xin-Ya, Yang Jin-Heng, Wang Pei-En, Zheng Wen-Qiang. Hydraulic support alignment method based on maximum entropy Kalman filter algorithm. Industry and Mine Automation, 2022, 48(11): 119−124
    [19] 胡波, 廉自生. 基于支持向量机和遗传算法的液压支架调直系统研究. 煤矿机械, 2014, 35(10): 39−41

    Hu Bo, Lian Zi-Sheng. Research on hydraulic support alignment system based on support vector machine and genetic algorithm. Coal Mine Machinery, 2014, 35(10): 39−41
    [20] 王虹, 尤秀松, 李首滨, 魏文艳. 基于遗传算法与BP神经网络的支架跟机自动化研究. 煤炭科学技术, 2021, 49(1): 272−277 doi: 10.13199/j.cnki.cst.2021.01.024

    Wang Hong, You Xiu-Song, Li Shou-Bin, Wei Wen-Yan. Research on support following automation based on genetic algorithm and BP neural network. Coal Science and Technology, 2021, 49(1): 272−277 doi: 10.13199/j.cnki.cst.2021.01.024
    [21] 李文俊, 周展. 基于惯导系统的综采工作面自动调直技术. 陕西煤炭, 2022, 41(4): 130−133 doi: 10.3969/j.issn.1008-0155.2023.23.018

    Li Wen-Jun, Zhou Zhan. Automatic alignment technology of fully mechanized working face based on inertial navigation system. Shaanxi Coal, 2022, 41(4): 130−133 doi: 10.3969/j.issn.1008-0155.2023.23.018
    [22] 王云飞, 赵继云, 张鹤, 王浩, 张阳. 基于神经网络补偿的液压支架群推移系统直线度控制方法. 煤炭科学技术, 2024, 52(11): 174−185

    Wang Yun-Fei, Zhao Ji-Yun, Zhang He, Wang Hao, Zhang Yang. Straightness control method of hydraulic support group pushing system based on neural network compensation. Coal Science and Technology, 2024, 52(11): 174−185
    [23] 孙铭泽, 王永强, 常亚军, 朱德昇, 李石岩, 杨克虎. 液压支护机器人群组移架一致性分布式协同控制. 煤炭学报, 2024, 49(S2): 1208−1222

    Sun Ming-Ze, Wang Yong-Qiang, Chang Ya-Jun, Zhu De-Sheng, Li Shi-Yan, Yang Ke-Hu. Distributed cooperative synchronization control of hydraulic support robot group for advancing. Journal of China Coal Society, 2024, 49(S2): 1208−1222
    [24] Wang J, Zhang Y, Kim T-K, Gu Y. Shapley Q-value: a local reward approach to solve global reward games. arXiv preprint arXiv: 1907.05707, 2019
    [25] Wang J, Zhang Y, Gu Y, Kim T-K. SHAQ: incorporating Shapley value theory into multi-agent Q-learning. arXiv preprint arXiv: 2105.15013, 2021
    [26] Li J. Shapley counterfactual credits for multi-agent reinforcement learning. arXiv preprint arXiv: 2106.00285, 2021
    [27] Heuillet A, Couthouis F, Diaz-Rodriguez N. Collective explainable AI: explaining cooperative strategies and agent contribution in multiagent reinforcement learning with Shapley values. IEEE Computational Intelligence Magazine, 2022, 17(1): 59−71 doi: 10.1109/MCI.2021.3129959
    [28] Wang J. Shapley value based multi-agent reinforcement learning: theory, method and its application to energy network. arXiv preprint arXiv: 2402.15324, 2024
    [29] Qin H, Zhang W, Tian R. Collaborative control method of transit signal priority based on cooperative game and reinforcement learning. In: Proceedings of the 4th IEEE International Conference on Electronic Technology, Communication and Information (ICETCI). New York: IEEE, 2024. 537-542
    [30] Tang C, Pan L, Chen J, Liu Y, Lai J. A game theory-reinforcement learning approach to cooperation for UAVs. IEEE Transactions on Vehicular Technology, 2025, 74(6): 9864−9869 doi: 10.1109/TVT.2025.3539382
    [31] Yang L. Policy representation via diffusion probability model for reinforcement learning. arXiv preprint arXiv: 2305.13122, 2023
    [32] 徐轶bob. 掩护式液压支架综采工作面三维演示 [Online], available: https://www.bilibili.com/video/BV1jh4y1S7Z9/, 2026-05-21

    Xu Yi-Bob. Three-dimensional demonstration of hydraulic shield support in fully mechanized coal mining face [Online], available: https://www.bilibili.com/video/BV1jh4y1S7Z9/, May 21, 2026
  • 加载中
计量
  • 文章访问数:  10
  • HTML全文浏览量:  7
  • 被引次数: 0
出版历程
  • 收稿日期:  2025-11-11
  • 录用日期:  2026-05-13
  • 网络出版日期:  2026-05-25

目录

    /

    返回文章
    返回