基于多智能体合作博弈的液压支架调直策略研究

杨艺; 王静怡; 钱伟; 王田

doi:10.16383/j.aas.c250624

基于多智能体合作博弈的液压支架调直策略研究

doi: 10.16383/j.aas.c250624 cstr: 32138.14.j.aas.c250624

杨艺^{1, 2, 3,},
王静怡^{1, 2,},
钱伟^{1, 2,},
王田^4,

1.
河南理工大学电气工程与自动化学院, 焦作 454003
2.
河南省煤矿装备智能检测与控制重点实验室, 焦作 454003
3.
郑州恒达智控科技股份有限公司, 郑州 450000
4.
北京航空航天大学人工智能学院, 北京 100191

基金项目: 国家自然科学基金(92467108)资助

详细信息

作者简介:
杨艺：河南理工大学电气工程与自动化学院副教授. 2007年获得河南理工大学硕士学位, 2017年获得北京航空航天大学博士学位. 主要研究方向为人工智能, 强化学习, 自适应动态规划及其在煤矿中的应用. 本文通信作者. E-mail: yangyi@hpu.edu.cn

王静怡：河南理工大学电气工程与自动化学院硕士研究生. 主要研究方向为多智能体强化学习. E-mail: Anna_wangjy@outlook.com

钱伟：河南理工大学电气工程与自动化学院教授. 2005年获得东南大学硕士学位, 2009年获得浙江大学工业控制技术国家重点实验室博士学位. 主要研究方向为时滞系统, 随机系统, 网络控制系统和多智能体系统. E-mail: qwei@hpu.edu.cn

王田：北京航空航天大学人工智能学院教授. 2010年获得西安交通大学硕士学位, 2014年获得法国特鲁瓦工程技术大学博士学位. 主要研究方向为人工智能, 计算机视觉和模式识别. E-mail: wangtian@buaa.edu.cn

计量
- 文章访问数: 106
- HTML全文浏览量: 60
- 被引次数: 0
出版历程
- 收稿日期: 2025-11-11
- 录用日期: 2026-05-13
- 网络出版日期: 2026-05-25

Study on Hydraulic Support Alignment Strategy Based on Multi-agent Cooperative Game

YANG Yi^{1, 2, 3
,},
WANG Jing-Yi^{1, 2
,},
QIAN Wei^{1, 2
,},
WANG Tian^4
,

1.
School of Electrical Engineering and Automation, Henan Polytechnic University, Jiaozuo 454003
2.
Henan Key Laboratory of Intelligent Inspection and Control of Coal Mine Equipment, Jiaozuo 454003
3.
Zhengzhou Hengda Intelligent Control Technology Co., Ltd., Zhengzhou 450000
4.
School of Artificial Intelligence, Beihang University, Beijing 100191

Funds: Supported by National Natural Science Foundation of China (92467108)

More Information

Author Bio:
YANG Yi　Associate professor at the School of Electrical Engineering and Automation, Henan Polytechnic University. He received his master degree from Henan Polytechnic University in 2007, and his Ph.D. degree from Beihang University in 2017. His research interests include artificial intelligence, reinforcement learning, adaptive dynamic programming, and their applications in coal mine. Corresponding author of this paper

WANG Jing-Yi Master student at the School of Electrical Engineering and Automation, Henan Polytechnic University. Her main research interest is multi-agent reinforcement learning

Professor at the School of Electrical Engineering and Automation, Henan Polytechnic University. He received his master degree from Southeast University in 2005, and his Ph.D. degree from the State Key Laboratory of Industrial Control Technology, Zhejiang University in 2009. His research interests include time-delay systems, stochastic systems, networked control systems, and multi-agent systems

WANG Tian Professor at the School of Artificial Intelligence, Beihang University. He received his master degree from Xi＇an Jiaotong University in 2010, and his Ph.D. degree from Université de Technologie de Troyes, France, in 2014. His research interests include artificial intelligence, computer vision, and pattern recognition

摘要

摘要: 综采工作面液压支架群调直过程中, 支架群间的强耦合关系与液压缸摩擦−滑移非线性构成典型的“耦合−非线性”双重复杂性, 使传统控制方法难以有效建模. 现有多智能体强化学习方法能实现并行决策, 但面临全局奖励无法精确归因至各支架动作、仅依赖观测状态难以捕获支架群姿态的时序演化规律等问题, 阻碍策略的有效收敛. 为此, 提出融合合作博弈与长短期记忆网络(long short-term memory, LSTM)的多智能体强化学习算法CG-LSTM-MATD3. 该算法基于去中心化部分可观测马尔科夫决策过程对液压支架调直过程建模, 引入Shapley值实现各液压支架边际贡献的合理归因, 并设计Coalition网络通过联盟生成降低计算复杂度. 其次, 在Actor、Critic以及Coalition网络中嵌入LSTM模块, 通过对状态序列等信息的记忆, 使模型能够捕获状态的时序依赖关系, 增强对环境动态特性和真实状态的感知能力. 实验结果表明, 该算法在七支架任务中直线度相对基线算法提升80.59%, 消融实验进一步证明了合作博弈机制与LSTM模块的有效性.
- 液压支架 /
- 多智能体强化学习 /
- 合作博弈 /
- MATD3算法 /
- LSTM /
- Shapley值
Abstract: During the alignment of hydraulic support clusters in fully mechanized mining faces, the strong coupling between support clusters and the friction-slip nonlinearity of hydraulic cylinders create a typical “coupling-nonlinearity” dual complexity, making it difficult for traditional control methods to model the system effectively. Existing multi-agent reinforcement learning methods can achieve parallel decision-making, while they face challenges such as the inability to precisely attribute global rewards to each support action, and the difficulty of capturing the temporal evolution of the support cluster's posture based solely on observed states, which hinder effective policy convergence. To address these problems, this paper proposes the multi-agent reinforcement learning algorithm CG-LSTM-MATD3, which integrates cooperative games with long short-term memory networks(LSTM). This algorithm models the hydraulic support alignment process based on a decentralized partially observable Markov decision process, introduces Shapley values to reasonably attribute the marginal contributions of each hydraulic support, and designs a Coalition network to reduce computational complexity through coalition formation. Furthermore, LSTM modules are embedded in the Actor, Critic, and Coalition networks. By storing information such as state sequences, the model can capture temporal dependencies among states, thereby enhancing its ability to perceive the dynamic characteristics of the environment and the true state. Experimental results show that the algorithm achieves an 80.59% improvement in linearity compared to the baseline algorithm in the seven-support task. Ablation experiments further validate the effectiveness of the cooperative game mechanism and the LSTM modules.
- hydraulic support /
- multi-agent reinforcement learning /
- cooperative game /
- MATD3 algorithm /
- LSTM /
- Shapley value

HTML全文

图 1 煤矿综采工作面液压支架协同作业

Fig. 1 Cooperative operation of hydraulic supports in fully mechanized coal mining face

下载: 全尺寸图片幻灯片

图 2 CG-LSTM-MATD3算法框架

Fig. 2 The framework of CG-LSTM-MATD3 algorithm

下载: 全尺寸图片幻灯片

图 3 Shapley值信用分配

Fig. 3 Shapley value credit assignment

下载: 全尺寸图片幻灯片

图 4 LSTM融入各网络示意图

Fig. 4 Schematic diagram of LSTM integration into each network

下载: 全尺寸图片幻灯片

图 5 $ 5\sim10 $个智能体规模下的性能曲线

Fig. 5 Performance curves with $ 5\sim10 $ agents

下载: 全尺寸图片幻灯片

图 6 五个智能体下的奖励和直线度曲线

Fig. 6 Curves of reward and linearity for 5 agents

下载: 全尺寸图片幻灯片

图 7 六个智能体下的奖励和直线度曲线

Fig. 7 Curves of reward and linearity for 6 agents

下载: 全尺寸图片幻灯片

图 8 七个智能体下的奖励和直线度曲线

Fig. 8 Curves of reward and linearity for 7 agents

下载: 全尺寸图片幻灯片

图 9 不同噪声场景下七个智能体的性能曲线

Fig. 9 Performance curves of 7 agents under different noise scenes

下载: 全尺寸图片幻灯片

表 1 算法超参数设置

Table 1 Algorithm hyperparameter setting

参数	MATD3	CG-MATD3	CG-LSTM-MATD3	DIPO
Actor学习率	$ 3\times10^{-4} $	$ 3\times10^{-4} $	$ 3\times10^{-4} $	$ 3\times10^{-4} $
Critic学习率	$ 3\times10^{-4} $	$ 3\times10^{-4} $	$ 3\times10^{-4} $	$ 3\times10^{-4} $
联盟学习率	–	$ 3\times10^{-4} $	$ 3\times10^{-4} $	–
折扣因子	0.99	0.99	0.99	0.99
软更新参数	$ 5\times10^{-3} $	$ 5\times10^{-3} $	$ 5\times10^{-3} $	$ 5\times10^{-3} $
步数	10	10	10	10
训练总回合	2000	2000	2000	2000
经验池容量	10000	10000	10000	10000
批容量	256	256	256	256
动作梯度步数	–	–	–	4
扩散推理时间步	–	–	–	4
动作学习率	–	–	–	0.04
Actor、Critic梯度裁剪范数	–	–	–	1.0
动作梯度范数比例	–	–	–	0.1

下载: 导出CSV

表 2 不同场景下的奖励与直线度数值

Table 2 Reward and linearity values under different scenes

场景	说明	奖励	直线度
Original	无干扰	6.476	$ - $1.028
Scene 1	液压支架偏斜	6.180	$ - $1.188
Scene 2	传感器噪声	5.930	$ - $1.534
Scene 3	煤层起伏	6.416	$ - $0.985
Scene 4	混和噪声干扰	5.895	$ - $1.683

下载: 导出CSV

表 3 LSTM与经验回放模块互补性分析

Table 3 Complementary analysis of LSTM and experience replay modules

MATD3算法配置	收敛回合数	平均直线度	直线度$ \uparrow $	平均奖励	奖励$ \uparrow $
经验回放	未收敛至最优	$ - $5.301	–	5.001	–
LSTM+顺序采样	530	$ - $1.795	66.14%	5.548	10.94%
LSTM+经验回放	350	$ - $1.497	71.76%	6.320	26.37%

下载: 导出CSV

表 4 消融实验

Table 4 Ablation experiment

CG	LSTM	直线度			奖励			运行时间(ms)	GPU占用(MB)	模型参数量(KB)
CG	LSTM	Avg	Std	提升	Avg	Std	提升	运行时间(ms)	GPU占用(MB)	模型参数量(KB)
$ \times $	$ \times $	$ - $5.301	1.71	–	5.001	0.89	–	6.88	7.71	344.9
$ \times $	$ \checkmark $	$ - $1.497	1.61	71.76%	6.320	0.47	26.37%	9.34	17.42	363.7
$ \checkmark $	$ \times $	$ - $1.675	0.54	68.40%	5.428	0.53	8.54%	9.34	8.37	357.1
$ \checkmark $	$ \checkmark $	$ - $1.029	0.43	80.59%	6.477	0.44	29.51%	9.34	19.73	378.9

下载: 导出CSV

参考文献(32)

[1]	张德生, 周杰, 任怀伟, 冯银辉, 韩会军, 巩师鑫. 超大采高工作面装备群智能协同控制关键技术. 煤炭技术, 2025, 44(10): 1−5 doi: 10.13301/j.cnki.ct.2025.10.001 Zhang De-Sheng, Zhou Jie, Ren Huai-Wei, Feng Yin-Hui, Han Hui-Jun, Gong Shi-Xin. Key technologies for intelligent collaborative control of equipment group in super high mining face. Coal Technology, 2025, 44(10): 1−5 doi: 10.13301/j.cnki.ct.2025.10.001
[2]	Murphy K. Reinforcement learning: an overview. arXiv preprint arXiv: 2412.05265, 2024
[3]	罗开成, 高阳, 杨艺, 常亚军, 袁瑞甫. 基于均值偏差奖赏函数的放煤口控制策略研究. 煤炭工程, 2022, 54(9): 105−111 Luo Kai-Cheng, Gao Yang, Yang Yi, Chang Ya-Jun, Yuan Rui-Fu. Control strategy of drawing opening based on mean deviation reward function. Coal Engineering, 2022, 54(9): 105−111
[4]	杨艺, 李庆元, 李化敏, 李东印, 杨延麟, 费树岷. 基于批量式强化学习的群组放煤智能决策研究. 煤炭科学技术, 2022, 50(10): 188−197 Yang Yi, Li Qing-Yuan, Li Hua-Min, Li Dong-Yin, Yang Yan-Lin, Fei Shu-Min. Research on intelligent decision-making of group drawing based on batch reinforcement learning. Coal Science and Technology, 2022, 50(10): 188−197
[5]	杨艺, 王圣文, 崔科飞, 费树岷. 基于模糊深度Q网络的放煤智能决策方法. 工矿自动化, 2023, 49(4): 78−85 doi: 10.13272/j.issn.1671-251x.2022090068 Yang Yi, Wang Sheng-Wen, Cui Ke-Fei, Fei Shu-Min. Intelligent drawing decision method based on fuzzy deep Q-network. Industry and Mine Automation, 2023, 49(4): 78−85 doi: 10.13272/j.issn.1671-251x.2022090068
[6]	杨艺, 孙英杰, 常亚军, 刘斌斌, 王科平. 基于链式基站坐标融合的采煤机定位方法. 工矿自动化, 2025, 51(5): 49−56 Yang Yi, Sun Ying-Jie, Chang Ya-Jun, Liu Bin-Bin, Wang Ke-Ping. Shearer positioning method based on chain base station coordinate fusion. Industry and Mine Automation, 2025, 51(5): 49−56
[7]	Luo B, Liu D, Wu H-N, Huang T, Yang C, Gui W. Recent advances on off-policy reinforcement learning for optimization control. IEEE Transactions on Cybernetics, 20261−14
[8]	Yang Y, Dai Y, Wang T, Qian W. Hydraulic-supports alignment by TD3 with segmented experience pool. Neural Processing Letters, 2025, 57(2): 35 doi: 10.1007/s11063-025-11744-y
[9]	郭雷, 梁成庆. 基于MATD3算法的多智能体避碰控制. 计算技术与自动化, 2024, 43(1): 9−15 Guo Lei, Liang Cheng-Qing. Multi-agent collision avoidance control based on MATD3 algorithm. Computing Technology and Automation, 2024, 43(1): 9−15
[10]	Guicheng S, Yang W. Review on dec-POMDP model for MARL algorithms. In: Proceedings of the Smart Communications, Intelligent Algorithms and Interactive Methods. Singapore: Springer, 2022. 29-35
[11]	Alvarez-Mozos M, Macho-Stadler I, Perez-Castrillo D. Sequential creation of surplus and the shapley value. Games and Economic Behavior, 2026, 155: 149−166 doi: 10.1016/j.geb.2025.09.007
[12]	Neuman E, Tuschmann S. Stochastic graphon games with interventions. arXiv preprint arXiv: 2507.00561, 2025
[13]	Krichen M, Mihoub A. Long short-term memory networks: a comprehensive survey. AI, 2025, 6(9): 215 doi: 10.3390/ai6090215
[14]	白晋铭, 王然风, 付翔. 基于架间行走机器人的液压支架直线度测量方法. 工矿自动化, 2019, 45(1): 45−51 Bai Jin-Ming, Wang Ran-Feng, Fu Xiang. Straightness measurement method of hydraulic support based on walking robot between supports. Industry and Mine Automation, 2019, 45(1): 45−51
[15]	张旭辉, 王冬曼, 杨文娟. 基于视觉测量的液压支架位姿检测方法. 工矿自动化, 2019, 45(3): 56−60 doi: 10.13272/j.issn.1671-251x.2018090039 Zhang Xu-Hui, Wang Dong-Man, Yang Wen-Juan. Position and posture detection method of hydraulic support based on visual measurement. Industry and Mine Automation, 2019, 45(3): 56−60 doi: 10.13272/j.issn.1671-251x.2018090039
[16]	张树楠, 曹现刚, 崔亚仲, 罗璇, 张国祯. 基于多传感器的液压支架直线度测量方法研究. 煤矿机械, 2020, 41(4): 56−59 doi: 10.13436/j.mkjx.202004019 Zhang Shu-Nan, Cao Xian-Gang, Cui Ya-Zhong, Luo Xuan, Zhang Guo-Zhen. Research on straightness measurement method of hydraulic support based on multi-sensor. Coal Mine Machinery, 2020, 41(4): 56−59 doi: 10.13436/j.mkjx.202004019
[17]	王宇卓, 常宗旭, 高飞, 廉自生. 液压支架的调直方法研究. 机电工程, 2021, 38(5): 645−649 doi: 10.3969/j.issn.1001-4551.2021.05.020 Wang Yu-Zhuo, Chang Zong-Xu, Gao Fei, Lian Zi-Sheng. Research on alignment method of hydraulic support. Journal of Mechanical & Electrical Engineering, 2021, 38(5): 645−649 doi: 10.3969/j.issn.1001-4551.2021.05.020
[18]	宋单阳, 卢春贵, 陶心雅, 杨金衡, 王培恩, 郑文强. 基于最大熵卡尔曼滤波算法的液压支架调直方法. 工矿自动化, 2022, 48(11): 119−124 Song Dan-Yang, Lu Chun-Gui, Tao Xin-Ya, Yang Jin-Heng, Wang Pei-En, Zheng Wen-Qiang. Hydraulic support alignment method based on maximum entropy Kalman filter algorithm. Industry and Mine Automation, 2022, 48(11): 119−124
[19]	胡波, 廉自生. 基于支持向量机和遗传算法的液压支架调直系统研究. 煤矿机械, 2014, 35(10): 39−41 Hu Bo, Lian Zi-Sheng. Research on hydraulic support alignment system based on support vector machine and genetic algorithm. Coal Mine Machinery, 2014, 35(10): 39−41
[20]	王虹, 尤秀松, 李首滨, 魏文艳. 基于遗传算法与BP神经网络的支架跟机自动化研究. 煤炭科学技术, 2021, 49(1): 272−277 doi: 10.13199/j.cnki.cst.2021.01.024 Wang Hong, You Xiu-Song, Li Shou-Bin, Wei Wen-Yan. Research on support following automation based on genetic algorithm and BP neural network. Coal Science and Technology, 2021, 49(1): 272−277 doi: 10.13199/j.cnki.cst.2021.01.024
[21]	李文俊, 周展. 基于惯导系统的综采工作面自动调直技术. 陕西煤炭, 2022, 41(4): 130−133 doi: 10.3969/j.issn.1008-0155.2023.23.018 Li Wen-Jun, Zhou Zhan. Automatic alignment technology of fully mechanized working face based on inertial navigation system. Shaanxi Coal, 2022, 41(4): 130−133 doi: 10.3969/j.issn.1008-0155.2023.23.018
[22]	王云飞, 赵继云, 张鹤, 王浩, 张阳. 基于神经网络补偿的液压支架群推移系统直线度控制方法. 煤炭科学技术, 2024, 52(11): 174−185 Wang Yun-Fei, Zhao Ji-Yun, Zhang He, Wang Hao, Zhang Yang. Straightness control method of hydraulic support group pushing system based on neural network compensation. Coal Science and Technology, 2024, 52(11): 174−185
[23]	孙铭泽, 王永强, 常亚军, 朱德昇, 李石岩, 杨克虎. 液压支护机器人群组移架一致性分布式协同控制. 煤炭学报, 2024, 49(S2): 1208−1222 Sun Ming-Ze, Wang Yong-Qiang, Chang Ya-Jun, Zhu De-Sheng, Li Shi-Yan, Yang Ke-Hu. Distributed cooperative synchronization control of hydraulic support robot group for advancing. Journal of China Coal Society, 2024, 49(S2): 1208−1222
[24]	Wang J, Zhang Y, Kim T-K, Gu Y. Shapley Q-value: a local reward approach to solve global reward games. arXiv preprint arXiv: 1907.05707, 2019
[25]	Wang J, Zhang Y, Gu Y, Kim T-K. SHAQ: incorporating Shapley value theory into multi-agent Q-learning. arXiv preprint arXiv: 2105.15013, 2021
[26]	Li J. Shapley counterfactual credits for multi-agent reinforcement learning. arXiv preprint arXiv: 2106.00285, 2021
[27]	Heuillet A, Couthouis F, Diaz-Rodriguez N. Collective explainable AI: explaining cooperative strategies and agent contribution in multiagent reinforcement learning with Shapley values. IEEE Computational Intelligence Magazine, 2022, 17(1): 59−71 doi: 10.1109/MCI.2021.3129959
[28]	Wang J. Shapley value based multi-agent reinforcement learning: theory, method and its application to energy network. arXiv preprint arXiv: 2402.15324, 2024
[29]	Qin H, Zhang W, Tian R. Collaborative control method of transit signal priority based on cooperative game and reinforcement learning. In: Proceedings of the 4th IEEE International Conference on Electronic Technology, Communication and Information (ICETCI). New York: IEEE, 2024. 537-542
[30]	Tang C, Pan L, Chen J, Liu Y, Lai J. A game theory-reinforcement learning approach to cooperation for UAVs. IEEE Transactions on Vehicular Technology, 2025, 74(6): 9864−9869 doi: 10.1109/TVT.2025.3539382
[31]	Yang L. Policy representation via diffusion probability model for reinforcement learning. arXiv preprint arXiv: 2305.13122, 2023
[32]	徐轶bob. 掩护式液压支架综采工作面三维演示 [Online], available: https://www.bilibili.com/video/BV1jh4y1S7Z9/, 2026-05-21 Xu Yi-Bob. Three-dimensional demonstration of hydraulic shield support in fully mechanized coal mining face [Online], available: https://www.bilibili.com/video/BV1jh4y1S7Z9/, May 21, 2026