冗余人工肌肉驱动的仿生机器人强化学习控制

牛鹏军; 程屹涛; 朱彦臣; 厉侃; 刘珂

doi:10.16383/j.aas.c250508

冗余人工肌肉驱动的仿生机器人强化学习控制

doi: 10.16383/j.aas.c250508 cstr: 32138.14.j.aas.c250508

牛鹏军^1,,
程屹涛^1,,
朱彦臣^2,,
厉侃^2,,
刘珂^1,

1.
北京大学先进制造与机器人学院北京 100871
2.
华中科技大学智能制造装备与技术全国重点实验室武汉 430074

基金项目: 国家重点研发计划(2022YFB4701900) 资助

详细信息

作者简介:
牛鹏军：北京大学先进制造与机器人学院博士研究生. 2025年获得北京航空航天大学机械工程及自动化学院学士学位. 主要研究方向为机器人仿真与控制. E-mail: pjniu25@stu.pku.edu.cn

程屹涛：北京大学先进制造与机器人学院博士研究生. 2023年获得北京大学工学院学士学位. 主要研究方向为软体机器人, 机器人感知与控制, 人机交互. E-mail: chengyitao@pku.edu.cn

朱彦臣：华中科技大学智能制造装备与技术全国重点实验室博士研究生. 2023年获得四川大学机械工程学院学士学位. 主要研究方向为柔性电子, 软体机器人. E-mail: yanchenshizhu@hust.edu.cn

厉侃：华中科技大学智能制造装备与技术全国重点实验室研究员. 2019年获得美国西北大学理论与应用力学专业博士学位. 主要研究方向为三维柔性微飞行器, 三维柔性可拉伸电子器件. E-mail: kanli@hust.edu.cn

刘珂：北京大学先进制造与机器人学院研究员. 2019年获得美国佐治亚理工学院博士学位. 主要研究方向为柔性结构与软体机器的设计、分析与应用. 本文通信作者. E-mail: liuke@pku.edu.cn

计量
- 文章访问数: 453
- HTML全文浏览量: 214
- PDF下载量: 83
- 被引次数: 0
出版历程
- 收稿日期: 2025-09-29
- 网络出版日期: 2026-04-24
- 刊出日期: 2026-05-20

Reinforcement Learning Control for Bioinspired Robots Driven by Redundant Artificial Muscles

NIU Peng-Jun^1
,,
CHENG Yi-Tao^1
,,
ZHU Yan-Chen^2
,,
LI Kan^2
,,
LIU Ke^1
,

1.
School of Advanced Manufacturing and Robotics, Peking University, Beijing 100871
2.
State Key Laboratory of Intelligent Manufacturing Equipment and Technology, Huazhong University of Science and Technology, Wuhan 430074

Funds: Supported by National Key Research and Development Program of China (2022YFB4701900)

More Information

Author Bio:
NIU Peng-Jun　Ph.D. candidate at the School of Advanced Manufacturing and Robotics, Peking University. He received his bachelor degree from the School of Mechanical Engineering and Automation, Beihang University in 2025. His research interests include robot simulation and control

CHENG Yi-Tao　Ph.D. candidate at the School of Advanced Manufacturing and Robotics, Peking University. He received his bachelor degree from the College of Engineering, Peking University in 2023. His research interests include soft robot, robot perception and control, and human-machine interaction

ZHU Yan-Chen　Ph.D. candidate at the State Key Laboratory of Intelligent Manufacturing Equipment and Technology, Huazhong University of Science and Technology. He received his bachelor degree from the School of Mechanical Engineering, Sichuan University in 2023. His research interests include flexible electronics and soft robotics

LI Kan　Research fellow at the State Key Laboratory of Intelligent Manufacturing Equipment and Technology, Huazhong University of Science and Technology. He received his Ph.D. degree in theoretical and applied mechanics from Northwestern University, USA, in 2019. His research interests include three-dimensional flexible micro aerial vehicles and three-dimensional flexible and stretchable electronic devices

LIU Ke　Research fellow at the School of Advanced Manufacturing and Robotics, Peking University. He received his Ph.D. degree from Georgia Institute of Technology, USA, in 2019. His research interests include design, analysis and application of flexible structures and soft machines. Corresponding author of this paper

摘要

摘要: 人工肌肉是仿生机器人的核心驱动部件, 然而当前人工肌肉的应用与真实生物相差甚远, 缺乏像生物一样的冗余多肌肉协同. 针对上述问题, 围绕仿生机器人的复杂人工肌肉驱动与协同, 本文提出一种由多股人工肌肉并联驱动的软体机器人设计, 并围绕这种设计建立基于强化学习的运动控制策略. 研制了以柔性十字形电路板为主体, 集成六路液晶弹性体人工肌肉与驱动电路的原型样机, 并测试获得其应变特性与响应性能; 针对原型样机形变−运动特点, 在仿真环境中构建基于绳腱驱动的简化模型. 通过合理设计状态空间、动作空间及奖励函数等, 以 soft Actor-Critic算法进行强化学习并行训练, 得到平移与旋转运动肌肉协同策略. 将运动策略中稳定周期段以离线方式驱动实物样机, 实现有效的多向平移与旋转运动, 验证了采用强化学习控制复杂人工肌肉系统的可行性.
- 仿生机器人 /
- 人工肌肉 /
- 强化学习 /
- 软体机器人 /
- 运动控制
Abstract: Artificial muscles are key actuation components for bioinspired robots. However, their current applications remain far from the capabilities of biological muscle systems, particularly due to the lack of redundant and coordinated multi-muscle actuation similar to that found in living organisms. To address the challenge of complex artificial-muscle actuation and coordination in bioinspired robots, this study proposes a soft robotic design driven by multiple artificial muscles arranged in parallel and develops a reinforcement learning-based locomotion control strategy for this design. A prototype was developed using a flexible cross-shaped printed circuit board as the main body, integrating six liquid crystal elastomer artificial muscles and their driving circuits. Its strain characteristics and dynamic response performance were experimentally characterized. Considering the deformation and locomotion characteristics of the prototype, a simplified tendon-driven model was constructed in a simulation environment. By properly designing the state space, action space, and reward functions, parallel reinforcement learning training was conducted using the soft Actor-Critic algorithm to obtain coordinated muscle activation strategies for translational and rotational locomotion. The stable periodic segments of the learned locomotion policies were then extracted and used to drive the physical prototype offline. The robot achieved effective multidirectional translation and rotation, demonstrating the feasibility of using reinforcement learning to control complex artificial-muscle-driven systems.
- bionic robot /
- artificial muscles /
- reinforcement learning /
- soft robot /
- locomotion control

HTML全文

图 1 原型样机

Fig. 1 The prototype

下载: 全尺寸图片幻灯片

图 2 人工肌肉制备与封装

Fig. 2 Fabrication and encapsulation of artificial muscles

下载: 全尺寸图片幻灯片

图 3 形变特征

Fig. 3 Deformation characteristic

下载: 全尺寸图片幻灯片

图 4 拉压力学性能测试

Fig. 4 Tension-compression mechanical performance testing

下载: 全尺寸图片幻灯片

图 5 人工肌肉驱动响应

Fig. 5 Artificial muscle actuation response

下载: 全尺寸图片幻灯片

图 6 器件选择与电路设计

Fig. 6 Component selection and circuit design

下载: 全尺寸图片幻灯片

图 7 人工肌肉驱动方案

Fig. 7 Artificial muscle actuation scheme

下载: 全尺寸图片幻灯片

图 8 简化模型结构

Fig. 8 Simplified model structure

下载: 全尺寸图片幻灯片

图 9 实物−仿真形变对应

Fig. 9 Physical-simulation deformation correspondence

下载: 全尺寸图片幻灯片

图 10 SAC网络结构

Fig. 10 SAC network structure

下载: 全尺寸图片幻灯片

图 11 并行训练

Fig. 11 Parallel training

下载: 全尺寸图片幻灯片

图 12 强化学习训练成果

Fig. 12 Reinforcement learning training results

下载: 全尺寸图片幻灯片

图 13 离线x方向运动

Fig. 13 Offline translation along the x-axis

下载: 全尺寸图片幻灯片

图 14 离线y方向运动

Fig. 14 Offline translation along the y-axis

下载: 全尺寸图片幻灯片

图 15 离线yaw方向旋转

Fig. 15 Offline rotation about the yaw axis

下载: 全尺寸图片幻灯片

参考文献(54)

[1]	Joachimczak M, Suzuki R, Arita T. Improving evolvability of morphologies and controllers of developmental soft-bodied robots with novelty search. Frontiers in Robotics and AI, 2015, 2: Article No. 00033 doi: 10.3389/frobt.2015.00033
[2]	Woodward M A, Sitti M. Morphological intelligence counters foot slipping in the desert locust and dynamic robots. Proceedings of the National Academy of Sciences of the United States of America, 2018, 115(36): E8358−E8367 doi: 10.1073/pnas.1804239115
[3]	Ghazi-Zahedi K, Haeufle D F B, Montufar G, Schmitt S, Ay N. Evaluating morphological computation in muscle and DC-motor driven models of human hopping. arXiv preprint arXiv: 1512.00250, 2015.
[4]	Uppington M, Gobbo P, Hauert S, Hauser H. Evolving and generalising morphologies for locomoting micro-scale robotic agents. Journal of Micro and Bio Robotics, 2022, 18: 37−47 doi: 10.1007/s12213-023-00155-8
[5]	王久斌, 贺威, 孟亭亭, 邹尧, 付强. 基于高仿生形态布局的仿鸽扑翼飞行机器人系统设计. 自动化学报, 2024, 50(2): 308−319 doi: 10.16383/j.aas.c220836 Wang Jiu-Bin, He Wei, Meng Ting-Ting, Zou Yao, Fu Qiang. System design of dove-like flapping-wing flying robot based on highly bionic morphological layout. Acta Automatica Sinica, 2024, 50(2): 308−319 doi: 10.16383/j.aas.c220836
[6]	Wang T Y, Pierce C, Kojouharov V, Chong B X, Diaz K, Lu H, et al. Mechanical intelligence simplifies control in terrestrial limbless locomotion. Science Robotics, 2023, 8: Article No. eadi2243 doi: 10.1126/scirobotics.adi2243
[7]	Chen A, Song B F, Liu K, Wang Z H, Xue D, Qi H D. Flapping-wing robot achieves bird-style self-takeoff by adopting reconfigurable mechanisms. Science Advances, 2025, 11: Article No. eadx0465 doi: 10.1126/sciadv.adx0465
[8]	Zhong Q, Zhu J, Fish F E, Kerr S J, Downs A M, Bart-Smith H, et al. Tunable stiffness enables fast and efficient swimming in fish-like robots. Science Robotics, 2021, 6: Article No. eabe4088 doi: 10.1126/scirobotics.abe4088
[9]	Wen L, Ren Z Y, Di Santo V, Hu K N, Yuan T, Wang T M, et al. Understanding fish linear acceleration using an undulatory biorobotic model with soft fluidic elastomer actuated morphing median fins. Soft Robotics, 2018, 5(4): 375−388 doi: 10.1089/soro.2017.0085
[10]	吴正兴, 喻俊志, 谭民. 两类仿鲹科机器鱼倒游运动控制方法的对比研究. 自动化学报, 2013, 39(12): 2032−2042 doi: 10.3724/SP.J.1004.2013.02032 Wu Zheng-Xing, Yu Jun-Zhi, Tan Min. Comparison of two methods to implement backward swimming for a carangiform robotic fish. Acta Automatica Sinica, 2013, 39(12): 2032−2042 doi: 10.3724/SP.J.1004.2013.02032
[11]	Liu Z M, Liu J Q, Wang H, Yu X, Yang K, Liu W B, et al. A 1 mm-thick miniatured mobile soft robot with mechanosensation and multimodal locomotion. IEEE Robotics and Automation Letters, 2020, 5(2): 3291−3298 doi: 10.1109/LRA.2020.2976306
[12]	Zhang Y F, Yang D Z, Yan P N, Zhou P W, Zou J, Gu G Y. Inchworm inspired multimodal soft robots with crawling, climbing, and transitioning locomotion. IEEE Transactions on Robotics, 2022, 38(3): 1806−1819 doi: 10.1109/TRO.2021.3115257
[13]	Ren Z Y, Sitti M. Design and build of small-scale magnetic soft-bodied robots with multimodal locomotion. Nature Protocols, 2024, 19: 441−486 doi: 10.1038/s41596-023-00916-6
[14]	Hu W Q, Lum G Z, Mastrangeli M, Sitti M. Small-scale soft-bodied robot with multimodal locomotion. Nature, 2018, 554: 81−85 doi: 10.1038/nature25443
[15]	Niu J W, Zhang F W, Liu C L, Xie K R, Zhang Y X, Zhang J, et al. Magnetically driven biomimetic microrobot loaded with eleutheroside B for targeted delivery and neural repair in spinal cord injury. ACS Applied Materials and Interfaces, 2025, 17(30): 42688−42705 doi: 10.1021/acsami.5c07658
[16]	Yu S M, Zhang W W, Feng Y Z, Zhang X, Li C H, Shi S J, et al. Magnetic cell-mimetic droplet microrobots with division and exocytosis capabilities. Research, 2025, 8: Article No. 0730 doi: 10.34133/research.0730
[17]	Li T L, Yu S M, Sun B, Li Y L, Wang X L, Pan Y L, et al. Bioinspired claw-engaged and biolubricated swimming microrobots creating active retention in blood vessels. Science Advances, 2023, 9: Article No. eadg4501 doi: 10.1126/sciadv.adg4501
[18]	Pan F, Liu J Q, Zuo Z H, He X, Shao Z Y, Chen J Y, et al. Miniature deep-sea morphable robot with multimodal locomotion. Science Robotics, 2025, 10: Article No. eadp7821 doi: 10.1126/scirobotics.adp7821
[19]	Feng R Y, He Y M, Feng S Y, Li S G. Impulsive actuation for soft robots. npj Robotics, 2025, 3: Article No. 27 doi: 10.1038/s44182-025-00045-0
[20]	Xu Y, Zhuo J S, Fan M Y, Li X, Cao X N, Ruan D R, et al. A bioinspired shape memory alloy based soft robotic system for deep-sea exploration. Advanced Intelligent System, 2024, 6: Article No. 2300699 doi: 10.1002/aisy.202300699
[21]	Huang X N, Kumar K, Jawed M K, Nasab A M, Ye Z S, Shan W L, et al. Chasing biomimetic locomotion speeds: Creating untethered soft robots with shape memory alloy actuators. Science Robotics, 2018, 3: Article No. eaau7557 doi: 10.1126/scirobotics.aau7557
[22]	Gu G Y, Zou J, Zhao R K, Zhao X H, Zhu X Y. Soft wall-climbing robots. Science Robotics, 2018, 3: Article No. eaat2874 doi: 10.1126/scirobotics.aat2874
[23]	Wang X X, Pei X, Wang X Y, Hou T G. Lightweight untethered soft robotic fish. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). Yokohama, Japan: IEEE, 2024. 669−675
[24]	Shintake J, Cacucciolo S, Shea H, Floreano D. Soft biomimetic fish robot made of dielectric elastomer actuators. Soft Robotics, 2018, 5(4): 466−474 doi: 10.1089/soro.2017.0062
[25]	Wang T L, Joo H J, Song S Y, Hu W Q, Keplinger C, Sitti M. A versatile jellyfish-like robotic platform for effective underwater propulsion and manipulation. Science Advances, 2023, 9: Article No. eadg0292 doi: 10.1126/sciadv.adg0292
[26]	陶子辰, 刘松源, 桂昀, 郝思远, 方浩, 杨庆凯. 张拉整体跨域机器人的设计与控制. 机器人, 2025, 47(3): 338−347, 360 doi: 10.13973/j.cnki.robot.240303 Tao Zi-Chen, Liu Song-Yuan, Gui Yun, Hao Si-Yuan, Fang Hao, Yang Qing-Kai. Design and control of tensegrity based cross-domain robot. Robot, 2025, 47(3): 338−347, 360 doi: 10.13973/j.cnki.robot.240303
[27]	Mo J X, Gao C Q, Fang H, Yang Q K. Design and locomotion characteristic analysis of a novel tensegrity hopping robot. In: Proceedings of the IEEE International Conference on Robotics and Biomimetics (ROBIO). Koh Samui, Thailand: IEEE, 2023. 1−8
[28]	Mo J X, Fang H, Yang Q K. Design and locomotion characteristic analysis of two kinds of tensegrity hopping robots. iScience, 2024, 27(3): Article No. 109226 doi: 10.1016/j.isci.2024.109226
[29]	陈雯慧, 周晓航, 刘珂. 液晶弹性体在人工肌肉领域的研究进展. 液晶与显示, 2025, 40(2): 201−217 doi: 10.37188/CJLCD.2024-0228 Chen Wen-Hui, Zhou Xiao-Hang, Liu Ke. Application of liquid crystal elastomers in the development of artificial muscles. Chinese Journal of Liquid Crystals and Displays, 2025, 40(2): 201−217 doi: 10.37188/CJLCD.2024-0228
[30]	Chen W H, Tong D Z, Meng L H, Tan B W, Lan R C, Zhang Q F, et al. Knotted artificial muscles for bio-mimetic actuation under deepwater. Advanced Materials, 2024, 36: Article No. 2400763 doi: 10.1002/adma.202400763
[31]	Chen W H, Yang S A, Zhu C, Cheng Y T, Shi Y T, Yu C P, et al. Scalable jet swimmer driven by pulsatile artificial muscles and soft chamber buckling. Advanced Materials, 2025, 37: Article No. 2503777 doi: 10.1002/adma.202503777
[32]	Lai M, Go K, Li Z B, Kroger T, Schaal S, Allen K, et al. RoboBallet: Planning for multirobot reaching with graph neural networks and reinforcement learning. Science Robotics, 2025, 10: Article No. eads1204 doi: 10.1126/scirobotics.ads1204
[33]	Cao S J, Sun L, Jiang J J, Zuo Z Y. Reinforcement learning-based fixed-time trajectory tracking control for uncertain robotic manipulators with input saturation. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(8): 4584−4595 doi: 10.1109/TNNLS.2021.3116713
[34]	Pavlichenko D, Behnke S. Real-robot deep reinforcement learning: Improving trajectory tracking of flexible-joint manipulator with reference correction. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). Philadelphia, PA, USA: IEEE, 2022. 2671−2677
[35]	He J Z, Zhang C, Jenelten F, Grandia R, Bacher M, Hutter M. Attention-based map encoding for learning generalized legged locomotion. Science Robotics, 2025, 10: Article No. eadv3604 doi: 10.1126/scirobotics.adv3604
[36]	Huang H D, Sun S L, Zhao Z D, Huang H L, Shen C Q, Xu W F. PTRL: Prior transfer deep reinforcement learning for legged robots locomotion. arXiv preprint arXiv: 2504.05629, 2025.
[37]	李源潮, 陶重犇, 王琛. 基于最大熵深度强化学习的双足机器人步态控制方法. 计算机应用, 2024, 44(2): 445−451 doi: 10.11772/j.issn.1001-9081.2023020153 Li Yuan-Chao, Tao Chong-Ben, Wang Chen. Gait control method based on maximum entropy deep reinforcement learning for biped robot. Journal of Computer Applications, 2024, 44(2): 445−451 doi: 10.11772/j.issn.1001-9081.2023020153
[38]	吴晓光, 刘绍维, 杨磊, 邓文强, 贾哲恒. 基于深度强化学习的双足机器人斜坡步态控制方法. 自动化学报, 2021, 47(8): 1976−1987 doi: 10.16383/j.aas.c190547 Wu Xiao-Guang, Liu Shao-Wei, Yang Lei, Deng Wen-Qiang, Jia Zhe-Heng. A gait control method for biped robot on slope based on deep reinforcement learning. Acta Automatica Sinica, 2021, 47(8): 1976−1987 doi: 10.16383/j.aas.c190547
[39]	Ma J C, Lu H M, Xiao J H, Zeng Z W, Zheng Z Q. Multi-robot target encirclement control with collision avoidance via deep reinforcement learning. Journal of Intelligent and Robotic Systems, 2020, 99: 371−386 doi: 10.1007/s10846-019-01106-x
[40]	Zhou Z Q, Zhu P M, Zeng Z W, Xiao J H, Lu H M, Zhou Z T. Robot navigation in a crowd by integrating deep reinforcement learning and online planning. Applied Intelligence, 2022, 52: 15600−15616 doi: 10.1007/s10489-022-03191-2
[41]	Hua H, Wang Y N, Zhong H, Zhang H, Fang Y C. Deep reinforcement learning-based hierarchical motion planning strategy for multirotors. IEEE Transactions on Industrial Informatics, 2025, 21(6): 4324−4333 doi: 10.1109/TII.2024.3523594
[42]	朱亚洲, 刘煜莹, 王亚东, 谢慧婷, 李恭新. 基于液晶弹性体的仿尺蠖软体机器人. 液晶与显示, 2025, 40(4): 527−535 doi: 10.37188/CJLCD.2025-0002 Zhu Ya-Zhou, Liu Yu-Ying, Wang Ya-Dong, Xie Hui-Ting, Li Gong-Xin. Inchworm-like soft robot based on liquid crystal elastomer. Chinese Journal of Liquid Crystals and Displays, 2025, 40(4): 527−535 doi: 10.37188/CJLCD.2025-0002
[43]	Wu S, Hong Y Y, Zhao Y, Yin J, Zhu Y. Caterpillar-inspired soft crawling robot with distributed programmable thermal actuation. Science Advances, 2023, 9: Article No. eadf8014
[44]	Rogers J A, Someya T, Huang Y. Materials and mechanics for stretchable electronics. Science, 2010, 327: 1603−1607
[45]	Todorov E, Erez T, Tassa Y. MuJoCo: A physics engine for model-based control. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Vilamoura-Algarve, Portugal: IEEE, 2012. 5026−5033
[46]	Kumar S, Narayanan M S, Singhal P, Corso J J, Krovi V. Surgical tool attributes from monocular video. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). Hong Kong, China: IEEE, 2014. 4887−4892
[47]	Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, et al. OpenAI Gym. arXiv preprint arXiv: 1606.01540, 2016.
[48]	Ma J, Han Z J, Yang L S, Min G C, Liu Z J, He W. Dynamics modeling of a soft arm under the Cosserat theory. In: Proceedings of the IEEE International Conference on Real-time Computing and Robotics (RCAR). Xining, China: IEEE, 2021. 87−90
[49]	Li J Z, Ma J, Hu Y J, Zhang L, Liu Z J, Sun S Y. Vision-based reinforcement learning control of soft robot manipulators. Robotic Intelligence and Automation, 2024, 44(6): 783−790 doi: 10.1108/RIA-01-2024-0002
[50]	杨妍, 刘运鹏, 韩江涛, 刘志杰, 韩志冀. 软体机械臂的建模与神经网络控制. 工程科学学报, 2023, 43(3): 454−464 doi: 10.13374/j.issn2095-9389.2021.12.17.003 Yang Yan, Liu Yun-Peng, Han Jiang-Tao, Liu Zhi-Jie, Han Zhi-Ji. Modeling and neural network control of a soft manipulator. Chinese Journal of Engineering, 2023, 43(3): 454−464 doi: 10.13374/j.issn2095-9389.2021.12.17.003
[51]	程屹涛, 杨焕煜, 刘珂. 基于梁单元的曲面软体机器人简化力学模型. 机器人, 2025, 47(5): 646−656 doi: 10.13973/j.cnki.robot.240122 Cheng Yi-Tao, Yang Huang-Yu, Liu Ke. Reduced order model for soft robotic surface based on beam elements. Robot, 2025, 47(5): 646−656 doi: 10.13973/j.cnki.robot.240122
[52]	Sutton R S, Barto A G. Reinforcement Learning: An Introductions, 2nd ed. Cambridge, MA: MIT Press, 2018.
[53]	Haarnoja T, Zhou A, Hartikainen K, Tucker G, Ha S, Tan J, et al. Soft actor-critic algorithms and applications. arXiv preprint arXiv: 1812.05905, 2018.
[54]	Haarnoja T, Zhou A, Abbeel P, Levine S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv preprint arXiv: 1801.01290, 2018.