• 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于PIRL的空间机械臂仿生智能抓取方法

李连鹏 郭航 李明洋 张海博 徐拴锋 张冬浩

李连鹏, 郭航, 李明洋, 张海博, 徐拴锋, 张冬浩. 基于PIRL的空间机械臂仿生智能抓取方法. 自动化学报, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250686
引用本文: 李连鹏, 郭航, 李明洋, 张海博, 徐拴锋, 张冬浩. 基于PIRL的空间机械臂仿生智能抓取方法. 自动化学报, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250686
Li Lian-Peng, Guo Hang, Li Ming-Yang, Zhang Hai-Bo, Xu Shuan-Feng, Zhang Dong-Hao. Bionic intelligent grasping method of space manipulators based on progressive imitation-reinforcement learning. Acta Automatica Sinica, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250686
Citation: Li Lian-Peng, Guo Hang, Li Ming-Yang, Zhang Hai-Bo, Xu Shuan-Feng, Zhang Dong-Hao. Bionic intelligent grasping method of space manipulators based on progressive imitation-reinforcement learning. Acta Automatica Sinica, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c250686

基于PIRL的空间机械臂仿生智能抓取方法

doi: 10.16383/j.aas.c250686 cstr: 32138.14.j.aas.c250686
基金项目: 国家自然科学基金(62406032), 北京市自然科学基金(4242036), 空间智能控制技术全国重点实验室基金(HTKJ2025KL502016, 2025-JCJQ-LB-065)资助
详细信息
    作者简介:

    李连鹏:北京信息科技大学自动化学院副教授. 主要研究方向为多智能体协同控制, 机器人智能控制. E-mail: llp@bistu.edu.cn

    郭航:北京信息科技大学自动化学院硕士研究生. 主要研究方向机器人安全控制. E-mail: 2024020508@bistu.edu.cn

    李明洋:北京控制工程研究所高级工程师, 主要研究方向为机器人智能操作控制. 本文通信作者. E-mail: lmy_hit@163.com

    张海博:北京控制工程研究所研究员, 主要研究方向为空间智能操作控制. E-mail: zhanghb502@163.com

    徐拴锋:北京控制工程研究所高级工程师, 主要研究方向为空间机器人智能操控. E-mail: xushuanfeng2003@163.com

    张冬浩:北京信息科技大学自动化学院副教授. 主要研究方向为机器人智能控制, 具身智能. E-mail: Donghaozhang@bistu.edu.cn

Bionic Intelligent Grasping Method of Space Manipulators Based on Progressive Imitation-reinforcement Learning

Funds: Supported by National Natural Science Foundation of China (62406032), Beijing Natural Science Foundation (4242036), and Fund of National Key Laboratory of Space Intelligent Control (HTKJ2025KL502016, 2025-JCJQ-LB-065)
More Information
    Author Bio:

    LI Lian-Peng Associate professor at the School of Automation, Beijing Information Science and Technology University. His research interests include multi-agent cooperative control and robot intelligent control

    GUO Hang Master student at the School of Automation, Beijing Information Science and Technology University. His main research interest is robot security control

    LI Ming-Yang Senior engineer at Beijing Institute of Control Engineering. His main research interest is intelligent operation control of robots. Corresponding author of this paper

    ZHANG Hai-Bo Researcher at Beijing Institute of Control Engineering. His main research interest is space intelligent operation control

    XU Shuan-Feng Senior engineer at Beijing Institute of Control Engineering. His main research interest is intelligent manipulation of space robots

    ZHANG Dong-Hao Associate professor at the School of Automation, Beijing Information Science and Technology University. His research interests include robot intelligent control and embodied intelligence

  • 摘要: 针对空间机械臂在微重力环境下执行漂浮目标自主抓取任务时存在的样本获取难、泛化能力弱、动态扰动适应差的问题, 提出一种融合仿生智能的渐进式模仿强化学习方法. 首先, 基于遥操作采集的人类臂手协同操作专家演示数据, 构建多层感知机(MLP) 初始抓取策略模型, 并通过行为克隆完成仿生抓取训练; 然后, 将该初始模型嵌入Genesis高保真空间操作仿真环境, 采用近端策略优化空间抓取算法开展抓取策略在线微调, 依托叠加式动作空间与分阶段奖励机制实现专家先验知识与环境自主探索的协同优化, 有效解决模仿学习的分布偏移缺陷与强化学习样本效率瓶颈. 实验结果表明, 所提方法在目标随机位姿扰动下抓取成功率达89.5%, 较MLP模仿学习提升14.5%, 显著增强了策略在目标位姿偏差下复杂空间场景中的鲁棒性与环境适应能力, 为微重力环境下空间机械臂漂浮目标自主抓取提供新的技术方案.
  • 图  1  PIRL方法框架图

    Fig.  1  Framework diagram of the PIRL method

    图  2  机械臂映射器

    Fig.  2  Robot arm mapper

    图  3  手部关键点定义

    Fig.  3  Hand key point definition

    图  4  关节映射链路

    Fig.  4  Joint mapping link

    图  5  PPO-GS算法执行与训练流程

    Fig.  5  Flowchart of PPO-GS algorithm execution and training

    图  6  仿真系统搭建图

    Fig.  6  Simulation system construction diagram

    图  7  灵巧手映射

    Fig.  7  Dexterous hand mapping

    图  8  遥操作系统平台(a)

    Fig.  8  Teleoperation system platform (a)

    图  9  遥操作系统平台(b)

    Fig.  9  Teleoperation system platform(b)

    图  10  仿真系统典型操作样本生成效果

    Fig.  10  Simulation system typical operation sample generation effect

    图  11  模仿学习模型训练过程

    Fig.  11  Imitation learning model training process

    图  12  工具抓取推理测试结果

    Fig.  12  Tool grasping reasoning test results

    图  13  工具抓取测试失败情况

    Fig.  13  Tool grasping test failure cases

    图  14  训练平均奖励随轮次迭代图

    Fig.  14  Training average reward versus iteration rounds

    表  1  MLP网络结构

    Table  1  MLP network structure

    网络层级维度/数量核心内容描述
    输入层15机械臂当前末端位姿(7维) + 工具当前位姿(7维) + 灵巧手状态(1维)
    隐藏层1128特征提取层, 处理输入层15维数据, 初步压缩冗余信息
    隐藏层264特征优化层, 进一步提炼关键操作特征
    输出层8机械臂期望末端位姿(7维) + 灵巧手期望运动状态(1维)
    下载: 导出CSV

    表  2  PPO网络结构

    Table  2  PPO network structure

    网络层级维度/数量核心内容描述
    输入层15观测向量: 抓取目标位姿(3+4)、末端执行器位姿(3+4)、阶段编码(1)
    actor隐藏层1128全连接层, ELU激活
    actor隐藏层264全连接层, ELU激活
    actor输出层7相对增量动作: $\Delta$位置, $\Delta$欧拉角, 抓取命令
    critic隐藏层1128全连接层, ELU激活
    critic隐藏层264全连接层, ELU激活
    critic输出层1标量状态价值估计$V(s)$
    下载: 导出CSV

    表  3  实验平台软硬件配置

    Table  3  Experimental platform software and hardware configuration

    配置项具体规格
    CPUCore i5-10200H
    GPUNVIDIA GTX 1650 Ti
    显存4 GB
    内存16 GB
    操作系统Ubuntu 22.04
    Python3.10.12
    PyTorch2.4.0
    CUDA12.4
    仿真引擎Genesis 0.3.4
    下载: 导出CSV

    表  4  仿真物理参数

    Table  4  Simulated physical parameters

    参数取值
    仿真时间步长$dt$$50\,\;\mathrm{ms}$
    重力加速度$g$$(0,\;0,\;0)\,\;\mathrm{m/s^2}$
    仿真子迭代数$n$$2$
    工具质量$m$$0.8\,\;\mathrm{kg}$
    约束求解器$S$Newton法
    碰撞与关节限位$F$开启
    工作空间半径$R$$0.5\,\;\mathrm{m}$
    Z轴高度限制$Z$$[0.1,\;0.6]\,\;\mathrm{m}$
    下载: 导出CSV

    表  5  灵巧手映射误差参数

    Table  5  Dexterous hand mapping error parameter

    误差项目数值
    相机有效视野$L$$0.3\,\;\mathrm{m}$
    图像横向分辨率$W$$640$
    关键点检测误差$p$$5$像素
    逆运动学关节角残差$\theta$$0.05\,\;\mathrm{rad}$
    图像采集时延$t_1$约$33\,\;\mathrm{ms}$
    模型推理时延$t_2$约$15\,\;\mathrm{ms}$
    通信传输时延$t_3$约$5\,\;\mathrm{ms}$
    系统总时延$\Delta t$约$50\,\;\mathrm{ms}$
    手部运动平均速度$v_{hand}$$0.1\,\;\mathrm{m/s}$
    下载: 导出CSV

    表  6  时间性能

    Table  6  Time performance indicators

    指标 数值 计算方法
    单步仿真时间$dt$ $0.05\,\;\mathrm{s}$ $\mathrm{dt}=0.05\;\mathrm{s},\; \mathrm{substeps}=2$
    单动作执行时长$T_{act}$ $1.25\,\;\mathrm{s}$ $25$次物理步$\times 0.05\,\;\mathrm{s}/$次
    Episode平均时长$T_{epi}$ $8.75\sim22.5\,\;\mathrm{s}$ $7\sim18$步$\times 1.25\,\;\mathrm{s}/$步
    单轮训练时长$T_{train}$ 约$37.5\,\;\mathrm{s}$ $30$步$\times 1.25\,\;\mathrm{s}$
    总训练时长$T$ 约$10\,\;\mathrm{h}$ $1 000\times37.5\,\;\mathrm{s}$
    在线推理延迟$\Delta t$ $<5\,\;\mathrm{ms}$ MLP前向$+$ PPO前向
    控制频率$f$ $20\,\;\mathrm{Hz}$ $1/0.05\,\;\mathrm{s}$, 满足实时要求
    下载: 导出CSV
  • [1] 李林峰, 解永春. 空间机器人操作: 一种多任务学习视角. 中国空间科学技术, 2022, 42(3): 10−24

    Li Lin-Feng, Xie Yong-Chun. Space robotic manipulation: A multi-task learning perspective. Chinese Space Science and Technology, 2022, 42(3): 10−24
    [2] Chihi M, Hassine C B, Hu Q. Segmented hybrid impedance control for hyper-redundant space manipulators. Applied Sciences-Basel, 2025, 15(3): Artical No. 1133 doi: 10.3390/app15031133
    [3] 谢芳霖, 汪凌昕, 张亚航, 王耀兵, 王捷. 面向空间自主装配验证评估的机械臂避障运动规划. 航天器工程, 2025, 34(2): 82−89

    Xie Fang-Lin, Wang Ling-Xin, Zhang Ya-Hang, Wang Yao-Bing, Wang Jie. Obstacle avoidance motion planning of manipulator for space autonomous assembly validation and evaluation. Spacecraft Engineering, 2025, 34(2): 82−89
    [4] Si Y F, Wang D, Jiang Y Z, Zhu H, Shi S, Tan L, et al. Bionic intelligent clothing. Advanced Materials, 2025, 38(5): Artical No. e14621
    [5] Li M G, Zhang N, Xing Y, Liu B Y, Su W Y, Li S Y, et al. Design, analysis, and experimental research of flexible multi-constraint gripper for nest frames. Journal of mechanical design, 2026, 148(2): Artical No. 023301
    [6] 原劲鹏, 葛连正, 李德伦. 双臂空间机器人闭链系统的协同柔顺控制策略研究. 空间控制技术与应用, 2023, 49(2): 42−50 doi: 10.3969/j.issn.1674-1579.2023.02.005

    Yuan Jin-Peng, Ge Lian-Zheng, Li De-Lun. Cooperative compliance control strategy for dual arm space robot with closed chain system. Aerospace Control and Application, 2023, 49(2): 42−50 doi: 10.3969/j.issn.1674-1579.2023.02.005
    [7] Jiang Y M, Wang Y N, Miao Z Q, Na J, Zhao Z J, Yang C G. Composite-learning-based adaptive neural control for dual-arm robots with relative motion. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(3): 1010−1021 doi: 10.1109/TNNLS.2020.3037795
    [8] 张孟旭, 高向川, 尹丽楠, 王建辉. 基于机器视觉的机械臂抓取系统设计. 计算机应用与软件, 2024, 41(8): 22−27 doi: 10.3969/j.issn.1000-386x.2024.08.004

    Zhang Meng-Xu, Gao Xiang-Chuan, Yin Li-Nan, Wang Jian-Hui. Design of a robotic arm grasping system based on machine vision. Computer Applications and Software, 2024, 41(8): 22−27 doi: 10.3969/j.issn.1000-386x.2024.08.004
    [9] 黄艳龙, 徐德, 谭民. 机器人运动轨迹的模仿学习综述. 自动化学报, 2022, 48(2): 315−334 doi: 10.16383/j.aas.c210033

    Huang Yan-Long, Xu De, Tan Min. On imitation learning of robot movement trajectories: A survey. Acta Automatica Sinica, 2022, 48(2): 315−334 doi: 10.16383/j.aas.c210033
    [10] Odesanmi G A, Wang Q N, Mai J G. Skill learning framework for human-robot interaction and manipulation tasks. Robotics and Computer-Integrated Manufacturing, 2023, 79: Artical No. 102444 doi: 10.1016/j.rcim.2022.102444
    [11] Kota I, Yasutake T, Satoki T, Masaki H. Autonomous teleoperated robotic arm based on imitation learning using instance segmentation and haptics information. Journal of Advanced Computational Intelligence and Intelligent Informatics, 2025, 29(1): 79−94 doi: 10.20965/jaciii.2025.p0079
    [12] Zhang S, Liu S Q, Li Y, Li X, Wang Z G. A visual imitation learning algorithm for the selection of robots’ grasping points. Robotics and Autonomous Systems, 2024, 172: Artical No. 104600 doi: 10.1016/j.robot.2023.104600
    [13] 王雪松, 王荣荣, 程玉虎. 安全强化学习综述. 自动化学报, 2023, 49(9): 1813−1835 doi: 10.16383/j.aas.c220631

    Wang Xue-Song, Wang Rong-Rong, Cheng Yu-Hu. Safe reinforcement learning: A survey. Acta Automatica Sinica, 2023, 49(9): 1813−1835 doi: 10.16383/j.aas.c220631
    [14] Liu Y K, Xu H, Liu D, Wang L H. A digital twin-based sim-to-real transfer for deep reinforcement learning-enabled industrial robot grasping. Robotics and Computer-Integrated Manufacturing, 2022, 78: Artical No. 102365 doi: 10.1016/j.rcim.2022.102365
    [15] Shukla P, Kumar H, Nandi G C. Robotic grasp manipulation using evolutionary computing and deep reinforcement learning. Intelligent Service Robotics, 2021, 14(1): 61−77 doi: 10.1007/s11370-020-00342-7
    [16] Hu Z, Zheng Y, Pan J. Grasping living objects with adversarial behaviors using inverse reinforcement learning. IEEE Transactions on Robotics, 2023, 39(2): 1151−1163 doi: 10.1109/TRO.2022.3226108
    [17] Yagna J, Mahmoud S, Paul W, Aaisha M. A comprehensive review of robotics advancements through imitation learning for self-learning systems. In: Proceedings of the 9th International Conference On Mechanical Engineering and Robotics Research. Barcelona, Spain: ICMERR, 2025. 1-4
    [18] Li Y H, He H Y, Chai J, Bai G R, Dong E B. Grasping unknown objects with only one demonstration. IEEE Robotics and Automation Letters, 2025, 10(2): 987−994 doi: 10.1109/LRA.2024.3513037
    [19] 申珅. 基于强化学习与模仿学习结合的机械臂抓取控制研究 [硕士论文], 中北大学, 中国, 2023.

    Shen S. Research on Robotic Arm Grasping Control Based on the Combination of Reinforcement Learning and Imitation Learning[Master thesis], North University of China, China, 2023.
    [20] Pereira M, Dimou D, Moreno P. In-hand manipulation of unseen objects through 3D vision. In: Proceedings of the 5th Iberian Robotics Conference. Zaragoza, Spain: ROBOT, 2022. 163-174
    [21] 袁利, 姜甜甜, 魏春岭, 杨孟飞. 空间控制技术发展与展望. 自动化学报, 2023, 49(3): 476−493 doi: 10.16383/j.aas.c220792

    Yuan Li, Jiang Tian-Tian, Wei Chun-Ling, Yang Meng-Fei. Advances and perspectives of space control technology. Acta Automatica Sinica, 2023, 49(3): 476−493 doi: 10.16383/j.aas.c220792
    [22] Yang Y C, Li R J, Wang L F, Zheng S, Ma S Z, Zhang K Y, et al. Scalable dexterous robot learning with ar-based remote human-robot interactions. arXiv preprint arXiv: 2602.07341, 2026.
    [23] 林麒光, 刘宇, 李杰, 刘小峰. 基于轨迹测量与人机映射的六自由度机械臂运动追踪模型. 电子测量与仪器学报, 2023, 37(3): 102−110 doi: 10.13382/j.jemi.B2206010

    Lin Qi-Guang, Liu Yu, Li Jie, Liu Xiao-Feng. Motion tracking model of 6-DOF manipulator based on trajectory measurement and human-machine mapping. Journal of Electronic Measurement and Instrumentation, 2023, 37(3): 102−110 doi: 10.13382/j.jemi.B2206010
    [24] 张玲俊, 汤亮, 刘磊. 目标位置引导的五指灵巧手手内重定向. 机器人, 2025, 47(1): 10−21 doi: 10.13973/j.cnki.robot.240019

    Zhang Ling-Jun, Tang Liang, Liu Lei. Target position-guided in-hand reorientation of five-fingered dexterous hands. Robotics, 2025, 47(1): 10−21 doi: 10.13973/j.cnki.robot.240019
    [25] Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. arXiv: 1707.06347, 2017.
    [26] Genesis作者团队. Genesis: 面向机器人及具身智能的生成式通用物理引擎 [Online], available: https://genesis-world.readthedocs.io/zh-cn/latest/, 2026-04-20

    Genesis Authors. Genesis: A Generative and Universal Physics Engine for Robotics and Beyond[Online], available: https://genesis-world.readthedocs.io/zh-cn/latest/, April 20, 2026
    [27] Li M Y, Du Z J, Ma X X, Dong W, Gao Y Z. A robot hand-eye calibration method of line laser sensor based on 3d reconstruction. Robotics and Computer-Integrated Manufacturing, 2021, 71: Artical No. 102136 doi: 10.1016/j.rcim.2021.102136
  • 加载中
计量
  • 文章访问数:  41
  • HTML全文浏览量:  17
  • 被引次数: 0
出版历程
  • 收稿日期:  2025-11-30
  • 录用日期:  2026-03-31
  • 网络出版日期:  2026-04-28

目录

    /

    返回文章
    返回