-
摘要: 针对四足机器人在复杂环境中的高速稳定运动问题, 提出一种融合模型和学习的分层运动控制框架. 首先, 提出基于单次落足点偏差的惩罚机制, 实现对连续滑动状态的有效评估. 其次, 构建基于双曲正切函数的连续接触状态描述, 显著改善传统离散方法中的相位切换冲击问题. 然后, 设计基于LSTM的地面特性实时估计网络, 实现质心位置的自适应调整. 最后, 提出基于执行层和决策层的分层控制框架, 提高系统的环境适应能力. 在Isaac Gym仿真环境中的实验表明, 该控制方法能够适应不同摩擦系数和运动速度条件. 特别是在极低摩擦环境($\mu=0.05$)下, 自适应控制策略通过$0.0610 \text{m}$的质心高度调整, 在维持$1.4284 \text{ m/s}$运动速度的同时, 将足端滑动距离控制在$0.308 \pm 0.005 \text{ cm}$, 充分验证所提控制方法的有效性和实用价值.Abstract: Addressing the issue of high-speed stable motion for quadruped robots in complex environments, a hierarchical motion control framework integrating model and learning is proposed. First, a penalty mechanism based on single foot placement point deviation is introduced to effectively evaluate continuous sliding states. Second, a continuous contact state description based on hyperbolic tangent function is constructed, significantly improving the phase switching impact problem in traditional discrete methods. Then, an LSTM-based ground characteristics real-time estimation network is designed to achieve adaptive adjustment of the center of mass position. Finally, a hierarchical control framework based on execution and decision layers is proposed to enhance the system's environmental adaptability. Experiments in the Isaac Gym simulation environment demonstrate that this control method can adapt to different friction coefficients and motion speeds. Particularly in extremely low friction environments ($\mu=0.05$), the adaptive control strategy adjusts the centroid of mass height by
0.0610 m, while maintaining a movement speed of $1.4284$ m/s and controlling the sliding distance of the foot end to $0.308 \pm 0.005 \text{ cm}$. This outcome serves to provide a comprehensive demonstration of the effectiveness and practical value of the proposed control method. -
图 6 低摩擦环境下不同步态的运动特征对比(a) ~ (d) 质心高度; (e) ~ (h) 运动速度; (i) ~ (l) 足端滑动. 从左至右分别为四步态, 目标速度1.5m/s, 摩擦系数$\mu = 0.05$.
Fig. 6 Motion characteristics comparison of different gaits in low-friction environment (a) ~ (d) CoM height; (e) ~ (h) motion velocity; (i) ~ (l) foot sliding. From left to right, the four gait patterns have a target speed of 1.5 m/s and a friction coefficient $\mu = 0.05$.
表 1 步态参数配置
Table 1 Configuration of gait parameters
步态类型 $ \theta_1 $ $ \theta_2 $ $ \theta_3 $ 蹦跳 $ 0 $ $ 0 $ $ 0 $ 小跑 $ \pi $ $ \pi $ $ 0 $ 侧步 $ \pi $ $ 0 $ $ \pi $ 跃步 $ 0 $ $ \pi $ $ \pi $ 表 2 控制命令参数范围
Table 2 Range of control command parameters
参数 最小值 最大值 单位 $ v_{x}^\mathrm{cmd} $ $ -3.00 $ $ 3.00 $ $ \mathrm{m/s} $ $ v_{y}^\mathrm{cmd} $ $ -1.00 $ $ 1.00 $ $ \mathrm{m/s} $ $ \omega_{z}^\mathrm{cmd} $ $ -1.00 $ $ 1.00 $ $ \mathrm{rad/s} $ $ f^\mathrm{cmd} $ $ 1.50 $ $ 4.00 $ $ \mathrm{Hz} $ $ h_\mathrm{com} $ $ -0.45 $ $ 0.10 $ $ \mathrm{m} $ $ h_\mathrm{foot}^\mathrm{cmd} $ $ 0.03 $ $ 0.30 $ $ \mathrm{m} $ 表 3 奖励函数
Table 3 Reward function
类别 项目 公式 权重 运动跟踪 水平面速度跟踪$ r_{v_{x,\;y}^{cmd}} $ $ \exp{\left(-\dfrac{\| v_{x,\;y} - v_{x,\;y}^{cmd} \|^2}{\sigma_{v_{x,\;y}}}\right)} $ $ 0.02 $ 垂直轴角速度跟踪$ r_{\omega_{z}^{cmd}} $ $ \exp{\left(-\dfrac{\| \omega_{z} - \omega_{z}^{cmd} \|^2}{\sigma_{\omega_{z}}}\right)} $ $ 0.01 $ 质心高度跟踪$ r_{h_{z}^{cmd}} $ $ \left(h_{z} - h_{z}^{cmd}\right)^2 $ $ -0.08 $ 躯干俯仰角跟踪$ r_{\phi^{cmd}} $ $ \left(\phi - \phi^{cmd}\right)^2 $ $ -0.10 $ 摆动相力跟踪$ r_{f_{c}^{cmd}} $ $ \displaystyle \sum\limits_{\text{foot}} \left(1 - C_{\text{foot}}^{\text{cmd}}(t)\right) \times \exp{\left(-\dfrac{\| f_{\text{foot}}^{\text{cmd}} \|^2}{\sigma_{cf}}\right)} $ $ -0.08 $ 支撑相速度跟踪$ r_{v_{f}^{cmd}} $ $ \displaystyle \sum\limits_{\text{foot}} C_{\text{foot}}^{\text{cmd}}(t) \times \exp{\left(-\dfrac{\| v_{\text{foot}}^{\text{cmd}} \|^2}{\sigma_{cv}}\right)} $ $ -0.08 $ 单次落足点偏差$ r_{\text{contact}} $ $ \|p_\text{foot}^c - p_\text{foot}^{c,\;\text{cmd}} \| \times \mathbb{I}(t = t_\text{contact}) $ $ -0.10 $ 姿态稳定性 垂直方向速度 $ v_{z}^2 $ $ -4 \times 10^{-4} $ 横滚和俯仰角速度 $ \|\omega_{x,\;y}\|^2 $ $ -2 \times 10^{-5} $ 足端滑动 $ \|v_{\text{foot},\;x,\;y}\|^2 $ $ -8 \times 10^{-4} $ 运动约束 大腿/小腿碰撞 $ 1_{\text{collision}} $ $ -0.02 $ 关节限位违反 $ 1_{q_{i}> q_{\text{max}} || q_{i}< q_{\text{min}}} $ $ -0.20 $ 关节扭矩 $ \|\tau\|^2 $ $ -9 \times 10^{-3} $ 关节角速度 $ \|\dot{q}\|^2 $ $ -9 \times 10^{-3} $ 关节角加速度 $ \|\ddot{q}\|^2 $ $ -5 \times 10^{-9} $ 动作平滑度(一阶) $ \|a_{t-1} - a_{t}\|^2 $ $ -2 \times 10^{-3} $ 动作平滑度(二阶) $ \|a_{t-2} - 2a_{t-1} + a_{t}\|^2 $ $ -2 \times 10^{-3} $ 表 4 PPO超参数
Table 4 PPO hyperparameters
参数 值 批量大小 $ 4096 \times 24 $ 小批量大小 $ 4096 \times 6 $ 迭代次数 $ 5 $ 裁剪范围 $ 0.20 $ 熵系数 $ 0.01 $ 折扣因子 $ 0.99 $ 广义优势估计折扣因子 $ 0.95 $ 目标KL散度 $ 0.01 $ 学习率 自适应$ ^* $ 表 5 基准控制器下不同步态的接触状态建模方法对比
Table 5 Comparison of contact state modeling methods under different gaits with a benchmark controller
步态 1.5 m/s 2.0 m/s 二值离散 双曲正切 二值离散 双曲正切 小跑 0.9217 0.9583 0.8915 0.9247 蹦跳 0.8697 0.9474 0.8642 0.9362 跃步 0.9371 0.9457 0.9230 0.9436 侧步 0.9335 0.9482 0.9257 0.9376 表 6 低摩擦环境下四种步态的性能对比
Table 6 Performance comparison of four gaits in a low-friction environment
自适应控制 步态 质心高度
变化$ (\text{m}) $足端滑动
距离$ (\text{cm}) $运动速度$ (\text{m/s}) $ 是 小跑 $ -0.061 $ $ 0.308 \pm 0.015 $ $ 1.428 $ 否 小跑 $ 0 $ $ 0.343 \pm 0.081 $ $ 1.455 $ 是 跃步 $ -0.119 $ $ 0.342 \pm 0.032 $ $ 1.372 $ 否 跃步 $ 0 $ $ 0.354 \pm 0.040 $ $ 1.219 $ 是 蹦跳 $ -0.271 $ $ 0.423 \pm 0.051 $ $ 1.116 $ 否 蹦跳 $ 0 $ $ 0.627 \pm 0.046 $ $ 0.898 $ 是 侧步 $ -0.301 $ $ 0.450 \pm 0.004 $ $ 0.683 $ 否 侧步 $ 0 $ $ 0.545 \pm 0.062 $ $ 0.467 $ 表 7 不同期望速度和摩擦条件下小跑步态的控制性能对比
Table 7 Comparison of control performance under different desired speeds and friction conditions in the trotting gait
自适应
控制期望速度
$ (\text{m/s}) $摩擦
系数质心高度
变化$ (\text{m}) $足端滑动
距离$ (\text{cm}) $运动速度
$ (\text{m/s}) $是 $ 1.5 $ $ 1.0 $ $ -0.0311 $ $ 0.290 \pm 0.0004 $ $ 1.4900 $ 否 $ 1.5 $ $ 1.0 $ $ 0 $ $ 0.316 \pm 0.0500 $ $ 1.5039 $ 否 $ 1.5 $ $ 1.0 $ $ -0.2000 $ $ 0.295 \pm 0.0600 $ $ 1.3973 $ 否 $ 1.5 $ $ 1.0 $ $ -0.4000 $ $ 0.286 \pm 0.0400 $ $ 1.3539 $ 是 $ 1.5 $ $ 0.2 $ $ -0.0380 $ $ 0.312 \pm 0.0040 $ $ 1.5020 $ 否 $ 1.5 $ $ 0.2 $ $ 0 $ $ 0.327 \pm 0.0600 $ $ 1.5481 $ 否 $ 1.5 $ $ 0.2 $ $ -0.2000 $ $ 0.304 \pm 0.0500 $ $ 1.4367 $ 否 $ 1.5 $ $ 0.2 $ $ -0.4000 $ $ 0.291 \pm 0.0300 $ $ 1.3727 $ 是 $ 1.5 $ $ 0.05 $ $ -0.0610 $ $ 0.308 \pm 0.0050 $ $ 1.4284 $ 否 $ 1.5 $ $ 0.05 $ $ 0 $ $ 0.343 \pm 0.0800 $ $ 1.4550 $ 否 $ 1.5 $ $ 0.05 $ $ -0.2000 $ $ 0.313 \pm 0.0500 $ $ 1.4108 $ 否 $ 1.5 $ $ 0.05 $ $ -0.4000 $ $ 0.296 \pm 0.0400 $ $ 1.3340 $ 是 $ 2.0 $ $ 1.0 $ $ -0.0836 $ $ 0.396 \pm 0.0001 $ $ 1.9040 $ 否 $ 2.0 $ $ 1.0 $ $ 0 $ $ 0.410 \pm 0.0510 $ $ 1.9639 $ 否 $ 2.0 $ $ 1.0 $ $ -0.2000 $ $ 0.388 \pm 0.0370 $ $ 1.8358 $ 否 $ 2.0 $ $ 1.0 $ $ -0.4000 $ $ 0.378 \pm 0.0550 $ $ 1.7922 $ 是 $ 2.0 $ $ 0.2 $ $ -0.1616 $ $ 0.430 \pm 0.0060 $ $ 2.0600 $ 否 $ 2.0 $ $ 0.2 $ $ 0 $ $ 0.424 \pm 0.0510 $ $ 1.9966 $ 否 $ 2.0 $ $ 0.2 $ $ -0.200 $ $ 0.401 \pm 0.0450 $ $ 1.8815 $ 否 $ 2.0 $ $ 0.2 $ $ -0.400 $ $ 0.384 \pm 0.0400 $ $ 1.8137 $ 是 $ 2.0 $ $ 0.05 $ $ -0.2750 $ $ 0.239 \pm 0.0020 $ $ 1.7300 $ 否 $ 2.0 $ $ 0.05 $ $ 0 $ $ 0.441 \pm 0.0890 $ $ 1.8718 $ 否 $ 2.0 $ $ 0.05 $ $ -0.2000 $ $ 0.411 \pm 0.0490 $ $ 1.7193 $ 否 $ 2.0 $ $ 0.05 $ $ -0.4000 $ $ 0.389 \pm 0.0116 $ $ 1.4957 $ -
[1] Shao Y, Jin Y., Liu X, He W, Wang H, Yang W. Learning free gait transition for quadruped robots via phase-guided controller. IEEE Robotics and Automation Letters, 2021, 7(2): 1230−1237 [2] Kang D, De Vincenti F, Adami N, Coros S. Animal motions on legged robots using nonlinear model predictive control. In: Proceedings of International Conference on Intelligent Robots and Systems (IROS). Kyoto, Japan: IEEE, 2022. 11955−11962 [3] Wensing P, Posa M, Hu Y, Escande A, Mansard N. Optimization-based control for dynamic legged robots. IEEE Transactions on Robotics, 2023, 40: 43−63 [4] Ding Y, Pandala A, Park H. Real-time model predictive control for versatile dynamic motions in quadrupedal robots. In: Proceedings of International Conference on Robotics and Automation (ICRA). Montreal, QC, Canada: IEEE, 2019. 8484−8490 [5] Romero A, Song Y, Scaramuzza D. Actor-critic model predictive control. arXiv preprint arXiv: 2306.09852, 2023. [6] Hwangbo J, Lee J, Dosovitskiy A, Bellicoso D, Tsounis V, Koltun V, Hutter M. Learning agile and dynamic motor skills for legged robots. Science Robotics, 2019, 4(26): eaau5872 doi: 10.1126/scirobotics.aau5872 [7] Kumar A, Fu Z, Pathak D, Malik J. Rma: Rapid motor adaptation for legged robots. arXiv preprint arXiv: 2107.04034, 2021. [8] Rudin N, Hoeller D, Reist P, Hutter M. Learning to walk in minutes using massively parallel deep reinforcement learning. In: Proceedings of Conference on Robot Learning (CoRL). London, England: PMLR, 2022. 91−100 [9] Hasson C, Manczurowsky J, Yen S. A reinforcement learning approach to gait training improves retention. Frontiers in Human Neuroscience, 2015, 9: 459−467 [10] Haarnoja T, Ha S, Zhou A, Tucker G, Levine S. Learning to walk via deep reinforcement learning. arXiv preprint arXiv: 1812.11103, 2018. [11] Weng J, Hashemi E, Arami A. Natural walking with musculoskeletal models using deep reinforcement learning. IEEE Robotics and Automation Letters, 2021, 6(2): 4156−4162 doi: 10.1109/LRA.2021.3067617 [12] Shi H, Zhou B, Zeng H, Wang F, Dong Y, Li J. Reinforcement learning with evolutionary trajectory generator: A general approach for quadrupedal locomotion. IEEE Robotics and Automation Letters, 2022, 7(2): 3085−3092 doi: 10.1109/LRA.2022.3145495 [13] Chai H, Rong X, Tang X, L i, Y. Gait-based quadruped robot planar hopping control with energy planning. International Journal of Advanced Robotic Systems, 2016, 13(1): 20−32 doi: 10.5772/62140 [14] Raibert M. Trotting, pacing and bounding by a quadruped robot. Journal of Biomechanics, 1990, 23: 79−98 doi: 10.1016/0021-9290(90)90043-3 [15] Jiang Z, Li M, Guo W. Running control of a quadruped robot in trotting gait. In: Proceedings of IEEE 5th International Conference on Robotics, Automation and Mechatronics (RAM). Qingdao, China: IEEE, 2011. 172−177 [16] Fukuoka Y, Kimura H. Dynamic locomotion of a biomorphic quadruped ‘Tekken’robot using various gaits: walk, trot, free-gait and bound. Applied Bionics and Biomechanics, 2009, 6(1): 63−71 [17] Adak O, Erbatur K. Bound gait reference generation of a quadruped robot via contact force planning. International Journal of Mechanical Engineering and Robotics Research, 2022, 11(3): 129−137 [18] Xiao W, Wang W. Hopf oscillator-based gait transition for a quadruped robot, In: Proceedings of International Conference on Robotics and Biomimetics (ROBIO). Bali, Indonesia: IEEE, 2014. 2074−2079 [19] Margolis G, Agrawal P. Walk these ways: Tuning robot control for generalization with multiplicity of behavior. In: Proceedings of Conference on Robot Learning (CoRL). Auckland, New Zealand: PMLR, 2023. 22−31 [20] Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. arXiv preprint arXiv: 1707.06347, 2017. [21] Miki T, Lee J, Hwangbo J, Koltun K, Hutter M. Learning robust perceptive locomotion for quadrupedal robots in the wild. Science Robotics, 2022, 7(62): eabk2822 doi: 10.1126/scirobotics.abk2822 [22] Puterman L. Markov decision processes. Handbooks in operations research and management science, 1990, 2: 331−434 [23] Heiden T, Sanderson D, Inglis J, Siegmund G. Adaptations to normal human gait on potentially slippery surfaces: the effects of awareness and prior slip experience. Gait & Posture, 2006, 24(2): 237−246 [24] Cappellini G, Ivanenko Y, Dominici N, Poppele R, Lacquaniti F. Motor patterns during walking on a slippery walkway. Journal of Neurophysiology, 2010, 103(2): 746−760 doi: 10.1152/jn.00499.2009 [25] Makoviychuk V, Wawrzyniak L, Guo Y, Storey K, Macklin M, Hoeller D, et al. Isaac gym: High performance gpu-based physics simulation for robot learning. arXiv preprint arXiv: 2108.10470, 2021. -
计量
- 文章访问数: 8
- HTML全文浏览量: 10
- 被引次数: 0