面向不同粗糙程度地面的四足机器人自适应控制方法

张楠杰; 陈玉全; 季茂沁; 孙运康; 王冰

doi:10.16383/j.aas.c240738

面向不同粗糙程度地面的四足机器人自适应控制方法

doi: 10.16383/j.aas.c240738 cstr: 32138.14.j.aas.c240738

1.
河海大学人工智能与自动化学院常州 213000

基金项目: 国家自然科学基金(51777058)资助

详细信息

作者简介:
张楠杰：河海大学人工智能与自动化学院硕士研究生. 主要研究方向为无人系统控制和具身智能. E-mail: 231622010045@hhu.edu.cn

陈玉全：河海大学人工智能与自动化学院副教授. 主要研究方向为具身智能, 智能优化, 分数阶系统与控制. 本文通信作者. E-mail: cyq@mail.ustc.edu.cn

季茂沁：河海大学人工智能与自动化学院硕士研究生. 主要研究方向为非渐近收敛理论和无人系统控制. E-mail: 231322010005@hhu.edu.cn

孙运康：河海大学人工智能与自动化学院硕士研究生. 主要研究方向为多智能体控制, 无人系统控制, 自适应控制. E-mail: sunyunkanghhu@163.com

王冰：河海大学人工智能与自动化学院教授. 主要研究方向为无人系统控制, 新能源发电控制, 电力系统调度. E-mail: icekingking@hhu.edu.cn

计量
- 文章访问数: 380
- HTML全文浏览量: 243
- PDF下载量: 65
- 被引次数: 0
出版历程
- 收稿日期: 2024-11-14
- 录用日期: 2025-03-05
- 网络出版日期: 2025-04-30
- 刊出日期: 2025-07-29

Adaptive Control Method for Quadruped Robot Facing Floors of Different Roughness

1.
College of Artificial Intelligence and Automation, Hohai University, Changzhou 213000

Funds: Supported by National Natural Science Foundation of China (51777058)

More Information

Author Bio:
ZHANG Nan-Jie　Master student at the College of Artificial Intelligence and Automation, Hohai University. His research interest covers unmanned system control and embodied artificial intelligence

CHEN Yu-Quan　Associate professor at the College of Artificial Intelligence and Automation, Hohai University. His research interest covers embodied artificial intelligence, intelligent optimization, and fractional order system and control. Corresponding author of this paper

JI Mao-Qin　Master student at the College of Artificial Intelligence and Automation, Hohai University. His research interest covers non-asymptotic convergence theory and unmanned system control

SUN Yun-Kang　Master student at the College of Artificial Intelligence and Automation, Hohai University. His research interest covers multi-agent control, unmanned system control, and adaptive control

WANG Bing　Professor at the College of Artificial Intelligence and Automation, Hohai University. His research interest covers unmanned system control, new energy generation control, and power system scheduling

摘要

摘要: 针对四足机器人在复杂环境中的高速稳定运动问题, 提出一种融合模型和学习的分层运动控制框架. 首先, 提出基于单次落足点偏差的惩罚机制, 实现对连续滑动状态的有效评估. 其次, 构建基于双曲正切函数的连续接触状态描述, 显著改善了传统离散方法中的相位切换冲击问题. 然后, 设计基于LSTM的地面特性实时估计网络, 实现质心位置的自适应调整. 最后, 提出基于执行层和决策层的分层控制框架, 提高系统的环境适应能力. 在Isaac Gym仿真环境中的实验表明, 该控制方法能够适应不同摩擦系数和运动速度条件. 特别是在极低摩擦环境($\mu=0.05$)下, 自适应控制策略通过$0.061\;0 \;\text{m}$的质心高度调整, 在维持$1.428\;4 \text{ m/s}$运动速度的同时, 将足端滑动距离控制在$0.308 \pm 0.005\;0 \text{ cm}$, 充分验证了所提控制方法的有效性和实用价值.
- 四足机器人 /
- 强化学习 /
- 自适应控制策略 /
- 奖励函数优化 /
- 分层控制框架
Abstract: Addressing the issue of high-speed stable motion for quadruped robots in complex environments, a hierarchical motion control framework integrating model and learning is proposed. First, a penalty mechanism based on single foot placement point deviation is introduced to effectively evaluate continuous sliding states. Second, a continuous contact state description based on hyperbolic tangent function is constructed, significantly improving the phase switching impact problem in traditional discrete methods. Then, a LSTM-based ground characteristics real-time estimation network is designed to achieve adaptive adjustment of the centroid of mass position. Finally, a hierarchical control framework based on execution and decision layers is proposed to enhance the system＇s environmental adaptability. Experiments in the Isaac Gym simulation environment demonstrate that this control method can adapt to different friction coefficients and motion speeds. Particularly in an extremely low friction environment ($\mu$ = 0.05), the adaptive control strategy adjusts the centroid of mass height by 0.0610 m, while maintaining a motion speed of 1.4284 m/s and controlling the sliding distance of the foot end to 0.308 ± 0.0050 cm. This outcome serves to provide a comprehensive demonstration of the effectiveness and practical value of the proposed control method.
- Quadruped robot /
- reinforcement learning /
- adaptive control strategy /
- reward function optimization /
- hierarchical control framework

HTML全文

图 1 四种步态及其期望的接触状态图

Fig. 1 Diagram of the four gaits and the desired contact state for each gait

下载: 全尺寸图片幻灯片

图 2 相位变量与腿部运动的映射关系

Fig. 2 Mapping relationship between phase variables and leg motion

下载: 全尺寸图片幻灯片

图 3 系统控制框架

Fig. 3 System control framework

下载: 全尺寸图片幻灯片

图 4 PPO训练过程中部分奖励的变化趋势((a)总奖励; (b)运动学跟踪奖励; (c)步态执行奖励; (d)运动稳定性奖励; (e)运动约束奖励)

Fig. 4 Trends in partial rewards during PPO training ((a) Total reward; (b) Kinematic tracking rewards; (c) Gait execution rewards; (d) Motion stability rewards; (e) Motion constraint rewards)

下载: 全尺寸图片幻灯片

图 5 基准控制器下不同建模方法的足端接触序列对比

Fig. 5 Comparison of foot contact sequences under different modeling methods with a benchmark controller

下载: 全尺寸图片幻灯片

图 6 低摩擦环境下不同步态的运动特征对比((a) ~ (d)为质心高度; (e) ~ (h)为运动速度; (i) ~ (l)为足端滑动距离. 从左至右分别为四种步态, 目标速度为1.5 m/s, 摩擦系数$\mu = 0.05$)

Fig. 6 Motion characteristics comparison of different gaits in a low friction environments ((a) ~ (d) is the centroid of mass height; (e) ~ (h) is the motion speed; (i) ~ (l) is the foot end sliding distance. From left to right, the four gaits have a target speed of 1.5 m/s and a friction coefficient $\mu = 0.05$)

下载: 全尺寸图片幻灯片

图 7 基准控制器下不同摩擦系数和期望速度条件的质心高度与足端滑动分布图

Fig. 7 Distribution diagram of the centroid of mass height and foot end sliding under different friction coefficients and desired speeds with a benchmark controller

下载: 全尺寸图片幻灯片

图 8 不同摩擦系数下机器人运动特性对比((a) ~ (c)为质心高度; (d) ~ (f)为运动速度; (g) ~ (i)为足端滑动. 阴影区域表示标准差)

Fig. 8 Robot motion characteristics comparison under different friction coefficients ((a) ~ (c) is the centroid of mass height; (d) ~ (f) is the motion speed; (g) ~ (i) is the foot end sliding. Shaded areas indicate standard deviations)

下载: 全尺寸图片幻灯片

表 1 步态参数配置

Table 1 Configuration of gait parameters

步态类型	$ \theta_1 $	$ \theta_2 $	$ \theta_3 $
蹦跳	$ 0 $	$ 0 $	$ 0 $
小跑	$ \pi $	$ \pi $	$ 0 $
侧步	$ \pi $	$ 0 $	$ \pi $
跃步	$ 0 $	$ \pi $	$ \pi $

下载: 导出CSV

表 2 控制命令参数范围

Table 2 Range of control command parameters

参数	最小值	最大值	单位
$ v_{x}^\mathrm{cmd} $	$ -3.00 $	$ 3.00 $	$ \mathrm{m/s} $
$ v_{y}^\mathrm{cmd} $	$ -1.00 $	$ 1.00 $	$ \mathrm{m/s} $
$ \omega_{z}^\mathrm{cmd} $	$ -1.00 $	$ 1.00 $	$ \mathrm{rad/s} $
$ f^\mathrm{cmd} $	$ 1.50 $	$ 4.00 $	$ \mathrm{Hz} $
$ h_\mathrm{com}^\mathrm{cmd} $	$ -0.45 $	$ 0.10 $	$ \mathrm{m} $
$ h_\mathrm{foot}^\mathrm{cmd} $	$ 0.03 $	$ 0.30 $	$ \mathrm{m} $

下载: 导出CSV

表 3 奖励函数

Table 3 Reward functions

类别	项目	公式	权重
运动跟踪	水平面速度跟踪$ r_{v_{x,\;y}^{cmd}} $	$ \exp{\left(-\dfrac{\\| v_{x,\;y} - v_{x,\;y}^{cmd} \\|^2}{\sigma_{v_{x,\;y}}}\right)} $	$\,\;\;0.02$
	垂直轴角速度跟踪$ r_{\omega_{z}^{cmd}} $	$ \exp{\left(-\dfrac{\\| \omega_{z} - \omega_{z}^{cmd} \\|^2}{\sigma_{\omega_{z}}}\right)} $	$\,\;\; 0.01$
	质心高度跟踪$ r_{h_{z}^{cmd}} $	$ \left(h_{z} - h_{z}^{cmd}\right)^2 $	$ -0.08 $
	躯干俯仰角跟踪$r_{\rho^{cmd}} $	$ \left(\rho - \rho^{cmd}\right)^2$	$ -0.10 $
	摆动相力跟踪$ r_{f_{c}^{cmd}} $	$ \displaystyle \sum\limits_{\text{foot}} \left(1 - C_{\text{foot}}^{\text{cmd}}(t)\right) \times \exp{\left(-\dfrac{\\| f_{\text{foot}}^{\text{cmd}} \\|^2}{\sigma_{cf}}\right)} $	$ -0.08 $
	支撑相速度跟踪$ r_{v_{f}^{cmd}} $	$ \displaystyle \sum\limits_{\text{foot}} C_{\text{foot}}^{\text{cmd}}(t) \times \exp{\left(-\dfrac{\\| v_{\text{foot}}^{\text{cmd}} \\|^2}{\sigma_{cv}}\right)} $	$ -0.08 $
	单次落足点偏差$ r_{\text{contact}} $	$ \\|p_\text{foot}^c - p_\text{foot}^{c,\;\text{cmd}} \\| \times \mathbb{I}(t = t_\text{contact}) $	$ -0.10 $
姿态稳定性	垂直方向速度	$ v_{z}^2 $	$ -4 \times 10^{-4} $
	横滚和俯仰角速度	$ \\|\omega_{x,\;y}\\|^2 $	$ -2 \times 10^{-5} $
	足端滑动	$ \\|v_{\text{foot},\;x,\;y}\\|^2 $	$ -8 \times 10^{-4} $
运动约束	大腿/小腿碰撞	$ 1_{\text{collision}} $	$ -0.02 $
	关节限位违反	$ 1_{q_{i}> q_{\text{max}} \|\| q_{i}< q_{\text{min}}} $	$ -0.20 $
	关节力矩	$ \\|\tau\\|^2 $	$ -9 \times 10^{-3} $
	关节角速度	$ \\|\dot{q}\\|^2 $	$ -9 \times 10^{-3} $
	关节角加速度	$ \\|\ddot{q}\\|^2 $	$ -5 \times 10^{-9} $
	动作平滑度(一阶)	$ \\|a_{t-1} - a_{t}\\|^2 $	$ -2 \times 10^{-3} $
	动作平滑度(二阶)	$ \\|a_{t-2} - 2a_{t-1} + a_{t}\\|^2 $	$ -2 \times 10^{-3} $

下载: 导出CSV

表 4 PPO超参数

Table 4 PPO hyperparameters

参数	值
批量大小	$4\;096 \times 24$
小批量大小	$4\;096 \times 6 \;\,$
迭代次数	$ 5 $
裁剪范围	$ 0.20 $
熵系数	$ 0.01 $
折扣因子	$ 0.99 $
广义优势估计折扣因子	$ 0.95 $
目标KL散度	$ 0.01 $
学习率	自适应

下载: 导出CSV

表 5 基准控制器下不同步态的接触状态建模方法对比

Table 5 Comparison of contact state modeling methods under different gaits with a benchmark controller

步态	1.5 m/s		2.0 m/s
步态	二值离散	双曲正切	二值离散	双曲正切
小跑	0.9217	0.9583	0.8915	0.9247
蹦跳	0.8697	0.9474	0.8642	0.9362
跃步	0.9371	0.9457	0.9230	0.9436
侧步	0.9335	0.9482	0.9257	0.9376

下载: 导出CSV

表 6 低摩擦环境下四种步态的性能对比

Table 6 Performance comparison of four gaits in a low friction environment

自适应控制	步态	质心高度变化$ (\text{m}) $	足端滑动距离$ (\text{cm}) $	运动速度$ (\text{m/s}) $
是	小跑	$ -0.061 $	$ 0.308 \pm 0.015 $	$ 1.428 $
否	小跑	$ 0 $	$ 0.343 \pm 0.081 $	$ 1.455 $
是	跃步	$ -0.119 $	$ 0.342 \pm 0.032 $	$ 1.372 $
否	跃步	$ 0 $	$ 0.354 \pm 0.040 $	$ 1.219 $
是	蹦跳	$ -0.271 $	$ 0.423 \pm 0.051 $	$ 1.116 $
否	蹦跳	$ 0 $	$ 0.627 \pm 0.046 $	$ 0.898 $
是	侧步	$ -0.301 $	$ 0.450 \pm 0.004 $	$ 0.683 $
否	侧步	$ 0 $	$ 0.545 \pm 0.062 $	$ 0.467 $

下载: 导出CSV

表 7 不同期望速度和摩擦条件下小跑步态的控制性能对比

Table 7 Comparison of control performance under different desired speeds and friction conditions in the trotting gait

自适应控制	期望速度 $ (\text{m/s}) $	摩擦系数	质心高度变化$ (\text{m}) $	足端滑动距离$ (\text{cm}) $	运动速度 $ (\text{m/s}) $
是	$ 1.5 $	$ 1.0 $	$ -0.031\;1 $	$ 0.290 \pm 0.000\;4 $	$ 1.490\;0 $
否	$ 1.5 $	$ 1.0 $	$ 0 $	$ 0.316 \pm 0.050\;0 $	$ 1.503\;9 $
否	$ 1.5 $	$ 1.0 $	$ -0.200\;0 $	$ 0.295 \pm 0.060\;0 $	$ 1.397\;3 $
否	$ 1.5 $	$ 1.0 $	$ -0.400\;0 $	$ 0.286 \pm 0.040\;0 $	$ 1.353\;9 $
是	$ 1.5 $	$ 0.2 $	$ -0.038\;0 $	$ 0.312 \pm 0.004\;0 $	$ 1.502\;0 $
否	$ 1.5 $	$ 0.2 $	$ 0 $	$ 0.327 \pm 0.060\;0 $	$ 1.548\;1 $
否	$ 1.5 $	$ 0.2 $	$ -0.200\;0 $	$ 0.304 \pm 0.050\;0 $	$ 1.436\;7 $
否	$ 1.5 $	$ 0.2 $	$ -0.400\;0 $	$ 0.291 \pm 0.030\;0 $	$ 1.372\;7 $
是	$ 1.5 $	$ 0.05 $	$ -0.061\;0 $	$ 0.308 \pm 0.005\;0 $	$ 1.428\;4 $
否	$ 1.5 $	$ 0.05 $	$ 0 $	$ 0.343 \pm 0.080\;0 $	$ 1.455\;0 $
否	$ 1.5 $	$ 0.05 $	$ -0.200\;0 $	$ 0.313 \pm 0.050\;0 $	$ 1.410\;8 $
否	$ 1.5 $	$ 0.05 $	$ -0.400\;0 $	$ 0.296 \pm 0.040\;0 $	$ 1.334\;0 $
是	$ 2.0 $	$ 1.0 $	$ -0.083\;6 $	$ 0.396 \pm 0.000\;1 $	$ 1.904\;0 $
否	$ 2.0 $	$ 1.0 $	$ 0 $	$ 0.410 \pm 0.051\;0 $	$ 1.963\;9 $
否	$ 2.0 $	$ 1.0 $	$ -0.200\;0 $	$ 0.388 \pm 0.037\;0 $	$ 1.835\;8 $
否	$ 2.0 $	$ 1.0 $	$ -0.400\;0 $	$ 0.378 \pm 0.055\;0 $	$ 1.792\;2 $
是	$ 2.0 $	$ 0.2 $	$ -0.161\;6 $	$ 0.430 \pm 0.006\;0 $	$ 2.060\;0 $
否	$ 2.0 $	$ 0.2 $	$ 0 $	$ 0.424 \pm 0.051\;0 $	$ 1.996\;6 $
否	$ 2.0 $	$ 0.2 $	$ -0.200\;0 $	$ 0.401 \pm 0.045\;0 $	$ 1.881\;5 $
否	$ 2.0 $	$ 0.2 $	$ -0.400\;0 $	$ 0.384 \pm 0.040\;0 $	$ 1.813\;7 $
是	$ 2.0 $	$ 0.05 $	$ -0.275\;0 $	$ 0.239 \pm 0.002\;0 $	$ 1.730\;0 $
否	$ 2.0 $	$ 0.05 $	$ 0 $	$ 0.441 \pm 0.089\;0 $	$ 1.871\;8 $
否	$ 2.0 $	$ 0.05 $	$ -0.200\;0 $	$ 0.411 \pm 0.049\;0 $	$ 1.719\;3 $
否	$ 2.0 $	$ 0.05 $	$ -0.400\;0 $	$ 0.389 \pm 0.011\;6 $	$ 1.495\;7 $

下载: 导出CSV

A1 变量含义说明

A1 Description of the meaning of variables

类别	变量	含义
运动跟踪	$ v_{x,\ y} $	质心在$ x \text{-} y $平面上的实际速度
	$ v_{x,\ y}^{cmd} $	期望的质心水平面速度
	$ \omega_z $	质心绕$ z $轴的实际角速度
	$ \omega_z^{cmd} $	期望的质心偏航角速度
	$ h_z $	质心的实际垂直高度
	$ h_z^{cmd} $	期望的质心垂直高度
	$ \rho $	躯干的实际俯仰角
	$ \rho^{cmd} $	期望的俯仰角
	$ C_{\text{foot}}^{\text{cmd}}(t) $	$ t $ 时刻, 期望的足端接触状态
	$ f_\text{foot}^\text{cmd} $	期望的足端接触力
	$ v_{\text{foot}}^{\text{cmd}} $	期望的足端速度
	$ p_\text{foot}^c $	足端实际接触时的位置
	$ p_\text{foot}^{c,\ \text{cmd}} $	期望落足点位置
	$ \mathbb{I}(t = t_\text{contact}) $	指示函数
	$ \sigma_{*} $	缩放因子
姿态稳定性	$ v_z $	质心在$ z $轴上的实际速度
	$ \omega_{x,\ y} $	质心绕$ x \text{-} y $轴的实际角速度
	$ v_{\text{foot},\ x,\ y} $	足端在$ x \text{-} y $平面上的实际速度
运动约束	$ 1_{\text{collision}} $	是否发生碰撞
	$ q_i $	第 $ i $ 个关节的实际角度
	$ q_\text{max},\ q_\text{min} $	关节角度的上限和下限
	$ \tau $	施加在关节上的力矩
	$ \dot{q} $	关节角速度
	$ \ddot{q} $	关节角加速度
	$ a_t,\ a_{t-1} $	在$ t $和$ t-1 $时刻的动作

下载: 导出CSV

参考文献(25)

[1]	Shao Y, Jin Y, Liu X, He W, Wang H, Yang W. Learning free gait transition for quadruped robots via phase-guided controller. IEEE Robotics and Automation Letters, 2021, 7(2): 1230−1237
[2]	Kang D, de Vincenti F, Adami N, Coros S. Animal motions on legged robots using nonlinear model predictive control. In: Proceedings of the International Conference on Intelligent Robots and Systems (IROS). Kyoto, Japan: IEEE, 2022. 11955−11962
[3]	Wensing P, Posa M, Hu Y, Escande A, Mansard N. Optimization-based control for dynamic legged robots. IEEE Transactions on Robotics, 2023, 40: 43−63
[4]	Ding Y, Pandala A, Park H. Real-time model predictive control for versatile dynamic motions in quadrupedal robots. In: Proceedings of the International Conference on Robotics and Automation (ICRA). Montreal, Canada: IEEE, 2019. 8484−8490
[5]	Romero A, Song Y, Scaramuzza D. Actor-critic model predictive control. arXiv preprint arXiv: 2306.09852, 2023.
[6]	Hwangbo J, Lee J, Dosovitskiy A, Bellicoso D, Tsounis V, Koltun V, et al. Learning agile and dynamic motor skills for legged robots. Science Robotics, 2019, 4(26): Article No. eaau5872 doi: 10.1126/scirobotics.aau5872
[7]	Kumar A, Fu Z, Pathak D, Malik J. RMA: Rapid motor adaptation for legged robots. arXiv preprint arXiv: 2107.04034, 2021.
[8]	Rudin N, Hoeller D, Reist P, Hutter M. Learning to walk in minutes using massively parallel deep reinforcement learning. In: Proceedings of the Conference on Robot Learning (CoRL). London, UK: PMLR, 2022. 91−100
[9]	Hasson C, Manczurowsky J, Yen S. A reinforcement learning approach to gait training improves retention. Frontiers in Human Neuroscience, 2015, 9: 459−467
[10]	Haarnoja T, Ha S, Zhou A, Tucker G, Levine S. Learning to walk via deep reinforcement learning. arXiv preprint arXiv: 1812.11103, 2018.
[11]	Weng J, Hashemi E, Arami A. Natural walking with musculoskeletal models using deep reinforcement learning. IEEE Robotics and Automation Letters, 2021, 6(2): 4156−4162 doi: 10.1109/LRA.2021.3067617
[12]	Shi H, Zhou B, Zeng H, Wang F, Dong Y, Li J. Reinforcement learning with evolutionary trajectory generator: A general approach for quadrupedal locomotion. IEEE Robotics and Automation Letters, 2022, 7(2): 3085−3092 doi: 10.1109/LRA.2022.3145495
[13]	Chai H, Rong X, Tang X, Li Y. Gait-based quadruped robot planar hopping control with energy planning. International Journal of Advanced Robotic Systems, 2016, 13(1): 20−32 doi: 10.5772/62140
[14]	Raibert M. Trotting, pacing and bounding by a quadruped robot. Journal of Biomechanics, 1990, 23: 79−98 doi: 10.1016/0021-9290(90)90043-3
[15]	Jiang Z, Li M, Guo W. Running control of a quadruped robot in trotting gait. In: Proceedings of the IEEE 5th International Conference on Robotics, Automation and Mechatronics (RAM). Qingdao, China: IEEE, 2011. 172−177
[16]	Fukuoka Y, Kimura H. Dynamic locomotion of a biomorphic quadruped ‘Tekken’ robot using various gaits: Walk, trot, free-gait and bound. Applied Bionics and Biomechanics, 2009, 6(1): 63−71
[17]	Adak O, Erbatur K. Bound gait reference generation of a quadruped robot via contact force planning. International Journal of Mechanical Engineering and Robotics Research, 2022, 11(3): 129−137
[18]	Xiao W, Wang W. Hopf oscillator-based gait transition for a quadruped robot. In: Proceedings of the International Conference on Robotics and Biomimetics (ROBIO). Bali, Indonesia: IEEE, 2014. 2074−2079
[19]	Margolis G, Agrawal P. Walk these ways: Tuning robot control for generalization with multiplicity of behavior. In: Proceedings of the Conference on Robot Learning (CoRL). Auckland, New Zealand: PMLR, 2023. 22−31
[20]	Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. arXiv preprint arXiv: 1707.06347, 2017.
[21]	Heiden T, Sanderson D, Inglis J, Siegmund G. Adaptations to normal human gait on potentially slippery surfaces: The effects of awareness and prior slip experience. Gait & Posture, 2006, 24(2): 237−246
[22]	Cappellini G, Ivanenko Y, Dominici N, Poppele R, Lacquaniti F. Motor patterns during walking on a slippery walkway. Journal of Neurophysiology, 2010, 103(2): 746−760 doi: 10.1152/jn.00499.2009
[23]	Puterman L. Markov decision processes. Handbooks in Operations Research and Management Science, 1990, 2: 331−434
[24]	Miki T, Lee J, Hwangbo J, Koltun K, Hutter M. Learning robust perceptive locomotion for quadrupedal robots in the wild. Science Robotics, 2022, 7(62): Article No. eabk2822 doi: 10.1126/scirobotics.abk2822
[25]	Makoviychuk V, Wawrzyniak L, Guo Y, Storey K, Macklin M, Hoeller D, et al. Isaac Gym: High performance gpu-based physics simulation for robot learning. arXiv preprint arXiv: 2108.10470, 2021.