2.793

2018影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于深度强化学习的双足机器人斜坡步态控制方法

吴晓光 刘绍维 杨磊 邓文强 贾哲恒

吴晓光, 刘绍维, 杨磊, 邓文强, 贾哲恒. 基于深度强化学习的双足机器人斜坡步态控制方法. 自动化学报, 2020, 46(x): 1−12 doi: 10.16383/j.aas.c190547
引用本文: 吴晓光, 刘绍维, 杨磊, 邓文强, 贾哲恒. 基于深度强化学习的双足机器人斜坡步态控制方法. 自动化学报, 2020, 46(x): 1−12 doi: 10.16383/j.aas.c190547
Wu Xiao-Guang, Liu Shao-Wei, Yang Lei, Deng Wen-Qiang, Jia Zhe-Heng. A Gait Control Method for Biped Robot on Slope Based on Deep Reinforcement Learning. Acta Automatica Sinica, 2020, 46(x): 1−12 doi: 10.16383/j.aas.c190547
Citation: Wu Xiao-Guang, Liu Shao-Wei, Yang Lei, Deng Wen-Qiang, Jia Zhe-Heng. A Gait Control Method for Biped Robot on Slope Based on Deep Reinforcement Learning. Acta Automatica Sinica, 2020, 46(x): 1−12 doi: 10.16383/j.aas.c190547

基于深度强化学习的双足机器人斜坡步态控制方法

doi: 10.16383/j.aas.c190547
基金项目: 国家自然科学基金(61503325), 中国博士后科学基金(2015M581316)资助
详细信息
    作者简介:

    吴晓光:燕山大学副教授, 2012年获得哈尔滨工业大学博士学位. 主要研究方向为双足机器人、三维虚拟视觉重构等E-mail: wuxiaoguang@ysu.edu.cn

    刘绍维:燕山大学电气工程学院硕士研究生. 主要研究方向为深度强化学习、双足机器人. 本文通信作者.E-mail: lwsalpha@outlook.com

    杨磊:燕山大学电气工程学院硕士研究生. 主要研究方向为双足机器人稳定性分析.E-mail: 15733513567@163.com

    邓文强:燕山大学电气工程学院硕士研究生. 主要研究方向为生成对抗网络、人体运动协调性分析等.E-mail: dengwq24@163.com

    贾哲恒:燕山大学电气工程学院硕士研究生. 主要研究方向为人体姿态估计、目标识别、深度学习.E-mail: jiazheheng@163.com

A Gait Control Method for Biped Robot on Slope Based on Deep Reinforcement Learning

Funds: Supported by National Natural Science Foundation of China (61503325), China Postdoctoral Science Foundation under Grants (2015M581316)
  • 摘要: 为提高准被动双足机器人斜坡步行稳定性, 本文提出了一种基于深度强化学习的准被动双足机器人步态控制方法. 通过分析准被动双足机器人的混合动力学模型与稳定行走过程, 建立了状态空间、动作空间、episode过程与奖励函数. 在利用基于DDPG改进的Ape-X DPG算法持续学习后, 准被动双足机器人能在较大斜坡范围内实现稳定行走. 仿真实验表明, Ape-X DPG无论是学习能力还是收敛速度均优于基于PER的DDPG. 同时, 相较于能量成型控制, 使用Ape-X DPG的准被动双足机器人步态收敛更迅速、步态收敛域更大, 证明Ape-X DPG可有效提高准被动双足机器人的步行稳定性.
  • 图  1  机器人模型示意图

    Fig.  1  Sketch of the biped model

    图  2  被动步行过程

    Fig.  2  Passive dynamic waking process

    图  3  DDPG中神经网络训练过程

    Fig.  3  The neural network training process in DDPG

    图  4  APE-X DPG算法结构

    Fig.  4  The structure of Ape-X DPG

    图  5  交互单元n中episode过程

    Fig.  5  Episode process in interaction unit n

    图  6  falls = 0时的奖励函数空间

    Fig.  6  Landscape of the reward function when falls = 0

    图  7  平均奖励值曲线

    Fig.  7  The curve of the average reward

    图  8  测试集稳定行走次数

    Fig.  8  Stable walking times in test

    图  9  机器人左腿相空间图

    Fig.  9  The phase plane of the right leg

    图  10  初始状态b时机器人行走状态

    Fig.  10  Biped walking state in initial state b

    图  11  机器人行走过程棍状图

    Fig.  11  The git diagrams of the biped

    图  12  机器人物理模型示意图

    Fig.  12  Sketch of the biped physical model

    图  13  机器人物理仿真

    Fig.  13  Robot physics simulation

    图  14  稳定行走胞数

    Fig.  14  The number of the state walking

    图  15  $ \phi = 0.1 $时机器人步态收敛域

    Fig.  15  The biped BOA when $ \phi = 0.1 $

    表  1  机器人符号及无量纲参数

    Table  1  Symbols and dimensionless default values of biped parameters

    参数 符号 数值
    腿长 I 1
    腿部质心 m1 1
    髋关节质心 m2 2
    足半径 r 0.3
    腿部质心与圆弧足中心距离 I1 0.55
    髋关节与圆弧足中心距离 I2 0.7
    髋关节到腿部质心距离 c 0.15
    腿部转动惯量 J1 0.01
    重力加速度 g 9.8
    下载: 导出CSV

    表  2  扰动函数N分配与学习耗时

    Table  2  Noise function N settings and learning time

    算法 高斯扰动 O-U扰动 网络参数扰动[39] 耗时
    DDPG 0 1 0 6.4 h
    2交互单元 1 1 0 4.2 h
    4交互单元 2 1 1 4.2 h
    6交互单元 2 2 2 4.3 h
    下载: 导出CSV

    表  3  机器人初始状态

    Table  3  The Initial states of the biped

    状态 $\theta_1$ $\dot\theta_1$ $\dot\theta_2$ $\phi$
    a 0.37149 −1.24226 2.97253 0.078
    b 0.24678 −1.20521 0.15476 0.121
    下载: 导出CSV
  • [1] 田彦涛, 孙中波, 李宏扬, 王静. 动态双足机器人的控制与优化研究进展. 自动化学报, 2016, 42(08): 1142−1157

    1 Tian Yan-Tao, Sun Zhong-Bo, Li Hong-Yang, Wang Jing. A review of optimal and control strategies for dynamic walking bipedal robots. Acta Automatica Sinica, 2016, 42(08): 1142−1157
    [2] 2 Chin C S, Lin W P. Robust genetic algorithm and fuzzy inference mechanism embedded in a sliding-mode controller for an uncertain underwater robot. IEEE/ASME Transactions on Mechatronics, 2018, 23(2): 655−666 doi: 10.1109/TMECH.2018.2806389
    [3] 3 Wang Y, Wang S, Wei Q, et al. Development of an Underwater Manipulator and Its Free-Floating Autonomous Operation. IEEE/ASME Transactions on Mechatronics, 2016, 21(2): 815−824 doi: 10.1109/TMECH.2015.2494068
    [4] 4 Wang Y, Wang S, Tan M, et al. Real-Time Dynamic Dubins-Helix Method for 3-D Trajectory Smoothing. IEEE Transactions on Control Systems Technology, 2015, 23(2): 730−736 doi: 10.1109/TCST.2014.2325904
    [5] 5 Wang Y, Wang S, Tan M. Path Generation of Autonomous Approach to a Moving Ship for Unmanned Vehicles. IEEE Transactions on Industrial Electronics, 2015, 62(9): 5619−5629 doi: 10.1109/TIE.2015.2405904
    [6] Ma K Y, Chirarattananon P, Wood R J. Design and fabrication of an insect-scale flying robot for control autonomy. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems. Hamburg: IEEE, 2015, 1558−1564.
    [7] 7 McGeer T. Passive Dynamic Walking. The International Journal of Robotics Research, 1990, 9(2): 62−82 doi: 10.1177/027836499000900206
    [8] Bhounsule P A, Cortell J, Ruina A. Design and control of Ranger: an energy-efficient, dynamic walking robot. In: proceedings of the 15th International Conference on Climbing and Walking Robots and the Support Technologies for Mobile Machines. Baltimore, USA, 2012: 441−448.
    [9] 9 Kurz M J, Stergiou N. An artificial neural network that utilizes hip joint actuations to control bifurcations and chaos in a passive dynamic bipedal walking model. Biological Cybernetics, 2005, 93(3): 213−221 doi: 10.1007/s00422-005-0579-6
    [10] 10 Sun Chang-Yin, He Wei, Ge Wei-Liang, Chang Cheng. Adaptive Neural Network Control of Biped Robots. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2016, 47(2): 315−326
    [11] Sugimoto Y, Osuka K. Walking control of quasi passive dynamic walking robot “Quartet Ⅲ” based on continuous delayed feedback control. In: Proceedings of the 2004 IEEE International Conference on Robotics and Biomimetics. Shenyang, China: IEEE, 2004: 606−611.
    [12] 刘德君, 田彦涛, 张雷. 双足欠驱动机器人能量成型控制. 机械工程学报, 2012, 48(23): 16−22 doi: 10.3901/JME.2012.23.016

    12 Liu De-Jun, Tian Yan-Tao, Zhang Lei. Energy shaping control of under-actuated biped robot. Chinese Journal of Mechanical Engineering, 2012, 48(23): 16−22 doi: 10.3901/JME.2012.23.016
    [13] 13 Spong M W, Holm J K, Lee D. Passivity-based control of bipedal locomotion. IEEE Robotics & Automation Magazine, 2007, 14(2): 30−40
    [14] 刘乃军, 鲁涛, 蔡莹皓, 王硕. 机器人操作技能学习方法综述. 自动化学报, 2019, 45(3): 458−470

    14 LIU Nai-Jun, LU Tao, CAI Ying-Hao, WANG Shuo. A Review of Robot Manipulation Skills Learning Methods. Acta Automatica Sinica, 2019, 45(3): 458−470
    [15] Tedrake R, Zhang T W, Seung H S. Stochastic policy gradient reinforcement learning on a simple 3D biped. In: Proceedings of 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems. Sendai, Japan: IEEE, 2004, 3: 2849-2854.
    [16] 16 Hitomi K, Shibata T, Nakamura Y, Ishii S. Reinforcement learning for quasi-passive dynamic walking of an unstable biped robot. Robotics and Autonomous Systems, 2006, 54(12): 982−988 doi: 10.1016/j.robot.2006.05.014
    [17] Ueno T, Nakamura Y, Takuma T, Shibata T, Hosoda K, Ishii S. Fast and Stable Learning of Quasi-Passive Dynamic Walking by an Unstable Biped Robot based on Off-Policy Natural Actor-Cnrtic. In: Proceedings of 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems. Beijing, China: IEEE, 2006: 5226−5231.
    [18] 刘全, 翟建伟, 章宗长, 钟珊, 周倩, 章鹏, 等. 深度强化学习综述. 计算机学报, 2018, 41(01): 1−27

    18 Liu Quan, Zhai Jian-Wei, Zhang Zong-Zhang, Zhong Shan, Zhou Qian, et al. A Survey on Deep Reinforcement Learning. Chinese Journal of Computers, 2018, 41(01): 1−27
    [19] Kendall A, Hawke J, Janz D, Mazur P, Reda D, Allen J K, etal. Learning to Drive in a Day[Online], available: https://arxiv.org/abs/1807.00412, Jul 1, 2018
    [20] 王云鹏, 郭戈. 基于深度强化学习的有轨电车信号优先控制. 自动化学报, 2019, 45(12): 2366−2377

    20 Wang Yun-Peng, Guo Ge. Signal priority control for trams using deep reinforcement learning. Acta Automatica Sinica, 2019, 45(12): 2366−2377
    [21] 张一珂, 张鹏远, 颜永红. 基于对抗训练策略的语言模型数据增强技术. 自动化学报, 2018, 44(5): 891−900

    21 Zhang Yi-Ke, Zhang Peng-Yuan, Yan Yong-Hong. Data Augmentation for Language Models via Adversarial Training. Acta Automatica Sinica, 2018, 44(5): 891−900
    [22] Andreas J, Rohrbach M, Darrell T, Klein D. Learning to Compose Neural Networks for Question Answering In: proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. San Diego, California: Association for Computational Linguistics, 2016. 1545−1554.
    [23] Zhang X, Lapata M. Sentence simplification with deep reinforcement learning. In: proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen, Denmark: Association for Computational Linguistics, 2017. 584−594
    [24] 赵玉婷, 韩宝玲, 罗庆生. 基于deep Q-network双足机器人非平整地面行走稳定性控制方法. 计算机应用, 2018, 38(9): 2459−2463

    24 Zhao Yu-Ting, Han Bao-Ling, Luo Qing-Sheng. Walking stability control method for biped robot on uneven ground based on Deep Q-Network. Journal of Computer Applications, 2018, 38(9): 2459−2463
    [25] 25 Mnih V, Kavukcuoglu K, Silver D, Rusu A A, Veness J, Bellemare M G, et al. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529−533 doi: 10.1038/nature14236
    [26] Kumar A, Paul N, Omkar S N. Bipedal Walking Robot using Deep Deterministic Policy Gradient. In: proceedings of 2018 IEEE Symposium Series on Computational Intelligence. Bengaluru, India: IEEE, 2018.
    [27] Lillicrap T P, Hunt J J, Pritzel A, Heess N, Erez T, Tassa Y, et al. Continuous control with deep reinforcement learning[Online], available: https: //arxiv.org/abs/1509.02971, Sep 9, 2015
    [28] Song D R, Yang Chuan-Yu, McGreavy C, Li Zhi-Bin. Recurrent Deterministic Policy Gradient Method for Bipedal Locomotion on Rough Terrain Challenge. In: proceedings of 2018 15th International Conference on Control, Automation, Robotics and Vision. Singapore, Singapore: IEEE, 2018. 311−318.
    [29] Todorov E, Erez T, Tassa Y. Mujoco: A physics engine for model-based control. In: proceedings of 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. Algarve, Portugal: IEEE. 2012: 5026−5033.
    [30] Palanisamy P. Hands-On Intelligent Agents with OpenAI Gym: Your guide to developing AI agents using deep reinforcement learning. Packt Publishing Ltd, 2018.
    [31] Schaul T, Quan J, Antonoglou I, Silver D. Prioritized Experience Replay. In: proceedings of International Conference on Learning Representations 2016. San Juan, Puerto Rico, 2016. 322−355.
    [32] Horgan D, Quan J, Budden D, Maron G B, Hessel M, Hasselt H, et al. Distributed prioritized experience replay. In: proceedings of International Conference on Learning Representations 2018. Vancouver, Canada, 2018.
    [33] 33 Zhao Jie, Wu Xiao-Guang, Zang X Z, Yang Ji-Hong. Analysis of period doubling bifurcation and chaos mirror of biped passive dynamic robot gait. Chinese science bulletin, 2012, 57(14): 1743−1750 doi: 10.1007/s11434-012-5113-3
    [34] Silver D, Lever G, Heess N, Degris T, Wierstra D, Riedmiller M, et al. Deterministic policy gradient algorithms. In: proceedings of International Conference on International Conference on Machine Learning, Beijing, China, 2014.
    [35] Sutton R S, Barto A G. Reinforcement learning: An introduction. Cambridge:MIT press, 1998.
    [36] Zhao Jie, Wu Xiao-Guang, Zhu Yan-He, Li Ge. The improved passive dynamic model with high stability. In: proceedings of 2009 International Conference on Mechatronics and Automation. Changchun, China: IEEE, 2009. 4687−4692.
    [37] Abadi M, Barham P, Chen Jian-Min, Chen Zhi-Feng, Andy D, Jeffrey D, et al. Tensorflow: A system for large-scale machine learning. In: proceedings of 12th USENIX Symposium on Operating Systems Design and Implementation. Savannah, USA, 2016: 265−283.
    [38] Kingma D P, Ba J. Adam: A Method for Stochastic Optimization. In: proceedings of 3rd International Conference for Learning Representations. San Diego, USA. 2015.
    [39] Plappert M, Houthooft R, Dhariwal P, Sidor S, Chen R Y, Chen Xi, et al. Parameter Space Noise for Exploration[Online], available: https://arxiv.org/abs/1706.01905, Jun 6, 2017
    [40] Schwab A L, Wisse M. Basin of attraction of the simplest walking model. In: proceedings of the ASME design engineering technical conference. Pittsburgh, USA: ASME, 2001. 6: 531−539.
  • 加载中
计量
  • 文章访问数:  2360
  • HTML全文浏览量:  1677
  • 被引次数: 0
出版历程
  • 收稿日期:  2019-07-23
  • 录用日期:  2020-01-09

目录

    /

    返回文章
    返回