2.624

2020影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于高斯回归的连续空间多智能体跟踪学习

陈鑫 魏海军 吴敏 曹卫华

陈鑫, 魏海军, 吴敏, 曹卫华. 基于高斯回归的连续空间多智能体跟踪学习. 自动化学报, 2013, 39(12): 2021-2031. doi: 10.3724/SP.J.1004.2013.02021
引用本文: 陈鑫, 魏海军, 吴敏, 曹卫华. 基于高斯回归的连续空间多智能体跟踪学习. 自动化学报, 2013, 39(12): 2021-2031. doi: 10.3724/SP.J.1004.2013.02021
CHEN Xin, WEI Hai-Jun, WU Min, CAO Wei-Hua. Tracking Learning Based on Gaussian Regression for Multi-agent Systems in Continuous Space. ACTA AUTOMATICA SINICA, 2013, 39(12): 2021-2031. doi: 10.3724/SP.J.1004.2013.02021
Citation: CHEN Xin, WEI Hai-Jun, WU Min, CAO Wei-Hua. Tracking Learning Based on Gaussian Regression for Multi-agent Systems in Continuous Space. ACTA AUTOMATICA SINICA, 2013, 39(12): 2021-2031. doi: 10.3724/SP.J.1004.2013.02021

基于高斯回归的连续空间多智能体跟踪学习

doi: 10.3724/SP.J.1004.2013.02021
基金项目: 

国家自然科学基金(61074058)资助

详细信息
    作者简介:

    陈鑫 中南大学副教授. 主要研究方向为多智能体系统,智能控制和过程控制.E-mail:chenxin@csu.edu.cn

Tracking Learning Based on Gaussian Regression for Multi-agent Systems in Continuous Space

Funds: 

Supported by National Natural Science Foundation of China (61074058)

  • 摘要: 提高适应性、实现连续空间的泛化、降低维度是实现多智能体强化学习(Multi-agent reinforcement learning,MARL)在连续系统中应用的几个关键. 针对上述需求,本文提出连续多智能体系统(Multi-agent systems,MAS)环境下基于模型的智能体跟踪式学习机制和算法(MAS MBRL-CPT).以学习智能体适应同伴策略为出发点,通过定义个体期望即时回报,将智能体对同伴策略的观测融入环境交互效果中,并运用随机逼近实现个体期望即时回报的在线学习.定义降维的Q函数,在降低学习空间维度的同时,建立MAS环境下智能体跟踪式学习的Markov决策过程(Markov decision process,MDP).在运用高斯回归建立状态转移概率模型的基础上,实现泛化样本集Q值函数的在线动态规划求解.基于离散样本集Q函数运用高斯回归建立值函数和策略的泛化模型. MAS MBRL-CPT在连续空间Multi-cart-pole控制系统的仿真实验表明,算法能够使学习智能体在系统动力学模型和同伴策略未知的条件下,实现适应性协作策略的学习,具有学习效率高、泛化能力强等特点.
  • [1] Busoniu L, Babuska R, De Schutter B. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 2008, 38(2): 156-172
    [2] Kaelbling L P, Littman M L, Moore A W. Reinforcement learning: a survey. Journal of Artificial Intelligence Research, 1996, 4: 237-285
    [3] Chen Xue-Song, Yang Yi-Min. Reinforcement learning: survey of recent work. Application Research of Computers, 2010, 27(8): 2834-2838, 2844 (陈学松, 杨宜民. 强化学习研究综述. 计算机应用研究, 2010, 27(8): 2834-2838, 2844)
    [4] Cheng Yu-Hu, Feng Huan-Ting, Wang Xue-Song. Policy iteration reinforcement learning based on geodesic Gaussian basis defined on state-action graph. Acta Automatica Sinica, 2011, 37(1): 44-51 (程玉虎, 冯涣婷, 王雪松. 基于状态-动作图测地高斯基的策略迭代强化学习. 自动化学报, 2011, 37(1): 44-51)
    [5] Xu Xin, Shen Dong, Gao Yan-Qing, Wang Kai. Learning control of dynamical systems based on Markov decision processes: research frontiers and outlooks. Acta Automatica Sinica, 2012, 38(5): 673-687 (徐昕, 沈栋, 高岩青, 王凯. 基于马氏决策过程模型的动态系统学习控制: 研究前沿与展望. 自动化学报, 2012, 38(5): 673-687)
    [6] Busoniu L, De Schutter B, Babuška R. Approximate dynamic programming and reinforcement learning. In: Proceedings of the 2010 Interactive Collaborative Information Systems, Studies in Computational Intelligence. Berlin Heidelberg: Springer, 2010, 281: 3-44
    [7] Wang Xue-Song, Tian Xi-Lan, Cheng Yu-Hu, Yi Jian-Qiang. Q-learning system based on cooperative least squares support vector machine. Acta Automatica Sinica, 2009, 35(2): 214-219 (王雪松, 田西兰, 程玉虎, 易建强. 基于协同最小二乘支持向量机的Q学习. 自动化学报, 2009, 35(2): 214-219)
    [8] Busoniu L, Ernst D, De Schutter B, Babuska R. Online least-squares policy iteration for reinforcement learning control. In: Proceedings of the 2010 American Control Conference. Baltimore, USA: IEEE, 2010. 486-491
    [9] Rasmussen C E, Kuss M. Gaussian processes in reinforcement learning. In: Proceedings of the 17th Annual Conference on Neural Information Processing Systems. Vancouver, Canada: MIT Press, 2003. 751-759
    [10] Jung T, Stone P. Gaussian processes for sample efficient reinforcement learning with RMAX-like exploration. In: Proceedings of the 2010 European Conference on Machine Learning and Knowledge Discovery in Databases, Part I. Berlin, Heidelberg: Springer-Verlag, 2010. 601-616
    [11] Deisenroth M P, Rasmussen C E. PILCO: a model-based and data-efficient approach to policy search. In: Proceedings of the 28th International Conference on Machine Learning. Washington, USA, 2011. 465-472
    [12] Deisenroth M P, Rasmussen C E, Peters J. Gaussian process dynamic programming. Neurocomputing, 2009, 72(7-9): 1508-1524
    [13] Wu Jun, Xu Xin, Wang Jian, He Han-Gen. Recent advances of reinforcement learning in multi-robot systems: a survey. Control and Decision, 2011, 26(11): 1601-1610, 1615 (吴军, 徐昕, 王健, 贺汉根. 面向多机器人系统的增强学习研究进展综述. 控制与决策, 2011, 26(11): 1601-1610, 1615)
    [14] Hu J L, Wellman M P. Nash Q-learning for general-sum stochastic games. The Journal of Machine Learning Research, 2003, 4: 1039-1069
    [15] Greenwald A, Hall K. Correlated Q-learning. In: Proceedings of the 20th International Conference on Machine Learning. Washington D.C., USA: AAAI Press, 2003. 242-249
    [16] Conitzer V, Sandholm T. AWESOME: a general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents. Machine Learning, 2007, 67(1-2): 23-43
    [17] Weinberg M, Rosenschein J S, Paul K. Best-response multiagent learning in non-stationary environments. In: Proceedings of the 3rd International Joint Conference on Autonomous Agents and Multiagent Systems. Washington D.C., USA: IEEE, 2004. 506-513
    [18] Chen C L, Li H X, Dong D Y. Hybrid control for robot navigation: a hierarchical Q-learning algorithm. IEEE Robotics and Automation Magazine, 2008, 15(2): 37-47
    [19] Dai Zhao-Hui, Yuan Jiao-Hong, Wu Min, Chen Xin. Dynamic hierarchical reinforcement learning based on probability model. Control Theory and Applications, 2011, 28(11): 1595-1600, 1606 (戴朝晖, 袁姣红, 吴敏, 陈鑫. 基于概率模型的动态分层强化学习. 控制理论与应用, 2011, 28(11): 1595-1600, 1606)
    [20] Shoham Y, Powers R, Grenager T. Multi-agent Reinforcement Learning: a Critical Survey, Technical Report, Computer Science Department, Stanford University, 2003
    [21] Rasmussen C E, Williams C K I. Gaussian Processes for Machine Learning. Cambridge, MA, USA: The MIT Press, 2006
    [22] Florian R V. Correct Equations for the Dynamics of the Cart-pole System. Technical Report, Center for Cognitive and Neural Studies, 2007
  • 加载中
计量
  • 文章访问数:  1250
  • HTML全文浏览量:  46
  • PDF下载量:  1326
  • 被引次数: 0
出版历程
  • 收稿日期:  2012-04-17
  • 修回日期:  2013-05-13
  • 刊出日期:  2013-12-20

目录

    /

    返回文章
    返回