2.765

2022影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

一类基于谱方法的强化学习混合迁移算法

朱美强 程玉虎 李明 王雪松 冯涣婷

朱美强, 程玉虎, 李明, 王雪松, 冯涣婷. 一类基于谱方法的强化学习混合迁移算法. 自动化学报, 2012, 38(11): 1765-1776. doi: 10.3724/SP.J.1004.2012.01765
引用本文: 朱美强, 程玉虎, 李明, 王雪松, 冯涣婷. 一类基于谱方法的强化学习混合迁移算法. 自动化学报, 2012, 38(11): 1765-1776. doi: 10.3724/SP.J.1004.2012.01765
ZHU Mei-Qiang, CHENG Yu-Hu, LI Ming, WANG Xue-Song, FENG Huan-Ting. A Hybrid Transfer Algorithm for Reinforcement Learning Based on Spectral Method. ACTA AUTOMATICA SINICA, 2012, 38(11): 1765-1776. doi: 10.3724/SP.J.1004.2012.01765
Citation: ZHU Mei-Qiang, CHENG Yu-Hu, LI Ming, WANG Xue-Song, FENG Huan-Ting. A Hybrid Transfer Algorithm for Reinforcement Learning Based on Spectral Method. ACTA AUTOMATICA SINICA, 2012, 38(11): 1765-1776. doi: 10.3724/SP.J.1004.2012.01765

一类基于谱方法的强化学习混合迁移算法

doi: 10.3724/SP.J.1004.2012.01765
详细信息
    通讯作者:

    朱美强

A Hybrid Transfer Algorithm for Reinforcement Learning Based on Spectral Method

  • 摘要: 在状态空间比例放大的迁移任务中, 原型值函数方法只能有效迁移较小特征值对应的基函数, 用于目标任务的值函数逼近时会使部分状态的值函数出现错误. 针对该问题, 利用拉普拉斯特征映射能保持状态空间局部拓扑结构不变的特点, 对基于谱图理论的层次分解技术进行了改进, 提出一种基函数与子任务最优策略相结合的混合迁移方法. 首先, 在源任务中利用谱方法求取基函数, 再采用线性插值技术将其扩展为目标任务的基函数; 然后, 用插值得到的次级基函数(目标任务的近似Fiedler特征向量)实现任务分解, 并借助改进的层次分解技术求取相关子任务的最优策略; 最后, 将扩展的基函数和获取的子任务策略一起用于目标任务学习中. 所提的混合迁移方法可直接确定目标任务部分状态空间的最优策略, 减少了值函数逼近所需的最少基函数数目, 降低了策略迭代次数, 适用于状态空间比例放大且具有层次结构的迁移任务. 格子世界的仿真结果验证了新方法的有效性.
  • [1] Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press, 1998[2] Gao Yang, Chen Shi-Fu, Lu Xin. Research on reinforcement learning technology: a review. Acta Automatica Sinica, 2004, 30(1): 86-100(高阳, 陈世富, 陆鑫. 强化学习研究综述. 自动化学报, 2004, 30(1): 86-100)[3] Zhao Dong-Bin, Liu De-Rong, Yi Jian-Qiang. An overview on the adaptive dynamic programming based urban city traffic signal optimal control. Acta Automatica Sinica, 2009, 35(6): 676-681(赵冬斌, 刘德荣, 易建强. 基于自适应动态规划的城市交通信号优化控制方法综述. 自动化学报, 2009, 35(6): 676-681)[4] Barto A G, Mahadevan S. Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems, 2003, 13(4): 341-379[5] Pan S J, Yang Q. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(10): 1345-1359[6] Taylor M E, Stone P. Transfer learning for reinforcement learning domains: a survey. The Journal of Machine Learning Research, 2009, 10: 1633-1685[7] Wang Hao, Gao Yang, Cheng Xing-Guo. Transfer of reinforcement learning: the state of the art. Acta Electronica Sinica, 2008, 36(12a): 39-43(王皓, 高阳, 陈兴国. 强化学习中的迁移: 方法和进展. 电子学报, 2008, 36(12a): 39-43)[8] Mahadevan S, Maggioni M. Proto-value functions: a Laplacian framework for learning representation and control in Markov decision processes. The Journal of Machine Learning Research, 2007, 8: 2169-2231[9] Chiu C C, Soo V W. Automatic complexity reduction in reinforcement learning. Computational Intelligence, 2010, 26(1): 1-25[10] Simsek O, Wolfe A P, Barto A G. Identifying useful subgoals in reinforcement learning by local graph partitioning. In: Proceedings of the 22nd International Conference on Machine Learning. New York, USA: ACM, 2005. 816-823[11] Ferguson K, Mahadevan S. Proto-transfer Learning in Markov Decision Processes Using Spectral Methods, Technical Report, University Massachusetts, Amherst, 2008[12] Luo Si-Wei, Zhao Lian-Wei. Manifold learning algorithms based on spectral graph theory. Journal of Computer Research and Development, 2006, 43(7): 1174-1179(罗四维, 赵连伟. 基于谱图理论的流形学习算法. 计算机研究与发展, 2006, 43(7): 1174-1179)[13] Shi J B, Malik J. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(8): 888-905[14] Lagoudakis M G, Parr R. Least-squares policy iteration. Journal of Machine Learning Research, 2003, 4(12): 1107-1149[15] Wang Xue-Song, Tian Xi-Lan, Cheng Yu-Hu, Yi Jian-Qiang. Q-learning system based on cooperative least squares support vector machine. Acta Automatica Sinica, 2009, 35(2): 214-219(王雪松, 田西兰, 程玉虎, 易建强. 基于协同最小二乘支持向量机的Q学习. 自动化学报, 2009, 35(2): 214-219)[16] Xu X, Hu D W, Lu X C. Kernel-based least squares policy iteration for reinforcement learning. IEEE Transactions on Neural Networks, 2007, 18(4): 973-992[17] Chung F R K. Spectral Graph Theory. United States: American Mathematical Society, 1996[18] Sutton R S, Precup D, Singh S. Between mdps and semi-mdps: a framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 1999, 112(1-2): 181-211
  • 加载中
计量
  • 文章访问数:  1978
  • HTML全文浏览量:  102
  • PDF下载量:  1271
  • 被引次数: 0
出版历程
  • 收稿日期:  2011-12-02
  • 修回日期:  2012-05-22
  • 刊出日期:  2012-11-20

目录

    /

    返回文章
    返回