[1]
|
Zhao Dong-Bin,Liu De-Rong,Yi Jian-Qiang. An overview on the adaptive dynamic programming based urban city traffic signal optimal control. Acta Automatica Sinica,2009,35(6):676-681(赵冬斌,刘德荣,易建强. 基于自适应动态规划的城市交通信号优化控制方法综述. 自动化学报,2009,35(6):676-681)[2] Zhang W,Dietterich T G. Value function approximation and job-shop scheduling. In:Proceedings of the Workshop on Value Function Approximation,Report Number CMU-CS-95-206,School of Computer Science,Carnegie-Mellon University,USA,1995[3] Sugiyama M,Hachiya H,Towell C,Vijayakumar S. Value function approximation on non-linear manifolds for robot motor control. In:Proceedings of the IEEE International Conference on Robotics and Automation. Rome,Italy:IEEE,2007. 1733-1740[4] Barto A G,Sutton R S,Anderson C W. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on System,Man and Cybernetics,1983,13(5):834-846[5] Peters J,Schaal S. Policy gradient methods for robotics. In:Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. Beijing,China:IEEE,2006. 2219-2225[6] Cheng Yu-Hu,Feng Huan-Ting,Wang Xue-Song. Policy iteration reinforcement learning based on geodesic Gaussian basis defined on state-action graph. Acta Automatica Sinica,2011,37(1):44-51(程玉虎,冯涣婷,王雪松. 基于状态--动作图测地高斯基的策略迭代强化学习. 自动化学报,2011,37(1):44-51)[7] Wang Xue-Ning,Chen Wei,Zhang Meng,Xu Xin,He Han-Gen. A survey of direct policy search methods in reinforcement learning. CAAI Transactions on Intelligent Systems,2007,2(1):16-24(王学宁,陈伟,张锰,徐昕,贺汉根. 增强学习中的直接策略搜索方法综述. 智能系统学报,2007,2(1):16-24)[8] Dayan P,Hinton G E. Using expectation-maximization for reinforcement learning. Neural Computation,1997,9(2):271-278[9] Peters J,Schaal S. Reinforcement learning by reward-weighted regression for operational space control. In:Proceedings of the 24th International Conference on Machine Learning. Corvallis,USA:ACM,2007. 745-750[10] Wang Xue-Song,Tian Xi-Lan,Cheng Yu-Hu,Yi Jian-Qiang. Q-learning system based on cooperative least squares support vector machine. Acta Automatica Sinica,2009,35(2):214-219(王雪松,田西兰,程玉虎,易建强. 基于协同最小二乘支持向量机的Q学习. 自动化学报,2009,35(2):214-219)[11] Williams R J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning,1992,8(3-4):229-256[12] Rückstie\ss T,Felder M,Schmidhuber J. State-dependent exploration for policy gradient methods. In:Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases. Antwerp,Belgium:Springer,2008. 234-249[13] Peters J,Kober J. Using reward-weighted imitation for robot reinforcement learning. In:Proceedings of the IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning. Nashville,USA:IEEE,2009. 226-232[14] Sehnke F,Osendorfer C,Rückstie\ss T,Graves A,Peters J,Schmidhuber J. Parameter-exploring policy gradients. Neural Networks,2010,23(4):551-559[15] Tang Hao,Wan Hai-Feng,Han Jiang-Hong,Zhou Lei. Coordinated look-ahead control of multiple CSPS system by multi-agent reinforcement learning. Acta Automatica Sinica,2010,36(2):289-296(唐昊,万海峰,韩江洪,周雷. 基于多Agent强化学习的多站点CSPS系统的协作Look-ahead 控制. 自动化学报,2010,36(2):289-296)[16] Hachiya H,Peters J,Sugiyama M. Efficient sample reuse in EM-based policy search. In:Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases. Bled,Slovenia:Springer,2009. 469-484[17] Riedmiller M,Peters J,Schaal S. Evaluation of policy gradient methods and variants on the cart-pole benchmark. In:Proceedings of the IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning. Honolulu,USA:IEEE,2007. 254-261[18] Peters J,Vijayakumar S,Schaal S. Natural actor-critic. In:Proceedings of the 16th European Conference on Machine Learning. Porto,Portugal:Springer,2005. 280-291
|