Coordinate Control of Multiple CSPS System Based on State Aggregation Method
-
摘要: 单站点传送带给料加工站(Conveyor-serviced production station,CSPS)系统中,可运用强化学习对状态——行动空间进行有效探索,以搜索近似最优的前视距离控制策略.但是多站点CSPS系统的协同控制问题中,系统状态空间的大小会随着站点个数的增加和缓存库容量的增加而成指数形式(或几何级数)增长,从而导致维数灾,影响学习算法的收敛速度和优化效果.为此,本文在站点局域信息交互机制的基础上引入状态聚类的方法,以减小每个站点学习空间的大小和复杂性.首先,将多个站点看作相对独立的学习主体,且各自仅考虑邻近下游站点的缓存库的状态并纳入其性能值学习过程;其次,将原状态空间划分成多个不相交的子集,每个子集用一个抽象状态表示,然后,建立基于状态聚类的多站点反馈式Q学习算法.通过该方法,可在抽象状态空间上对各站点的前视距离策略进行优化学习,以寻求整个系统的生产率最大.仿真实验结果说明,与一般的多站点反馈式Q学习方法相比,基于状态聚类的多站点反馈式Q学习方法不仅具有收敛速度快的优点,而且还在一定程度上提高了系统生产率.Abstract: In a single conveyor-serviced production station (CSPS) system, we can learn an approximate optimal look-ahead policy by reinforcement learning (RL) through exploring the state-action space. However, for the coordinate control problem in a multiple CSPS system, the state space will grow exponentially or geometrically as the number of stations and the capacity of buffer increase. As a result, the learning process will suffer from the curse of dimensionality, which may have a negative influence on convergence speed and optimized value. Therefore, by combining a local information interaction mechanism among stations, we introduce a state aggregation method to reduce the size and complexity of each station's leaning space. Firstly, each station is regarded as an independent learning agent that incorporates only the buffer state of its nearest downstream station into its own learning process. Secondly, the original state space is divided into several disjoint sets and each set is represented by an abstract state, and a multiple-agent state aggregation feedback Q-learning (SAFQL) algorithm is proposed afterwards. Through our proposed approach, the agent can learn an optimized look-ahead policy over the abstract state space to improve the entire system's processing rate. Finally, we demonstrate by a numerical example that, in comparison to general feedback Q-learning algorithm, SAFQL algorithm can not only fasten the convergence speed, but also improve the processing rate in some degree.
-
[1] Matsui M. A generalized model of convey-serviced production station (CSPS). Journal of Japan Industrial Management Association, 1993, 44(1): 25-32 [2] Matsui M. CSPS model: look-ahead controls and physics. International Journal of Production Research, 2005, 43(10): 2001-2025 [3] Hao T, Tamio A. Look-ahead control of conveyor-serviced production station by using potential-based online policy iteration. International Journal of Control, 2009, 82(10): 1917-1928 [4] Yamada T, Satomi K, Matsui M. Strategic selection of assembly systems under viable demands. Assembly Automation, 2006, 26(4): 335-342 [5] Nakase N, Yamada T, Matsui M. A management design approach to a simple flexible assembly system. International Journal of Production Economics, 2002, 76(3): 281-292 [6] Feyzbakhsh S A, Matsui M. Adam-eve-like genetic algorithm: a methodology for optimal design of a simple flexible assembly system. Computers & Industrial Engineering, 1999, 36(2): 233-258 [7] Tang Hao, Wan Hai-Feng, Han Jiang-Hong, Zhou Lei. Coordinated look-ahead control of multiple CSPS system by multi-agent reinforcement learning. Acta Automatica Sinica, 2010, 36(2): 289-296(唐昊, 万海峰, 韩江洪, 周雷. 基于多Agent 强化学习的多站点CSPS 系统的协作 Look-ahead 控制. 自动化学报, 2010, 36(2): 289-296) [8] Yan Q C, Liu Q, Hu D J. A hierarchical reinforcement learning algorithm based on heuristic reward function. In: Proceedings of the 2nd IEEE International Conference on Advanced Computer Control. Shenyang, China: IEEE, 2010. 371-376 [9] Botvinick M M. Hierarchical reinforcement learning and decision making. Current Opinion in Neurobiology, 2012, 22(6): 956-962 [10] Jia Q S. Event-based optimization with lagged state information. In: Proceedings of the 31st Chinese Control Conference. Hefei, China: IEEE, 2012. 2055-2060 [11] Yuasa H, Ito M. Self-organizing system theory by use of reaction-diffusion equation on a graph with boundary. In: Proceedings of the 1999 IEEE International Conference on Systems, Man, and Cybernetics. Tokyo, Japan: IEEE, 1999. 211-216 [12] Wright R, Lin S. Evolutionary tile coding: an automated state abstraction algorithm for reinforcement learning. In: Proceedings of the the 2010 Abstraction, Reformulation, and Approximation. Atlanta, Georgia, USA: the Association for the Advancement of Artificial Intelligence Workshops, 2010 [13] Li L H, Walsh T J, Littman M L. Towards a unified theory of state abstraction for MDPs. In: Proceedings of the 9th International Symposium on Artificial Intelligence and Mathematics. Fort Lauderdale, Florida, USA: Kluwer AcademicPublishers, 2006. 531-539 [14] Singh S P, Jaakkola T, Jordan M I. Reinforcement learning with soft state aggregation. In: Proceedings of the 1995 Conference on Neural Information Processing Systems. Denver, CO, USA: MIT, 1995. 361-368 [15] Gunady M K, Gomaa W. Reinforcement learning generalization using state aggregation with a maze-solving problem. In: Proceedings of the 2012 Japan-Egypt Conference on Electronics, Communication and Computers. Alexandria, Egypt: IEEE, 2012. 157-162 [16] Cao X R. Semi-Markov decision problems and performance sensitivity analysis. IEEE Transaction on Automatic Control, 2003, 48(5): 758-769
点击查看大图
计量
- 文章访问数: 1968
- HTML全文浏览量: 42
- PDF下载量: 1040
- 被引次数: 0