深度学习在控制领域的研究现状与展望

段艳杰; 吕宜生; 张杰; 赵学亮; 王飞跃

doi:10.16383/j.aas.2016.c160019

深度学习在控制领域的研究现状与展望

doi: 10.16383/j.aas.2016.c160019

段艳杰^1,,
吕宜生^1,,
张杰^1,2,,
赵学亮^1,,
王飞跃^1, ,

1.
中国科学院自动化研究所复杂系统管理与控制国家重点实验室北京 100190
2.
青岛智能产业技术研究院山东 266000

基金项目:

国家自然科学基金 71402178

国家自然科学基金 71232006

国家自然科学基金 61233001

详细信息

作者简介:
段艳杰中国科学院自动化研究所复杂系统管理与控制国家重点实验室博士研究生.主要研究方向为智能交通系统,机器学习及应用.E-mail:duanyanjie2012@ia.ac.cn

吕宜生中国科学院自动化研究所复杂系统管理与控制国家重点实验室助理研究员.主要研究方向为交通数据分析,动态交通建模,平行交通管理与控制系统.E-mail:yisheng.lv@ia.ac.cn

张杰中国科学院自动化研究所复杂系统管理与控制国家重点实验室助理研究员.主要研究方向为拍卖机制,最优控制与博弈论.E-mail:jie.zhang@ia.ac.cn

赵学亮中国科学院自动化研究所复杂系统管理与控制国家重点实验室博士研究生,中国自动化学会工程师.主要研究方向为社会计算,智能信息处理.E-mail:xueliang.zhao@ia.ac.cn

通讯作者:
王飞跃中国科学院自动化研究所复杂系统管理与控制国家重点实验室研究员.主要研究方向为智能系统和复杂系统的建模,分析与控制.本文通信作者.E-mail:feiyue.wang@ia.ac.cn

计量
- 文章访问数: 8717
- HTML全文浏览量: 5922
- PDF下载量: 9262
- 被引次数: 0
出版历程
- 收稿日期: 2015-12-26
- 录用日期: 2016-03-26
- 刊出日期: 2016-05-01

Deep Learning for Control: The State of the Art and Prospects

DUAN Yan-Jie^1
,,
LV Yi-Sheng^1
,,
ZHANG Jie^{1,2
,},
ZHAO Xue-Liang^1
,,
WANG Fei-Yue^{1
, ,}

1.
The State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190
2.
Qingdao Academy of Intelligent Industries, Shandong 266000

Funds:

Supported by National Natural Science Foundation of China 71402178

Supported by National Natural Science Foundation of China 71232006

Supported by National Natural Science Foundation of China 61233001

More Information

Author Bio:
Ph. D. candidate at The State Key Laboratory of Man- agement and Control for Complex Sys- tems, Institute of Automation, Chinese Academy of Sci- ences. Her research interest covers intelligent transporta- tion systems, machine learning and its application

Assistant professor at The State Key Laboratory of Man- agement and Control for Complex Sys- tems, Institute of Automation, Chinese Academy of Sci- ences. His research interest covers tra±c data analysis, dynamic tra±c modeling, and parallel tra±c management and control systems

Assistant professor at The State Key Laboratory of Manage- ment and Control for Complex Sys- tems, Institute of Automation, Chinese Academy of Sci- ences. His research interest covers online auctions, optimal control and game theory

g Ph. D. candi- date at The State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sci- ences, engineer at Chinese Association of Automation. His research interest covers social computing and intelligent in- formation processing

Corresponding author: WANG Fei-Yue Professor at The State Key Laboratory of Management and Control for Complex Systems, In- stitute of Automation, Chinese Academy of Sciences. His research interest covers modeling, analysis, and control of intelligent systems and complex systems. Corresponding author of this paper

摘要

摘要: 深度学习在特征提取与模型拟合方面显示了其潜力和优势. 对于存在高维数据的控制系统, 引入深度学习具有一定的意义. 近年来, 已有一些研究关注深度学习在控制领域的应用. 本文介绍了深度学习在控制领域的研究方向和现状, 包括控制目标识别、状态特征提取、系统参数辨识和控制策略计算. 并对相关的深度控制以及自适应动态规划与平行控制的方法和思想进行了描述. 总结了深度学习在控制领域研究中的主要作用和存在的问题, 展望了未来值得研究的方向.
- 深度学习 /
- 控制 /
- 特征 /
- 自适应动态规划
Abstract: Deep learning has shown great potential and advantage in feature extraction and model fitting. It is significant to use deep learning for control problems involving high dimension data. Currently, there have been some investigations focusing on deep learning in control. This paper is a review of related work including control object recognition, state feature extraction, system parameter identification and control strategy calculation. Besides, this paper describes the approaches and ideas of deep control, adaptive dynamic programming and parallel control related to deep learning in control. Also, this paper summarizes the main functions and existing problems of deep learning in control, presents some prospects of future work.
- Deep learning /
- control /
- feature /
- adaptive dynamic programming (ADP)

HTML全文

图 1 DBN网络结构

Fig. 1 The structure of DBN

下载: 全尺寸图片幻灯片

图 2 SAE网络结构

Fig. 2 The structure of SAE

下载: 全尺寸图片幻灯片

图 3 CNN网络结构^[6]

Fig. 3 quad The structure of CNN^[6]

下载: 全尺寸图片幻灯片

图 4 RNN网络结构

Fig. 4 The structure of RNN

下载: 全尺寸图片幻灯片

图 5 深度学习在控制系统各环节的应用

Fig. 5 The application of deep learning in control system

下载: 全尺寸图片幻灯片

图 6 机械手抓取系统^[14]

Fig. 6 Robotic grasping system^[14]

下载: 全尺寸图片幻灯片

图 7 使用深度学习进行Atari游戏

Fig. 7 Playing Atari with deep learning

下载: 全尺寸图片幻灯片

图 8 进行状态预测和学习 $Q$ 函数的深度网络^[20]

Fig. 8 Neural network for learning state prediction and $Q$ function^[20]

下载: 全尺寸图片幻灯片

图 9 进行运动控制函数研究的深度网络

Fig. 9 Deep neural network for motor control function

下载: 全尺寸图片幻灯片

图 10 深度模糊控制网络

Fig. 10 Neuro-fuzzy network

下载: 全尺寸图片幻灯片

图 11 自适应动态规划的神经网络结构

Fig. 11 The network structure of adaptive dynamic programming

下载: 全尺寸图片幻灯片

图 12 平行控制系统^[50]

Fig. 12 Parallel control systems^[50]

下载: 全尺寸图片幻灯片

参考文献(63)

[1]	LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521(7553): 436-444
[2]	Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In: Proceedings of the 2012 Advances in Neural Information Processing Systems. Lake Tahoe, Nevada, USA: Curran Associates, Inc., 2012. 1097-1105
[3]	Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks. Science, 2006, 313(5786): 504-507
[4]	Hinton G E, Osindero S, Teh Y W. A fast learning algorithm for deep belief nets. Neural Computation, 2006, 18(7): 1527-1554
[5]	Bengio Y, Lamblin P, Popovici D, Larochelle H. Greedy layer-wise training of deep networks. In: Proceedings of the 2007 Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 2007. 153-160
[6]	Lecun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998, 86(11): 2278-2324
[7]	Sutskever I. Training recurrent neural networks [Ph.D. dissertation], University of Toronto, Canada, 2013
[8]	Bengio Y. Learning deep architectures for AI. Foundations and Trends^® in Machine Learning, 2009, 2(1): 1-127
[9]	Arel I, Rose D C, Karnowski T P. Deep machine learning——a new frontier in artificial intelligence research. IEEE Computational Intelligence Magazine, 2010, 5(4): 13-18
[10]	Bengio Y, Courville A, Vincent P. Representation learning: a review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(8): 1798-1828
[11]	Schmidhuber J. Deep learning in neural networks: an overview. Neural Networks, 2015, 61: 85-117
[12]	Boulanger-Lewandowski N, Bengio Y, Vincent P. Modeling temporal dependencies in high-dimensional sequences: application to polyphonic music generation and transcription. arXiv: 12066392, 2012.
[13]	Schuster M, Paliwal K K. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 1997, 45(11): 2673-2681
[14]	Yu J, Weng K, Liang G, Xie G. A vision-based robotic grasping system using deep learning for 3D object recognition and pose estimation. In: Proceedings of the 2013 IEEE International Conference on Robotics and Biomimetics (ROBIO). Shenzhen, China: IEEE, 2013. 1175-1180
[15]	Lange S, Riedmiller M. Deep auto-encoder neural networks in reinforcement learning. In: Proceedings of the 2010 International Joint Conference on Neural Networks (IJCNN). Barcelona: IEEE, 2010. 1-8
[16]	Mattner J, Lange S, Riedmiller M. Learn to swing up and balance a real pole based on raw visual input data. In: Proceedings of the 19th International Conference on Neural Information Processing. Doha, Qatar: Springer, 2012. 126-133
[17]	Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M. Playing Atari with deep reinforcement learning. arXiv: 1312.5602, 2013.
[18]	Mnih V, Kavukcuoglu K, Silver D, Rusu A A, Veness J, Bellemare M G, Graves A, Riedmiller M, Fidjeland A K, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529-533
[19]	Punjani A, Abbeel P. Deep learning helicopter dynamics models. In: Proceedings of the 2015 IEEE International Conference on Robotics and Automation. Seattle, WA: IEEE, 2015. 3223-3230
[20]	Lenz I, Knepper R, Saxena A. DeepMPC: learning deep latent features for model predictive control. In: Proceedings of Robotics: Science and Systems (RSS). Rome, Italy, 2015.
[21]	Anderson C W, Lee M, Elliott D L. Faster reinforcement learning after pretraining deep networks to predict state dynamics. In: Proceedings of the 2015 International Joint Conference on Neural Networks (IJCNN). Killarney: IEEE, 2015. 1-7
[22]	Cheon K, Kim J, Hamadache M, Lee D. On replacing PID controller with deep learning controller for DC motor system. Journal of Automation and Control Engineering, 2015, 3(6): 452-456
[23]	Levine S. Exploring deep and recurrent architectures for optimal control. In NIPS (Neural Information Processing Systems) 2013 Workshop on Deep Learning, 2013. arXiv: 1311.1761, 2013.
[24]	Berniker M, Kording K P. Deep networks for motor control functions. Frontiers in Computational Neuroscience, 2015, 9: 32
[25]	Wang F-Y, Kim H-M. Implementing adaptive fuzzy logic controllers with neural networks: a design paradigm. Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology, 1995, 3(2): 165-180
[26]	Saridis G N, Moed M C. Analytic formulation of intelligent machines as neural nets. In: Proceedings of the 1988 IEEE International Symposium on Intelligent Control. Arlington, VA: IEEE, 1988. 22-27
[27]	Moed M C, Saridis G N. A Boltzmann machine for the organization of intelligent machines. IEEE Transactions on Systems, Man, and Cybernetics, 1990, 20(5): 1094-1102
[28]	Wang F-Y. Evolutionary Neuro-fuzzy Networks for Analysis of Complex Systems: a Memetic Approach, Technical report#03-09-99, Program for advanced research of complex systems, the University of Arizona, 1999
[29]	Wang F-Y. Modeling, analysis and synthesis of linguistic dynamic systems: a computational theory. In: Proceedings of the 1995 IEEE International Workshop on Architecture for Semiotic Modeling and Situation Control in Large Complex Systems. Monterey, CA: IEEE Press, 1995. 173-178
[30]	王飞跃. 词计算和语言动力学系统的基本问题和研究. 自动化学报, 2005, 31(6): 844-852 Wang Fei-Yue. Fundamental issues in research of computing with words and linguistic dynamic systems. Acta Automatica Sinica, 2005, 31(6): 844-852
[31]	Saridis G N, Stephanou H E. A hierarchical approach to the control of a prosthetic arm. IEEE Transactions on Systems, Man, and Cybernetics, 1977, 7(6): 407-420
[32]	Bellman R. On the theory of dynamic programming. Proceedings of the National Academy of Sciences of the United States of America, 1952, 38(8): 716-719
[33]	Bellman R. Dynamic Programming. Princeton: Princeton University Press, 1957.
[34]	Dreyfus S E, Law A M. The Art and Theory of Dynamic Programming. New York: Academic Press, 1977.
[35]	Werbos P J. Advanced forecasting methods for global crisis warning and models of intelligence. General Systems Yearbook, 1977, 22(12): 25-38
[36]	Wang F-Y, Zhang H G, Liu D R. Adaptive dynamic programming: an introduction. IEEE Computational Intelligence Magazine, 2009, 4(2): 39-47
[37]	Liu D R, Wang D, Wang F-Y, Li H L, Yang X. Neural-network-based online HJB solution for optimal robust guaranteed cost control of continuous-time uncertain nonlinear systems. IEEE Transactions on Cybernetics, 2014, 44(12): 2834-2847
[38]	Liu D, Wang D, Li H. Decentralized stabilization for a class of continuous-time nonlinear interconnected systems using online learning optimal control approach. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25(2): 418-428
[39]	Xu B, Yang C, Shi Z. Reinforcement learning output feedback NN control using deterministic learning technique. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25(3): 635-641
[40]	Werbos P J. Approximate dynamic programming for real-time control and neural modeling. Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches. New York: Van Nostrand Reinhold, 1992. 493-525
[41]	Murray J J, Cox C J, Lendaris G G, Saeks R. Adaptive dynamic programming. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 2002, 32(2): 140-153
[42]	Bertsekas D P. Dynamic Programming and Optimal Control. Massachusetts: Athena Scientific Belmont, 1996.
[43]	Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge: MIT Press, 1998.
[44]	Al-Tamimi A, Lewis F L, Abu-Khalaf M. Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2008, 38(4): 943-949
[45]	Wei Q, Liu D, Lin H. Value iteration adaptive dynamic programming for optimal control of discrete-time nonlinear systems. IEEE Transactions on Cybernetics, 2015, 46(3): 840-853
[46]	Abu-Khalaf M, Lewis F L. Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica, 2005, 41(5): 779-791
[47]	Zhang H G, Wei Q L, Liu D R. An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games. Automatica, 2011, 47(1): 207-214
[48]	Bhasin S, Kamalapurkar R, Johnson M, Vamvoudakis K G, Lewis F L, Dixon W E. A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica, 2013, 49(1): 82-92
[49]	Liu D, Wei Q. Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25(3): 621-634
[50]	王飞跃. 平行控制: 数据驱动的计算控制方法. 自动化学报, 2013, (4): 293-302 Wang Fei-Yue. Parallel control: a method for data-driven and computational control. Acta Automatica Sinica, 2013, 39(4): 293-302
[51]	王飞跃. 平行系统方法与复杂系统的管理和控制. 控制与决策, 2004, 19(5): 485-489 Wang Fei-Yue. Parallel system methods for management and control of complex systems. Control and Decision, 2004, 19(5): 485-489
[52]	王飞跃. 关于复杂系统研究的计算理论与方法. 中国基础科学, 2004, 6(5): 3-10 Wang Fei-Yue. Computational theory and method on complex system. China Basic Science, 2004, 6(5): 3-10
[53]	王飞跃, 史帝夫·兰森. 从人工生命到人工社会---复杂社会系统研究的现状和展望. 复杂系统与复杂性科学, 2004, 1(1): 33-41 Wang Fei-Yue, Lansing J S. From artificial life to artificial societies——new methods for studies of complex social systems. Complex Systems and Complexity Science, 2004, 1(1): 33-41
[54]	王飞跃. 关于复杂系统的建模、分析、控制和管理. 复杂系统与复杂性科学, 2006, 3(2): 26-34 Wang Fei-Yue. On the modeling, analysis, control and management of complex systems. Complex Systems and Complexity Science, 2006, 3(2): 26-34
[55]	王飞跃. 人工社会、计算实验、平行系统---关于复杂社会经济系统计算研究的讨论. 复杂系统与复杂性科学, 2004, 1(4): 25-35 Wang Fei-Yue. Artificial societies, computational experiments, and parallel systems: a discussion on computational theory of complex social-economic systems. Complex Systems and Complexity Science, 2004, 1(4): 25-35
[56]	Wang F-Y. Toward a paradigm shift in social computing: the ACP approach. IEEE Intelligent Systems, 2007, 22(5): 65-67
[57]	Wang F-Y, Carley K M, Zeng D, Mao W. Social computing: from social informatics to social intelligence. IEEE Intelligent Systems, 2007, 22(2): 79-83
[58]	王飞跃. 基于社会计算和平行系统的动态网民群体研究. 上海理工大学学报, 2011, 33(1): 8-17 Wang Fei-Yue. Study on cyber-enabled social movement organizations based on social computing and parallel systems. Journal of University of Shanghai for Science and Technology, 2011, 33(1): 8-17
[59]	Wang F-Y. Parallel control and management for intelligent transportation systems: Concepts, architectures, and applications. IEEE Transactions on Intelligent Transportation Systems, 2010, 11(3): 630-638
[60]	Zhu F, Wen D, Chen S. Computational traffic experiments based on artificial transportation systems: an application of ACP approach. IEEE Transactions on Intelligent Transportation Systems, 2013, 14(1): 189-198
[61]	Wang F-Y, Wong P K. Intelligent systems and technology for integrative and predictive medicine: an ACP approach. ACM Transactions on Intelligent Systems and Technology (TIST), 2013, 4(2): 32
[62]	Duan W, Cao Z D, Wang Y Z, Zhu B, Zeng D, Wang F-Y, Qiu X G, Song H B, Wang Y. An ACP approach to public health emergency management: using a campus outbreak of H1N1 influenza as a case study. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2013, 43(5): 1028-1041
[63]	Silver D, Huang A, Maddison C J, Guez A, Sifre L, Van den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, Dieleman S, Grewe D, Nham J, Kalchbrenner N, Sutskever I, Lillicrap T, Leach M, Kavukcuoglu K, Graepel T, Hassabis D. Mastering the game of Go with deep neural networks and tree search. Nature, 2016, 529(7587): 484-489