| [1] | Bellman R E. Dynamic Programming. Princeton, NJ: Princeton University Press, 1957. | 
		
				| [2] | Werbos P J. Approximate dynamic programming for real-time control and neural modeling. Handbook of Intelligent Control. New York: Van Nostrand Reinhold, 1992. | 
		
				| [3] | Lewis F L, Vrabie D, Vamvoudakis K G. Reinforcement learning and feedback control: using natural decision methods to design optimal adaptive controllers. IEEE Control Systems, 2012, 32(6): 76-105 doi:  10.1109/MCS.2012.2214134 | 
		
				| [4] | 张化光, 张欣, 罗艳红, 杨珺.自适应动态规划综述.自动化学报, 2013, 39(4): 303-311 doi:  10.1016/S1874-1029(13)60031-2Zhang Hua-Guang, Zhang Xin, Luo Yan-Hong, Yang Jun. An overview of research on adaptive dynamic programming. Acta Automatica Sinica, 2013, 39(4): 303-311 doi:  10.1016/S1874-1029(13)60031-2 | 
		
				| [5] | 刘德荣, 李宏亮, 王鼎.基于数据的自学习优化控制:研究进展与展望.自动化学报, 2013, 39(11): 1858-1870 doi:  10.3724/SP.J.1004.2013.01858Liu De-Rong, Li Hong-Liang, Wang Ding. Data-based self-learning optimal control: research progress and prospects. Acta Automatica Sinica, 2013, 39(11): 1858-1870 doi:  10.3724/SP.J.1004.2013.01858 | 
		
				| [6] | Hou Z S, Wang Z. From model-based control to data-driven control: survey, classification and perspective. Information Sciences, 2013, 235: 3-35 doi:  10.1016/j.ins.2012.07.014 | 
		
				| [7] | Prokhorov D V, Wunsch D C. Adaptive critic designs. IEEE Transactions on Neural Networks, 1997, 8(5): 997-1007 doi:  10.1109/72.623201 | 
		
				| [8] | Sutton R S, Barto A G. Reinforcement Learning——An Introduction. Cambridge, MA: MIT Press, 1998. | 
		
				| [9] | Si J, Wang Y T. Online learning control by association and reinforcement. IEEE Transactions on Neural Networks, 2001, 12(2): 264-276 doi:  10.1109/72.914523 | 
		
				| [10] | 王飞跃.平行控制:数据驱动的计算控制方法.自动化学报, 2013, 39(4): 293-302 http://www.aas.net.cn/CN/abstract/abstract17915.shtmlWang Fei-Yue. Parallel control: a method for data-driven and computational control. Acta Automatica Sinica, 2013, 39(4): 293-302 http://www.aas.net.cn/CN/abstract/abstract17915.shtml | 
		
				| [11] | Al-Tamimi A, Lewis F L, Abu-Khalaf M. Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Transactions on Systems, Man, Cybernetics, Part B, Cybernetics, 2008, 38(4): 943-949 doi:  10.1109/TSMCB.2008.926614 | 
		
				| [12] | Zhang H G, Luo Y H, Liu D R. Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints. IEEE Transactions on Neural Networks, 2009, 20(9): 1490-1503 doi:  10.1109/TNN.2009.2027233 | 
		
				| [13] | Dierks T, Thumati B T, Jagannathan S. Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence. Neural Networks, 2009, 22(5-6): 851-860 doi:  10.1016/j.neunet.2009.06.014 | 
		
				| [14] | Wang F Y, Jin N, Liu D R, Wei Q L. Adaptive dynamic programming for finite-horizon optimal control of discrete-time nonlinear systems with ε-error bound. IEEE Transactions on Neural Networks, 2011, 22(1): 24-36 doi:  10.1109/TNN.2010.2076370 | 
		
				| [15] | Liu D R, Wang D, Zhao D B, Wei Q L, Jin N. Neural-network-based optimal control for a class of unknown discrete-time nonlinear systems using globalized dual heuristic programming. IEEE Transactions on Automation Science and Engineering, 2012, 9(3): 628-634 doi:  10.1109/TASE.2012.2198057 | 
		
				| [16] | Wang D, Liu D R, Wei Q L, Zhao D B, Jin N. Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming. Automatica, 2012, 48(8): 1825-1832 doi:  10.1016/j.automatica.2012.05.049 | 
		
				| [17] | Zhang H G, Qin C B, Luo Y H. Neural-network-based constrained optimal control scheme for discrete-time switched nonlinear system using dual heuristic programming. IEEE Transactions on Automation Science and Engineering, 2014, 11(3): 839-849 doi:  10.1109/TASE.2014.2303139 | 
		
				| [18] | Liu D R, Li H L, Wang D. Error bounds of adaptive dynamic programming algorithms for solving undiscounted optimal control problems. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(6): 1323-1334 doi:  10.1109/TNNLS.2015.2402203 | 
		
				| [19] | Zhong X N, Ni Z, He H B. A theoretical foundation of goal representation heuristic dynamic programming. IEEE Transactions on Neural Networks and Learning Systems, 2016, 27(12): 2513-2525 doi:  10.1109/TNNLS.2015.2490698 | 
		
				| [20] | Heydari A, Balakrishnan S N. Finite-horizon control-constrained nonlinear optimal control using single network adaptive critics. IEEE Transactions on Neural Networks and Learning Systems, 2013, 24(1): 145-157 doi:  10.1109/TNNLS.2012.2227339 | 
		
				| [21] | Jiang Y, Jiang Z P. Robust adaptive dynamic programming and feedback stabilization of nonlinear systems. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25(5): 882-893 doi:  10.1109/TNNLS.2013.2294968 | 
		
				| [22] | Na J, Herrmann G. Online adaptive approximate optimal tracking control with simplified dual approximation structure for continuous-time unknown nonlinear systems. IEEE/CAA Journal of Automatica Sinica, 2014, 1(4): 412-422 doi:  10.1109/JAS.2014.7004668 | 
		
				| [23] | Liu D R, Yang X, Wang D, Wei Q L. Reinforcement-learning-based robust controller design for continuous-time uncertain nonlinear systems subject to input constraints. IEEE Transactions on Cybernetics, 2015, 45(7): 1372-1385 doi:  10.1109/TCYB.2015.2417170 | 
		
				| [24] | Luo B, Wu H N, Huang T W. Off-policy reinforcement learning for H∞ control design. IEEE Transactions on Cybernetics, 2015, 45(1): 65-76 doi:  10.1109/TCYB.2014.2319577 | 
		
				| [25] | Mu C X, Ni Z, Sun C Y, He H B. Air-breathing hypersonic vehicle tracking control based on adaptive dynamic programming. IEEE Transactions on Neural Networks and Learning Systems, 2017, 28(3): 584-598 doi:  10.1109/TNNLS.2016.2516948 | 
		
				| [26] | Wang D, Liu D R, Zhang Q C, Zhao D B. Data-based adaptive critic designs for nonlinear robust optimal control with uncertain dynamics. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2016, 46(11): 1544-1555 doi:  10.1109/TSMC.2015.2492941 |