基于微分对策理论的非线性控制回顾与展望

谭拂晓; 刘德荣; 关新平; 罗斌

doi:10.3724/SP.J.1004.2014.00001

基于微分对策理论的非线性控制回顾与展望

doi: 10.3724/SP.J.1004.2014.00001

谭拂晓^1,2, ,,
刘德荣³,
关新平⁴,
罗斌¹

1.
安徽大学计算机科学与技术学院合肥 230601;
2.
阜阳师范学院计算机与信息学院阜阳 236037;
3.
中国科学院自动化研究所复杂系统管理与控制国家重点实验室北京 100190;
4.
上海交通大学电子信息与电气工程学院上海 200240

基金项目:

国家自然科学基金（61073116）;安徽省自然科学基金（1208085MF111），中国科学院自动化研究所复杂系统管理与控制国家重点实验室开放基金（20120102），安徽省教育厅自然科学研究项目（KJ2011B123）;安徽省博士后基金，安徽省工业图像处理与分析重点实验室开放基金资助

详细信息

作者简介:
谭拂晓安徽大学计算机科学与技术学院博士后，阜阳师范学院计算机与信息学院副教授. 主要研究方向为多智能体网络系统的协调控制，非线性系统的鲁棒控制，基于增强学习的非线性系统动态优化. 本文通信作者.E-mail：fuxiaotan@gmail.com

通讯作者:
谭拂晓

计量

文章访问数: 3284

HTML全文浏览量: 139

PDF下载量: 2599

被引次数: 0

出版历程

收稿日期: 2013-06-14

修回日期: 2013-09-18

刊出日期: 2014-01-20

Review and Perspective of Nonlinear Systems Control Based on Differential Games

TAN Fu-Xiao^{1,2
, ,},

LIU De-Rong³,

GUAN Xin-Ping⁴,

LUO Bin¹

1.
School of Computer Science and Technology, Anhui University, Hefei 230601;

2.
School of Computer and Information, Fuyang Teachers College, Fuyang 236037;

3.
State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190;

4.
School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240

Funds:
Supported by National Natural Science Foundation of China (61073116), Anhui Provincial Natural Science Foundation of China (1208085MF111), the Open Research Project from State Key Laboratory of Management and Control for Complex Systems (20120102), Natural Science Research Project of the Education Department of Anhui Province (KJ2011B123), Anhui Postdoctoral Foundation, and Open Fund of Key Laboratory of Anhui Industrial Image Processing and Analysis

摘要

HTML全文

图(0) 表(0)

参考文献(85)

相关文章

施引文献

资源附件(0)

访问统计

摘要

摘要: 微分对策是使用微分方程处理双方或多方连续动态冲突、竞争或合作问题的一种数学工具. 它已经广泛应用于生物学、经济学、国际关系、计算机科学和军事战略等诸多领域. 微分对策实质上是一种双方或多方的最优控制问题，它将现代控制理论与对策论相融合，从而比控制理论具有更强的竞争性、对抗性和适用性. 本文根据非线性微分对策理论的控制、均衡及算法阐述了微分对策的理论发展历史，综述了已有结论与算法的本质，总结了现有的研究成果. 最后对基于微分对策理论非线性系统的鲁棒性与最优性进行了展望.

关键词:
微分对策 /

非线性系统 /

均衡 /

HJI方程 /

代价函数

Abstract: Differential game is a mathematical tool for dealing with the problems of continuous dynamic conflict, competition or cooperation with two or more control actions using differential equations. It has been widely employed in biology, economics, international relations, computer science, military strategy and so on. Differential game is essentially an optimal control problem of two or more parties. By integration of modern control theory and game theory, differential game thus has stronger competitiveness, confrontation ability and applicability than control theory. Based on control, equilibrium, and algorithms of nonlinear differential game theory, the paper elaborates on the development history of control, surveys the essence of existing conclusions and algorithms, and summarizes the existing research results. Finally, the perspective of robustness and optimality of nonlinear systems based on differential game are discussed and explored.

Key words:
Differential games /

nonlinear system /

equilibrium /

HJI equation /

cost function

HTML全文

参考文献(85)

[1] Nian Xiao-Hong, Huang Lin. New development on differential game theory and its application. Control and Decision, 2004, 19(2): 128-133(年晓红, 黄琳. 微分对策理论及其应用研究的新进展. 控制与决策, 2004, 19(2): 128-133)

[2] [2] Isaacs R. Differential Games: A Mathematical Theory with Applications to Warfare and Pursuit, Control and Optimization. New York: Dover Publications, 1999

[3] [3] Issacs R. Differential Games: SIAM Series in Applied Mathematics. New York: John Wiley and Sons, 1965

[4] [4] Friedman A. Differential Games: Pure and Applied Mathematics Series. New York: Wiley Interscience, 1971

[5] [5] Friedman A. Differential Games. Rhode Island: American Mathematical Society, 1974

[6] [6] Nash J. Non-cooperative games. Annals of Mathematics, 1951, 54(3): 286-295

[7] [7] Basar T, Olsder G J. Dynamic Noncooperative Game Theory (2nd Edition). New York: SIAM, Society for Industrial and Applied Mathematics, 1999

[8] [8] Basar T, Bernhard P. H-Optimal Control and Related Minimax Design Problems: A Dynamic Game Approach (2nd Edition). Boston: Birkhuser Boston Inc., 2008

[9] Song Chong-Hui, Bian Chun-Yuan, Zhang Xie, Shi Cheng-Long. Numerical optimization method for HJI equations derived from robust receding horizon control schemes and controller design. Scientia Sinica Informationis, 2011, 41(9): 1156-1170(宋崇辉, 边春元, 张勰, 史成龙. 鲁棒后退时域控制中HJI方程的数值解法及控制器设计. 中国科学: 信息科学, 2011, 41(9): 1156-1170)

[10] Isaacs R. Differential games: their scope, nature, and future. Journal of Optimization Theory and Applications, 1969, 3(5): 283-292

[11] Bardi M, Capuzzo-Dolcetta I. Optimal Control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations. Boston: Birkhuser Boston Inc., 1997

[12] Beard R, Saridis G, Wen J. Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation. Automatica, 1997, 33(12): 2159-2177

[13] Sassano M, Astolfi A. Dynamic approximate solutions of the HJ inequality and of the HJB equation for input-affine nonlinear systems. IEEE Transactions on Automatic Control, 2012, 57(10): 2490-2503

[14] Frihauf P, Krstic M, Basar T. Nash equilibrium seeking in noncooperative games. IEEE Transactions on Automatic Control, 2012, 57(5): 1192-1207

[15] Shamma J S, Arslan G. Dynamic fictitious play, dynamic gradient play, and distributed convergence to Nash equilibria. IEEE Transactions on Automatic Control, 2005, 50(3): 312-327

[16] Engwerda J C. LQ Dynamic Optimization and Differential Games. New York: John Wiley and Sons Ltd, 2005

[17] Aliyu M D S. Nonlinear H Control, Hamiltonian Systems and Hamilton-Jacobi Equations. New York: CRC Press, 2011

[18] Vamvoudakis K G, Lewis F L. Online solution of nonlinear two-player zero-sum games using synchronous policy iteration. International Journal of Robust and Nonlinear Control, 2012, 22(13): 1460-1483

[19] Limebeer D J N, Anderson B D O, Hendel B. A Nash game approach to mixed H2/H control. IEEE Transactions on Automatic Control, 1994, 39(1): 69-82

[20] Liu D R, Wei Q L. Finite-approximation-error based optimal control approach for discrete-time nonlinear systems. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2013, 43(2): 779-789

[21] Abu-Khalaf M, Lewis F L, Huang J. Neuro-dynamic programming and zero-sum games for constrained control systems. IEEE Transactions on Neural Networks, 2008, 19(7): 1243-1252

[22] Starr A W, Ho Y C. Nonzero-sum differential games. Journal of Optimization Theory and Applications, 1969, 3(3): 184-206

[23] Pavel L, Fairman F W. Robust stabilization of nonlinear plants: an L2-approach. International Journal of Robust and Nonlinear Control, 1996, 6(7): 691-726

[24] van der Schaft A J. L2-gain analysis of nonlinear systems and nonlinear state feedback H-control. IEEE Transactions on Automatic Control, 1992, 37(6): 770-784

[25] Lu W M, Doyle J C. H control of nonlinear systems: a convex characterization. IEEE Transactions on Automatic Control, 1995, 40(9): 1668-1675

[26] Lin W, Byrnes C I. H -control of discrete-time nonlinear systems. IEEE Transactions on Automatic Control, 1996, 41(4): 494-509

[27] Lin W. Mixed H2/H-control for nonlinear systems. International Journal of Control, 1996, 64(5): 899-922

[28] Chen B S, Chang Y C. Nonlinear mixed H2/H-control for robust tracking of robotic systems. International Journal of Control, 1998, 67(6): 837-857

[29] Isidori A. Feedback control of nonlinear systems. International Journal of Robust and Nonlinear Control, 1992, 2(4): 291-311

[30] Isidori A. H control via measurement feedback for affine nonlinear systems. International Journal of Robust and Nonlinear Control, 1994, 4(4): 553-574

[31] Isidori A, Kang W. H control via measurement feedback for general class of nonlinear systems. IEEE Transactions on Automatic Control, 1995, 40(3): 466-472

[32] Lin W, Byrnes C I. Dissipativity, L2-gain and H-control for discrete-time nonlinear systems. In: Proceedings of the 1994 American Control Conference. Baltimore, Maryland, 1994. 2257-2260

[33] Lin W, Byrnes C I. Discrete-time nonlinear H control with measurement feedback. Automatica, 1996, 31(3): 419-434

[34] Guillard H, Monaco S, Normand-Cyrot D. Approximate solutions to nonlinear discrete-time H-control. IEEE Transactions on Automatic Control, 1995, 40(12): 2143-2148

[35] Guillard H, Monaco S, Normand-Cyrot D. On H-control of discrete-time nonlinear systems. International Journal of Robust and Nonlinear Control, 1996, 6(7): 633-643

[36] James M R, Baras J S. Robust H output-feedback control for nonlinear systems. IEEE Transactions on Automatic Control, 1995, 40(6): 1007-1017

[37] Engwerda J C. The regular convex cooperative linear quadratic control problem. Automatica, 2008, 44(9): 2453-2457

[38] Engwerda J C, Salmah S. Necessary and sufficient conditions for Pareto optimal solutions of cooperative differential games. SIAM Journal on Control and Optimization, 2010, 48(6): 3859-3881

[39] Reddy P V, Engwerda J C. Pareto optimality in infinite horizon linear quadratic differential games. Automatica, 2013, 49(6): 1705-1714

[40] Starr A, Ho Y C. Further properties of nonzero-sum differential games. Journal of Optimization Theory and Applications, 1969, 4(3): 207-219

[41] Engwerda J C, Salmah S. Feedback nash equilibria for linear quadratic descriptor differential games. Automatica, 2012, 48(4): 625-631

[42] von Stackelbe H. The Theory of the Market Economy. Oxford: Oxford University Press, 1952

[43] Cruz J B. Leader-follower strategies for multilevel systems. IEEE Transactions on Automatic Control, 1978, 23(2): 244-255

[44] Cruz J B. Survey of Nash and Stackelberg equilibrium strategies in dynamic games. Annals Economic and Social Measurement, 1975, 4(2): 339-344

[45] Papavassilopoulos G P, Cruz J B. Nonclassical control problems and Stackelberg games. IEEE Transactions on Automatic Control, 1979, 24(2): 155-166

[46] Al-Tamimi A, Lewis F L, Abu-Khalaf M. Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2008, 38(4): 943-949

[47] Barto A G, Sutton R S, Anderson C W. Neuron-like adaptive elements that can solve difficult learning control problems. IEEE Transactions on System, Man, and Cybernetic, Part B, 1983, 13(5): 834-846

[48] Abu-Khalaf M, Lewis F L. Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica, 2005, 41(5): 779-791

[49] Zhang Ping, Fang Yang-Wang, Hui Xiao-Bin, Liu Xin-Ai, Li Liang. Near optimal strategy for nonlinear stochastic differential games based on the technique of statistical linearization. Acta Automatica Sinica, 2013, 39(4): 390-399(张平, 方洋旺, 惠晓滨, 刘新爱, 李亮. 基于统计线性化的随机非线性微分对策逼近最优策略. 自动化学报, 2013, 39(4): 390-399)

[50] Zhao Dong-Bin, Liu De-Rong, Yi Jian-Qiang. An overview on the adaptive dynamic programming based urban city traffic signal optimal control. Acta Automatica Sinica, 2009, 35(6): 677-681(赵冬斌, 刘德荣, 易建强. 基于自适应动态规划的城市交通信号优化控制方法综述. 自动化学报, 2009, 35(6): 677-681)

[51] Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge: MIT Press, 1998

[52] Zhang Hua-Guang, Zhang Xin, Luo Yan-Hong, Yang Jun. An overview of research on adaptive dynamic programming. Acta Automatica Sinica, 2013, 39(4): 303-311 (张化光, 张欣, 罗艳红, 杨珺. 自适应动态规划综述. 自动化学报, 2013, 39(4): 303-311)

[53] Lewis F L, Vrabie D, Vamvoudakis K G. Reinforcement learning and feedback control: using natural decision methods to design optimal adaptive controllers. IEEE Control Systems Magazine, 2012, 32(6): 76-105

[54] Wang F Y, Jin N, Liu D R, Wei Q L. Adaptive dynamic programming for finite-horizon optimal control of discrete-time nonlinear systems with -error bound. IEEE Transactions on Neural Networks, 2011, 22(1): 24-36

[55] Wang F Y, Zhang H G, Liu D R. Adaptive dynamic programming: an introduction. IEEE Computational Intelligence Magazine, 2009, 4(2): 39-47

[56] Vrabie D, Lewis F L. Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems. Neural Networks, 2009, 22(3): 237-246

[57] Vamvoudakis K G, Lewis F L. Online synchronous policy iteration method for optimal control. Recent Advances in Intelligent Control Systems. Berlin: Springer-Verlag, 2009. 357-374

[58] Bhasin S, Kamalapurkar R, Johnson M, Vamvoudakis K G, Lewis F L, Dixon W E. A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica, 2013, 49(1): 82-92

[59] Zhang H G, Luo Y H, Liu D R. Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints. IEEE Transactions on Neural Networks, 2009, 20(9): 1490-1503

[60] Zhang H G, Cui L L, Luo Y H. Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP. IEEE Transactions on Cybernetics, 2013, 43(1): 206-216

[61] Wei Q L, Zhang H G. A new approach to solve a class of continuous-time nonlinear quadratic zero-sum game using ADP. In: Proceedings of the 2008 IEEE International Conference on Networking, Sensing and Control. Sanya, China: IEEE, 2008. 507-512

[62] Zhang H G, Wei Q L, Liu D R. An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games. Automatica, 2010, 47(1): 207-214

[63] Vamvoudakis K G, Lewis F L. Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica, 2010, 46(5): 878-888

[64] Zhang X, Zhang H G, Luo Y H, Dong M. Iteration algorithm for solving the optimal strategies of a class of nonaffine nonlinear quadratic zero-sum games. In: Proceedings of the 2010 Chinese Control and Decision Conference (CDC). Xuzhou, China: IEEE, 2010. 1359-1364

[65] Liu D R, Li H L, Wang D. Neural-network-based zero-sum game for discrete-time nonlinear systems via iterative adaptive dynamic programming algorithm. Neurocomputing, 2013, 110(13): 92-100

[66] Vamvoudakis K G, Lewis F L. Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica, 2011, 47(8): 1556 -1569

[67] Xu Xin, Shen Dong, Gao Yan-Qing, Wang Kai. Learning control of dynamical systems based on Markov decision processes: research frontiers and outlooks. Acta Automatica Sinica, 2012, 38(5): 673-687(徐昕, 沈栋, 高岩青, 王凯. 基于马氏决策过程模型的动态系统学习控制: 研究前沿与展望. 自动化学报, 2012, 38(5): 673-687)

[68] Sharma R, Gopal M. Synergizing reinforcement learning and game theorya new direction for control. Applied Soft Computing, 2010, 10(3): 675-688

[69] Littman M L. Value-function reinforcement learning in markov games. Journal of Cognitive Systems Research, 2001, 2(1): 55-56

[70] Littman M L. Markov games as a framework for multi-agent reinforcement learning. In: Proceedings of the 11th International Conference on Machine Learning. New Brunswick, NJ: Morgan Kaufmann Publishers, 1994. 157-163

[71] Frnay B, Saerens M. QL2, A simple reinforcement learning scheme for two-player zero-sum Markov games. Neurocomputing, 2009, 72(7-9): 1494-1507

[72] Hu J L, Wellman M P. Multiagent reinforcement learning: theoretical framework and an algorithm. In: Proceedings of the 15th International Conference on Machine Learning. New Brunswick, NJ: Morgan Kaufmann Publishers, 1998. 242-250

[73] Dockner E J, Steffen J, van Ngo L, Sorger G. Differential Games in Economics and Management Science. Cambridge: Cambridge University Press, 2001

[74] Weber T A, Kryazhimskiy A V. Optimal Control Theory with Applications in Economics. Cambridge: The MIT Press, 2011

[75] Wang Fei-Yue. Parallel control: a method for data-driven and computational control. Acta Automatica Sinica, 2013, 39(4): 293-302(王飞跃. 平行控制: 数据驱动的计算控制方法. 自动化学报, 2013, 39(4): 293-302)

[76] Anderson R, Moore T. The economics of information security. Science, 2006, 314(5799): 610-613

[77] Nisan N, Roughgarden T, Tardos E, Vazirani V V. Algorithmic Game Theory. Cambridge: Cambridge University Press, 2007

[78] Roughgarden T. Algorithmic game theory. Communications of the ACM, 2010, 53(7): 78-86

[79] Wei Zhi-Qiang, Zhou Wei, Ren Xiang-Jun, Wei Qing, Jia Dong-Ning, Kang Mi-Jun, Yin Bo, Cong Yan-Ping. A strategy-proof trust based decision mechanism for pervasive computing environments. Chinese Journal of Computer, 2012, 35(5): 871-882(魏志强, 周炜, 任相军, 魏青, 贾东宁, 康密军, 殷波, 丛艳平. 普适计算环境中防护策略的信任决策机制研究. 计算机学报, 2004, 35(5): 871-882)

[80] Semsar-Kazerooni E, Khorasani K. Multi-agent team cooperation: a game theory approach. Automatica, 2009, 45(10): 2205-2213

[81] Fax J A, Murray R M. Information flow and cooperative control of vehicle formations. IEEE Transactions on Automatic Control, 2004, 49(9): 1465-1476

[82] Vamvoudakis K G, Lewis F L, Hudas G R. Multi-agent differential graphical games: online adaptive learning solution for synchronization with optimality. Automatica, 2012, 48(8): 1598-1611

[83] Jadbabaie A, Lin J, Morse A S. Coordination of groups of mobile autonomous agents using nearest neighbor rules. IEEE Transactions on Automatic Control, 2003, 48(6): 988 -1001

[84] Chen X, Deng X T. Settling the complexity of two-player Nash equilibrium. In: Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06). Berkeley, USA: IEEE, 2006. 261-272

[85] Daskalakis C. The Complexity of Nash Equilibria. Electrical Engineering and Computer Sciences [Ph.D. dissertation], University of California at Berkeley, USA, 2008

相关文章

施引文献

资源附件(0)

访问统计

计量

文章访问数: 3284

HTML全文浏览量: 139

PDF下载量: 2599

被引次数: 0

出版历程

收稿日期: 2013-06-14

修回日期: 2013-09-18

刊出日期: 2014-01-20

目录

姓名
邮箱
手机号码
标题
留言内容
验证码

留言板

基于微分对策理论的非线性控制回顾与展望

doi: 10.3724/SP.J.1004.2014.00001

通讯作者:
谭拂晓

计量

Review and Perspective of Nonlinear Systems Control Based on Differential Games

计量

目录

[1]	Nian Xiao-Hong, Huang Lin. New development on differential game theory and its application. Control and Decision, 2004, 19(2): 128-133(年晓红, 黄琳. 微分对策理论及其应用研究的新进展. 控制与决策, 2004, 19(2): 128-133)
[2]	[2] Isaacs R. Differential Games: A Mathematical Theory with Applications to Warfare and Pursuit, Control and Optimization. New York: Dover Publications, 1999
[3]	[3] Issacs R. Differential Games: SIAM Series in Applied Mathematics. New York: John Wiley and Sons, 1965
[4]	[4] Friedman A. Differential Games: Pure and Applied Mathematics Series. New York: Wiley Interscience, 1971
[5]	[5] Friedman A. Differential Games. Rhode Island: American Mathematical Society, 1974
[6]	[6] Nash J. Non-cooperative games. Annals of Mathematics, 1951, 54(3): 286-295
[7]	[7] Basar T, Olsder G J. Dynamic Noncooperative Game Theory (2nd Edition). New York: SIAM, Society for Industrial and Applied Mathematics, 1999
[8]	[8] Basar T, Bernhard P. H-Optimal Control and Related Minimax Design Problems: A Dynamic Game Approach (2nd Edition). Boston: Birkhuser Boston Inc., 2008
[9]	Song Chong-Hui, Bian Chun-Yuan, Zhang Xie, Shi Cheng-Long. Numerical optimization method for HJI equations derived from robust receding horizon control schemes and controller design. Scientia Sinica Informationis, 2011, 41(9): 1156-1170(宋崇辉, 边春元, 张勰, 史成龙. 鲁棒后退时域控制中HJI方程的数值解法及控制器设计. 中国科学: 信息科学, 2011, 41(9): 1156-1170)
[10]	Isaacs R. Differential games: their scope, nature, and future. Journal of Optimization Theory and Applications, 1969, 3(5): 283-292
[11]	Bardi M, Capuzzo-Dolcetta I. Optimal Control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations. Boston: Birkhuser Boston Inc., 1997
[12]	Beard R, Saridis G, Wen J. Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation. Automatica, 1997, 33(12): 2159-2177
[13]	Sassano M, Astolfi A. Dynamic approximate solutions of the HJ inequality and of the HJB equation for input-affine nonlinear systems. IEEE Transactions on Automatic Control, 2012, 57(10): 2490-2503
[14]	Frihauf P, Krstic M, Basar T. Nash equilibrium seeking in noncooperative games. IEEE Transactions on Automatic Control, 2012, 57(5): 1192-1207
[15]	Shamma J S, Arslan G. Dynamic fictitious play, dynamic gradient play, and distributed convergence to Nash equilibria. IEEE Transactions on Automatic Control, 2005, 50(3): 312-327
[16]	Engwerda J C. LQ Dynamic Optimization and Differential Games. New York: John Wiley and Sons Ltd, 2005
[17]	Aliyu M D S. Nonlinear H Control, Hamiltonian Systems and Hamilton-Jacobi Equations. New York: CRC Press, 2011
[18]	Vamvoudakis K G, Lewis F L. Online solution of nonlinear two-player zero-sum games using synchronous policy iteration. International Journal of Robust and Nonlinear Control, 2012, 22(13): 1460-1483
[19]	Limebeer D J N, Anderson B D O, Hendel B. A Nash game approach to mixed H2/H control. IEEE Transactions on Automatic Control, 1994, 39(1): 69-82
[20]	Liu D R, Wei Q L. Finite-approximation-error based optimal control approach for discrete-time nonlinear systems. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2013, 43(2): 779-789
[21]	Abu-Khalaf M, Lewis F L, Huang J. Neuro-dynamic programming and zero-sum games for constrained control systems. IEEE Transactions on Neural Networks, 2008, 19(7): 1243-1252
[22]	Starr A W, Ho Y C. Nonzero-sum differential games. Journal of Optimization Theory and Applications, 1969, 3(3): 184-206
[23]	Pavel L, Fairman F W. Robust stabilization of nonlinear plants: an L2-approach. International Journal of Robust and Nonlinear Control, 1996, 6(7): 691-726
[24]	van der Schaft A J. L2-gain analysis of nonlinear systems and nonlinear state feedback H-control. IEEE Transactions on Automatic Control, 1992, 37(6): 770-784
[25]	Lu W M, Doyle J C. H control of nonlinear systems: a convex characterization. IEEE Transactions on Automatic Control, 1995, 40(9): 1668-1675
[26]	Lin W, Byrnes C I. H -control of discrete-time nonlinear systems. IEEE Transactions on Automatic Control, 1996, 41(4): 494-509
[27]	Lin W. Mixed H2/H-control for nonlinear systems. International Journal of Control, 1996, 64(5): 899-922
[28]	Chen B S, Chang Y C. Nonlinear mixed H2/H-control for robust tracking of robotic systems. International Journal of Control, 1998, 67(6): 837-857
[29]	Isidori A. Feedback control of nonlinear systems. International Journal of Robust and Nonlinear Control, 1992, 2(4): 291-311
[30]	Isidori A. H control via measurement feedback for affine nonlinear systems. International Journal of Robust and Nonlinear Control, 1994, 4(4): 553-574
[31]	Isidori A, Kang W. H control via measurement feedback for general class of nonlinear systems. IEEE Transactions on Automatic Control, 1995, 40(3): 466-472
[32]	Lin W, Byrnes C I. Dissipativity, L2-gain and H-control for discrete-time nonlinear systems. In: Proceedings of the 1994 American Control Conference. Baltimore, Maryland, 1994. 2257-2260
[33]	Lin W, Byrnes C I. Discrete-time nonlinear H control with measurement feedback. Automatica, 1996, 31(3): 419-434
[34]	Guillard H, Monaco S, Normand-Cyrot D. Approximate solutions to nonlinear discrete-time H-control. IEEE Transactions on Automatic Control, 1995, 40(12): 2143-2148
[35]	Guillard H, Monaco S, Normand-Cyrot D. On H-control of discrete-time nonlinear systems. International Journal of Robust and Nonlinear Control, 1996, 6(7): 633-643
[36]	James M R, Baras J S. Robust H output-feedback control for nonlinear systems. IEEE Transactions on Automatic Control, 1995, 40(6): 1007-1017
[37]	Engwerda J C. The regular convex cooperative linear quadratic control problem. Automatica, 2008, 44(9): 2453-2457
[38]	Engwerda J C, Salmah S. Necessary and sufficient conditions for Pareto optimal solutions of cooperative differential games. SIAM Journal on Control and Optimization, 2010, 48(6): 3859-3881
[39]	Reddy P V, Engwerda J C. Pareto optimality in infinite horizon linear quadratic differential games. Automatica, 2013, 49(6): 1705-1714
[40]	Starr A, Ho Y C. Further properties of nonzero-sum differential games. Journal of Optimization Theory and Applications, 1969, 4(3): 207-219
[41]	Engwerda J C, Salmah S. Feedback nash equilibria for linear quadratic descriptor differential games. Automatica, 2012, 48(4): 625-631
[42]	von Stackelbe H. The Theory of the Market Economy. Oxford: Oxford University Press, 1952
[43]	Cruz J B. Leader-follower strategies for multilevel systems. IEEE Transactions on Automatic Control, 1978, 23(2): 244-255
[44]	Cruz J B. Survey of Nash and Stackelberg equilibrium strategies in dynamic games. Annals Economic and Social Measurement, 1975, 4(2): 339-344
[45]	Papavassilopoulos G P, Cruz J B. Nonclassical control problems and Stackelberg games. IEEE Transactions on Automatic Control, 1979, 24(2): 155-166
[46]	Al-Tamimi A, Lewis F L, Abu-Khalaf M. Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2008, 38(4): 943-949
[47]	Barto A G, Sutton R S, Anderson C W. Neuron-like adaptive elements that can solve difficult learning control problems. IEEE Transactions on System, Man, and Cybernetic, Part B, 1983, 13(5): 834-846
[48]	Abu-Khalaf M, Lewis F L. Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica, 2005, 41(5): 779-791
[49]	Zhang Ping, Fang Yang-Wang, Hui Xiao-Bin, Liu Xin-Ai, Li Liang. Near optimal strategy for nonlinear stochastic differential games based on the technique of statistical linearization. Acta Automatica Sinica, 2013, 39(4): 390-399(张平, 方洋旺, 惠晓滨, 刘新爱, 李亮. 基于统计线性化的随机非线性微分对策逼近最优策略. 自动化学报, 2013, 39(4): 390-399)
[50]	Zhao Dong-Bin, Liu De-Rong, Yi Jian-Qiang. An overview on the adaptive dynamic programming based urban city traffic signal optimal control. Acta Automatica Sinica, 2009, 35(6): 677-681(赵冬斌, 刘德荣, 易建强. 基于自适应动态规划的城市交通信号优化控制方法综述. 自动化学报, 2009, 35(6): 677-681)
[51]	Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge: MIT Press, 1998
[52]	Zhang Hua-Guang, Zhang Xin, Luo Yan-Hong, Yang Jun. An overview of research on adaptive dynamic programming. Acta Automatica Sinica, 2013, 39(4): 303-311 (张化光, 张欣, 罗艳红, 杨珺. 自适应动态规划综述. 自动化学报, 2013, 39(4): 303-311)
[53]	Lewis F L, Vrabie D, Vamvoudakis K G. Reinforcement learning and feedback control: using natural decision methods to design optimal adaptive controllers. IEEE Control Systems Magazine, 2012, 32(6): 76-105
[54]	Wang F Y, Jin N, Liu D R, Wei Q L. Adaptive dynamic programming for finite-horizon optimal control of discrete-time nonlinear systems with -error bound. IEEE Transactions on Neural Networks, 2011, 22(1): 24-36
[55]	Wang F Y, Zhang H G, Liu D R. Adaptive dynamic programming: an introduction. IEEE Computational Intelligence Magazine, 2009, 4(2): 39-47
[56]	Vrabie D, Lewis F L. Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems. Neural Networks, 2009, 22(3): 237-246
[57]	Vamvoudakis K G, Lewis F L. Online synchronous policy iteration method for optimal control. Recent Advances in Intelligent Control Systems. Berlin: Springer-Verlag, 2009. 357-374
[58]	Bhasin S, Kamalapurkar R, Johnson M, Vamvoudakis K G, Lewis F L, Dixon W E. A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica, 2013, 49(1): 82-92
[59]	Zhang H G, Luo Y H, Liu D R. Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints. IEEE Transactions on Neural Networks, 2009, 20(9): 1490-1503
[60]	Zhang H G, Cui L L, Luo Y H. Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP. IEEE Transactions on Cybernetics, 2013, 43(1): 206-216
[61]	Wei Q L, Zhang H G. A new approach to solve a class of continuous-time nonlinear quadratic zero-sum game using ADP. In: Proceedings of the 2008 IEEE International Conference on Networking, Sensing and Control. Sanya, China: IEEE, 2008. 507-512
[62]	Zhang H G, Wei Q L, Liu D R. An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games. Automatica, 2010, 47(1): 207-214
[63]	Vamvoudakis K G, Lewis F L. Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica, 2010, 46(5): 878-888
[64]	Zhang X, Zhang H G, Luo Y H, Dong M. Iteration algorithm for solving the optimal strategies of a class of nonaffine nonlinear quadratic zero-sum games. In: Proceedings of the 2010 Chinese Control and Decision Conference (CDC). Xuzhou, China: IEEE, 2010. 1359-1364
[65]	Liu D R, Li H L, Wang D. Neural-network-based zero-sum game for discrete-time nonlinear systems via iterative adaptive dynamic programming algorithm. Neurocomputing, 2013, 110(13): 92-100
[66]	Vamvoudakis K G, Lewis F L. Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica, 2011, 47(8): 1556 -1569
[67]	Xu Xin, Shen Dong, Gao Yan-Qing, Wang Kai. Learning control of dynamical systems based on Markov decision processes: research frontiers and outlooks. Acta Automatica Sinica, 2012, 38(5): 673-687(徐昕, 沈栋, 高岩青, 王凯. 基于马氏决策过程模型的动态系统学习控制: 研究前沿与展望. 自动化学报, 2012, 38(5): 673-687)
[68]	Sharma R, Gopal M. Synergizing reinforcement learning and game theorya new direction for control. Applied Soft Computing, 2010, 10(3): 675-688
[69]	Littman M L. Value-function reinforcement learning in markov games. Journal of Cognitive Systems Research, 2001, 2(1): 55-56
[70]	Littman M L. Markov games as a framework for multi-agent reinforcement learning. In: Proceedings of the 11th International Conference on Machine Learning. New Brunswick, NJ: Morgan Kaufmann Publishers, 1994. 157-163
[71]	Frnay B, Saerens M. QL2, A simple reinforcement learning scheme for two-player zero-sum Markov games. Neurocomputing, 2009, 72(7-9): 1494-1507
[72]	Hu J L, Wellman M P. Multiagent reinforcement learning: theoretical framework and an algorithm. In: Proceedings of the 15th International Conference on Machine Learning. New Brunswick, NJ: Morgan Kaufmann Publishers, 1998. 242-250
[73]	Dockner E J, Steffen J, van Ngo L, Sorger G. Differential Games in Economics and Management Science. Cambridge: Cambridge University Press, 2001
[74]	Weber T A, Kryazhimskiy A V. Optimal Control Theory with Applications in Economics. Cambridge: The MIT Press, 2011
[75]	Wang Fei-Yue. Parallel control: a method for data-driven and computational control. Acta Automatica Sinica, 2013, 39(4): 293-302(王飞跃. 平行控制: 数据驱动的计算控制方法. 自动化学报, 2013, 39(4): 293-302)
[76]	Anderson R, Moore T. The economics of information security. Science, 2006, 314(5799): 610-613
[77]	Nisan N, Roughgarden T, Tardos E, Vazirani V V. Algorithmic Game Theory. Cambridge: Cambridge University Press, 2007
[78]	Roughgarden T. Algorithmic game theory. Communications of the ACM, 2010, 53(7): 78-86
[79]	Wei Zhi-Qiang, Zhou Wei, Ren Xiang-Jun, Wei Qing, Jia Dong-Ning, Kang Mi-Jun, Yin Bo, Cong Yan-Ping. A strategy-proof trust based decision mechanism for pervasive computing environments. Chinese Journal of Computer, 2012, 35(5): 871-882(魏志强, 周炜, 任相军, 魏青, 贾东宁, 康密军, 殷波, 丛艳平. 普适计算环境中防护策略的信任决策机制研究. 计算机学报, 2004, 35(5): 871-882)
[80]	Semsar-Kazerooni E, Khorasani K. Multi-agent team cooperation: a game theory approach. Automatica, 2009, 45(10): 2205-2213
[81]	Fax J A, Murray R M. Information flow and cooperative control of vehicle formations. IEEE Transactions on Automatic Control, 2004, 49(9): 1465-1476
[82]	Vamvoudakis K G, Lewis F L, Hudas G R. Multi-agent differential graphical games: online adaptive learning solution for synchronization with optimality. Automatica, 2012, 48(8): 1598-1611
[83]	Jadbabaie A, Lin J, Morse A S. Coordination of groups of mobile autonomous agents using nearest neighbor rules. IEEE Transactions on Automatic Control, 2003, 48(6): 988 -1001
[84]	Chen X, Deng X T. Settling the complexity of two-player Nash equilibrium. In: Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06). Berkeley, USA: IEEE, 2006. 261-272
[85]	Daskalakis C. The Complexity of Nash Equilibria. Electrical Engineering and Computer Sciences [Ph.D. dissertation], University of California at Berkeley, USA, 2008

留言板

基于微分对策理论的非线性控制回顾与展望

doi: 10.3724/SP.J.1004.2014.00001

通讯作者: 谭拂晓

计量

出版历程

Review and Perspective of Nonlinear Systems Control Based on Differential Games

计量

出版历程

目录

通讯作者:
谭拂晓