| 
	                    [1]
	                 | 
				
					Rubenstein M, Cornejo A, Nagpal R. Programmable self-assembly in a thousand-robot swarm. Science, 2014, 345(6198): 795−799 doi:  10.1126/science.1254295
					 | 
			
		
				| 
	                    [2]
	                 | 
				
					Wang Y D, He H B, Sun C Y. Learning to navigate through complex dynamic environment with modular deep reinforcement learning. IEEE Transactions on Games, 2018, 10(4): 400−412 doi:  10.1109/TG.2018.2849942
					 | 
			
		
				| 
	                    [3]
	                 | 
				
					郑南宁. 人工智能面临的挑战. 自动化学报, 2016, 42(5): 641−642Zheng Nan-Ning. On challenges in artificial intelligence. Acta Automatica Sinica, 2016, 42(5): 641−642
					 | 
			
		
				| 
	                    [4]
	                 | 
				
					Nguyen T T, Nguyen N D, Nahavandi S. Deep reinforcement learning for multiagent systems: a review of challenges, solutions, and applications. IEEE Transactions on Cybernetics, 2020 doi:  10.1109/TCYB.2020.2977374
					 | 
			
		
				| 
	                    [5]
	                 | 
				
					赵冬斌, 邵坤, 朱圆恒, 李栋, 陈亚冉, 王海涛, 等. 深度强化学习综述: 兼论计算机围棋的发展. 控制理论与应用, 2016, 33(6): 701−717 doi:  10.7641/CTA.2016.60173Zhao Dong-Bin, Shao Kun, Zhu Yuan-Heng, Li Dong, Chen Ya-Ran, Wang Hai-Tao, et al. Review of deep reinforcement learning and discussions on the development of computer Go. Control Theory & Applications, 2016, 33(6): 701−717 doi:  10.7641/CTA.2016.60173
					 | 
			
		
				| 
	                    [6]
	                 | 
				
					周志华. AlphaGo专题介绍. 自动化学报, 2016, 42(5): 670Zhou Zhi-Hua. AlphaGo special session: an introduction. Acta Automatica Sinica, 2016, 42(5): 670
					 | 
			
		
				| 
	                    [7]
	                 | 
				
					Silver D, Huang A, Maddison C J, Guez A, Sifre L, van den Driessche G, et al. Mastering the game of go with deep neural networks and tree search. Nature, 2016, 529(7587): 484−489 doi:  10.1038/nature16961
					 | 
			
		
				| 
	                    [8]
	                 | 
				
					Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, et al. Mastering the game of go without human knowledge. Nature, 2017, 550(7676): 354−359 doi:  10.1038/nature24270
					 | 
			
		
				| 
	                    [9]
	                 | 
				
					Berner C, Brockman G, Chan B, Cheung V, Dębiak P, Denniso C, et al. Dota 2 with large scale deep reinforcement learning. arXiv: 1912.06680, 2019.
					 | 
			
		
				| 
	                    [10]
	                 | 
				
					Hung S M, Givigi S N. A Q-learning approach to flocking with UAVs in a stochastic environment. IEEE Transactions on Cybernetics, 2017, 47(1): 186−197 doi:  10.1109/TCYB.2015.2509646
					 | 
			
		
				| 
	                    [11]
	                 | 
				
					Schwab D, Zhu Y F, Veloso M. Zero shot transfer learning for robot soccer. In: Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2018). Stockholm, Sweden: ACM, 2018. 2070−2072
					 | 
			
		
				| 
	                    [12]
	                 | 
				
					王云鹏, 郭戈. 基于深度强化学习的有轨电车信号优先控制. 自动化学报, 2019, 45(12): 2366−2377Wang Yun-Peng, Guo Ge. Signal priority control for trams using deep reinforcement learning. Acta Automatica Sinica, 2019, 45(12): 2366−2377
					 | 
			
		
				| 
	                    [13]
	                 | 
				
					Rahman M S, Mahmud M A, Pota H R, Hossain M J, Orchi T F. Distributed multi-agent-based protection scheme for transient stability enhancement in power systems. International Journal of Emerging Electric Power Systems, 2015, 16(2): 117−129 doi:  10.1515/ijeeps-2014-0143
					 | 
			
		
				| 
	                    [14]
	                 | 
				
					He J, Peng J, Jiang F, Qin G R, Liu W R. A distributed Q learning spectrum decision scheme for cognitive radio sensor network. International Journal of Distributed Sensor Networks, 2015, 2015: 7
					 | 
			
		
				| 
	                    [15]
	                 | 
				
					Leibo J Z, Zambaldi V, Lanctot M, Marecki J, Graepel T. Multi-agent reinforcement learning in sequential social dilemmas. In: Proceedings of the 16th Conference on Autonomous Agents and Multiagent Systems. Sao Paulo, Brazil: ACM, 2017. 464−473
					 | 
			
		
				| 
	                    [16]
	                 | 
				
					吴国政. 从F03项目资助情况分析我国自动化学科的发展现状与趋势. 自动化学报, 2019, 45(9): 1611−1619Wu Guo-Zheng. Analysis of the status and trend of the development of China's automation discipline from F03 funding of NSFC. Acta Automatica Sinica, 2019, 45(9): 1611−1619
					 | 
			
		
				| 
	                    [17]
	                 | 
				
					Hernandez-Leal P, Kartal B, Taylor M E. A survey and critique of multiagent deep reinforcement learning. Autonomous Agents and Multi-Agent Systems, 2019, 33(6): 750−797 doi:  10.1007/s10458-019-09421-1
					 | 
			
		
				| 
	                    [18]
	                 | 
				
					Mu C X, Ni Z, Sun C Y, He H B. Air-breathing hypersonic vehicle tracking control based on adaptive dynamic programming. IEEE Transactions on Neural Networks and Learning Systems, 2017, 28(3): 584−598 doi:  10.1109/TNNLS.2016.2516948
					 | 
			
		
				| 
	                    [19]
	                 | 
				
					Mu C, Zhao Q, Sun C, Gao Z. A novel Q-learning algorithm for optimal tracking control of linear discrete-time systems with unknown dynamics. Applied Soft Computing, 2019, 82: 1−13
					 | 
			
		
				| 
	                    [20]
	                 | 
				
					Wang Y D, Sun J, He H B, Sun C Y. Deterministic policy gradient with integral compensator for robust quadrotor control. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2019 doi:  10.1109/TSMC.2018.2884725
					 | 
			
		
				| 
	                    [21]
	                 | 
				
					Sutton R S, McAllester D, Singh S, Mansour Y. Policy gradient methods for reinforcement learning with function approximation. In: Proceedings of the 12th International Conference on Neural Information Processing Systems. Denver, USA: MIT Press, 1999. 1057−1063
					 | 
			
		
				| 
	                    [22]
	                 | 
				
					Silver D, Lever G, Heess N, Degris T, Wierstra D, Riedmiller M. Deterministic policy gradient algorithms. In: Proceedings of the 31st International Conference on Machine Learning. Beijing, China: ACM, 2014. 387−395
					 | 
			
		
				| 
	                    [23]
	                 | 
				
					Wei Q L, Wang L X, Liu Y, Polycarpou M M. Optimal elevator group control via deep asynchronous actor-critic learning. IEEE Transactions on Neural Networks and Learning Systems, 2020 doi:  10.1109/TNNLS.2020.2965208
					 | 
			
		
				| 
	                    [24]
	                 | 
				
					Dong L, Zhong X N, Sun C Y, He H B. Adaptive event-triggered control based on heuristic dynamic programming for nonlinear discrete-time systems. IEEE Transactions on Neural Networks and Learning Systems, 2017, 28(7): 1594−1605 doi:  10.1109/TNNLS.2016.2541020
					 | 
			
		
				| 
	                    [25]
	                 | 
				
					Arulkumaran K, Deisenroth M P, Brundage M, Bharath A A. Deep reinforcement learning: a brief survey. IEEE Signal Processing Magazine, 2017, 34(6): 26−38 doi:  10.1109/MSP.2017.2743240
					 | 
			
		
				| 
	                    [26]
	                 | 
				
					Li Y X. Deep reinforcement learning: an overview. arXiv: 1701.07274, 2017.
					 | 
			
		
				| 
	                    [27]
	                 | 
				
					Nguyen N D, Nguyen T, Nahavandi S. System design perspective for human-level agents using deep reinforcement learning: a survey. IEEE Access, 2017, 5: 27091−27102 doi:  10.1109/ACCESS.2017.2777827
					 | 
			
		
				| 
	                    [28]
	                 | 
				
					Nguyen T T. A multi-objective deep reinforcement learning framework. arXiv: 1803.02965, 2018.
					 | 
			
		
				| 
	                    [29]
	                 | 
				
					Tsitsiklis J N, van Roy B. Analysis of temporal-difference learning with function approximation. In: Proceedings of the 9th International Conference on Neural Information Processing Systems. Denver, USA: MIT Press, 1996. 1075−1081
					 | 
			
		
				| 
	                    [30]
	                 | 
				
					Van Hasselt H. Double Q-learning. In: Proceedings of the 23rd International Conference on Neural Information Processing Systems. Vancouver, Canada: MIT Press, 2010. 2613−2621
					 | 
			
		
				| 
	                    [31]
	                 | 
				
					Van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double Q-learning. arXiv: 1509.06461, 2015.
					 | 
			
		
				| 
	                    [32]
	                 | 
				
					Schaul T, Quan J, Antonoglou I, Silver D. Prioritized experience replay. arXiv: 1511.05952, 2015.
					 | 
			
		
				| 
	                    [33]
	                 | 
				
					Wang Z Y, Schaul T, Hessel M, van Hasselt H, Lanctot M, de Freitas N. Dueling network architectures for deep reinforcement learning. In: Proceedings of the 33rd International Conference on Machine Learning. New York, USA: ACM, 2016. 1995−2003
					 | 
			
		
				| 
	                    [34]
	                 | 
				
					Hausknecht H, Stone P. Deep recurrent Q-learning for partially observable MDPs. arXiv: 1507.06527, 2017.
					 | 
			
		
				| 
	                    [35]
	                 | 
				
					Lample G, Chaplot D S. Playing FPS games with deep reinforcement learning. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence. San Francisco, USA: AIAA, 2017.
					 | 
			
		
				| 
	                    [36]
	                 | 
				
					Sorokin I, Seleznev A, Pavlov M, Fedorov A, Ignateva A. Deep attention recurrent Q-network. arXiv: 1512.01693, 2015.
					 | 
			
		
				| 
	                    [37]
	                 | 
				
					Lillicrap T P, Hunt J J, Pritzel A, Heess N, Erez T, Tassa Y, et al. Continuous control with deep reinforcement learning. arXiv: 1509.02971, 2015.
					 | 
			
		
				| 
	                    [38]
	                 | 
				
					Mnih V, Badia A P, Mirza M, Graves A, Harley T, Lillicrap T P, et al. Asynchronous methods for deep reinforcement learning. In: Proceedings of the 33rd International Conference on Machine Learning. New York, USA: ACM, 2016. 1928−1937
					 | 
			
		
				| 
	                    [39]
	                 | 
				
					Haarnoja T, Zhou A, Abbeel P, Levine S. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv: 1801.01290, 2018.
					 | 
			
		
				| 
	                    [40]
	                 | 
				
					Schulman J, Levine S, Abbeel P, Jordan M I, Moritz P. Trust region policy optimization. In: Proceedings of the 32nd International Conference on Machine Learning. Lille, France: ACM, 2015. 1889−1897
					 | 
			
		
				| 
	                    [41]
	                 | 
				
					Jadid O A, Hajinezhad D. A review of cooperative multi-agent deep reinforcement learning. arXiv: 1908.03963, 2019.
					 | 
			
		
				| 
	                    [42]
	                 | 
				
					Tan M. Multi-agent reinforcement learning: independent vs. cooperative agents. In: Proceedings of the 10th International Conference on Machine Learning. Amherst, USA: ACM, 1993. 330−337
					 | 
			
		
				| 
	                    [43]
	                 | 
				
					Matignon L, Laurent G J, Le Fort-Piat N. Independent reinforcement learners in cooperative markov games: a survey regarding coordination problems. The Knowledge Engineering Review, 2012, 27(1): 1−31 doi:  10.1017/S0269888912000057
					 | 
			
		
				| 
	                    [44]
	                 | 
				
					Tampuu A, Matiisen T, Kodelja D, Kuzovkin I, Korjus K, Aru J, et al. Multiagent cooperation and competition with deep reinforcement learning. arXiv: 1511.08779, 2015.
					 | 
			
		
				| 
	                    [45]
	                 | 
				
					Usunier N, Synnaeve G, Lin Z M, Chintala S. Episodic exploration for deep deterministic policies: an application to starcraft micromanagement tasks. arXiv: 1609.02993, 2016.
					 | 
			
		
				| 
	                    [46]
	                 | 
				
					Cui L L, Wang X W, Zhang Y. Reinforcement learning-based asymptotic cooperative tracking of a class multi-agent dynamic systems using neural networks. Neurocomputing, 2016, 171: 220−229 doi:  10.1016/j.neucom.2015.06.066
					 | 
			
		
				| 
	                    [47]
	                 | 
				
					Kraemer L, Banerjee B. Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing, 2016, 190: 82−94 doi:  10.1016/j.neucom.2016.01.031
					 | 
			
		
				| 
	                    [48]
	                 | 
				
					Lowe R, Wu Y, Tamar A, Harb J, Abbeel P, Mordatch I. Multi-agent actor-critic for mixed cooperative-competitive environments. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: MIT Press, 2017. 6379−6390
					 | 
			
		
				| 
	                    [49]
	                 | 
				
					Ryu H, Shin H, Park J. Multi-agent actor-critic with generative cooperative policy network. arXiv: 1810.09206, 2018.
					 | 
			
		
				| 
	                    [50]
	                 | 
				
					Chu X X, Ye H J. Parameter sharing deep deterministic policy gradient for cooperative multi-agent reinforcement learning. arXiv: 1710.00336, 2017.
					 | 
			
		
				| 
	                    [51]
	                 | 
				
					Foerster J N, Farquhar G, Afouras T, Nardelli N, Whiteson S. Counterfactual multi-agent policy gradients. arXiv: 1705.08926, 2017.
					 | 
			
		
				| 
	                    [52]
	                 | 
				
					Zhang K Q, Yang Z R, Liu H, Zhang T, Basar T. Fully decentralized multi-agent reinforcement learning with networked agents. In: Proceedings of the 35th International Conference on Machine Learning. Stockholm, Sweden: ACM, 2018. 5872−5881
					 | 
			
		
				| 
	                    [53]
	                 | 
				
					Jiang J C, Dun C, Huang T J, Lu Z Q. Graph convolutional reinforcement learning. arXiv: 1810.09202, 2018.
					 | 
			
		
				| 
	                    [54]
	                 | 
				
					Wang Q L, Psillakis H E, Sun C Y. Cooperative control of multiple agents with unknown high-frequency gain signs under unbalanced and switching topologies. IEEE Transactions on Automatic Control, 2019, 64(6): 2495−2501 doi:  10.1109/TAC.2018.2867161
					 | 
			
		
				| 
	                    [55]
	                 | 
				
					Hernandez-Leal P, Kaisers M, Baarslag T, de Cote E M. A survey of learning in multiagent environments: dealing with non-stationarity. arXiv: 1707.09183, 2017.
					 | 
			
		
				| 
	                    [56]
	                 | 
				
					Mnih V, Kavukcuoglu K, Silver D, Rusu A A, Veness J, Bellemare M G, et al. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529−533 doi:  10.1038/nature14236
					 | 
			
		
				| 
	                    [57]
	                 | 
				
					Abdallah S, Kaisers M. Addressing the policy-bias of Q-learning by repeating updates. In: Proceedings of the 12th International Conference on Autonomous Agents and Multi-agent Systems. Saint Paul, USA: ACM, 2013. 1045−1052
					 | 
			
		
				| 
	                    [58]
	                 | 
				
					Abdallah S, Kaisers M. Addressing environment non-stationarity by repeating Q-learning updates. The Journal of Machine Learning Research, 2016, 17(1): 1582−1612
					 | 
			
		
				| 
	                    [59]
	                 | 
				
					Yu C, Zhang M J, Ren F H, Tan G Z. Emotional multiagent reinforcement learning in spatial social dilemmas. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(12): 3083−3096 doi:  10.1109/TNNLS.2015.2403394
					 | 
			
		
				| 
	                    [60]
	                 | 
				
					Diallo E A O, Sugiyama A, Sugawara T. Learning to coordinate with deep reinforcement learning in doubles pong game. In: Proceedings of the 16th IEEE International Conference on Machine Learning and Applications. Cancun, Mexico: IEEE, 2017. 14−19
					 | 
			
		
				| 
	                    [61]
	                 | 
				
					Foerster J N, Nardelli N, Farquhar G, Afouras T, Torr P H S, Kohli P. Stabilising experience replay for deep multi-agent reinforcement learning. In: Proceedings of the 34th International Conference on Machine Learning. Sydney, Australia: ACM, 2017. 1146−1155
					 | 
			
		
				| 
	                    [62]
	                 | 
				
					Palmer G, Tuyls K, Bloembergen D, Savani R. Lenient multi-agent deep reinforcement learning. In: Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems. Stockholm, Sweden: ACM, 2018. 443−451
					 | 
			
		
				| 
	                    [63]
	                 | 
				
					Omidshafiei S, Pazis J, Amato C, How J P, Vian J. Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In: Proceedings of the 34th International Conference on Machine Learning. Sydney, Australia: ACM, 2017. 2681−2690
					 | 
			
		
				| 
	                    [64]
	                 | 
				
					Zheng Y, Meng Z P, Hao J Y, Zhang Z Z. Weighted double deep multiagent reinforcement learning in stochastic cooperative environments. In: Proceedings of the 15th Pacific Rim International Conference on Artificial Intelligence. Nanjing, China: ACM, 2018. 421−429
					 | 
			
		
				| 
	                    [65]
	                 | 
				
					Mu C X, Zhao Q, Sun C Y. Optimal model-free output synchronization of heterogeneous multi-agent systems under switching topologies. IEEE Transactions on Industrial Electronics, 2019 doi:  10.1109/TIE.2019.2958277
					 | 
			
		
				| 
	                    [66]
	                 | 
				
					Foerster J N, Assael Y M, de Freitas N, Whiteson S. Learning to communicate to solve riddles with deep distributed recurrent Q-networks. arXiv: 1602.02672, 2016.
					 | 
			
		
				| 
	                    [67]
	                 | 
				
					Hong Z W, Su S Y, Shann T Y, Chang Y H, Lee C Y. A deep policy inference Q-network for multi-agent systems. In: Proceedings of the 17th Conference on Autonomous Agents and Multiagent Systems. Stockholm, Sweden: Springer, 2018. 1388−1396
					 | 
			
		
				| 
	                    [68]
	                 | 
				
					Kasai T, Tenmoto H, Kamiya A. Learning of communication codes in multi-agent reinforcement learning problem. In: Proceedings of 2008 IEEE Conference on Soft Computing in Industrial Applications. Muroran, Japan: IEEE, 2008. 1−6
					 | 
			
		
				| 
	                    [69]
	                 | 
				
					Foerster J N, Assael Y M, de Freitas N, Whiteson S. Learning to communicate with deep multi-agent reinforcement learning. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain: ACM, 2016. 2137−2145
					 | 
			
		
				| 
	                    [70]
	                 | 
				
					Sukhbaatar S, Szlam A, Fergus R. Learning multiagent communication with backpropagation. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain: ACM, 2016. 2252−2260
					 | 
			
		
				| 
	                    [71]
	                 | 
				
					Zhang H G, Jiang H, Luo Y H, Xiao G Y. Data-driven optimal consensus control for discrete-time multi-agent systems with unknown dynamics using reinforcement learning method. IEEE Transactions on Industrial Electronics, 2017, 64(5): 4091−4100 doi:  10.1109/TIE.2016.2542134
					 | 
			
		
				| 
	                    [72]
	                 | 
				
					Zhang Y, Zavlanos M M. Distributed off-policy actor-critic reinforcement learning with policy consensus. arXiv: 1903.09255, 2019.
					 | 
			
		
				| 
	                    [73]
	                 | 
				
					Wei Q L, Liu D R, Lewis F L, Liu Y, Zhang J. Mixed iterative adaptive dynamic programming for optimal battery energy control in smart residential microgrids. IEEE Transactions on Industrial Electronics, 2017, 64(5): 4110−4120 doi:  10.1109/TIE.2017.2650872
					 | 
			
		
				| 
	                    [74]
	                 | 
				
					Yang X D, Wang Y D, He H B, Sun C Y, Zhang Y B. Deep reinforcement learning for economic energy scheduling in data center microgrids. In: Proceedings of the 2019 IEEE Power & Energy Society General Meeting. Atlanta, USA: IEEE, 2019. 1−5
					 | 
			
		
				| 
	                    [75]
	                 | 
				
					Prasad A, Dusparic I. Multi-agent deep reinforcement learning for zero energy communities. arXiv: 1810.03679, 2018.
					 | 
			
		
				| 
	                    [76]
	                 | 
				
					徐昕. 增强学习与近似动态规划. 北京: 科学出版社, 2010Xu Xin. Reinforcement Learning and Approximate Dynamic Programming. Beijing: Science Press, 2010
					 | 
			
		
				| 
	                    [77]
	                 | 
				
					Wan Z Q, Jiang C, Fahad M, Ni Z, Guo Y, He H B. Robot-assisted pedestrian regulation based on deep reinforcement learning. IEEE Transactions on Cybernetics, 2020, 50(4): 1669−1682 doi:  10.1109/TCYB.2018.2878977
					 | 
			
		
				| 
	                    [78]
	                 | 
				
					Lin K X, Zhao R Y, Xu Z, Zhou J Y. Efficient large-scale fleet management via multi-agent deep reinforcement learning. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. London, UK: ACM, 2018. 1774−1783
					 | 
			
		
				| 
	                    [79]
	                 | 
				
					Ben Noureddine D, Gharbi A, Ben Ahmed S. Multi-agent deep reinforcement learning for task allocation in dynamic environment. In: Proceedings of the 12th International Conference on Software Technologies. Madrid, Spain: SciTePress, 2017. 17−26
					 | 
			
		
				| 
	                    [80]
	                 | 
				
					Hüttenrauch M, Šošić A, Neumann G. Guided deep reinforcement learning for swarm systems. arXiv: 1709.06011, 2017.
					 | 
			
		
				| 
	                    [81]
	                 | 
				
					Kurek M, Jaśkowski W. Heterogeneous team deep Q-learning in low-dimensional multi-agent environments. In: Proceedings of the 2016 IEEE Conference on Computational Intelligence and Games (CIG). Santorini, Greece: IEEE, 2016. 1−8
					 | 
			
		
				| 
	                    [82]
	                 | 
				
					Perolat J, Leibo J Z, Zambaldi V, Beattie C, Tuyls K, Graepel T. A multi-agent reinforcement learning model of common-pool resource appropriation. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: ACM, 2017. 3643−3652
					 | 
			
		
				| 
	                    [83]
	                 | 
				
					Piot B, Geist M, Pietquin O. Bridging the gap between imitation learning and inverse reinforcement learning. IEEE Transactions on Neural Networks and Learning Systems, 2017, 28(8): 1814−1826 doi:  10.1109/TNNLS.2016.2543000
					 | 
			
		
				| 
	                    [84]
	                 | 
				
					Hadfield-Menell D, Russell S J, Abbeel P, Dragan A. Cooperative inverse reinforcement learning. In: Proceedings of the 30th Conference on Neural Information Processing Systems. Barcelona, Spain: ACM, 2016. 3909−3917
					 | 
			
		
				| 
	                    [85]
	                 | 
				
					Hadfield-Menell D, Milli S, Abbeel P, Russell S, Dragan A D. Inverse reward design. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: ACM, 2017. 6765−6774
					 | 
			
		
				| 
	                    [86]
	                 | 
				
					Levine S, Finn C, Darrell T, Abbeel P. End-to-end training of deep visuomotor policies. The Journal of Machine Learning Research, 2016, 17(1): 1334−1373
					 | 
			
		
				| 
	                    [87]
	                 | 
				
					Nagabandi A, Kahn G, Fearing R S, Levine S. Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. In: Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA). Brisbane, Australia: IEEE, 2018. 7559−7566
					 | 
			
		
				| 
	                    [88]
	                 | 
				
					Gu S X, Lillicrap T P, Sutskever I, Levine S. Continuous deep Q-learning with model-based acceleration. In: Proceedings of the 33rd International Conference on Machine Learning. New York, USA: ACM, 2016. 2829−2838
					 | 
			
		
				| 
	                    [89]
	                 | 
				
					Finn C, Levine S. Deep visual foresight for planning robot motion. In: Proceedings of the 2017 IEEE International Conference on Robotics and Automation. Singapore: IEEE, 2017. 2786−2793
					 | 
			
		
				| 
	                    [90]
	                 | 
				
					Serban I V, Sankar C, Pieper M, Pineau J, Bengio Y. The bottleneck simulator: a model-based deep reinforcement learning approach. arXiv: 1807.04723, 2018.
					 | 
			
		
				| 
	                    [91]
	                 | 
				
					Rashid T, Samvelyan M, de Witt C S, Farquhar G, Foerster J, Whiteson S. QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. arXiv: 1803.11485, 2018.
					 | 
			
		
				| 
	                    [92]
	                 | 
				
					Foerster J N, Chen R Y, Al-Shedivat M, Whiteson S, Abbeel P, Mordatch I. Learning with opponent-learning awareness. In: Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems. Stockholm, Sweden: ACM, 2018. 122−130
					 | 
			
		
				| 
	                    [93]
	                 | 
				
					Yuan X, Dong L, Sun C Y. Solver-critic: a reinforcement learning method for discrete-time constrained-input systems. IEEE Transactions on Cybernetics, 2020 doi:  10.1109/TCYB.2020.2978088
					 | 
			
		
				| 
	                    [94]
	                 | 
				
					He W, Li Z J, Chen C L P. A survey of human-centered intelligent robots: issues and challenges. IEEE/CAA Journal of Automatica Sinica, 2017, 4(4): 602−609 doi:  10.1109/JAS.2017.7510604
					 | 
			
		
				| 
	                    [95]
	                 | 
				
					Nahavandi S. Trusted autonomy between humans and robots: toward human-on-the-loop in robotics and autonomous systems. IEEE Systems, Man, and Cybernetics Magazine, 2017, 3(1): 10−17 doi:  10.1109/MSMC.2016.2623867
					 |