2.624

2020影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

兵棋推演的智能决策技术与挑战

尹奇跃 赵美静 倪晚成 张俊格 黄凯奇

尹奇跃, 赵美静, 倪晚成, 张俊格, 黄凯奇. 兵棋推演的智能决策技术与挑战. 自动化学报, 2021, 47(x): 1−15 doi: 10.16383/j.aas.c210547
引用本文: 尹奇跃, 赵美静, 倪晚成, 张俊格, 黄凯奇. 兵棋推演的智能决策技术与挑战. 自动化学报, 2021, 47(x): 1−15 doi: 10.16383/j.aas.c210547
Yin Qi-Yue, Zhao Mei-Jing, Ni Wan-Cheng, Zhang Jun-Ge, Huang Kai-Qi. Intelligent decision making technology and challenge of wargame. Acta Automatica Sinica, 2021, 47(x): 1−15 doi: 10.16383/j.aas.c210547
Citation: Yin Qi-Yue, Zhao Mei-Jing, Ni Wan-Cheng, Zhang Jun-Ge, Huang Kai-Qi. Intelligent decision making technology and challenge of wargame. Acta Automatica Sinica, 2021, 47(x): 1−15 doi: 10.16383/j.aas.c210547

兵棋推演的智能决策技术与挑战

doi: 10.16383/j.aas.c210547
基金项目: 国家自然科学青年基金(61906197)资助
详细信息
    作者简介:

    尹奇跃:中国科学院自动化研究所副研究员、硕导. 主要研究方向为强化学习、数据挖掘与游戏AI. E-mail: qyyin@nlpr.ia.ac.cn

    赵美静:中国科学院自动化研究所副研究员. 主要研究方向为知识表示与建模、复杂系统决策. E-mail: meijing.zhao@ia.ac.cn

    倪晚成:中国科学院自动化研究所研究员、硕导. 主要研究方向为数据挖掘与知识发现、复杂系统建模、群体智能博弈决策平台与评估. E-mail: wancheng.ni@ia.ac.cn

    张俊格:中国科学院自动化研究所副研究员、硕导. 主要研究方向为持续学习、小样本学习、博弈决策、强化学习. E-mail: jgzhang@nlpr.ia.ac.cn

    黄凯奇:中国科学院自动化研究所研究员、博导. 主要研究方向为计算机视觉、模式识别和认知决策. 本文通信作者. E-mail: kqhuang@nlpr.ia.ac.cn

Intelligent Decision Making Technology and Challenge of Wargame

Funds: Supported by National Natural Science Foundation of China (61906197)
More Information
    Author Bio:

    YIN Qi-Yue associate professor at the Institute of Automation, Chinese Academy of Sciences. His research interest covers reinforcement learning, data mining and artificial intelligence on games

    ZHAO Mei-Jing associate professor at the Institute of Automation, Chinese Academy of Sciences. Her research interest covers knowledge representation and modeling, complex system decision making

    NI Wan-Cheng professor at the Institute of Automation, Chinese Academy of Sciences. Her research interest covers data mining and knowledge discovery, complex system modeling, swarm intelligence platform and evaluation

    ZHANG Jun-Ge associate professor at the Institute of Automation, Chinese Academy of Sciences. His research interest covers game AI and decision-making, reinforcement learning, multi-agent learning, pattern recognition, computer vision

    HUANG Kai-Qi professor at the Institute of Automation, Chinese Academy of Sciences. His research interest covers computer vision, pattern recognition and cognitive decision-making. Corresponding author of this paper

  • 摘要: 近年来, 以人机对抗为途径的智能决策技术取得了飞速发展, 人工智能技术AlphaGo、AlphaStar等分别在围棋、星际争霸等游戏环境中战胜了顶尖人类选手. 兵棋推演, 作为一种人机对抗策略验证环境, 由于其非对称环境决策、更接近真实环境的随机性与高风险决策等特点受到智能决策技术研究者的广泛关注. 本文将梳理兵棋推演与目前主流人机对抗环境如围棋、德扑、星际争霸等对抗环境的区别, 阐述兵棋推演智能决策技术的发展现状, 并分析当前主流技术的局限与瓶颈, 对兵棋推演中的智能决策技术研究进行了思考, 期望能对兵棋推演相关研究人员的智能决策技术研究带来启发.
    1)  1 http://turingai.ia.ac.cn/
    2)  2 http://turingai.ia.ac.cn/ranks/wargame_list
    3)  3 https: //www.tensorflow.org/4 https: //pytorch.org/
    4)  https: //pytorch.org/
    5)  5 http://turingai.ia.ac.cn/notices/detail/116
    6)  6 http://turingai.ia.ac.cn/bbs/detail/14/1/29
    7)  7 http://www.cas.cn/syky/202107/t20210712_4798152.shtml
    8)  8 http://gym.openai.com/
  • 图  1  包以德循环

    Fig.  1  OODA loop

    图  2  自博弈+强化学习训练

    Fig.  2  Self-training + reinforcement learning

    图  3  IMAPLA用于兵棋推演AI训练

    Fig.  3  IMAPLA for training wargame AI

    图  4  知识与数据驱动“加性融合”框架

    Fig.  4  Additive fusion between knowledge-based and data-based AI

    图  5  人机对抗框架[45]

    Fig.  5  Human-machine confrontation framework[45]

    图  6  知识与数据驱动“主从融合”框架

    Fig.  6  Hypotactic fusion between knowledge-based and data-based AI

    图  7  智能体单项能力评估

    Fig.  7  Evaluation of specific capability of Agents

    图  8  “图灵网”平台

    Fig.  8  Turing AI platform

    图  9  兵棋推演知识库构建示例

    Fig.  9  Example of knowledge base construction for wargame

    图  10  兵棋推演中的异步多智能体协同

    Fig.  10  Asynchronous multi-agent cooperation in wargame

    图  11  兵棋推演大模型训练挑战

    Fig.  11  Challenge of training big model for wargame

    图  12  排兵布阵问题示意图

    Fig.  12  Environment of arranging arms

    图  13  算子异步协同问题示意图

    Fig.  13  Environment of asynchronous multi-agent cooperation

    表  1  对决策带来挑战的代表性因素

    Table  1  Representative factors that challenge decision-making

    游戏雅达利围棋德州扑克星际争霸兵棋推演
    不完美信息×
    长时决策×
    策略非传递×
    智能体协作×××
    非对称环境××××
    高随机性××××
    下载: 导出CSV
  • [1] Campbell M, Jr A J H, Hsu F H. Deep blue. Artificial Intelligence, 2002, 134(1-2): 57-83. doi: 10.1016/S0004-3702(01)00129-1
    [2] Silver D, Huang A, Maddison C J, et al. Mastering the game of Go with deep neural networks and tree search. Nature, 2016, 529: 484-489. doi: 10.1038/nature16961
    [3] Brown N, Sandholm T. Superhuman AI for heads-up nolimit poker: Libratus beats top professionals. Science, 2018, 359(6374): 418-424. doi: 10.1126/science.aao1733
    [4] Vinyals O, Babuschkin I, Czarnecki W M, et al. Grandmaster level in StarCraft Ⅱ using multi-agent reinforcement learning. Nature, 2019, 575: 350-354. doi: 10.1038/s41586-019-1724-z
    [5] Ye D, Chen G, Zhang W, et al. Towards Playing Full MOBA Games with Deep Reinforcement Learning. In: Advances in Neural Information Processing Systems 33, virtual: MIT Press, 2020.
    [6] 胡晓峰, 贺筱媛, 陶九阳. AlphaGo的突破与兵棋推演的挑战. 科 技导报, 2017, 35(21): 49-60.

    Hu Xiao-Feng, He Xiao-Yuan, Tao Jiu-Yang. AlphaGo’s breakthrough and challenges of wargaming. Science & Technology Review, 2017, 35(21): 49-60.
    [7] 胡晓峰, 齐大伟. 智能化兵棋系统: 下一代需要改变的是什么. 系统 仿真学报, 2021.

    Hu Xiao-Feng, Qi Da-Wei. Intelligent wargaming system: what will the next generation need to be changed. Journal of System Simulation, 2021. https://kns.cnki.net/kcms/detail/11.3092.v.20210812.1004.004.html.
    [8] 吴琳, 胡晓峰, 陶九阳, 贺筱媛. 面向智能成长的兵棋推演生态系 统. 系统仿真学报, 2021, 33(9): 2048-2058.

    Wu Lin, Hu Xiaofeng, Tao Jiuyang, He Xiaoyuan. Wargaming eco-system for intelligence growing. Journal of System Simulation, 2021, 33(9): 2048-2058.
    [9] 戴勇, 黄杏花. 人工智能在计算机兵棋推演领域的应用. 集成电路 应用, 2020, 37(5):67-69.

    Dai Yong, Huang Xing-Hua. Study on application of artificial intelligence in computer wargame. Application of IC. 2020, 37(5):67-69.
    [10] Moy G, Shekh S. The Application of AlphaZero to Wargaming. In: AI 2019: Advances in Artificial Intelligence - 32nd Australasian Joint Conference, Adelaide, SA, Australia: Springer, 2019. 3−14.
    [11] Evensen P I, Martinussen S E, Halsor M, Bentsen D H. Wargaming evolved: Methodology and best practices for simulation-supported wargaming. In: Interservice/Industry Training, Simulation, and Education Conference, Orlando(FL), USA: I/ITSEC Show Daily publication, 2019. 19182.
    [12] 胡艮胜, 张倩倩, 马朝忠. 兵棋推演系统中的异常数据挖掘方法. 信 息工程大学学报, 2020, 21(03): 373-377.

    Hu Gen-Sheng, Zhang Qian-Qian, Ma Chao-Zhong. Outlier data mining of the war game system. Journal of Information Engineering University, 2020, 21(03): 373-377.
    [13] 纪梦琪, 董倩, 秦茂, 森杨峰. 基于OPM的兵棋推演行动方案建模 方法研究. 指挥控制与仿真, 2018, 40(5): 79-85. doi: 10.3969/j.issn.1673-3819.2018.05.016

    Ji Meng-Qi, Dong Qian, Qin Mao-Sen, Yang Feng. Research on modeling method of COA for wargaming based on object-process methodology. Command Control & Simulation, 2018, 40(5): 79-85. doi: 10.3969/j.issn.1673-3819.2018.05.016
    [14] Bedi P, Taneja S B, Satija P, Jain G, Pandey A, Aggarwal A. Bot development for military wargaming simulation. In: International Conference on Application of Computing and Communication Technologies, Delhi, India: Springer, 2018. 347−360.
    [15] 王桂起, 刘辉, 朱宁. 兵棋技术综述[J]. 兵工自动化, 2012, 31(8): 38-41+45. doi: 10.3969/j.issn.1006-1576.2012.08.012

    Wang Gui-Qi, Liu Hui, Zhu Ning. A survey of war games technology. Ordnance Industry Automation, 2012, 31(8): 38-41+45. doi: 10.3969/j.issn.1006-1576.2012.08.012
    [16] 彭春光, 赵鑫业, 刘宝宏, 黄柯棣. 兵棋推演技术综述. 系统仿真技术及其应用, 2009, 366-370.

    Peng Chun-Guang, Zhao Xin-Ye, Liu Bao-Hong, Huang KeDi. The technology of wargaming: an overview. In: Proceedings of 14th Chinese Conference on System Simulation Technology & Application, 2009, 366-370.
    [17] 曹占广, 陶帅, 胡晓峰, 何吕龙. 国外兵棋推演及系统研究进展. 系 统仿真学报, 2021, 33(9): 2059-2065.

    Cao Zhan-Guang, Tao Shuai, Hu Xiao-Feng, He Lü-Long. Abroad wargaming deduction and system research. Journal of System Simulation, 2021, 33(9): 2059-2065.
    [18] 司光亚, 王艳正. 新一代大型计算机兵棋系统面临的挑战与思考. 系统仿真学报, 2021, 33(9): 2010-2016.

    Si Guang-Ya, Wang Yan-Zheng. Challenges and reflection on next-generation large-scale computer wargame system. Journal of System Simulation, 2021, 33(9): 2010-2016.
    [19] Ganzfried S, Sandholm T. Game theory-based opponent modeling in large imperfect-information games. In: 10th International Conference on Autonomous Agents and Multiagent Systems, Taipei, Taiwan: Springer, 2011. 533−540.
    [20] Littman M L. Algorithms for sequential decisionmaking [Ph.D.dissertation], Brown University, 1996.
    [21] Nieves N P, Yang Y, Slumbers O, Mguni D H, Wen Y, Wang J. Modelling behavioural diversity for learning in open-ended games. In: Proceedings of the 38th International Conference on Machine Learning, Virtual Event: ACM, 2021. 8514−8524.
    [22] Jaderberg M, Czarnecki W M, Dunning I, Marris L, Lever G, Castaneda A G, Beattie C, Rabinowitz N C, Morcos A S, Ruderman A, Sonnerat N, Green T, Deason L, Leibo J Z, Silver D, Hassabis D, Kavukcuoglu K, Graepel T. Humanlevel performance in 3D multiplayer games with populationbased reinforcement learning. Science, 2019, 364: 859-865. doi: 10.1126/science.aau6249
    [23] Baker B, Kanitscheider I, Markov T, Wu Y, Powell G, McGrew B, Mordatch I. Emergent tool use from multi-agent autocurricula. In: 8th International Conference on Learning Representations, Addis Ababa, Ethiopia: OpenReview.net, 2020.
    [24] Liu I J, Jain U, Yeh R A, Schwing A G. Cooperative exploration for multi-agent deep reinforcement learning. In: Proceedings of the 38th International Conference on Machine Learning, Virtual Event: ACM, 2021. 6826−6836.
    [25] 周志杰, 曹友, 胡昌华, 唐帅文, 张春潮, 王杰. 基于规则的建模方 法的可解释性及其发展. 自动化学报, 2021, 47(6): 1201-1216.

    Zhou Zhi-Jie, Cao You, Hu Chang-Hua, Tang Shuai-Wen, Zhang Chun-Chao, Wang Jie. The interpretability of rulebased modeling approach and its development. Acta Automatica Sinica, 2021, 47(6): 1201-1216.
    [26] Gran T, Kooter B. Comparing OODA & other models as operational view C2 architecture topic: C4ISR/C2 architecture. In: 10th International command and control research and technology symposium, MacLean, Virginia, USA, 2005.
    [27] Nicolau M, Perez-Liebana D, Neill M O, Brabazon A. Evolutionary behavior tree approaches for navigating platform games. IEEE Transactions on Computational Intelligence and AI in Games, 2021, 9(3): 227-238.
    [28] Henzinger T A. The theory of hybrid automata. In: Verification of Digital and Hybrid Systems, Springer, 2000. 265−292.
    [29] 崔文华, 李东, 唐宇波, 柳少军. 基于深度强化学习的兵棋推演决策 方法框架. 国防科技, 2020, 41(2): 113-121.

    Cui Wen-Hua, Li Dong, Tang Yu-Bo, Liu Shao-Jun. Framework of wargaming decision-making methods based on deep reinforcement learning. National Defense Technology, 2020, 41(2): 113-121.
    [30] 李琛, 黄炎焱, 张永亮, 陈天德. Actor-Critic框架下的多智能体决 策方法及其在兵棋上的应用. 系统工程与电子技术, 2021, 43(3): 755-762. doi: 10.12305/j.issn.1001-506X.2021.03.20

    Li Chen, Huang Yan-Yan, Zhang Yong-Liang, Chen TianDe. Multi-agent decision-making method based on actorcritic framework and its application in wargame. Systems Engineering and Electronics, 2021, 43(3): 755-762. doi: 10.12305/j.issn.1001-506X.2021.03.20
    [31] 张振, 黄炎焱, 张永亮, 陈天德. 基于近端策略优化的作战实体博弈 对抗算法[J]. 南京理工大学学报, 2020, 45(1): 77-83.

    Zhang Zhen, Huang Yan-Yan, Zhang Yong-Liang, Chen Tian-De. Battle entity confrontation algorithm based on proximal policy optimization. Journal of Nanjing University of Sccience and Technology, 2020, 45(1): 77-83.
    [32] 秦超, 高晓光, 万开方. 深度卷积记忆网络时空数据模型. 自动化学报, 46(3): 451−462.

    Qin Chao, Gao Xiao-Guang, Wan Kai-Fang. Deep spatiotemporal convolutional long-short memory network. Acta Automatica Sinica, 2020, 46(3): 451−462.
    [33] 陈伟宏, 安吉尧, 李仁发, 李万里. 深度学习认知计算综述. 自动化 学报, 2017, 43(11): 1886-1897.

    CHEN Wei-Hong, AN Ji-Yao, LI Ren-Fa, LI Wan-Li. Review on Deep-learning-based Cognitive Computing. ACTA AUTOMATICA SINICA, 2017, 43(11): 1886-1897.
    [34] Burda Y, Edwards H, Storkey A, Klimov O. Exploration by random network distillation. In: 7th International Conference on Learning Representations, New Orleans, LA, USA: OpenReview.net, 2019.
    [35] Mnih V, Badia A P, Mirza M, Graves A, Harley T, Lillicrap T P, Silver D, Kavukcuoglu K. Asynchronous Methods for Deep Reinforcement Learning. In: Proceedings of the 33nd International Conference on Machine Learning, New York City, NY, USA: ACM, 2016. 1928−937.
    [36] Horgan D, Quan J, Budden D, Barth-Maron G, Hessel M, Hasselt H V, Silver D. Distributed prioritized experience replay. In 6th International Conference on Learning Representations, Vancouver, BC, Canada: OpenReview.net, 2018.
    [37] Espeholt L, Soyer H, Munos R, Simonyan K, Mnih V, Ward T, Doron Y, Firoiu V, Harley T, Dunning I, Legg S, Kavukcuoglu K. IMPALA: scalable distributed deep-RL with importance weighted actor-learner architectures. International Conference on Machine Learning, In: Proceedings of the 35nd International Conference on Machine Learning, Long Beach, California, USA: ACM, 2018. 1407−1416.
    [38] Jaderberg M, Czarnecki W M, Dunning I, et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science, 2019, 364(6443): 859-865. doi: 10.1126/science.aau6249
    [39] Espeholt L, Marinier R, Stanczyk P, Wang K, Michalski M. SEED RL: scalable and efficient deep-RL with accelerated central inference. In: Proceedings of the 8th International Conference on Learning Representations, Addis Ababa, Ethiopia: OpenReview.net, 2020.
    [40] Moritz P, Nishihara R, Wang S, Tumanov A, Liaw R, Liang E, Elibol M, Yang Z, Paul W, Jordan M I, Stoica I. Ray: A distributed framework for emerging AI applications. In: Proceedings of the 13th USENIX conference on Operating Systems Design and Implementation, Carlsbad CA USA: ACM, 2018. 561−577.
    [41] 蒲志强, 易建强, 刘振, 丘腾海, 孙金林, 李非墨. 知识和数据协同驱动的群体智能决策方法研究综述. 自动化学报, 2021.

    Pu Zhi-Qiang, Yi Jian-Qiang, Liu Zhen, Qiu Teng-Hai, Sun Jin-Lin, Li Fei-Mo. Knowledge-based and data-driven integrating methodologies for collective intelligence decision making: A survey. Acta Automatica Sinica, 2021.
    [42] Rueden L v, Mayer S, Beckh K, Georgiev B, Giesselbach S, Heese R, Kirsch B, Pfrommer J, Pick A, Ramamurthy R, Walczak M, Garcke J, Bauckhage C, Schuecker J. Informed machine learning –A taxonomy and survey of integrating prior knowledge into learning systems. IEEE Transactions on Knowledge and Data Mining, 2021.
    [43] Hartmann G, Shiller Z, Azaria A. Deep reinforcement learning for time optimal velocity control using prior knowledge. In: 31st IEEE International Conference on Tools with Artificial Intelligence, Portland, OR, USA: IEEE, 2019. 186−193.
    [44] Zhang P, Hao J, Wang W, Tang H, Ma Y, Duan Y, Zheng Y. KoGuN: Accelerating deep reinforcement learning via integrating human suboptimal knowledge. In: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, Virtual Event: Morgan Kaufmann, 2020. 2291−2297.
    [45] 黄凯奇, 兴军亮, 张俊格, 倪晚成, 徐博. 人机对抗智能技术[J]. 中 国科学: 信息科学, 2020, 50(4): 540-550. doi: 10.1360/N112019-00048

    Huang Kai-Qi, Xing Jun-Liang, Zhang Jun-Ge, Ni WanCheng, Xu Bo. Intelligent technologies of human-computer gaming. Scientia Sinica Informations, 2020, 50(4): 540-550. doi: 10.1360/N112019-00048
    [46] Elo A E. The Rating of Chessplayers, Past and Present. Arco Pub, 1978.
    [47] Ralf Herbrich Tom Minka Thore Graepel. TrueSkill(TM): a Bayesian skill rating system. In: Proceedings of the TwentyFirst Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada: MIT Press, 2007. 569−576.
    [48] Balduzzi D, Tuyls K, Perolat J, Graepel T. Re-evaluating evaluation. In: Advances in Neural Information Processing Systems 31, Montréal, Canada: MIT Press, 2018. 3268−3279.
    [49] Omidshafiei S, Ch P, Piliouras G, Tuyls K, Rowland M, Lespiau J B, Czarnecki W M, Lanctot M, Perolat J, Munos R. α-rank: Multi-agent evaluation by evolution. Scientific reports, 2019, 9(1): 1-29.
    [50] 唐宇波, 沈弼龙, 师磊, 易星. 下一代兵棋系统模型引擎设计问题研 究. 系统仿真学报, 2021, 33(9): 2025-2036.

    Tang Yu-Bo, Shen Bi-Long, Shi Lei, Yi Xing. Research on the issues of next generation wargame system model engine. Journal of System Simulation, 2021, 33(9): 2025-2036.
    [51] Tyran C K, George J F. The implementation of expert systems: a survey of successful implementations. ACM SIGMIS Database: the DATABASE for Advances in Information Systems, 1993, 24(1): 5-15. doi: 10.1145/154421.154422
    [52] Wang Z, Zhang J, Feng J, Chen Z. Knowledge graph embedding by translating on hyperplanes. In: Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, Québec City, Québec, Canada: AAAI, 2014. 1112−1119.
    [53] 王保魁, 吴琳, 胡晓峰, 贺筱媛, 郭圣明. 基于时序图的作战指 挥行为知识表示学习方法. 系统工程与电子技术, 2020, 42(11): 2520-2528. doi: 10.3969/j.issn.1001-506X.2020.11.14

    Wang Bao-Kui, Wu Lin, Hu Xiao-Feng, He Xiao-Yuan, Guo Sheng-Ming. Operations command behavior knowledge representation learning method based on sequential graph. Systems Engineering and Electronics, 2020, 42(11): 2520-2528. doi: 10.3969/j.issn.1001-506X.2020.11.14
    [54] 刘嵩, 武志强, 游雄, 张欣, 王雪峰. 基于兵棋推演的综合战场态势 多尺度表达. 测绘科学技术学报, 2012, 29(5), 1673-6338.

    Liu Song, Wu Zhi-Qiang, You Xiong, Zhang Xin, Wang XueFeng. Multi-scale expression of integrated battlefield situation based on wargaming. Journal of Geomatics Science and Technology, 2012, 29(5), 1673-6338.
    [55] 贺筱媛, 郭圣明, 吴琳, 李东, 许霄, 李丽. 面向智能化兵棋的认知 行为建模方法研究. 系统仿真学报, 2021, 33(9): 2037-2047.

    He Xiao-Yuan, Guo Sheng-Ming, Wu Lin, Li Dong, Xu Xiao, Li Li. Modeling research of cognition behavior for intelligent wargaming. Journal of System Simulation, 2021, 33(9): 2037-2047.
    [56] 朱丰, 胡晓峰, 吴琳, 贺筱媛, 吕学志, 廖鹰. 从态势认知走向态势 智能认知. 系统仿真学报, 2018, 30(3): 761-771.

    Zhu Feng, Hu Xiao-Feng, Wu Lin, He Xiao-Yuan, LüXueZhi, Liao Ying. From situation cognition stepped into situation intelligent cognition. Journal of System Simulation, 2018, 30(3): 761-771.
    [57] Heinrich J, Lanctot M, Silver D. Fictitious self-play in extensive-form games. In: Proceedings of the 32nd International Conference on Machine Learning, Lille, France: ACM, 2015. 805−813.
    [58] Adam L, Horcik R, Kasl T, Kroupa T. Double oracle algorithm for computing equilibria in continuous games. In: Thirty-Fifth AAAI Conference on Artificial Intelligence, Virtual Event: AAAI, 2021. 5070−5077.
    [59] Nguyen T T, Nguyen N D, Nahavandi S. Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications. IEEE Transactions on Cybernetics, 2020, 50(9): 3826-3839. doi: 10.1109/TCYB.2020.2977374
    [60] Zhang K, Yang Z, Basar T. Multi-agent reinforcement learning: A selective overview of theories and algorithms. In: Handbook of Reinforcement Learning and Control, Springer, 2021, 321−384.
    [61] 施伟, 冯旸赫, 程光权, 黄红蓝, 黄金才, 刘忠, 贺威. 基于深度 强化学习的多机协同空战方法研究. 自动化学报, 2021, 47(7): 1610-1623.

    Shi Wei, Feng Yang-He, Cheng Guang-Quan, Huang HongLan, Huang Jin-Cai, Liu Zhong, He Wei. Research on multiaircraft cooperative air combat method based on deep reinforcement learning. Acta Automatica Sinica, 2021, 47(7): 1610-1623.
    [62] 梁星星, 冯旸赫, 马扬, 程光权, 黄金才, 王琦, 周玉珍, 刘忠. 多Agent深度强化学习综述. 自动化学报, 2020, 46(12): 2537-2557.

    Liang Xing-Xing, Feng Yang-He, Ma Yang, Cheng GuangQuan, Huang Jin-Cai, Wang Qi, Zhou Yu-Zhen, Liu Zhong. Deep multi-agent reinforcement learning: a survey. Acta Automatica Sinica, 2020, 46(12): 2537-2557.
    [63] Agogino A K, Tumer K, Unifying Temporal and Structural Credit Assignment Problems. In: 3rd International Joint Conference on Autonomous Agents and Multiagent Systems, New York, NY, USA: Springer, 2004. 980−987.
    [64] Lansdell B J, Prakash P R, Kording K P. Learning to solve the credit assignment problem. In: 8th International Conference on Learning Representations, Addis Ababa, Ethiopia: OpenReview.net, 2020.
    [65] 孙长银, 穆朝絮. 多智能体深度强化学习的若干关键科学问题. 自 动化学报, 2020, 46(7): 1301-1312.

    Sun Chang-Yin, Mu Chao-Xu. Important scientific problems of multi-agent deep reinforcement learning. Acta Automatica Sinica, 2020, 46(7): 1301-1312.
    [66] Sunehag P, Lever G, Gruslys A, Czarnecki W M, Zambaldi V, Jaderberg M, Lanctot M, Sonnerat N, Leibo J Z, Tuyls K, Graepel T. Value-decomposition networks for cooperative multi-agent learning based on team reward. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, Stockholm, Sweden: Springer, 2018. 2085−2087.
    [67] Rashid T, Samvelyan M, Witt C S d, Farquhar G, Foerster J, Whiteson S. Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In: Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden: ACM, 2018. 4292–4301.
    [68] Son K, Kim D, Kang W J, Hostallero D E, Yi Y. Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: Proceedings of the 36th International Conference on Machine Learning, Long Beach, California, USA: ACM, 2019. 5887–5896.
    [69] Foerster J, Farquhar G, Afouras T, Nardelli N, Whiteson S. Counterfactual multi-agent policy gradients. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, USA: AAAI, 2018. 2974−2982.
    [70] Nguyen D T, Kumar A, Lau H C. Credit assignment for collective multiagent rl with global rewards. In: Advances in Neural Information Processing Systems 31, Montréal, Canada: MIT Press, 2018. 102–8113.
    [71] Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, Sifre L, Kumaran D, Graepe T, Lillicrap T, Simonyan K, Hassabis D. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, 2018, 362(6419): 1140-1144. doi: 10.1126/science.aar6404
    [72] Yu Y. Towards sample efficient reinforcement learning. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, Stockholm, Sweden: Morgan Kaufmann, 2018. 5739−5743.
    [73] Ecoffet A, Huizinga J, Lehman J, Stanley K O, Clune J. First return, then explore. Nature, 2021, 590: 580-586. doi: 10.1038/s41586-020-03157-9
    [74] Jin C, Krishnamurthy A, Simchowitz M, Yu T. Rewardfree exploration for reinforcement learning. In: Proceedings of the 37th International Conference on Machine Learning, Virtual Event: ACM. 2020. 4870−4879.
    [75] Mahajan A, Rashid T, Samvelyan M, Whiteson S. Maven: Multi-agent variational exploration. In: Advances in Neural Information Processing Systems 32, Vancouver, BC, Canada: MIT Press, 2019. 7611–7622.
    [76] Yang Y, Wen Y, Chen L, Wang J, Shao K, Mguni D, Zhang W. Multi-Agent Determinantal Q-Learning. In: Proceedings of the 37th International Conference on Machine Learning, Virtual Event: ACM, 2020. 10757−10766.
    [77] Wang T, Dong H, Lesser V, Zhang C. ROMA: Role-Oriented Multi-Agent Reinforcement Learning. In: Proceedings of the 37th International Conference on Machine Learning, Virtual Event: ACM, 2020. 9876−9886.
    [78] 张钹, 朱军, 苏航. 迈向第三代人工智能. 中国科学: 信息科学, 2020, 50(9): 1281-1302. doi: 10.1360/SSI-2020-0204

    Zhang Bo, Zhu Jung, Su Hang. Toward the third generation of artificial intelligence. Scientia Sinca Informationis, 2020, 50(9): 1281-1302. doi: 10.1360/SSI-2020-0204
    [79] 王保剑, 胡大裟, 蒋玉明. 改进A*算法在路径规划中的应用p. 计算 机工程与应用, 2021, 57(12): 243-247.

    Wand Bao-Jian, Hu Da-Sha, Jiang Yu-Ming. Application of improved A* algorithm in path planning. CEA, 2021, 57(12): 243-247.
    [80] 张可, 郝文宁, 史路荣, 余晓晗, 邵天浩. 基于级联模糊系统的兵棋 进攻关键点推理. 控制工程, 2021, 28(7): 1366-1374.

    Zhang Ke, Hao Wen-Ning, Shi Lu-Rong, Yu Xiao-Han, Shao Tian-Hao. Inference of key points of attack in wargame based on cascaded fuzzy system. Control Engineering of China, 2021, 28(7): 1366-1374.
    [81] 邢思远, 倪晚成, 张海东, 闫科. 基于兵棋复盘数据的武器 效用挖掘. 指挥与控制学报, 2020, 6(02): 132-140. doi: 10.3969/j.issn.2096-0204.2020.02.0132

    Xing Si-Yuan, Ni Wan-Cheng, Zhang Hai-Dong, Yan Ke. Mining of weapon utility based on the replay data of wargame. Journal of Command and Control, 2020, 6(2): 132-140 doi: 10.3969/j.issn.2096-0204.2020.02.0132
    [82] 金哲豪, 刘安东, 俞立. 基于GPR和深度强化学习的分层人机协作 控制. 自动化学报, 2020, 46(x): 1-11.

    Jin Zhe-Hao, Liu An-Dong, Yu Li. Hierarchical human-robot cooperative control based on GPR and DRL. Acta Automatica Sinica, 2020, 46(x): 1-11.
    [83] 徐磊, 杨勇. 基于兵棋推演的分队战斗行动方案评估. 火力与指挥 控制, 2021, 46(4): 88-98.

    Xu Lei, Yang Yong. Research on evaluation of unit combat action plan based on wargaming. Fire Control & Command Control, 2021, 46(4): 88-98.
    [84] 李云龙, 张艳伟, 王增臣. 联合作战方案推演评估技术框架. 指挥信 息系统与技术, 2020, 11(04): 78-83.

    Li Yun-Long, Zhan Yan-Wei, Wang Zeng-Chen. Technical Framework of Joint Operation Scheme Deduction and Evaluation. Command Information System and Technology, 2020, 11(04): 78-83.
    [85] Myerson R B. Game Theory. Harvard University Press, 2013.
    [86] Weibull J W. Evolutionary Game Theory. MIT Press, 1997.
    [87] Roughgarden T. Algorithmic game theory. Communications of the ACM, 2010, 53(7):78-86. doi: 10.1145/1785414.1785439
    [88] Chalkiadakis G, Elkind E, Wooldridge M. Cooperative game theory: Basic concepts and computational challenges. IEEE Intelligent Systems, 2012, 27(3):86-90. doi: 10.1109/MIS.2012.47
    [89] Lanctot M, Zambaldi V, Gruslys A, Lazaridou A, Tuyls K, Pérolat J, Silver D, Graepel T. A unified game-theoretic approach to multiagent reinforcement learning. In: Advances in Neural Information Processing Systems 30, Long Beach, CA, USA: MIT Press, 2017. 4190−4203.
    [90] Brown N, Lerer A, Gross S, Sandholm T. Deep counterfactual regret minimization. In: Proceedings of the 38th International Conference on Machine Learning, Virtual Event: ACM, 2021. 793−802.
    [91] Qiu X P, Sun T X, Xu Y G, Shao Y F, Dai N, Huang X J. Pre-trained models for natural language processing: A survey. Science China Technological Sciences, 2020, 63: 1872-1897. doi: 10.1007/s11431-020-1647-3
    [92] Zhang Z Y, Han X, Zhou H, et al. CPM: A large-scale generative Chinese Pre-trained language model. AI Open, 2021, 2: 93-99. doi: 10.1016/j.aiopen.2021.07.001
    [93] Brown T, Mann B, Ryder N, et al. Language models are fewshot learners. In: Proceedings of 34th Conference on Neural Information Processing Systems, Vancouver, Canada: MIT Press, 2020.
    [94] Meng D, Zhao Q, Jiang L. A theoretical understanding of self-paced learning. Information Sciences, 2017, 414: 319-328. doi: 10.1016/j.ins.2017.05.043
    [95] Singh P, Verma V K, Mazumder P, Carin L, Rai P. Calibrating CNNs for lifelong learning. In: Proceedings of 34th Conference on Neural Information Processing Systems, Vancouver, Canada: MIT Press, 2020.
    [96] Cheng W, Yin Q, Zhang J. Opponent strategy recognition in real time strategy game using deep feature fusion neural network. In: 2020 5th International Conference on Computer and Communication Systems, Wuhan, China: IEEE, 2020. 134−137.
    [97] Samvelyan M, Rashid T, Witt C S d, Farquhar G, Nardelli N, Rudner T G J, Hung C M, Torr P H S, Foerster J, Whiteson S. The StarCraft multi-agent challenge. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, Montreal, QC, Canada: Springer, 2019. 2186−2188.
    [98] Tang Z, Shao K, Zhu Y, Li D, Zhao D, Huang T. A review of computational intelligence for StarCraft AI. In: IEEE Symposium Series on Computational Intelligence, Bangalore, India: IEEE, 2018. 1167−1173.
    [99] Rashid T, Samvelyan M, Schroeder C, Farquhar G, Foerster J, Whiteson S. QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. In: Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden: ACM, 2018. 4295−4304.
    [100] Christianos F, Schafer L, Albrecht S V. Shared experience actor-critic for multi-agent reinforcement learning. In: Advances in Neural Information Processing Systems 33, virtual: MIT Press, 2020.
    [101] Jaques N, Lazaridou A, Hughes E, Gulcehre C, Ortega P A, Strouse D J, Leibo J Z, Freitas N d. Social influence as intrinsic motivation for multi-agent deep reinforcement learning. In: Proceedings of the 36th International Conference on Machine Learning, Long Beach, California, USA: ACM. 2019. 3040−3049.
  • 加载中
计量
  • 文章访问数:  1020
  • HTML全文浏览量:  500
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-06-17
  • 录用日期:  2021-09-17
  • 网络出版日期:  2021-10-24

目录

    /

    返回文章
    返回