2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码
郑逸宁, 余镇, 杨捷, 李不凡, 殷林琪, 印张悦, 袁枫烨, 魏海洋, 陆嘉昊, 方世成, 李拟珺, 桂韬, 李昀, 陈爽, 邱锡鹏. 大语言模型的工具使用综述. 自动化学报, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c240793
引用本文: 郑逸宁, 余镇, 杨捷, 李不凡, 殷林琪, 印张悦, 袁枫烨, 魏海洋, 陆嘉昊, 方世成, 李拟珺, 桂韬, 李昀, 陈爽, 邱锡鹏. 大语言模型的工具使用综述. 自动化学报, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c240793
Zheng Yi-Ning, Yu Zhen, Yang Jie, Li Bu-Fan, Yin Lin-Qi, Yin Zhang-Yue, Yuan Feng-Ye, Wei Hai-Yang, Lu Jia-Hao, Fang Shi-Cheng, Li Ni-Jun, Gui Tao, Li Yun, Chen Shuang, Qiu Xi-Peng. Survey of tool use in large language models. Acta Automatica Sinica, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c240793
Citation: Zheng Yi-Ning, Yu Zhen, Yang Jie, Li Bu-Fan, Yin Lin-Qi, Yin Zhang-Yue, Yuan Feng-Ye, Wei Hai-Yang, Lu Jia-Hao, Fang Shi-Cheng, Li Ni-Jun, Gui Tao, Li Yun, Chen Shuang, Qiu Xi-Peng. Survey of tool use in large language models. Acta Automatica Sinica, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c240793

大语言模型的工具使用综述

doi: 10.16383/j.aas.c240793 cstr: 32138.14.j.aas.c240793
详细信息
    作者简介:

    郑逸宁:复旦大学计算机科学技术学院博士研究生. 2019年获得复旦大学学士学位. 主要研究方向为大语言模型中的代码生成与工具使用. E-mail: ynzheng19@fudan.edu.cn

    余镇:复旦大学计算机科学技术学院硕士研究生. 2022年获得江西师范大学学士学位. 主要研究方向为工具学习和个性化智能体. E-mail: yuz24@m.fudan.edu.cn

    杨捷:复旦大学计算机科学技术学院硕士研究生. 2021年获得武汉大学学士学位. 主要研究方向为工具调用与智能体. E-mail: yangj24@m.fudan.edu.cn

    李不凡:复旦大学计算机科学技术学院硕士研究生. 2024年获得东北大学学士学位. 主要研究方向为大语言模型的后训练. E-mail: 24210240197@m.fudan.edu.cn

    殷林琪:复旦大学计算机科学技术学院本科生. 主要研究方向为工具学习. E-mail: 21307140112@m.fudan.edu.cn

    印张悦:复旦大学计算机科学技术学院博士研究生. 2021年获得华东师范大学学士学位. 主要研究方向为大语言模型和机器推理. E-mail: yinzy21@m.fudan.edu.cn

    袁枫烨:复旦大学计算机科学技术学院硕士研究生. 2024年获得同济大学学士学位. 主要研究方向为大模型的工具调用与智能体. E-mail: fyyuan24@m.fudan.edu.cn

    魏海洋:复旦大学计算机科学技术学院硕士研究生. 2022年获得安徽大学学士学位. 主要研究方向为工具学习和复杂意图检测. E-mail: 23210240325@m.fudan.edu.cn

    陆嘉昊:复旦大学计算机科学技术学院本科生. 主要研究方向为工具学习. E-mail: 21307130022@m.fudan.edu.cn

    方世成:复旦大学计算机科学技术学院本科生. 主要研究方向为工具学习. E-mail: 21307140067@m.fudan.edu.cn

    李拟珺:上海华为技术有限公司认知先进AI实验室高级工程师. 主要研究方向为专业知识工程和AI使能信息体验. E-mail: linijun@huawei.com

    桂韬:复旦大学现代语言学研究院副研究员. 2021年获得复旦大学博士学位. 主要研究方向为大语言模型和具身智能. E-mail: tgui@fudan.edu.cn

    李昀:上海华为技术有限公司认知先进AI实验室首席技术专家. 主要研究方向为人工智能, 认知智能, 知识工程, 网络安全, AI安全, 大数据安全. E-mail: lychina@139.com

    陈爽:复旦大学计算机科学技术学院博士后. 2023年获得东北大学博士学位. 主要研究方向为自然语言处理和大语言模型中的情感计算. E-mail: chenshuang_fd@fudan.edu.cn

    邱锡鹏:复旦大学计算机科学技术学院教授. 2006年分别获得复旦大学博士学位. 主要研究方向为自然语言处理和大语言模型. 本文通信作者. E-mail: xpqiu@fudan.edu.cn

Survey of Tool Use in Large Language Models

More Information
    Author Bio:

    ZHENG Yi-Ning Ph.D. candidate at the School of Computer Science, Fudan University. He received his bachelor degree from Fudan University in 2019. His research interest covers code generation and tool use in large language models

    YU Zhen Master student at the School of Computer Science, Fudan University. He received his bachelor degree from Jiangxi Normal University in 2022. His research interest covers tool learning and personalized agent

    YANG Jie Master student at the School of Computer Science, Fudan University. He received his bachelor degree from Wuhan University in 2021. His research interest covers tool use and agents

    LI Bu-Fan Master student at the School of Computer Science, Fudan University. He received his bachelor degree from Northeastern University in 2024. His main research interest is post-training of large language models

    YIN Lin-Qi Undergraduate student at the School of Computer Science, Fudan University. Her main research interest is tool learning

    YIN Zhang-Yue Ph.D. candidate at the School of Computer Science, Fudan University. He received his bachelor degree from East China Normal University in 2021. His research interest covers large language models and machine reasoning

    YUAN Feng-Ye Master student at the School of Computer Science, Fudan University. He received his bachelor degree from Tongji University in 2024. His research interest covers tool use in large language models and agents

    WEI Hai-Yang Master student at the School of Computer Science, Fudan University. He received his bachelor degree from Anhui University in 2022. His research interest covers tool learning and complex intent detection

    LU Jia-Hao Undergraduate student at the School of Computer Science, Fudan University. His main research interest is tool learning

    FANG Shi-Cheng Undergraduate student at the School of Computer Science, Fudan University. His main research interest is tool learning

    LI Ni-Jun Senior Engineer at Advanced Cognitive AI Laboratory, Shanghai Huawei Technologies Co., Ltd. His research interest covers professional knowledge engineering and AI-enabled information experience

    GUI Tao Associate professor at Institute of Modern Languages and Linguistics, Fudan University. He received his Ph.D. degree from Fudan University in 2021. His research interest covers large language models and embodied intelligence

    LI Yun Chief technical expert at Advanced Cognitive AI Laboratory, Shanghai Huawei Technologies Co., Ltd. His research interest covers AI, cognitive intelligence, knowledge engineering, cyber security, AI security, and big data security

    CHEN Shuang Postdoctor at the School of Computer Science, Fudan University. She received her Ph.D. from Northeastern University in 2023. Her research interests covers natural language processing and sentiment computing in large language models

    QIU Xi-Peng Professor at the School of Computer Science, Fudan University. He received his Ph.D. degrees from Fudan University in 2006. His research interest covers natural language processing and large language models. Corresponding author of this pape

  • 摘要: 大语言模型因其强大的生成和理解能力受到广泛关注, 但在获取实时信息和执行复杂计算上仍存在局限性. 为使其更好地响应用户需求, 赋予大语言模型工具使用能力成为当下的研究热点. 首先, 明确了大语言模型工具使用的基本概念, 并按照时间顺序梳理了工具使用的发展脉络. 随后, 总结了与工具使用相关的数据集和技术方法, 并分析了其在智能体和具身智能等领域的应用. 最后, 梳理了大语言模型工具使用领域未来的研究重点与发展方向.
  • 图  1  大语言模型工具使用的典型流程

    Fig.  1  A typical procedure for LLM-based tool use

    图  2  多工具调用示例

    Fig.  2  An example of multiple tool invocations

    表  1  大语言模型工具使用的发展

    Table  1  The development of tool use in large language models

    发布时间名称工具的类型对话轮次工具数量及关系工具使用能力的获取方法
    2022-05TALM[5]接口式工具单次询问多工具(含复杂关系)有监督微调
    2022-11PAL[4]Python解释器单次询问多工具情境学习
    2023-02Toolformer[6]接口式工具单次询问单工具有监督微调
    2023-03GPT4-Plugin[1]接口式工具多轮对话多工具有监督微调 + 强化学习
    2023-03HuggingGPT[11]神经网络模块单次询问多工具(含复杂关系)情境学习
    2023-03ViperGPT[23]Python函数单次询问多工具(含复杂关系)情境学习
    2023-04MOSS[7]接口式工具多轮对话多工具有监督微调
    2023-04API-Bank[19]接口式工具多轮对话多工具有监督微调
    2023-05APIBench[31]Python函数单次询问单工具有监督微调
    2023-05GPT4Tools[15]神经网络模块多轮对话多工具情境学习
    2023-05ToolkenGPT[32]接口式工具单次询问多工具(含复杂关系)有监督微调
    2023-05TRICE[18]接口式工具单次询问多工具(含复杂关系)有监督微调 + 强化学习
    2023-05CRITIC[12]Python函数单次询问多工具情境学习
    2023-05LATM[24]Python函数单次询问单工具情境学习 + 创建工具
    2023-05CREATOR[25]Python函数单次询问多工具情境学习 + 创建工具
    2023-05ToolBench[17]接口式工具单次询问单工具情境学习
    2023-06ToolAlpaca[20]接口式工具多轮对话多工具有监督微调
    2023-07ToolLLM[14]接口式工具单次询问多工具有监督微调
    2023-08Confucius[33]接口式工具单次询问多工具多阶段的有监督微调
    2023-09ToRA[26]Python解释器单次询问多工具(含复杂关系)有监督微调
    2023-09CRAFT[34]Python函数单次询问多工具(含复杂关系)情境学习
    2023-10MetaTool[10]接口式工具单次询问多工具情境学习
    2023-10ToolChain[35]接口式工具单次询问多工具情境学习 + 决策过程优化
    2023-11ToolTalk[36]Python函数多轮对话多工具(含复杂关系)情景学习
    2023-12CLOVA[37]Python函数单次询问多工具(含复杂关系)情境学习
    2023-12T-Eval[13]接口式工具多轮对话多工具(含复杂关系)情境学习
    2024-01ToolEyes[38]接口式工具单次询问多工具有监督微调
    2024-01MLLM-Tool[39]神经网络模块单次询问多工具(含复杂关系)有监督微调
    2024-01TroVE[40]Python函数单次询问多工具情境学习 + 创建工具
    2024-01EasyTools[41]接口式工具单次询问多工具情境学习 + 工具文档压缩
    2024-02AnyTool[42]接口式工具单次询问多工具情境学习 + 检索过程优化
    2024-02SciToolBench[43]Python函数单次询问多工具有监督微调
    2024-03ToolRerank[44]接口式工具单次询问多工具情境学习 + 检索过程优化
    2024-03STE[16]接口式工具单次询问单工具有监督微调 + 对错误反馈处理
    2024-05Seal-Tools[45]接口式工具单次询问多工具(含复杂关系)有监督微调
    2024-06ToolPreference[46]接口式工具单次询问多工具有监督微调 + 偏好优化
    2024-06UltraTool[47]接口式工具多轮对话多工具(含复杂关系)情境学习
    2024-07GTA[48]接口式工具单次询问多工具(含复杂关系)情境学习
    2024-07Llama-3.1[8]接口式工具多轮对话多工具有监督微调 + 强化学习
    2024-07AppWorld[27]手机应用单次询问多工具(含复杂关系)情境学习
    2024-07ShortcutsBench[28]手机应用单次询问多工具情境学习
    2024-08ToolSandbox[29]手机应用多轮对话多工具(含复杂关系)有监督微调
    2024-09ToolACE[2]接口式工具多轮对话多工具(含复杂关系)有监督微调
    2024-10StepTool[49]接口式工具单次询问多工具强化学习
    2024-10MTU-Bench[50]接口式工具多轮对话多工具(含复杂关系)有监督微调
    2024-10ToolGen[51]接口式工具单次询问多工具有监督微调 + 工具文档压缩
    2024-10AndroidWorld[30]手机应用单次询问多工具(含复杂关系)情境学习
    下载: 导出CSV

    表  2  工具使用数据集概览

    Table  2  The overview of tool use datasets

    数据集 工具数量 数据数量 单轮 多轮 单工具 多工具 独立关系 依赖关系 嵌套关系
    Toolformer[6] 5 12 500 × × × × ×
    API-Bank[19] 2 211 2 202 × ×
    APIBench[31] 11 645 16 450 × × × × ×
    ToolBench[17] 232 2 764 × × × × ×
    ToolAlpaca[20] 426 3 938 × ×
    RestBench[56] 94 157 × × × × ×
    ToolQA[57] 13 530 × × ×
    ToolLLM[14] 16 464 126 486 × × ×
    MetaTool[10] 199 21 127 × ×
    TaskBench[58] 103 28 127 × ×
    ToolTalk[36] 28 78 ×
    T-Eval[13] 15 533 × × ×
    ToolEyes[38] 568 382 × × ×
    UltraTool[47] 2 032 5 824 ×
    MLLM-Tool[39] 932 11 642 × × ×
    SciToolBench[43] 2 446 856 × × ×
    Seal-Tools[45] 4 076 14 076 ×
    ShortcutsBench[28] 1 414 7 627 × × ×
    GTA[48] 14 229 × ×
    AppWorld[27] 457 750 ×
    ToolSandbox[29] 34 1 032 ×
    CToolEval[59] 398 6 816 × ×
    ToolACE[2] 26 507 11 300
    MTU-Bench[50] 136 159 061
    下载: 导出CSV
  • [1] Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I, Aleman F L, et al. GPT-4 technical report. arXiv preprint arXiv: 2303.08774, 2024.
    [2] Liu W W, Huang X, Zeng X S, Hao X L, Yu S, Li D X, et al. ToolACE: Winning the points of LLM function calling. arXiv preprint arXiv: 2409.00920, 2024.
    [3] Abdelaziz I, Basu K, Agarwal M, Kumaravel S, Stallone M, Panda R, et al. Granite-function calling model: Introducing function calling abilities via multi-task learning of granular tasks. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing: Industry Track. Miami, USA: ACL, 2024. 1131−1139
    [4] Gao L Y, Madaan A, Zhou S Y, Alon U, Liu P F, Yang Y M, et al. PAL: Program-aided language models. In: Proceedings of the 40th International Conference on Machine Learning. Honolulu, USA: PMLR, 2023. 10764−10799
    [5] Parisi A, Zhao Y, Fiedel N. TALM: Tool augmented language models. arXiv preprint arXiv: 2205.12255, 2022.
    [6] Schick T, Dwivedi-Yu J, Dessi R, Raileanu R, Lomeli M, Hambro E, et al. Toolformer: Language models can teach themselves to use tools. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. New Orleans, USA: Curran Associates Inc., 2023. 68539−68551
    [7] Sun T X, Zhang X T, He Z F, Li P, Cheng Q Y, Liu X Y, et al. MOSS: An open conversational large language model. Machine Intelligence Research, 2024, 21(5): 888−905 doi: 10.1007/s11633-024-1502-8
    [8] Grattafiori A, Dubey A, Jauhri A, Pandey A, Kadian A, Al-Dahle A, et al. The llama 3 herd of models. arXiv preprint arXiv: 2407.21783, 2024.
    [9] Qwen Team. QwQ-32B: Embracing the power of reinforcement learning [Online],available: https://qwenlm.github.io/blog/qwq-32b/, May 14, 2025
    [10] Huang Y, Shi J W, Li Y, Fan C R, Wu S Y, Zhang Q H, et al. MetaTool benchmark for large language models: Deciding whether to use tools and which to use. In: Proceedings of the 12th International Conference on Learning Representations. Vienna, Austria: OpenReview.net, 2024.
    [11] Shen Y L, Song K T, Tan X, Li D S, Lu W M, Zhuang Y T. HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. New Orleans, USA: Curran Associates Inc., 2023. Article No. 1657
    [12] Gou Z B, Shao Z H, Gong Y Y, Shen Y L, Yang Y J, Duan N, et al. CRITIC: Large language models can self-correct with tool-interactive critiquing. In: Proceedings of the 12th International Conference on Learning Representations. Vienna, Austria: OpenReview.net, 2024.
    [13] Chen Z H, Du W H, Zhang W W, Liu K K, Liu J N, Zheng M, et al. T-eval: Evaluating the tool utilization capability of large language models step by step. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. Bangkok, Thailand: Association for Computational Linguistics, 2024. 9510−9529
    [14] Qin Y J, Liang S H, Ye Y N, Zhu K L, Yan L, Lu Y X, et al. ToolLLM: Facilitating large language models to master 16000+ real-world APIs. In: Proceedings of the 12th International Conference on Learning Representations. Vienna, Austria: OpenReview.net, 2024.
    [15] Yang R, Song L, Li Y W, Zhao S J, Ge Y X, Li X, et al. GPT4Tools: Teaching large language model to use tools via self-instruction. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. New Orleans, USA: Curran Associates Inc., 2023. Article No. 3149
    [16] Wang B S, Fang H, Eisner J, Van Durme B, Su Y. LLMs in the imaginarium: Tool learning through simulated trial and error. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. Bangkok, Thailand: ACL, 2024. 10583−10604
    [17] Xu Q T, Hong F L, Li B, Hu C R, Chen Z Y, Zhang J. On the tool manipulation capability of open-source large language models. arXiv preprint arXiv: 2305.16504, 2023.
    [18] Qiao S F, Gui H H, Lv C F, Jia Q H, Chen H J, Zhang N Y. Making language models better tool learners with execution feedback. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). Mexico City, Mexico: ACL, 2024. 3550−3568
    [19] Li M H, Zhao Y X, Yu B W, Song F F, Li H Y, Yu H Y, et al. API-bank: A comprehensive benchmark for tool-augmented LLMs. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. Singapore, Singapore: ACL, 2023. 3102−3116
    [20] Tang Q Y, Deng Z L, Lin H Y, Han X P, Liang Q, Cao B X, et al. ToolAlpaca: Generalized tool learning for language models with 3000 simulated cases. arXiv preprint arXiv: 2306.05301, 2023.
    [21] Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, et al. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 2020, 21(1): Article No. 140
    [22] Chen M, Tworek J, Jun H W, Yuan Q M, de Oliveira Pinto H P, Kaplan J, et al. Evaluating large language models trained on code. arXiv preprint arXiv: 2107.03374, 2021.
    [23] Surís D, Menon S, Vondrick C. ViperGPT: Visual inference via python execution for reasoning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Paris, France: IEEE, 2023. 11854−11864
    [24] Cai T L, Wang X Z, Ma T Y, Chen X Y, Zhou D. Large language models as tool makers. In: Proceedings of the 12th International Conference on Learning Representations. Vienna, Austria: OpenReview.net, 2024.
    [25] Qian C, Han C, Fung Y, Qin Y J, Liu Z Y, Ji H. CREATOR: Tool creation for disentangling abstract and concrete reasoning of large language models. In: Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023. Singapore, Singapore: ACL, 2023. 6922−6939
    [26] Gou Z B, Shao Z H, Gong Y Y, Shen Y L, Yang Y J, Huang M L, et al. ToRA: A tool-integrated reasoning agent for mathematical problem solving. In: Proceedings of the 12th International Conference on Learning Representations. Vienna, Austria: OpenReview.net, 2024.
    [27] Trivedi H, Khot T, Hartmann M, Manku R, Dong V, Li E, et al. AppWorld: A controllable world of apps and people for benchmarking interactive coding agents. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Bangkok, Thailand: ACL, 2024. 16022−16076
    [28] Shen H Y, Li Y, Meng D S, Cai D Q, Qi S, Zhang L, et al. ShortcutsBench: A large-scale real-world benchmark for API-based agents. In: Proceedings of the 13th International Conference on Learning Representations. Singapore, Singapore: OpenReview.net, 2025.
    [29] Lu J R, Holleis T, Zhang Y Z, Aumayer B, Nan F, Bai H P, et al. ToolSandbox: A stateful, conversational, interactive evaluation benchmark for LLM tool use capabilities. In: Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025. Albuquerque, USA: ACL, 2025. 1160−1183
    [30] Rawles C, Clinckemaillie S, Chang Y F, Waltz J, Lau G, Fair M, et al. AndroidWorld: A dynamic benchmarking environment for autonomous agents. In: Proceedings of the 13th International Conference on Learning Representations. Singapore, Singapore: OpenReview.net, 2025.
    [31] Patil S G, Zhang T J, Wang X, Gonzalez J E. Gorilla: Large language model connected with massive APIs. In: Proceedings of the 38th International Conference on Neural Information Processing Systems. Vancouver, Canada: NeurIPS, 2024.
    [32] Hao S B, Liu T Y, Wang Z, Hu Z T. ToolkenGPT: Augmenting frozen language models with massive tools via tool embeddings. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. New Orleans, USA: Curran Associates Inc., 2023. Article No. 1988
    [33] Gao S, Shi Z L, Zhu M H, Fang B W, Xin X, Ren P J, et al. Confucius: Iterative tool learning from introspection feedback by easy-to-difficult curriculum. In: Proceedings of the 38th AAAI Conference on Artificial Intelligence. Vancouver, Canada: AAAI Press, 2024. 18030−18038
    [34] Yuan L F, Chen Y Y, Wang X Y, Fung Y, Peng H, Ji H. CRAFT: Customizing LLMs by creating and retrieving from specialized toolsets. In: Proceedings of the 12th International Conference on Learning Representations. Vienna, Austria: OpenReview.net, 2024.
    [35] Zhuang Y C, Chen X, Yu T, Mitra S, Bursztyn V, Rossi R A, et al. ToolChain*: Efficient action space navigation in large language models with a* search. In: Proceedings of the 12th International Conference on Learning Representations. Vienna, Austria: OpenReview.net, 2024.
    [36] Farn N, Shin R. ToolTalk: Evaluating tool-usage in a conversational setting. arXiv preprint arXiv: 2311.10775, 2023.
    [37] Gao Z, Du Y T, Zhang X T, Ma X J, Han W, Zhu S C, et al. CLOVA: A closed-loop visual assistant with tool usage and update. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE, 2024. 13258−13268
    [38] Ye J J, Li G Y, Gao S Y, Huang C S, Wu Y L, Li S X, et al. ToolEyes: Fine-grained evaluation for tool learning capabilities of large language models in real-world scenarios. In: Proceedings of the 31st International Conference on Computational Linguistics. Abu Dhabi, UAE: ACL, 2025. 156−187
    [39] Wang C Y, Luo W X, Dong S X, Xuan X H, Li Z X, Ma L, et al. MLLM-Tool: A multimodal large language model for tool agent learning. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). Tucson, USA: IEEE, 2025. 6678−6687
    [40] Wang Z R, Neubig G, Fried D. TroVE: Inducing verifiable and efficient toolboxes for solving programmatic tasks. In: Proceedings of the 41st International Conference on Machine Learning. Vienna, Austria: OpenReview.net, 2024.
    [41] Yuan S Y, Song K T, Chen J J, Tan X, Shen Y L, Ren K, et al. EASYTOOL: Enhancing LLM-based agents with concise tool instruction. In: Proceedings of the ICLR Workshop on Large Language Model (LLM) Agents. Vienna, Austria: OpenReview.net, 2024.
    [42] Du Y, Wei F Y, Zhang H Y. AnyTool: Self-reflective, hierarchical agents for large-scale API calls. In: Proceedings of the 41st International Conference on Machine Learning. Vienna, Austria: OpenReview.net, 2024.
    [43] Ma Y B, Gou Z B, Hao J H, Xu R C, Wang S H, Pan L M, et al. SciAgent: Tool-augmented language models for scientific reasoning. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. Miami, USA: ACL, 2024. 15701−15736
    [44] Zheng Y H, Li P, Liu W, Liu Y, Luan J, Wang B. ToolRerank: Adaptive and hierarchy-aware reranking for tool retrieval. In: Proceedings of the Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). Torino, Italia: ACL, 2024. 16263−16273
    [45] Wu M S, Zhu T, Han H, Tan C Y, Zhang X, Chen W L. Seal-tools: Self-instruct tool learning dataset for agent tuning and detailed benchmark. In: Proceedings of the 13th National CCF Conference on Natural Language Processing and Chinese Computing. Hangzhou, China: Springer, 2024. 372−384
    [46] Chen S J, Wang Y B, Wu Y F, Chen Q G, Xu Z, Luo W H, et al. Advancing tool-augmented large language models: Integrating insights from errors in inference trees. In: Proceedings of the 38th International Conference on Neural Information Processing Systems. Vancouver, Canada: NeurIPS, 2024.
    [47] Huang S J, Zhong W J, Lu J Q, Zhu Q, Gao J H, Liu W W, et al. Planning, creation, usage: Benchmarking LLMs for comprehensive tool utilization in real-world complex scenarios. In: Proceedings of the Findings of the Association for Computational Linguistics: ACL 2024. Bangkok, Thailand: ACL, 2024. 4363−4400
    [48] Wang J Z, Ma Z R, Li Y N, Zhang S Y, Chen C L, Chen K, et al. GTA: A benchmark for general tool agents. In: Proceedings of the 38th International Conference on Neural Information Processing Systems. Vancouver, Canada: NeurIPS, 2024.
    [49] Yu Y Q, Wang Z F, Ma W Z, Guo Z C, Zhan J T, Wang S, et al. StepTool: A step-grained reinforcement learning framework for tool learning in LLMs. arXiv preprint arXiv: 2410.07745, 2024.
    [50] Wang P, Wu Y N, Wang N, Liu J H, Song X S, Peng Z Y, et al. MTU-Bench: A multi-granularity tool-use benchmark for large language models. In: Proceedings of the 13th International Conference on Learning Representations. Singapore, Singapore: OpenReview.net, 2025.
    [51] Wang R X, Han X D, Ji L, Wang S, Baldwin T, Li H N. ToolGen: Unified tool retrieval and calling via generation. In: Proceedings of the 13th International Conference on Learning Representations. Singapore, Singapore: OpenReview.net, 2025.
    [52] RapidAPI. RapidAPI: A platform for discovering and connecting to APIs [Online], available: https://rapidapi.com/, May 15, 2024.
    [53] Georgiev P, Lei V I, Burnell R, Bai L B, Gulati A, Tanzer G, et al. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. arXiv preprint arXiv: 2403.05530, 2024.
    [54] Yang A, Yang B S, Hui B Y, Zheng B, Yu B W, Zhou C, et al. Qwen2 technical report. arXiv preprint arXiv: 2407.10671, 2024.
    [55] Zhu Q H, Guo D Y, Shao Z H, Yang D J, Wang P Y, Xu R X, et al. DeepSeek-coder-V2: Breaking the barrier of closed-source models in code intelligence. arXiv preprint arXiv: 2406.11931, 2024.
    [56] Song Y F, Xiong W M, Zhu D W, Wu W H, Qian H, Song M B, et al. RestGPT: Connecting large language models with real-world RESTful APIs. arXiv preprint arXiv: 2306.06624, 2023.
    [57] Zhuang Y C, Yu Y, Wang K, Sun H T, Zhang C. ToolQA: A dataset for LLM question answering with external tools. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. New Orleans, USA: Curran Associates Inc., 2023. Article No. 2180
    [58] Shen Y L, Song K T, Tan X, Zhang W Q, Ren K, Yuan S Y, et al. TaskBench: Benchmarking large language models for task automation. In: Proceedings of the 38th International Conference on Neural Information Processing Systems. Vancouver, Canada: NeurIPS, 2024.
    [59] Guo Z S, Huang Y F, Xiong D Y. CToolEval: A Chinese benchmark for LLM-powered agent evaluation in real-world API interactions. In: Proceedings of the Findings of the Association for Computational Linguistics: ACL 2024. Bangkok, Thailand: Association for Computational Linguistics, 2024. 15711−15724
    [60] Basu K, Abdelaziz I, Chaudhury S, Dan S, Crouse M, Munawar A, et al. API-BLEND: A comprehensive corpora for training and benchmarking API LLMs. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Bangkok, Thailand: ACL, 2024. 12859−12870
    [61] Wang H R, Wang R, Xue B Y, Xia H M, Cao J T, Liu Z M, et al. AppBench: Planning of multiple APIs from various APPs for complex user instruction. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. Miami, USA: ACL, 2024. 15322−15336
    [62] Wang W X, Shi J L, Wang C Z, Lee C, Yuan Y L, Huang J T, et al. Learning to ask: When LLMs meet unclear instruction. arXiv preprint arXiv: 2409.00557, 2024.
    [63] Ye J J, Li S X, Li G Y, Huang C S, Gao S Y, Wu Y L, et al. ToolSword: Unveiling safety issues of large language models in tool learning across three stages. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Bangkok, Thailand: ACL, 2024. 2181−2211
    [64] Ye J J, Wu Y L, Gao S Y, Huang C S, Li S X, Li G Y, et al. RoTBench: A multi-level benchmark for evaluating the robustness of large language models in tool learning. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. Miami, USA: ACL, 2024. 313−333
    [65] Guo Z C, Cheng S J, Wang H, Liang S H, Qin Y J, Li P, et al. StableToolBench: Towards stable large-scale benchmarking on tool learning of large language models. In: Proceedings of the Findings of the Association for Computational Linguistics: ACL 2024. Bangkok, Thailand: ACL, 2024. 11143−11156
    [66] Papineni K, Roukos S, Ward T, Zhu W J. BLEU: A method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Philadelphia, USA: Association for Computational Linguistics, 2002. 311−318
    [67] Lin C Y. ROUGE: A package for automatic evaluation of summaries. In: Proceedings of the Text Summarization Branches Out. Barcelona, Spain: Association for Computational Linguistics, 2004. 74−81
    [68] Bergroth L, Hakonen H, Raita T. A survey of longest common subsequence algorithms. In: Proceedings of the 7th International Symposium on String Processing and Information Retrieval. SPIRE 2000. A Curuna, Spain: IEEE, 2000. 39−48
    [69] Liu Y M, Peng X Y, Zhang Y W, Cao J N, Zhang X H, Cheng S, et al. Tool-planner: Dynamic solution tree planning for large language model with tool clustering. arXiv preprint arXiv: 2406.03807, 2024.
    [70] Qiao S F, Fang R N, Qiu Z S, Wang X B, Zhang N Y, Jiang Y, et al. Benchmarking agentic workflow generation. arXiv preprint arXiv: 2410.07869, 2024.
    [71] OpenMOSS. UnifiedToolHub [Online], GitHub Repository, available: https://github.com/OpenMOSS/UnifiedToolHub, May 14, 2025.
    [72] Zhou S Y, Xu F F, Zhu H, Zhou X H, Lo R, Sridhar A, et al. WebArena: A realistic web environment for building autonomous agents. In: Proceedings of the 12th International Conference on Learning Representations. Vienna, Austria: OpenReview.net, 2024.
    [73] Kim G, Baldi P, McAleer S. Language models can solve computer tasks. In: Proceedings of the 37th International Conference on Neural Information Processing Systems. New Orleans, USA: Curran Associates Inc., 2023. Article No. 1723
    [74] Liu Y L, Yuan Y L, Wang C W, Han J H, Ma Y Q, Zhang L, et al. From summary to action: Enhancing large language models for complex tasks with open world APIs. arXiv preprint arXiv: 2402.18157, 2024.
    [75] Liu X, Qin B, Liang D Z, Dong G, Lai H Y, Zhang H C, et al. AutoGLM: Autonomous foundation agents for GUIs. arXiv preprint arXiv: 2411.00820, 2024.
    [76] Qi Z H, Liu X, Iong I L, Lai H Y, Sun X Q, Sun J D, et al. WebRL: Training LLM web agents via self-evolving online curriculum reinforcement learning. In: Proceedings of the 13th International Conference on Learning Representations. Singapore, Singapore: OpenReview.net, 2025.
    [77] Wu Q Z, Liu W, Luan J, Wang B. ToolPlanner: A tool augmented LLM for multi granularity instructions with path planning and feedback. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. Miami, USA: Association for Computational Linguistics, 2024. 18315−18339
    [78] Chen K, Cusumano-Towner M, Huval B, Petrenko A, Hamburger J, Koltun V, et al. Reinforcement learning for long-horizon interactive LLM agents. arXiv preprint arXiv: 2502.01600, 2025.
    [79] Kong Y L, Ruan J Q, Chen Y H, Zhang B, Bao T P, Shiwei S, et al. TPTU-v2: Boosting task planning and tool usage of large language model-based agents in real-world industry systems. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing: Industry Track. Miami, US: Association for Computational Linguistics, 2024. 371−385
    [80] Liu X K, Peng Z Y, Yi X Y, Xie X, Xiang L R, Liu Y C, et al. ToolNet: Connecting large language models with massive tools via tool graph. arXiv preprint arXiv: 2403.00839, 2024.
    [81] Huang W L, Abbeel P, Pathak D, Mordatch I. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In: Proceedings of the 39th International Conference on Machine Learning. Baltimore, USA: PMLR, 2022. 9118−9147
    [82] Xu H S, Zhu S, Wang Z H, Zheng H, Ma D, Cao R S, et al. Reducing tool hallucination via reliability alignment. arXiv preprint arXiv: 2412.04141, 2024.
    [83] Xu G W, Jin P, Li H, Song Y B, Sun L C, Yuan L. LLaVA-CoT: Let vision language models reason step-by-step. arXiv preprint arXiv: 2411.10440, 2024.
    [84] Koh J Y, McAleer S, Fried D, Salakhutdinov R. Tree search for language model agents. arXiv preprint arXiv: 2407.01476, 2024.
    [85] Chen P, Bu P, Song J, Gao Y, Zheng B. Can VLMs play action role-playing games? Take black myth Wukong as a study case. In: Proceedings of the NeurIPS Workshop on Open-World Agents. Vancouver, Canada: NeurIPS, 2024.
    [86] Nakano R, Hilton J, Balaji S, Wu J, Ouyang L, Kim C, et al. WebGPT: Browser-assisted question-answering with human feedback. arXiv preprint arXiv: 2112.09332, 2022.
    [87] Yao S Y, Chen H, Yang J, Narasimhan K. WebShop: Towards scalable real-world web interaction with grounded language agents. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. New Orleans, USA: Curran Associates Inc., 2022. Article No. 1508
    [88] Qiao S F, Fang R N, Zhang N Y, Zhu Y Q, Chen X, Deng S M, et al. Agent planning with world knowledge model. In: Proceedings of the 38th International Conference on Neural Information Processing Systems. Vancouver, Canada: NeurIPS, 2024.
    [89] Cao H Y, Zhang Y Q, Feng S, Yang X C, Wang D L, Zhang Y F. TOOL-ED: Enhancing empathetic response generation with the tool calling capability of LLM. In: Proceedings of the 31st International Conference on Computational Linguistics. Abu Dhabi, UAE: Association for Computational Linguistics, 2025. 5305−5320
    [90] Liao Z Y, Mo L B, Xu C J, Kang M T, Zhang J W, Xiao C W, et al. EIA: Environmental injection attack on generalist web agents for privacy leakage. arXiv preprint arXiv: 2409.11295, 2025.
    [91] Chen Z R, Xiang Z, Xiao C W, Song D, Li B. AgentPoison: Red-teaming LLM agents via poisoning memory or knowledge bases. In: Proceedings of the 38th International Conference on Neural Information Processing Systems. Vancouver, Canada: NeurIPS, 2024.
    [92] Xiang Z, Zheng L Z, Li Y J, Hong J Y, Li Q B, Xie H, et al. GuardAgent: Safeguard LLM agents by a guard agent via knowledge-enabled reasoning. arXiv preprint arXiv: 2406.09187, 2024.
    [93] Andrychowicz M, Baker B, Chociej M, Józefowicz R, McGrew B, Pachocki J, et al. Learning dexterous in-hand manipulation. The International Journal of Robotics Research, 2020, 39(1): 3−20 doi: 10.1177/0278364919887447
    [94] Kavraki L E, Svestka P, Latombe J C, Overmars M H. Probabilistic roadmaps for path planning in high-dimensional configuration spaces. IEEE Transactions on Robotics and Automation, 1996, 12(4): 566−580 doi: 10.1109/70.508439
    [95] Shen Z Y, Wilson J P, Harvey R, Gupta S. MRRT: Multiple rapidly-exploring random trees for fast online replanning in dynamic environments. arXiv preprint arXiv: 2104.11059, 2021.
    [96] Liang J, Huang W L, Xia F, Xu P, Hausman K, Ichter B, et al. Code as policies: Language model programs for embodied control. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). London, UK: IEEE, 2023. 9493−9500
    [97] Ichter B, Brohan A, Chebotar Y, Finn C, Hausman K, Herzog A, et al. Do as I can, not as I say: Grounding language in robotic affordances. In: Proceedings of the 6th Conference on Robot Learning. Auckland, New Zealand: PMLR, 2022. 287−318
    [98] Yu Q J, Huang S Y, Yuan X B, Jiang Z K, Hao C, Li X, et al. UniAff: A unified representation of affordances for tool usage and articulation with vision-language models. arXiv preprint arXiv: 2409.20551, 2024.
    [99] Huang W L, Wang C, Zhang R H, Li Y Z, Wu J J, Fei-Fei L. VoxPoser: Composable 3D value maps for robotic manipulation with language models. In: Proceedings of the 7th Conference on Robot Learning. Atlanta, USA: PMLR, 2023. 540−562
    [100] Huang W L, Wang C, Li Y Z, Zhang R H, Fei-Fei L. Rekep: Spatio-temporal reasoning of relational keypoint constraints for robotic manipulation. In: Proceedings of the 8th Conference on Robot Learning. Munich, Germany: PMLR, 2025. 4573−4602
    [101] Cai M X, Wang D L, Feng S, Zhang Y F. PECER: Empathetic response generation via dynamic personality extraction and contextual emotional reasoning. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Seoul, Korea: IEEE, 2024. 10631−10635
    [102] Jin Q, Yang Y F, Chen Q Y, Lu Z Y. GeneGPT: Augmenting large language models with domain tools for improved access to biomedical information. Bioinformatics, 2024, 40(2): btae075 doi: 10.1093/bioinformatics/btae075
    [103] Xiao S T, Liu Z, Zhang P T, Muennighoff N, Lian D, Nie J Y. C-pack: Packed resources for general Chinese embeddings. In: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. Washington, USA: Association for Computing Machinery, 2024. 641−649
    [104] Li Z C, Wang J H, Jiang Z S, Mao H Y, Chen Z X, Du J Z, et al. DMQR-RAG: Diverse multi-query rewriting for rag. arXiv preprint arXiv: 2411.13154, 2024.
    [105] Guo D Y, Yang D J, Zhang H W, Song J X, Zhang R Y, Xu R X, et al. DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning. arXiv preprint arXiv: 2501.12948, 2025.
    [106] Zeng Z Y, Cheng Q Y, Yin Z Y, Wang B, Li S M, Zhou Y H, et al. Scaling of search and learning: A roadmap to reproduce o1 from reinforcement learning perspective. arXiv preprint arXiv: 2412.14135, 2024.
  • 加载中
计量
  • 文章访问数:  217
  • HTML全文浏览量:  122
  • 被引次数: 0
出版历程
  • 收稿日期:  2024-12-13
  • 录用日期:  2025-04-22
  • 网络出版日期:  2025-05-14

目录

    /

    返回文章
    返回