2.845

2023影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码
郑逸宁, 余镇, 杨捷, 李不凡, 殷林琪, 印张悦, 袁枫烨, 魏海洋, 陆嘉昊, 方世成, 李拟珺, 桂韬, 李昀, 陈爽, 邱锡鹏. 大语言模型的工具使用综述. 自动化学报, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c240793
引用本文: 郑逸宁, 余镇, 杨捷, 李不凡, 殷林琪, 印张悦, 袁枫烨, 魏海洋, 陆嘉昊, 方世成, 李拟珺, 桂韬, 李昀, 陈爽, 邱锡鹏. 大语言模型的工具使用综述. 自动化学报, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c240793
Zheng Yi-Ning, Yu Zhen, Yang Jie, Li Bu-Fan, Yin Lin-Qi, Yin Zhang-Yue, Yuan Feng-Ye, Wei Hai-Yang, Lu Jia-Hao, Fang Shi-Cheng, Li Ni-Jun, Gui Tao, Li Yun, Chen Shuang, Qiu Xi-Peng. Survey of tool use in large language models. Acta Automatica Sinica, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c240793
Citation: Zheng Yi-Ning, Yu Zhen, Yang Jie, Li Bu-Fan, Yin Lin-Qi, Yin Zhang-Yue, Yuan Feng-Ye, Wei Hai-Yang, Lu Jia-Hao, Fang Shi-Cheng, Li Ni-Jun, Gui Tao, Li Yun, Chen Shuang, Qiu Xi-Peng. Survey of tool use in large language models. Acta Automatica Sinica, xxxx, xx(x): x−xx doi: 10.16383/j.aas.c240793

大语言模型的工具使用综述

doi: 10.16383/j.aas.c240793 cstr: 32138.14.j.aas.c240793
详细信息
    作者简介:

    郑逸宁:复旦大学计算机科学技术学院博士研究生. 2019年获得复旦大学学士学位. 主要研究方向为大语言模型中的代码生成与工具使用. E-mail: ynzheng19@fudan.edu.cn

    余镇:复旦大学计算机科学技术学院硕士研究生. 2022年获得江西师范大学学士学位. 主要研究方向为工具学习和个性化智能体. E-mail: yuz24@m.fudan.edu.cn

    杨捷:复旦大学计算机科学技术学院硕士研究生. 2021年获得武汉大学学士学位. 主要研究方向为工具调用与智能体. E-mail: yangj24@m.fudan.edu.cn

    李不凡:复旦大学计算机科学技术学院硕士研究生. 2024年获得东北大学学士学位. 主要研究方向为大语言模型的后训练. E-mail: 24210240197@m.fudan.edu.cn

    殷林琪:复旦大学计算机科学技术学院本科生. 主要研究方向为工具学习. E-mail: 21307140112@m.fudan.edu.cn

    印张悦:复旦大学计算机科学技术学院博士研究生. 2021年获得华东师范大学学士学位. 主要研究方向为大语言模型和机器推理. E-mail: ynzheng19@fudan.edu.cn

    袁枫烨:复旦大学计算机科学技术学院硕士研究生. 2024年获得同济大学学士学位. 主要研究方向为大模型的工具调用与智能体. E-mail: fyyuan24@m.fudan.edu.cn

    魏海洋:复旦大学计算机科学技术学院硕士研究生. 2022年获得安徽大学学士学位. 主要研究方向为工具学习和复杂意图检测. E-mail: 23210240325@m.fudan.edu.cn

    陆嘉昊:复旦大学计算机科学技术学院本科生. 主要研究方向为工具学习. E-mail: 21307130022@m.fudan.edu.cn

    方世成:复旦大学计算机科学技术学院本科生. 主要研究方向为工具学习. E-mail: 21307140067@m.fudan.edu.cn

    李拟珺:上海华为技术有限公司认知先进AI实验室高级工程师. 主要研究方向为专业知识工程和AI使能信息体验. E-mail: linijun@huawei.com

    桂韬:复旦大学现代语言学研究院副研究员. 2021年获得复旦大学博士学位. 主要研究方向为大语言模型和具身智能. E-mail: tgui@fudan.edu.cn

    李昀:上海华为技术有限公司认知先进AI实验室首席技术专家. 主要研究方向为人工智能, 认知智能, 知识工程, 网络安全, AI安全, 大数据安全. E-mail: lychina@139.com

    陈爽:复旦大学计算机科学技术学院博士后. 2023年获得东北大学博士学位. 主要研究方向为自然语言处理和大语言模型中的情感计算. E-mail: chenshuang_fd@fudan.edu.cn

    邱锡鹏:复旦大学计算机科学技术学院教授. 2006年分别获得复旦大学博士学位. 主要研究方向为自然语言处理和大语言模型. 本文通信作者. E-mail: xpqiu@fudan.edu.cn

Survey of Tool Use in Large Language Models

More Information
    Author Bio:

    ZHENG Yi-Ning Ph.D. candidate at the School of Computer Science, Fudan University. He received his bachelor degree from Fudan University in 2019. His research interest covers code generation and tool use in large language models

    YU Zhen Master student at the School of Computer Science, Fudan University. He received his bachelor degree from Jiangxi Normal University in 2022. His research interest covers tool learning and personalized agent

    YANG Jie Master student at the School of Computer Science, Fudan University. He received his bachelor degree from Wuhan University in 2021. His research interest covers tool use and agents

    LI Bu-Fan Master student at the School of Computer Science, Fudan University. He received his bachelor degree from Northeastern University in 2024. His main research interest is post-training of large language models

    YIN Lin-Qi Undergraduate student at the School of Computer Science, Fudan University. Her main research interest is tool learning

    YIN Zhang-Yue Ph.D. candidate at the School of Computer Science, Fudan University. He received his bachelor degree from East China Normal University in 2021. His research interest covers large language models and machine reasoning

    YUAN Feng-Ye Master student at the School of Computer Science, Fudan University. He received his bachelor degree from Tongji University in 2024. His research interest covers tool use in LLMs and agents

    WEI Hai-Yang Master student at the School of Computer Science, Fudan University. He received his bachelor degree from Anhui University in 2022. His research interest covers tool learning and complex intent detection

    LU Jia-Hao Undergraduate student at the School of Computer Science, Fudan University. His main research interest is tool learning

    FANG Shi-Cheng Undergraduate student at the School of Computer Science, Fudan University. His main research interest is tool learning

    LI Ni-Jun Senior Engineer at Advanced Cognitive AI Laboratory, Shanghai Huawei Technologies. His research interest covers professional knowledge engineering and AI-enabled information experience

    GUI Tao Associate professor at Institute of Modern Languages and Linguistics, Fudan University. He received his Ph.D. degree from Fudan University in 2021. His research interest covers large language models and embodied intelligence

    LI Yun Chief expert at Advanced Cognitive AI Laboratory, Shanghai Huawei Technologies. His research interest covers AI, cognitive intelligence, knowledge engineering, cyber security, AI security, and big data security

    CHEN Shuang Postdoctoral researcher at the School of Computer Science, Fudan University. She received her Ph.D. from Northeastern University in 2023. Her main research interests covers natural language processing and sentiment computing in large language models

    QIU Xi-Peng Professor at the School of Computer Science, Fudan University. He received his Ph.D. degrees from Fudan University in 2006. His research interest covers natural language processing and large language models. Corresponding author of this pape

  • 摘要: 大语言模型因其强大的生成和理解能力受到广泛关注, 但在获取实时信息和执行复杂计算上仍存在局限性. 为使其更好地响应用户需求, 赋予大语言模型工具使用能力成为当下的研究热点. 首先, 明确大语言模型工具使用的基本概念, 并按照时间顺序梳理工具使用的发展脉络. 随后, 总结与工具使用相关的数据集和技术方法, 并分析其在智能体和具身智能等领域的应用. 最后, 梳理大语言模型工具使用领域未来的研究重点与发展方向.
  • 图  1  大语言模型工具使用的典型流程

    Fig.  1  A typical procedure for LLM-based tool use

    图  2  多工具调用示例

    Fig.  2  An example of multiple tool invocations

    表  1  大语言模型工具使用的发展

    Table  1  The development of tool use in large language models

    发布时间名称工具的类型对话轮次工具数量及关系工具使用能力的获取方法
    2022-05TALM[5]接口式工具单次询问多工具(含复杂关系)有监督微调
    2022-11PAL[4]Python解释器单次询问多工具情境学习
    2023-02Toolformer[6]接口式工具单次询问单工具有监督微调
    2023-03GPT4-Plugin[1]接口式工具多轮对话多工具有监督微调 + 强化学习
    2023-03HuggingGPT[11]神经网络模块单次询问多工具(含复杂关系)情境学习
    2023-03ViperGPT[23]Python函数单次询问多工具(含复杂关系)情境学习
    2023-04MOSS[7]接口式工具多轮对话多工具有监督微调
    2023-04API-Bank[19]接口式工具多轮对话多工具有监督微调
    2023-05APIBench[31]Python函数单次询问单工具有监督微调
    2023-05GPT4Tools[15]神经网络模块多轮对话多工具情境学习
    2023-05ToolkenGPT[41]接口式工具单次询问多工具(含复杂关系)有监督微调
    2023-05TRICE[18]接口式工具单次询问多工具(含复杂关系)有监督微调 + 强化学习
    2023-05CRITIC[12]Python函数单次询问多工具情境学习
    2023-05LATM[24]Python函数单次询问单工具情境学习 + 创建工具
    2023-05CREATOR[25]Python函数单次询问多工具情境学习 + 创建工具
    2023-05ToolBench[17]接口式工具单次询问单工具情境学习
    2023-06ToolAlpaca[20]接口式工具多轮对话多工具有监督微调
    2023-07ToolLLM[14]接口式工具单次询问多工具有监督微调
    2023-08Confucius[35]接口式工具单次询问多工具多阶段的有监督微调
    2023-09ToRA[26]Python解释器单次询问多工具(含复杂关系)有监督微调
    2023-09CRAFT[32]Python函数单次询问多工具(含复杂关系)情境学习
    2023-10MetaTool[10]接口式工具单次询问多工具情境学习
    2023-10ToolChain[38]接口式工具单次询问多工具情境学习 + 决策过程优化
    2023-11ToolTalk[48]Python函数多轮对话多工具(含复杂关系)情景学习
    2023-12CLOVA[33]Python函数单次询问多工具(含复杂关系)情境学习
    2023-12T-Eval[13]接口式工具多轮对话多工具(含复杂关系)情境学习
    2024-01ToolEyes[49]接口式工具单次询问多工具有监督微调
    2024-01MLLM-Tool[50]神经网络模块单次询问多工具(含复杂关系)有监督微调
    2024-01TroVE[34]Python函数单次询问多工具情境学习 + 创建工具
    2024-01EasyTools[43]接口式工具单次询问多工具情境学习 + 工具文档压缩
    2024-02AnyTool[39]接口式工具单次询问多工具情境学习 + 检索过程优化
    2024-02SciToolBench[51]Python函数单次询问多工具有监督微调
    2024-03ToolRerank[40]接口式工具单次询问多工具情境学习 + 检索过程优化
    2024-03STE[16]接口式工具单次询问单工具有监督微调 + 对错误反馈处理
    2024-05Seal-Tools[52]接口式工具单次询问多工具(含复杂关系)有监督微调
    2024-06ToolPreference[53]接口式工具单次询问多工具有监督微调 + 偏好优化
    2024-06UltraTool[54]接口式工具多轮对话多工具(含复杂关系)情境学习
    2024-07GTA[55]接口式工具单次询问多工具(含复杂关系)情境学习
    2024-07Llama-3.1[8]接口式工具多轮对话多工具有监督微调 + 强化学习
    2024-07AppWorld[27]手机应用单次询问多工具(含复杂关系)情境学习
    2024-07ShortcutsBench[28]手机应用单次询问多工具情境学习
    2024-08ToolSandbox[29]手机应用多轮对话多工具(含复杂关系)有监督微调
    2024-09ToolACE[2]接口式工具多轮对话多工具(含复杂关系)有监督微调
    2024-10StepTool[44]接口式工具单次询问多工具强化学习
    2024-10MTU-Bench[37]接口式工具多轮对话多工具(含复杂关系)有监督微调
    2024-10ToolGen[42]接口式工具单次询问多工具有监督微调 + 工具文档压缩
    2024-10AndroidWorld[30]手机应用单次询问多工具(含复杂关系)情境学习
    下载: 导出CSV

    表  2  工具使用数据集概览

    Table  2  The overview of tool use datasets

    数据集 工具数量 数据数量 单轮 多轮 单工具 多工具 独立关系 依赖关系 嵌套关系
    Toolformer[6] 5 12 500 × × × × ×
    API-Bank[19] 2 211 2 202 × ×
    APIBench[31] 11 645 16 450 × × × × ×
    ToolBench[17] 232 2 764 × × × × ×
    ToolAlpaca[20] 426 3 938 × ×
    RestBench[56] 94 157 × × × × ×
    ToolQA[64] 13 530 × × ×
    ToolLLM[14] 16 464 126 486 × × ×
    MetaTool[10] 199 21 127 × ×
    TaskBench[57] 103 28 127 × ×
    ToolTalk[48] 28 78 ×
    T-Eval[13] 15 533 × × ×
    ToolEyes[49] 568 382 × × ×
    UltraTool[54] 2 032 5 824 ×
    MLLM-Tool[50] 932 11 642 × × ×
    SciToolBench[51] 2 446 856 × × ×
    Seal-Tools[52] 4 076 14 076 ×
    ShortcutsBench[28] 1 414 7 627 × × ×
    GTA[55] 14 229 × ×
    AppWorld[27] 457 750 ×
    ToolSandbox[29] 34 1 032 ×
    CToolEval[65] 398 6 816 × ×
    ToolACE[2] 26 507 11 300
    MTU-Bench[37] 136 159 061
    下载: 导出CSV
  • [1] OpenAI, Achiam J, Adler S, Agarwal S, Ahmad L and Akkaya I et al. GPT-4 Technical Report. arXiv preprint arXiv: 2303.08774, 2024.
    [2] Liu W W, Huang X, Zeng X S, Hao X L, Yu S and Li D X et al. ToolACE: Winning the Points of LLM Function Calling. arXiv preprint arXiv: 2409.00920, 2024.
    [3] Abdelaziz I, Basu K, Agarwal M, Kumaravel S, Stallone M and Panda R et al. Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks. arXiv preprint arXiv: 2407.00121, 2024.
    [4] Gao L Y, Madaan A, Zhou S Y, Alon U, Liu P F and Yang Y M et al. PAL: Program-aided language models. In: Proceedings of the 40th International Conference on Machine Learning. Honolulu, Hawaii, USA: PMLR, 2023. 10764–10799
    [5] Parisi A, Zhao Y and Fiedel N. TALM: Tool Augmented Language Models. arXiv preprint arXiv: 2205.12255, 2022.
    [6] Schick T, Yu J D, Dessi R, Raileanu R, Lomeli M and Hambro E et al. Toolformer: Language models can teach themselves to use tools. In: Preceedings of the Thirty-seventh Conference on Neural Information Processing Systems. Honolulu, Hawaii, USA: PMLR, 2023.
    [7] Sun T X, Zhang X T, He Z F, Li P, Cheng Q Y and Liu X Y, et al. MOSS: An Open Conversational Large Language Model. Machine Intelligence Research, 2024, 21(5): 888−905 doi: 10.1007/s11633-024-1502-8
    [8] Dubey A M, Jauhri A, Pandey A, Kadian A, AlDahle A and Letman A et al. The Llama 3 Herd of Models. arXiv preprint arXiv: 2407.21783, 2024.
    [9] Qwen. QwQ-32B: Embracing the Power of Reinforcement Learning. GitHub Homepage, 2025. https://qwenlm.github.io/blog/qwq-32b/.
    [10] Huang Y, Shi J W, Li Y, Fan C R, Wu S Y and Zhang Q H et al. Metatool benchmark for large language models: Deciding whether to use tools and which to use. In: Proceedings of The Twelfth International Conference on Learning Representations. Vienna, Austria: OpenReview.net, 2024.
    [11] Shen Y L, Song K T, Tan X, Li D S, Lu W M and Zhuang Y T. HuggingGPT: Solving AI tasks with ChatGPT and its friends in Hugging Face. In: Proceedings of Advances in Neural Information Processing Systems. Honolulu, Hawaii, USA: PMLR, 2023.
    [12] Gou Z B, Shao Z H, Gong Y Y, Shen Y L, Yang Y J and Duan N et al. CRITIC: Large language models can self-correct with tool-interactive critiquing. In: The Twelfth International Conference on Learning Representations. Vienna, Austria: OpenReview.net, 2024.
    [13] Chen Z H, Du W H, Zhang W W, Liu K K, Liu J N and Zheng M et al. T-eval: Evaluating the tool utilization capability of large language models step by step. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. Bangkok, Thailand: Association for Computational Linguistics, 2024. 9510–9529
    [14] Qin Y J, Liang S H, Ye Y N, Zhu K L, Yan L and Lu Y X et al. ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs. arXiv preprint arXiv: 2307.16789, 2023.
    [15] Yang R, Song L, Li Y W, Zhao S J, Ge Y X and Li X et al. GPT4Tools: Teaching large language model to use tools via self-instruction. In: Proceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS 2023). New Orleans, LA, USA: NeurIPS Foundation, 2023.
    [16] Wang B S, Fang H, Eisner J, Durme B V and Su Y. LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error. arXiv preprint arXiv: 2403.04746, 2024.
    [17] Xu Q T, Hong F L, Li B, Hu C R, Chen Z Y and Zhang J. On the Tool Manipulation Capability of Open-source Large Language Models. arXiv preprint arXiv: 2305.16504, 2023.
    [18] Qiao S F, Gui H H, Lv C F, Jia Q H, Chen H J and Zhang N Y. Making Language Models Better Tool Learners with Execution Feedback. arXiv preprint arXiv: 2305.13068, 2024.
    [19] Li M H, Zhao Y X, Yu B W, Song F F, Li H Y and Yu H Y et al. API-bank: A comprehensive benchmark for tool-augmented LLMs. In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Singapore: Association for Computational Linguistics, 2023. 3102–3116
    [20] Tang Q Y, Deng Z L, Lin H Y, Han X P, Liang Q and Cao B X et al. ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases. arXiv preprint arXiv: 2306.05301, 2023.
    [21] Raffel C, Shazeer N, Roberts A, Lee K, Narang S and Matena M, et al. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 2020, 21(140): 1−67
    [22] Chen M, Tworek J, Jun H W, Yuan Q M, Pinto H P d O and Kaplan J et al. Evaluating Large Language Models Trained on Code. arXiv preprint arXiv: 2107.03374, 2021.
    [23] Surís D, Menon S and Vondrick C. Vipergpt: Visual inference via python execution for reasoning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Paris, France: IEEE, 2023. 11888–11898
    [24] Cai T L, Wang X Z, Ma T Y, Chen X Y and Zhou D. Large language models as tool makers. In: Proceedings of the International Conference on Learning Representations. Vienna, Austria, 2024.
    [25] Qian C, Han C, Fung Y, Qin Y, Liu Z and Ji H. Creator: Tool creation for disentangling abstract and concrete reasoning of large language models. In: Findings of the Association for Computational Linguistics: EMNLP 2023. Singapore: Association for Computational Linguistics, 2023. 6922–6939
    [26] Gou Z B, Shao Z H, Gong Y Y, Shen Y L, Yang Y J and Huang M L et al. Tora: A tool-integrated reasoning agent for mathematical problem solving. In: Proceedings of the Twelfth International Conference on Learning Representations. Vienna, Austria: OpenReview.net, 2024.
    [27] Trivedi H, Khot T, Hartmann M, Manku R, Dong V and Li E et al. AppWorld: A controllable world of apps and people for benchmarking interactive coding agents. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Bangkok, Thailand: Association for Computational Linguistics, 2024. 16022–16076
    [28] Shen H Y, Li Y, Meng D S, Cai D Q, Qi S and Zhang L et al. Shortcutsbench: A large-scale real-world benchmark for API-based agents. In: Proceedings of the Thirteenth International Conference on Learning Representations. Singapore: OpenReview.net, 2025.
    [29] Lu J R, Holleis T, Zhang Y Z, Aumayer B, Nan F and Bai F et al. ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities. arXiv preprint arXiv: 2408.04682, 2024.
    [30] Rawles C, Clinckemaillie S, Chang Y F, Waltz J, Lau G and Fair M et al. AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents. arXiv preprint arXiv: 2405.14573, 2024.
    [31] Patil S G, Zhang T J, Wang X and Gonzalez J E. Gorilla: Large Language Model Connected with Massive APIs. arXiv preprint arXiv: 2305.15334, 2023.
    [32] Yuan L F, Chen Y Y, Wang X Y, Fung Y, Peng H and Ji H. CRAFT: Customizing LLMs by creating and retrieving from specialized toolsets. In: The Twelfth International Conference on Learning Representations. Vienna, Austria: OpenReview.net, 2024.
    [33] Gao Z, Du Y, Zhang X, Ma X, Han W and Zhu S C et al. Clova: A closed-loop visual assistant with tool usage and update. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2024. 13258–13268
    [34] Wang Z R, Neubig G and Fried D. TroVE: Inducing verifiable and efficient toolboxes for solving programmatic tasks. In: Forty-first International Conference on Machine Learning. Vienna, Austria: PMLR, 2024. 51177–51191
    [35] Gao S, Shi Z L, Zhu M H, Fang B W, Xin X and Ren P J et al. Confucius: Iterative tool learning from introspection feedback by easy-to-difficult curriculum. In: Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24). New York, NY, USA: AAAI Press, 2024. 18030–18038
    [36] RapidAPI. RapidAPI: A Platform for Discovering and Connecting to APIs, 2024.
    [37] Wang P, Wu Y N, Wang Z K, Liu J H, Song X S and Peng Z Y et al. MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models. arXiv preprint arXiv: 2410.11710, 2024.
    [38] Zhuang Y C, Chen X, Yu T, Mitra S, Bursztyn V and Rossi R A et al. Toolchain*: Efficient action space navigation in large language models with a* search. In: The Twelfth International Conference on Learning Representations. Vienna, Austria: OpenReview.net, 2024.
    [39] Du Y, Wei F Y and Zhang H Y. Anytool: Self-reflective, hierarchical agents for large-scale API calls. In: Proceedings of Forty-first International Conference on Machine Learning. Vienna, Austria: PMLR, 2024. 33001–33015
    [40] Zheng Y, Li P, Liu W, Liu Y, Luan J and Wang B. ToolRerank: Adaptive and hierarchy-aware reranking for tool retrieval. In: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). Torino, Italia: ELRA and ICCL, 2024. 16263–16273
    [41] Hao S B, Liu T Y, Wang Z and Hu Z T. ToolkenGPT: Augmenting frozen language models with massive tools via tool embeddings. In: Proceedings of Advances in Neural Information Processing Systems. New Orleans, LA, USA: NeurIPS Foundation, 2023.
    [42] Wang R X, Han X D, Ji L, Wang S, Baldwin T and Li H N. ToolGen: Unified Tool Retrieval and Calling via Generation. arXiv preprint arXiv: 2410.03439, 2024.
    [43] Yuan S, Song K, Chen J, Tan X, Shen Y and Kan R et al. Easytool: Enhancing llm-based agents with concise tool instruction. In: Proceedings of the LLM Agents Workshop at the International Conference on Learning Representations. Vienna, Austria: OpenReview.net, 2024.
    [44] Yu Y Q, Wang Z F, Ma W Z, Guo Z C, Zhan J T and Wang S et al. StepTool: A Step-grained Reinforcement Learning Framework for Tool Learning in LLMs. arXiv preprint arXiv: 2410.07745, 2024.
    [45] Team G, Georgiev P, Lei V I, Burnell R, Bai L B and Gulati A et al. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. arXiv preprint arXiv: 2403.05530, 2024.
    [46] Yang A, Yang B S, Hui B Y, Zheng B, Yu B W and Zhou C et al. Qwen2 Technical Report. arXiv preprint arXiv: 2407.10671, 2024.
    [47] DeepSeek-AI, Zhu Q H, Guo D Y, Shao Z H, Yang D J and Wang P Y et al. DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence. arXiv preprint arXiv: 2406.11931, 2024.
    [48] Farn N and Shin R. ToolTalk: Evaluating Tool-Usage in a Conversational Setting. arXiv preprint arXiv: 2311.10775, 2023.
    [49] Ye J J, Li G Y, Gao S Y, Huang C S, Wu Y L and Li S X et al. ToolEyes: Fine-grained evaluation for tool learning capabilities of large language models in realworld scenarios. In: Proceedings of the 31st International Conference on Computational Linguistics. Abu Dhabi, UAE: Association for Computational Linguistics, 2025. 156–187
    [50] Wang C Y, Luo W X, Chen Q Y, Mai H N, Guo J D and Dong S X et al. MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning. arXiv preprint arXiv: 2401.10727, 2024.
    [51] Ma Y, Gou Z, Hao J, Xu R, Wang S and Pan L et al. SciAgent: Tool-augmented Language Models for Scientific Reasoning. arXiv preprint arXiv: 2402.11451, 2024.
    [52] Wu M S, Zhu T, Han H, Tan C Y, Zhang X and Chen W L. Seal-Tools: Self-instruct tool learning dataset for agent tuning and detailed benchmark. In: Natural Language Processing and Chinese Computing: NLPCC 2024. Springer, 2024. 372–384
    [53] Chen S J, Wang Y B, Wu Y F, Chen Q G, Xu Z and Luo W H et al. Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees. In: Proceedings of the 38th Conference on Neural Information Processing Systems (NeurIPS 2024). New Orleans, LA, USA: NeurIPS Foundation, 2024.
    [54] Huang S, Zhong W, Lu J, Zhu Q, Gao J and Liu W et al. Planning, creation, usage: Benchmarking LLMs for comprehensive tool utilization in real-world complex scenarios. In: Findings of the Association for Computational Linguistics: ACL 2024. Bangkok, Thailand: Association for Computational Linguistics, 2024. 4363–4400
    [55] Wang J Z, Ma Z R, Li Y N, Zhang S Y, Chen C L and Chen K et al. GTA: A benchmark for general tool agents. In: Proceedings of Advances in Neural Information Processing Systems. New Orleans, LA, USA: NeurIPS Foundation, 2024.
    [56] Song Y F, Xiong W M, Zhu D W, Wu W H, Qian H and Song M B et al. RestGPT: Connecting Large Language Models with Real-World RESTful APIs. arXiv preprint arXiv: 2306.06624, 2023.
    [57] Shen Y L, Song K T, Tan X, Zhang W Q, Ren K and Yuan S Y et al. TaskBench: Benchmarking large language models for task automation. In: Proceedings of the 38th Conference on Neural Information Processing Systems. New Orleans, LA, USA: NeurIPS Foundation, 2024.
    [58] Basu K, Abdelaziz I, Chaudhury S, Dan S, Crouse M and Munawar A et al. API-BLEND: A comprehensive corpora for training and benchmarking API LLMs. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Bangkok, Thailand: Association for Computational Linguistics, 2024. 12859–12870
    [59] Wang H, Wang R, Xue B, Xia H, Cao J and Liu Z et al. AppBench: Planning of multiple APIs from various APPs for complex user instruction. In: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Miami, Florida, USA: Association for Computational Linguistics, 2024. 15322–15336
    [60] Wang W X, Shi J L, Wang C Z, Lee C, Yuan Y L and Huang J t et al. Learning to Ask: When LLMs Meet Unclear Instruction. arXiv preprint arXiv: 2409.00557, 2024.
    [61] Ye J, Li S, Li G, Huang C, Gao S and Wu Y et al. ToolSword: Unveiling safety issues of large language models in tool learning across three stages. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Bangkok, Thailand: Association for Computational Linguistics, 2024. 2181–2211
    [62] Ye J, Wu Y, Gao S, Huang C, Li S and Li G et al. RoTBench: A multi-level benchmark for evaluating the robustness of large language models in tool learning. In: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Miami, Florida, USA: Association for Computational Linguistics, 2024. 313–333
    [63] Guo Z C, Cheng S J, Wang H, Liang S H, Qin Y J and Li P et al. StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models. arXiv preprint arXiv: 2403.07714, 2024.
    [64] Zhuang Y C, Yu Y, Wang K, Sun H T and Zhang C. ToolQA: A dataset for LLM question answering with external tools. In: Advances in Neural Information Processing Systems. New Orleans, LA, USA: Curran Associates, Inc., 2023.
    [65] Guo Z, Huang Y and Xiong D. CToolEval: A Chinese benchmark for LLM-powered agent evaluation in real-world API interactions. In: Findings of the Association for Computational Linguistics: ACL 2024. Bangkok, Thailand: Association for Computational Linguistics, 2024. 15711–15724
    [66] Papineni K, Roukos S, Ward T and Zhu W J. Bleu: a Method for Automatic Evaluation of Machine Translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Philadelphia, Pennsylvania, USA: Association for Computational Linguistics, 2002. 311–318
    [67] Lin C Y. ROUGE: A Package for Automatic Evaluation of Summaries. In: Text Summarization Branches Out. Barcelona, Spain: Association for Computational Linguistics, 2004. 74–81
    [68] Bergroth L, Hakonen H and Raita T. A survey of longest common subsequence algorithms. In: Proceedings Seventh International Symposium on String Processing and Information Retrieval. 2000. 39–48
    [69] Liu Y M, Peng X Y, Zhang Y W, Cao J N, Zhang X H and Cheng S et al. Tool-Planner: Dynamic Solution Tree Planning for Large Language Model with Tool Clustering. arXiv preprint arXiv: 2406.03807, 2024.
    [70] Qiao S F, Fang R N, Qiu Z S, Wang X B, Zhang N Y and Jiang Y et al. Benchmarking Agentic Workflow Generation. arXiv preprint arXiv: 2410.07869, 2024.
    [71] OpenMOSS. UnifiedToolHub. GitHub repository, 2025. https://github.com/OpenMOSS/UnifiedToolHub.
    [72] Zhou S Y, Xu F F, Zhu H, Zhou X H, Lo R and Sridhar A et al. Webarena: A realistic web environment for building autonomous agents. In: Proceedings of the 12th International Conference on Learning Representations. Vienna, Austria: OpenReview.net, 2024.
    [73] Kim G W, Baldi P and McAleer S. Language models can solve computer tasks. In: Proceedings of the 37th Conference on Neural Information Processing Systems. New Orleans, LA, USA: Curran Associates, Inc., 2023.
    [74] Liu Y L, Yuan Y L, Wang C W, Han J H, Ma Y Q and Zhang L et al. From Summary to Action: Enhancing Large Language Models for Complex Tasks with Open World APIs. arXiv preprint arXiv: 2402.18157, 2024.
    [75] Liu X, Qin B, Liang D Z, Dong G, Lai H Y and Zhang H C et al. AutoGLM: Autonomous Foundation Agents for GUIs. arXiv preprint arXiv: 2411.00820, 2024.
    [76] Qi Z H, Liu X, Iong I L, Lai H Y, Sun X Q and Zhao W Y et al. WebRL: Training LLM web agents via self-evolving online curriculum reinforcement learning. In: Proceedings of the 13th International Conference on Learning Representations (ICLR 2025). Singapore: OpenReview.net, 2025.
    [77] Wu Q, Liu W, Luan J and Wang B. ToolPlanner: A tool augmented LLM for multi granularity instructions with path planning and feedback. In: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Miami, Florida, USA: Association for Computational Linguistics, 2024. 18315–18339
    [78] Chen K, Cusumano-Towner M, Huval B, Petrenko A, Hamburger J and Koltun V et al. Reinforcement learning for long-horizon interactive llm agents. arXiv preprint arXiv: 2502.01600, 2025.
    [79] Kong Y, Ruan J, Chen Y, Zhang B, Bao T and Shiwei S et al. TPTU-v2: Boosting task planning and tool usage of large language model-based agents in realworld industry systems. In: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track. Miami, Florida, US: Association for Computational Linguistics, 2024. 371–385
    [80] Liu X K, Peng Z Y, Yi X Y, Xie X, Xiang L R and Liu Y C et al. ToolNet: Connecting Large Language Models with Massive Tools via Tool Graph. arXiv preprint arXiv: 2403.00839, 2024.
    [81] Huang W, Abbeel P, Pathak D and Mordatch I. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In: Proceedings of the 39th International Conference on Machine Learning. PMLR, 2022. 9118–9147
    [82] Xu H S, Zhu S, Wang Z H, Zheng H, Ma D and Cao R S et al. Reducing Tool Hallucination via Reliability Alignment. arXiv preprint arXiv: 2412.04141, 2024.
    [83] Xu G W, Jin P, Li H, Song Y B, Sun L C and Yuan L. Llava-cot: Let vision language models reason step-by-step. arXiv preprint arXiv: 2411.10440, 2024.
    [84] Koh J Y, McAleer S, Fried D and Salakhutdinov R. Tree search for language model agents. arXiv preprint arXiv: 2407.01476, 2024.
    [85] Chen P, Bu P, Song J, Gao Y and Zheng B. Can VLMs play action role-playing games? take black myth wukong as a study case. In: Proceedings of NeurIPS 2024 Workshop on Open-World Agents. New Orleans, LA, USA: Curran Associates, Inc., 2024.
    [86] Nakano R I, Hilton J, Balaji S, Wu J, Ouyang L and Kim C et al. WebGPT: Browser-assisted question-answering with human feedback. arXiv preprint arXiv: 2112.09332, 2022.
    [87] Yao S, Chen H, Yang J and Narasimhan K. Webshop: Towards scalable real-world web interaction with grounded language agents. In: Advances in Neural Information Processing Systems. 2022. 28744–28757
    [88] Qiao S F, Fang R N, Zhang N Y, Zhu Y Q, Chen X and Deng S M et al. Agent planning with world knowledge model. In: Proceedings of The Thirty-eighth Annual Conference on Neural Information Processing Systems. New Orleans, LA, USA: Curran Associates, Inc., 2024.
    [89] Cao H, Zhang Y, Feng S, Yang X, Wang D and Zhang Y. TOOL-ED: Enhancing empathetic response generation with the tool calling capability of LLM. In: Proceedings of the 31st International Conference on Computational Linguistics. Abu Dhabi, UAE: Association for Computational Linguistics, 2025. 5305–5320
    [90] Liao Z Y, Mo L B, Xu C J, Kang M T, Zhang J W and Xiao C W et al. EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage. arXiv preprint arXiv: 2409.11295, 2025.
    [91] Chen Z R, Xiang Z, Xiao C W, Song D and Li B. AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases. arXiv preprint arXiv: 2407.12784, 2024.
    [92] Xiang Z, Zheng L Z, Li Y J, Hong J Y, Li Q B and Xie H et al. GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning. arXiv preprint arXiv: 2406.09187, 2024.
    [93] OpenAI, Andrychowicz M, Baker B, Chociej M, Józefowicz R and McGrew B, et al. Learning dexterous in-hand manipulation. The International Journal of Robotics Research, 2020, 39(1): 3−20 doi: 10.1177/0278364919887447
    [94] Kavraki L, Svestka P, Latombe J C and Overmars M. Probabilistic roadmaps for path planning in high-dimensional configuration spaces. IEEE Transactions on Robotics and Automation, 1996, 12(4): 566−580 doi: 10.1109/70.508439
    [95] Shen Z Y, Wilson J P, Harvey R and Gupta S. MRRT: Multiple Rapidly-Exploring Random Trees for Fast Online Replanning in Dynamic Environments. arXiv preprint arXiv: 2104.11059, 2021.
    [96] Liang J, Huang W, Xia F, Xu P, Hausman K and Ichter B et al. Code as policies: Language model programs for embodied control. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA). 2023. 9493–9500
    [97] Ahn M, Brohan A, Brown N, Chebotar Y, Cortes O and David B et al. Do as i can, not as i say: Grounding language in robotic affordances. In: Proceedings of the 6th Conference on Robot Learning (CoRL). 2022. 150–161
    [98] Yu Q J, Huang S Y, Yuan X B, Jiang Z K, Hao C and Li X et al. UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models. arXiv preprint arXiv: 2409.20551, 2024.
    [99] Huang W, Wang C, Zhang R, Li Y, Wu J and Fei-Fei L. Voxposer: Composable 3d value maps for robotic manipulation with language models. In: Proceedings of The 7th Conference on Robot Learning. PMLR, 2023. 540–562
    [100] Huang W L, Wang C, Li Y Z, Zhang R H and Fei L F. Rekep: Spatio-temporal reasoning of relational keypoint constraints for robotic manipulation. In: Proceedings of 2nd CoRL Workshop on Learning Effective Abstractions for Planning. 2024.
    [101] Cai M X, Wang D L, Feng S and Zhang Y F. Pecer: Empathetic response generation via dynamic personality extraction and contextual emotional reasoning. In: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2024. 10631–10635
    [102] Jin Q, Yang Y, Chen Q and Lu Z. GeneGPT: Augmenting large language models with domain tools for improved access to biomedical information. Bioinformatics, 2024, 40(2): ii125−ii134
    [103] Xiao S, Liu Z, Zhang P, Muennighoff N, Lian D and Nie J Y. C-Pack: Packed resources for general chinese embeddings. In: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). New York, NY, USA: Association for Computing Machinery, 2024. 641–649
    [104] Li Z C, Wang J H, Jiang Z S, Mao H Y, Chen Z X and Du J Z et al. Dmqr-rag: Diverse multi-query rewriting for rag. arXiv preprint arXiv: 2411.13154, 2024.
    [105] Xu H S, Zhu S, Wang Z H, Zheng H, Ma D and Cao R S et al. Reducing tool hallucination via reliability alignment. arXiv preprint arXiv: 2412.04141, 2024.
    [106] Mialon G, Dessì R, Lomeli M, Nalmpantis C, Pasunuru R and Raileanu R et al. GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning. arXiv preprint arXiv: 2406.09187, 2024.
    [107] DeepSeek-AI. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv: 2501.12948, 2025.
    [108] Zeng Z Y, Cheng Q Y, Yin Z Y, Wang B, Li S M and Zhou Y H et al. Scaling of search and learning: A roadmap to reproduce o1 from reinforcement learning perspective. arXiv preprint arXiv: 2412.14135, 2024.
  • 加载中
计量
  • 文章访问数:  31
  • HTML全文浏览量:  35
  • 被引次数: 0
出版历程
  • 收稿日期:  2024-12-13
  • 录用日期:  2025-04-22
  • 网络出版日期:  2025-05-14

目录

    /

    返回文章
    返回