课堂教学多智能体交互建模与拟真评估

徐如意; 石金鑫; 陈靓影; 杨宗凯

doi:10.16383/j.aas.c260112

课堂教学多智能体交互建模与拟真评估

doi: 10.16383/j.aas.c260112 cstr: 32138.14.j.aas.c260112

1.
武汉理工大学计算机与人工智能学院武汉 430070
2.
华东师范大学计算机科学与技术学院上海 200062
3.
华中师范大学人工智能与教育学部武汉 430079

基金项目: 国家自然科学基金(62377018), 中国博士后科学基金(2024M762508)资助

详细信息

作者简介:
徐如意：武汉理工大学计算机与人工智能学院助理研究员. 2024年获得华中师范大学博士学位. 主要研究方向为多智能体系统, 智能教育与人机交互. E-mail: ruyi.xu@whut.edu.cn

石金鑫：华东师范大学计算机科学与技术学院博士研究生. 2023年获得华中师范大学硕士学位. 主要研究方向为计算机视觉与多智能体交互建模. E-mail: jinxinshi@stu.ecnu.edu.cn

陈靓影：华中师范大学人工智能与教育学部教授. 2001年获得新加坡南洋理工大学博士学位. 主要研究方向为计算机视觉, 模式识别与人机交互. 本文通信作者. E-mail: chenjy@mail.ccnu.edu.cn

杨宗凯：华中师范大学人工智能与教育学部教授. 1991年获得西安交通大学博士学位. 主要研究方向为人工智能与教育交叉, 教育数字化. E-mail: zkyang@mail.ccnu.edu.cn

计量
- 文章访问数: 252
- HTML全文浏览量: 378
- 被引次数: 0
出版历程
- 收稿日期: 2026-02-11
- 录用日期: 2026-03-26
- 网络出版日期: 2026-04-26

Multi-agent Interactive Modeling and Authenticity Evaluation for Classroom Teaching

1.
School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan 430070
2.
School of Computer Science and Technology, East China Normal University, Shanghai 200062
3.
Faculty of Artificial Intelligence in Education, Central China Normal University, Wuhan 430079

Funds: Supported by National Natural Science Foundation of China (62377018) and China Postdoctoral Science Foundation (2024M762508)

More Information

Author Bio:
XU Ru-Yi　Assistant Researcher at the School of Computer Science and Artificial Intelligence, Wuhan University of Technology. He received his Ph.D. degree from Central China Normal University in 2024. His research interests include multi-agent systems, intelligent education, and human-computer interaction

SHI Jin-Xin　Ph.D. candidate at the School of Computer Science and Technology, East China Normal University. He received his master degree from Central China Normal University in 2023. His research interests include computer vision and multi-agent interactive modeling

CHEN Jing-Ying　Professor at the Faculty of Artificial Intelligence in Education, Central China Normal University. She received her Ph.D. degree from Nanyang Technological University, Singapore in 2001. Her research interests include computer vision, pattern recognition, and human--computer interaction. Corresponding author of this paper

YANG Zong-Kai　Professor at the Faculty of Artificial Intelligence in Education, Central China Normal University. He received his Ph.D. degree from Xi＇an Jiaotong University in 1991. His research interests include the intersection of artificial intelligence and education and educational digitalization

摘要

摘要: 随着大语言模型的发展, 多智能体虚拟课堂正成为低风险教学实验与策略验证的重要工具. 然而, 现有方法往往忽视真实课堂中的话语结构、学生潜在状态与同伴交互机制, 缺乏对教学互动真实性及干预效果的系统建模与评估. 为此, 提出IRF-Smi框架: 以发起–应答–反馈话语链条约束教学对话, 结合第一视角潜在状态建模与小世界社交网络, 刻画师生行为的动态演化及同伴影响. 同时构建教学互动真实性评测基准, 并采用Pearson相关系数、组内相关系数及平均绝对误差对模拟结果进行量化评估. 在50节K-12课堂数据上的实验表明, IRF-Smi相比AutoGen与MetaGPT在师生行为分布一致性方面表现更优; 此外, 游戏化教学策略带来显著收益, 验证了该框架用于教学机制研究与智能体行为验证的潜力.
- 多智能体系统 /
- 课堂教学模拟 /
- IRF话语结构 /
- 潜在状态建模 /
- 行为拟真评估
Abstract: With the development of large language models, multi-agent virtual classrooms are becoming an important tool for low-risk teaching experiments and strategy validation. However, existing methods often neglect the discourse structure, student latent states, and peer interaction mechanisms in real classrooms, lacking systematic modeling and evaluation of teaching interaction authenticity and intervention effects. To this end, the IRF-Smi framework is proposed: it constrains teaching dialogues using the initiation-response-feedback discourse chain, and incorporates first-person latent state modeling and small-world social networks to model the dynamic evolution of teacher-student behaviors and peer influence. A benchmark for teaching interaction authenticity is constructed, and the simulation results are quantitatively evaluated using Pearson correlation coefficient, intraclass correlation coefficient, and mean absolute error Experiments on 50 K-12 classroom sessions show that IRF-Smi achieves better consistency in teacher-student behavior distributions than AutoGen and MetaGPT. Moreover, gamified teaching strategies yield significant gains, demonstrating its potential for teaching mechanism research and agent behavior validation.
- multi-agent systems /
- classroom teaching simulation /
- IRF discourse structure /
- latent state modeling /
- behavioral authenticity evaluation
注释:

1) 1¹https://github.com/SumnerLab/TalkMoves

2) 2https://github.com/huggingface/peft

HTML全文

图 1 IRF话语结构及典型的多智能体对话模式

Fig. 1 IRF discourse structure and typical multi-agent dialogue patterns

下载: 全尺寸图片幻灯片

图 2 IRF-Smi生成的虚拟课堂交互流程

Fig. 2 Virtual classroom interaction process generated by IRF-Smi

下载: 全尺寸图片幻灯片

图 3 IRF-Smi中的角色智能体结构示意图

Fig. 3 Schematic diagram of role agent architecture in IRF-Smi

下载: 全尺寸图片幻灯片

图 4 TalkMoves数据集中教学对话行为的类别及其定义

Fig. 4 Categories and definitions of teaching dialogue behaviors in the TalkMoves dataset

下载: 全尺寸图片幻灯片

图 5 与真实课堂在6年级课程《三角函数与单位圆理解》中的话语行为分布对比

Fig. 5 Comparison of discourse behavior distributions between IRF-Smi and real classrooms in the Grade 6 lesson 《Understanding trigonometric functions and the unit circle》

下载: 全尺寸图片幻灯片

图 6 5名6年级学生在10节课程中的PD数值

Fig. 6 PD values of five Grade 6 students across 10 lessons

下载: 全尺寸图片幻灯片

图 7 Leo在模拟6年级课程《三角函数与单位圆理解》中的注意力与情绪状态变化(教师智能体在$ t=6$时引入游戏化教学策略)

Fig. 7 Changes in Leo's attention and emotional states during the simulated Grade 6 lesson 《Understanding trigonometric functions and the unit circle》(The teacher agent introduces a gamified teaching strategy at $ t=6$)

下载: 全尺寸图片幻灯片

图 8 6年级课程《三角函数与单位圆理解》中两种邻接结构下4名学生的注意力分布

Fig. 8 Attention distributions of four students under two adjacency structures in the Grade 6 lesson 《Understanding trigonometric functions and the unit circle》

下载: 全尺寸图片幻灯片

B1 Sarah Lin角色配置提示词

B1 Prompt for Sarah Lin＇s Role Configuration

下载: 全尺寸图片幻灯片

B2 Alex Wang角色配置提示词

B2 Prompt for Alex Wang＇s Role Configuration

下载: 全尺寸图片幻灯片

B3 Leo角色配置提示词

B3 Prompt for Leo＇s Role Configuration

下载: 全尺寸图片幻灯片

B4 Emily角色配置提示词

B4 Prompt for Emily＇s Role Configuration

下载: 全尺寸图片幻灯片

B5 Jason角色配置提示词

B5 Prompt for Jason＇s Role Configuration

下载: 全尺寸图片幻灯片

B6 Sophia Liu角色配置提示词

B6 Prompt for Sophia Liu＇s Role Configuration

下载: 全尺寸图片幻灯片

表 1 教学行为拟真性评测基准上IRF-Smi与其他方法的对比

Table 1 Comparison of IRF-Smi with other methods on the teaching behavior authenticity evaluation benchmark

指标	角色	方法	4年级	5年级	6年级	MS	HS
PCC	教师	AutoGen	0.577 8	0.606 0	0.602 9	0.589 7	0.598 1
		MetaGPT	0.605 0	0.621 7	0.626 9	0.626 3	0.629 1
		IRF-Smi	0.643 1	0.678 6	0.680 5	0.660 7	0.693 8
	学生	AutoGen	0.570 1	0.591 7	0.567 6	0.569 6	0.572 5
		MetaGPT	0.588 6	0.581 4	0.597 6	0.595 8	0.587 0
		IRF-Smi	0.636 3	0.676 5	0.679 2	0.637 5	0.681 5
ICC	教师	AutoGen	0.564 4	0.562 7	0.561 4	0.554 5	0.546 4
		MetaGPT	0.584 0	0.567 6	0.576 1	0.574 5	0.581 6
		IRF-Smi	0.645 7	0.662 1	0.671 5	0.643 7	0.669 9
	学生	AutoGen	0.579 5	0.589 6	0.584 1	0.586 5	0.586 7
		MetaGPT	0.581 7	0.603 9	0.597 4	0.605 6	0.578 6
		IRF-Smi	0.630 9	0.675 3	0.676 1	0.647 3	0.666 3
MAE	教师	AutoGen	0.107 1	0.133 0	0.128 6	0.138 4	0.137 4
		MetaGPT	0.109 1	0.128 4	0.132 8	0.142 0	0.135 0
		IRF-Smi	0.100 1	0.119 0	0.124 5	0.124 8	0.127 3
	学生	AutoGen	0.132 4	0.124 9	0.118 6	0.121 6	0.148 9
		MetaGPT	0.126 3	0.122 3	0.123 8	0.121 0	0.144 3
		IRF-Smi	0.107 4	0.109 8	0.106 3	0.113 9	0.132 5

下载: 导出CSV

表 2 IRF-Smi核心组件消融实验结果

Table 2 Ablation experimental results of IRF-Smi core components

变体	教师 PCC	教师 ICC	教师 MAE	学生 PCC	学生 ICC	学生 MAE
IRF-Smi	0.643 1	0.645 7	0.100 1	0.636 3	0.630 9	0.107 4
w/o IRF	0.621 8	0.602 7	0.131 6	0.610 4	0.607 9	0.126 8
w/o First-Person	0.648 9	0.631 5	0.123 8	0.628 7	0.621 6	0.121 7
w/o Small-World	0.653 2	0.638 4	0.122 6	0.641 1	0.636 8	0.118 9

下载: 导出CSV

表 3 课堂前后知识掌握变化(正确题数/10)

Table 3 Changes in knowledge acquisition before and after class (number of correct answers /10)

模型	时段	Sophia Liu	Alex Wang	Jason	Emily	Leo
GPT-4o	课前	7	7	8	7	7
GPT-4o	课后	9	10	10	7	7
LLaMA3-7B	课前	6	5	6	5	6
LLaMA3-7B	课后	10	9	9	6	6

下载: 导出CSV

表 4 不同规模与连接密度下的计算开销对比

Table 4 Computational cost comparison under different scales and connection densities

设置	时间/IRF (s)	token/IRF	token/学生/IRF
5人, $ k=1 $	224	620	34.7
5人, $ k=2 $	257	935	119.2
100人, $ k=1 $	249	1461	37.5
100人, $ k=2 $	261	2192	126.6

下载: 导出CSV

A1 TalkMoves自动标注提示词(LLaMA3-8B LoRA微调)

A1 Prompt for TalkMoves automatic annotation (LLaMA3-8B with LoRA fine-tuning)

# 角色
你是一个课堂语言分析系统. 任务是将每一句课堂话语归类到最合适的TalkMoves类别. 请结合发言者角色(教师/学生)及下方定义判断其交际意图, 并输出正确标签.
# 输入格式
话语: < 课堂中教师或学生说出的一句话>
发言者角色: < 教师或学生>
# 可选标签
## 教师话语行为
1. 无明显话语行为: 一般性陈述或离题表达, 无法归入以下类别.
2. 保持全班共同参与: 引导学生积极倾听, 并将注意力指向同伴观点.
3. 促使学生关联同伴观点: 提示学生回应或评价同学的贡献.
4. 复述: 原样或近似重复学生的话语内容.
5. 强调准确性: 要求学生使用更准确的数学表述或规范语言.
6. 重述/转述: 对学生观点进行改写或轻微扩展后再表达.
7. 追问推理: 鼓励学生解释理由、提供证据, 或建立概念之间的联系.
## 学生话语行为
1. 无明显话语行为: 一般性陈述或离题表达.
2. 关联同伴观点: 提及、评论或质疑同学观点.
3. 请求更多信息: 表达困惑、请求澄清或寻求帮助.
4. 提出结论/陈述: 给出事实性数学陈述或解题步骤.
5. 提供证据/推理: 解释思路、给出论证或推导过程.
# 示例
[输入]话语: Okay someone to tell me how do we write five tenths, Regina.　发言者角色: < 教师>
[输出]标签: 强调准确性
[输入]话语: Wait hang on, I meant Conrad was right.　发言者角色: < 学生>
[输出] 标签: 关联同伴观点

下载: 导出CSV

参考文献(40)

[1]	郑逸宁, 余镇, 李不凡, 杨捷, 殷林琪, 印张悦, 等. 大语言模型的工具使用综述. 自动化学报, 2025, 51(11): 2371−2386 doi: 10.16383/j.aas.c240793 Zheng Yi-Ning, Yu Zhen, Li Bu-Fan, Yang Jie, Yin Lin-Qi, Yin Zhang-Yue, et al. Survey of tool use in large language models. Acta Automatica Sinica, 2025, 51(11): 2371−2386 doi: 10.16383/j.aas.c240793
[2]	Stahl M, Biermann L, Nehring A, Wachsmuth H. Exploring LLM prompting strategies for joint essay scoring and feedback generation. In: Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications. Mexico City, Mexico: ACL, 2024. 283–298
[3]	Joshi I, Budhiraja R, Dev H, Kadia J, Ataullah M O, Mitra S, et al. ChatGPT in the classroom: An analysis of its strengths and weaknesses for solving undergraduate computer science questions. In: Proceedings of the 55th ACM Technical Symposium on Computer Science Education. New York, USA: ACM, 2024. 625–631
[4]	罗飙, 胡天猛, 周宇豪. 多智能体强化学习控制与决策研究综述. 自动化学报, 2025, 51(3): 510−539 doi: 10.16383/j.aas.c240392 Luo Biao, Hu Tian-Meng, Zhou Yu-Hao. Survey on multi-agent reinforcement learning for control and decision-making. Acta Automatica Sinica, 2025, 51(3): 510−539 doi: 10.16383/j.aas.c240392
[5]	Yue M, Lyu W, Mifdal W, Suh J, Zhang Y, Yao Z. MathVC: An LLM-simulated multi-character virtual classroom for mathematics education. arXiv preprint arXiv: 2404.06711, 2024.
[6]	Gherghel C, Yasuda S, Kita Y. Interaction during online classes fosters engagement with learning and self-directed study both in the first and second years of the COVID-19 pandemic. Computers & Education, 2023, 200: Article No. 104795 doi: 10.1016/j.compedu.2023.104795
[7]	Zhang Z, Zhang-Li D, Yu J, Gong L, Zhou J, Hao Z, et al. Simulating classroom education with LLM-empowered agents. In: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies. Albuquerque, USA: ACL, 2025. 10364–10379
[8]	Rustandi A. An analysis of IRF (initiation-response-feedback) on classroom interaction in EFL speaking class. EduLite: Journal of English Education, Literature and Culture, 2017, 2(1): 239−250 doi: 10.30659/e.2.1.239-250
[9]	Xiao Y, He Q, Veldkamp B, Liu H. Exploring latent states of problem-solving competence using hidden Markov model on process data. Journal of Computer Assisted Learning, 2021, 37(5): 1232−1247 doi: 10.1111/jcal.12559
[10]	俞文武, 杨晓亚, 李海昌, 王瑞, 胡晓惠. 面向多智能体协作的注意力意图与交流学习方法. 自动化学报, 2023, 49(11): 2311−2325 doi: 10.16383/j.aas.c210430 Yu Wen-Wu, Yang Xiao-Ya, Li Hai-Chang, Wang Rui, Hu Xiao-Hui. Attentional intention and communication for multi-agent learning. Acta Automatica Sinica, 2023, 49(11): 2311−2325 doi: 10.16383/j.aas.c210430
[11]	Desmarais M C, Baker R S J D. A review of recent advances in learner and skill modeling in intelligent learning environments. User Modeling and User-Adapted Interaction, 2012, 22: 9−38 doi: 10.1007/s11257-011-9106-8
[12]	陈世明, 化俞新, 祝振敏, 赖强. 邻域交互结构优化的多智能体快速蜂拥控制算法. 自动化学报, 2015, 41(12): 2092−2099 Chen Shi-Ming, Hua Yu-Xin, Zhu Zhen-Min, Lai Qiang. Fast flocking algorithm for multi-agent systems by optimizing local interactive topology. Acta Automatica Sinica, 2015, 41(12): 2092−2099
[13]	Weeden K A, Cornwell B. The small-world network of college classes: Implications for epidemic spread on a university campus. Sociological Science, 2020, 7: 222−241 doi: 10.15195/v7.a9
[14]	Song H F, Wang X J. Simple, distance-dependent formulation of the Watts-Strogatz model for directed and undirected small-world networks. Physical Review E, 2014, 90(6): Article No. 062801 doi: 10.1103/physreve.90.062801
[15]	Suresh A, Jacobs J, Harty C, Perkoff M, Martin J H, Sumner T. The TalkMoves dataset: K-12 mathematics lesson transcripts annotated for teacher and student discursive moves. In: Proceedings of the Thirteenth Language Resources and Evaluation Conference. Marseille, France: ELRA, 2022. 4654–4662
[16]	Liu Z, Zhu Z, Zhu L, Jiang E, Hu X, Peppler K A, et al. ClassMeta: Designing interactive virtual classmate to promote VR classroom participation. In: Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems. New York, USA: ACM, 2024. 1–17
[17]	Xu S, Wen H N, Pan H, Dominguez D, Hu D, Zhang X. Classroom simulacra: Building contextual student generative agents in online education for learning behavioral simulation. In: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems. New York, USA: ACM, 2025. 1–26
[18]	Shi Y, Liang R, Xu Y. EducationQ: Evaluating LLMs' teaching capabilities through multi-agent dialogue framework. arXiv preprint arXiv: 2504.14928, 2025.
[19]	Scarlatos A, Baker R S, Lan A. Exploring knowledge tracing in tutor-student dialogues using LLMs. In: Proceedings of the 15th International Learning Analytics and Knowledge Conference. New York, USA: ACM, 2025. 249–259
[20]	Wang R, Zhang Q, Robinson C, Loeb S, Demszky D. Bridging the novice-expert gap via models of decision-making: A case study on remediating math mistakes. In: Proceedings of NAACL 2024. Mexico City, Mexico: ACL, 2024. 2174–2199
[21]	Wan Y, Wu J, Abdulhai M, Shani L, Jaques N. Enhancing personalized multi-turn dialogue with curiosity reward. arXiv preprint arXiv: 2504.03206, 2025.
[22]	Kodama T, Kiyomaru H, Huang Y J, Kurohashi S. RecomMind: Movie recommendation dialogue with seeker's internal state. In: Proceedings of the Second Workshop on Social Influence in Conversations. Miami, USA: ACL, 2024. 46–63
[23]	Hridi A P, Hoq M, Gao Z, Lynch C, Sahay R, Hosseinalipour S, et al. Privacy-preserving distributed link predictions among peers in online classrooms using federated learning. arXiv preprint arXiv: 2504.10456, 2025.
[24]	Balaban I, Filipović D, Zlatović M. Post hoc identification of student groups: Combining user modeling with cluster analysis. Education and Information Technologies, 2023, 28(6): 7265−7290 doi: 10.1007/s10639-022-11468-9
[25]	Tu Q, Fan S, Tian Z, Yan R. CharacterEval: A Chinese benchmark for role-playing conversational agent evaluation. arXiv preprint arXiv: 2401, 2024.
[26]	Wu B, Sun K, Bai Z, Li Y, Wang B. RAIDEN Benchmark: Evaluating role-playing conversational agents with measurement-driven custom dialogues. In: Proceedings of the 31st International Conference on Computational Linguistics. Abu Dhabi, UAE: ACL, 2025. 11086–11106
[27]	Ofri O, Tabach M. Overt and covert participation in an argumentative whole-class discussion: Spread of ideas about quadratic functions. International Journal of Science and Mathematics Education, 2025, 23(3): 639−661 doi: 10.1007/s10763-024-10488-w
[28]	Mu S, Cui M, Huang X. Multimodal data fusion in learning analytics: A systematic review. Sensors, 2020, 20(23): Article No. 6856 doi: 10.3390/s20236856
[29]	Anyon J. Social class and the hidden curriculum of work. Childhood Socialization. London: Routledge, 2017. 369–394
[30]	Alharbi K, Cristea A I, Shi L, Tymms P, Brown C. Agent-based simulation of the classroom environment to gauge the effect of inattentive or disruptive students. In: Proceedings of the 17th International Conference on Intelligent Tutoring Systems. Cham: Springer, 2021. 211–223
[31]	Apicella A, Arpaia P, Frosolone M, Improta G, Moccaldi N, Pollastro A. EEG-based measurement system for monitoring student engagement in learning 4.0. Scientific Reports, 2022, 12(1): Article No. 5857 doi: 10.1038/s41598-022-09578-y
[32]	Li Q, Ren Y, Wei T, Wang C, Liu Z, Yue J. A learning attention monitoring system via photoplethysmogram using wearable wrist devices. Artificial Intelligence Supported Educational Technologies. Cham: Springer, 2020. 133–150
[33]	Qiu J, Tang J, Ma H, Dong Y, Wang K, Tang J. DeepInf: Social influence prediction with deep learning. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York, USA: ACM, 2018. 2110–2119
[34]	Smirnov I, Thurner S. Formation of homophily in academic performance: Students change their friends rather than performance. PLoS ONE, 2017, 12(8): Article No. e0183473 doi: 10.1371/journal.pone.0183473
[35]	McGraw K O, Wong S P. Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1996, 1(1): 30−46 doi: 10.1037/1082-989x.1.1.30
[36]	Grattafiori A, Dubey A, Jauhri A, Pandey A, Kadian A, Al-Dahle A, et al. The Llama 3 herd of models. arXiv preprint arXiv: 2407.21783, 2024.
[37]	Hu E J, Shen Y, Wallis P, Allen-Zhu Z, Li Y, Wang S, et al. LoRA: Low-rank adaptation of large language models. In: Proceedings of the 10th International Conference on Learning Representations. Virtual Event: ICLR, 2022.
[38]	Wu Q, Bansal G, Zhang J, Wu Y, Li B, Zhu E, et al. AutoGen: Enabling next-gen LLM applications via multi-agent conversation. In: Proceedings of the ICLR 2024 Workshop on Large Language Model (LLM) Agents. Vienna, Austria: ICLR, 2024.
[39]	Hong S, Zhuge M, Chen J, Zheng X, Cheng Y, Wang J, et al. MetaGPT: Meta programming for a multi-agent collaborative framework. In: Proceedings of the 12th International Conference on Learning Representations. Vienna, Austria: ICLR, 2024.
[40]	Hurst A, Lerer A, Goucher A P, Perelman A, Ramesh A, Clark A, et al. GPT-4o system card. arXiv preprint arXiv: 2410.21276, 2024.