-
摘要: 情感分析作为自然语言处理领域的核心任务之一, 面临着如何精准捕捉细粒度情感特征以及提升模型可解释性的双重挑战. 为此, 提出一种基于混合专家 (MoE) 模型的可扩展情感分析框架, 通过将门控机制融入专家内部, 设计可在任意预训练语言模型中扩展的混合专家模块. 该框架旨在以可控的计算开销扩展模型容量, 促进细粒度条件计算和专家专业化. 在三个典型情感分析数据集上的综合实验表明, 与基线模型相比, 本方法在关键指标上均取得显著提升, 尤其在处理复杂多分类任务时, 其性能已达到甚至超过主流参数高效微调大语言模型的水平. 更重要的是, 得益于稀疏激活机制, 模型在保持高性能的同时, 展现出卓越的推理效率. 通过对专家激活模式和输出表征的深入分析, 清晰地观察到不同专家针对特定语义模式形成了功能专精, 为模型决策提供了直观且有力的可解释性证据, 验证了该框架在构建高效、高性能且可信赖的情感分析系统中的巨大潜力.Abstract: As one of the core tasks in natural language processing, sentiment analysis faces dual challenges: accurately capturing fine-grained emotional features and enhancing model interpretability. To address these issues, we propose a scalable sentiment analysis framework based on a Mixture-of-Experts (MoE) architecture. By integrating a gating mechanism into the expert modules, we design a hybrid expert component that can be seamlessly incorporated into any pretrained language model. The framework aims to expand model capacity with controllable computational overhead, thereby enabling fine-grained conditional computation and expert specialization. Comprehensive experiments on three representative sentiment analysis benchmarks demonstrate that, compared with baseline models, our approach achieves significant improvements across key metrics. Notably, when handling complex multi-classification tasks, its performance rivals or even surpasses mainstream large language models that have undergone parameter-efficient fine-tuning. More importantly, benefiting from the sparse activation mechanism, the model maintains high performance while exhibiting exceptional inference efficiency. Through an in-depth analysis of expert activation patterns and output representations, we clearly observe that different experts develop functional specialization toward specific semantic patterns, providing intuitive and strong interpretability evidence for model decision-making. These findings validate the great potential of the proposed framework in building efficient, high-performance, and trustworthy sentiment analysis systems.
-
Key words:
- sentiment analysis /
- mixture of experts /
- interpretability /
- fine-grained feature capture /
- scalability
-
表 1 情感分析数据集统计信息
Table 1 Statistics of sentiment analysis datasets
统计项 IMDb TweetEval Emotion SST-5 训练集样本数 25000 3260 8540 验证集样本数 – 374 1100 测试集样本数 25000 1420 2210 类别数 2 4 5 表 2 不同数据集上基线模型与 MoE 增强模型之间的性能比较
Table 2 Performance comparison between baseline and MoE-enhanced models on different datasets
数据集 模型结构 Accuracy Precision Recall F1 IMDb (二分类) 基线 0.9543 0.9543 0.9543 0.9543 MoE 0.9565 0.9567 0.9565 0.9565 Llama-3-8B-Instruct + Zero-shot 0.9333 0.9164 0.9534 0.9346 Mistral-7B-Instruct + Zero-shot 0.9048 0.9751 0.8303 0.8969 Qwen2.5-7B-Instruct + Zero-shot 0.9424 0.9589 0.9240 0.9411 Llama-3-8B-Instruct + LoRA 0.9692 0.9660 0.9726 0.9693 Mistral-7B-Instruct + LoRA 0.9743 0.9724 0.9763 0.9743 Qwen2.5-7B-Instruct + LoRA 0.9659 0.9641 0.9678 0.9660 TweetEval (四分类) 基线 0.8191 0.8187 0.8191 0.8189 MoE 0.8325 0.8338 0.8325 0.8327 Llama-3-8B-Instruct + Zero-shot 0.7570 0.7846 0.6611 0.6910 Mistral-7B-Instruct + Zero-shot 0.7720 0.7265 0.7433 0.7296 Qwen2.5-7B-Instruct + Zero-shot 0.7735 0.7416 0.7226 0.7265 Llama-3-8B-Instruct + LoRA 0.8248 0.8018 0.7978 0.7986 Mistral-7B-Instruct + LoRA 0.8381 0.8177 0.7902 0.8019 Qwen2.5-7B-Instruct + LoRA 0.7994 0.7974 0.7878 0.7928 SST-5 (五分类) 基线 0.5452 0.5621 0.5452 0.5464 MoE 0.5805 0.5788 0.5805 0.5785 Llama-3-8B-Instruct + Zero-shot 0.3898 0.2697 0.3821 0.2495 Mistral-7B-Instruct + Zero-shot 0.4760 0.3922 0.4033 0.3531 Qwen2.5-7B-Instruct + Zero-shot 0.4841 0.4095 0.4322 0.3952 Llama-3-8B-Instruct + LoRA 0.5498 0.5568 0.5515 0.5541 Mistral-7B-Instruct + LoRA 0.6158 0.6148 0.5864 0.5888 Qwen2.5-7B-Instruct + LoRA 0.5398 0.5520 0.5429 0.5474 表 3 跨数据集和模型的性能与资源比较
Table 3 Performance and resource comparison across datasets and models
数据集 基座模型 方法 F1 Train (M) Total (M) Throughput GFLOPs Peak Mem (MB) IMDB Llama-3-8B-Instruct LoRA 0.9693 41.94 4582.54 2.997 4346.11 14955.8 Zero-shot 0.9346 – 4582.54 9.926 4037.17 – Mistral-7B-Instruct LoRA 0.9743 41.94 3800.31 2.239 4746.59 11248.3 Zero-shot 0.8969 – 3800.31 10.077 4327.97 – Qwen2.5-7B-Instruct LoRA 0.9660 40.37 4393.34 2.997 3634.03 19273.1 Zero-shot 0.9411 – 4393.34 8.187 3724.80 – RoBERTa BitFit 0.9036 0.10 124.65 873.470 45.90 1326.3 Full FT 0.9378 124.65 124.65 796.689 45.90 2702.8 LoRA 0.9310 0.89 125.53 659.302 46.05 1512.8 P-Tuning 0.7966 0.61 125.25 857.970 49.69 1439.2 MoE 0.9426 172.42 172.42 900.251 41.56 7429.5 TweetEval Llama-3-8B-Instruct LoRA 0.8083 41.94 4582.54 2.353 1171.26 11418.3 Zero-shot 0.6910 – 4582.54 50.868 1159.01 – Mistral-7B-Instruct LoRA 0.8019 41.94 3800.31 2.214 1280.33 11241.9 Zero-shot 0.7296 – 3800.31 48.024 1266.92 – Qwen2.5-7B-Instruct LoRA 0.8013 40.37 4393.34 3.124 1016.73 14166.8 Zero-shot 0.7265 – 4393.34 42.518 1010.61 – RoBERTa BitFit 0.1410 0.10 124.65 882.097 45.90 1326.3 Full FT 0.7962 124.65 124.65 835.678 45.90 2701.5 LoRA 0.5932 0.89 125.54 628.113 46.05 1512.8 P-Tuning 0.1410 0.61 125.26 889.742 49.69 1439.3 MoE 0.8039 200.74 200.74 890.958 42.65 8152.4 SST-5 Llama-3-8B-Instruct LoRA 0.5692 41.94 4582.54 2.385 1218.74 12301.7 Zero-shot 0.2495 – 4582.54 48.552 1225.63 – Mistral-7B-Instruct LoRA 0.5888 41.94 3800.31 2.204 1381.64 11578.2 Zero-shot 0.3531 – 3800.31 44.868 1378.78 – Qwen2.5-7B-Instruct LoRA 0.5649 40.37 4393.34 3.214 1093.97 15825.3 Zero-shot 0.3952 – 4393.34 40.558 1061.08 – RoBERTa BitFit 0.0870 0.10 124.65 888.373 45.90 1326.3 Full FT 0.5432 124.65 124.65 853.694 45.90 2699.3 LoRA 0.4805 0.89 125.54 651.461 46.05 1512.8 P-Tuning 0.1385 0.61 125.26 860.703 49.69 1439.3 MoE 0.5532 172.42 172.42 910.685 41.93 8404.0 表 4 不同数据集上基线与引入 MoE 后模型的性能对比结果
Table 4 Performance comparison between baseline and MoE-enhanced models on different datasets
数据集 模型结构 Accuracy Precision Recall F1 Score IMDb 普通FFN 0.9551 0.9552 0.9551 0.9551 门控专家 0.9565 0.9567 0.9565 0.9565 TweetEval 普通FFN 0.8220 0.8222 0.8220 0.8215 门控专家 0.8325 0.8338 0.8325 0.8327 SST-5 普通FFN 0.5683 0.5677 0.5683 0.5666 门控专家 0.5805 0.5788 0.5805 0.5785 -
[1] Bordoloi M, Biswas S K. Sentimentanalysis: A survey on design framework, applications and future scopes. Artifical Intelligence Review, 2023(56): 12505 doi: 10.1007/s10462-023-10442-2 [2] 郑治豪, 吴文兵, 陈鑫, 胡荣鑫, 柳鑫, 王璞. 基于社交媒体大数据的交通感知分析系统. 自动化学报, 2018, 44(4): 656−666ZHENG Zhi-Hao, WU Wen-Bing, CHEN Xin, HU Rong-Xin, LIU Xin, WANG Pu. A Traffic Sensing and Analyzing System Using Social Media Data. ACTA AUTOMATICA SINICA, 2018, 44(4): 656−666 [3] 王会东, 李兆东, 姚金丽, 余德淦. 基于对称三角模糊集的股票投资者情绪传播模型. 自动化学报, 2020, 46(5): 1031−1043 doi: 10.16383/j.aas.c190437Wang Hui-Dong, Li Zhao-Dong, Yao Jin-Li, Yu De-Gan. Sentimental propagation model of stock investors based on symmetric triangular fuzzy set. Acta Automatica Sinica, 2020, 46(5): 1031−1043 doi: 10.16383/j.aas.c190437 [4] 何欣润, 李毅轩, 傅中正, 伍冬睿, 黄剑. 多标签情感计算中的TSK模糊系统与域适应方法研究. 自动化学报, 2025, 51(7): 1546−1561He Xin-Run, Li Yi-Xuan, Fu Zhong-Zheng, Wu Dong-Rui, Huang Jian. A study of TSK fuzzy system and domain adaptation method in multi-label affective computing. Acta Automatica Sinica, 2025, 51(7): 1546−1561 [5] BACCIANELLA S, ESULI A, SEBASTIANI F. Senti-WordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: Proceedings of International Conference on Language Resources and Evaluation (LREC 2010). Malta: LREC, 2010. 83-90 [6] 栗雨晴, 礼欣, 韩煦, 宋丹丹, 廖乐健. 基于双语词典的微博多类情感分析方法. 电子学报, 2016, 44(9): 2068−2073 doi: 10.3969/j.issn.0372-2112.2016.09.007LI Yu-qing, LI Xin, HAN Xu, SONG Dan-dan, LIAO Le-jian. A Bilingual Lexicon-Based Multi-class Semantic Orientation Analysis for Microblogs. Acta Electronica Sinica, 2016, 44(9): 2068−2073 doi: 10.3969/j.issn.0372-2112.2016.09.007 [7] 赵妍妍, 秦兵, 石秋慧, 刘挺. 大规模情感词典的构建及其在情感分类中的应用. 中文信息学报, 2017, 31(2): 187−193ZHAO Yan-yan, QIN Bing, SHI Qiu-hui, LIU Ting. Large-scale Sentiment Lexicon Collection and Its Application in Sentiment Classification. Journal of Chinese Information Processing, 2017, 31(2): 187−193 [8] 杨爽, 陈芬. 基于 SVM 多特征融合的微博情感多级分类研究. 数据分析与知识发现, 2017, 1(2): 73−79Yang Shuang, Chen Fen. Analyzing Sentiments of Micro-blog Posts Based on Support Vector Machine. Data Analysis and Knowledge Discovery, 2017, 1(2): 73−79 [9] Li Jun, Rao Yanghui, Jin Fengmei, Chen Huijun, Xiang Xiyun. Multi-label maximum entropy model for social emotion classification over short text. Neurocomputing, 2016, 210: 247−256 doi: 10.1016/j.neucom.2016.03.088 [10] Aaqib Iqbal Alaie, Umar Farooq, Wakeel Ahmad Bhat, Surinder Singh Khurana, and Parvinder Singh. An Empirical Study on Sentimental Drug Review Analysis Using Lexicon and Machine Learning-Based Techniques. SN Computer Science. 2024, 5: 63, 1-14 https://doi.org/10.1007/s42979-023-02384-x [11] 王科, 夏睿. 情感词典自动构建方法综述. 自动化学报, 2016, 42(4): 495−511 doi: 10.16383/j.aas.2016.c150585WANG Ke, XIA Rui. A Survey on Automatical Construction Methods of Sentiment Lexicons. ACTA AUTOMATICA SINICA, 2016, 42(4): 495−511 doi: 10.16383/j.aas.2016.c150585 [12] Lai Yuni, Zhang Linfeng, Han Donghong, Zhou Rui, Wang Guoren. Fine-grained emotion classification of Chinese microblogs based on graph convolution networks. World Wide Web, 2020(23): 2771−2787 doi: 10.1007/s11280-020-00803-0 [13] Chen L, Varoquaux G, What Is the Role of Small Models in the LLM Era: A Survey. arXiv: 2409.06857, 2025. [14] Rezapour M. Emotion Detection with Transformers: A Comparative Study. arXiv: 2403.15454, 2024. [15] Dario Di Palma, Alessandro De Bellis, Giovanni Servedio, Vito Walter Anelli, Fedelucio Narducci, and Tommaso Di Noia. LLaMAs Have Feelings Too: Unveiling Sentiment and Emotion Representations in LLaMA Models Through Probing. In: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vienna, Austria. Association for Computational Linguistics, 2025. 6124–6142 [16] Chen K, Wang S, Ben H, Tang S, Hao Y. Mixture of Multimodal Adapters for Sentiment Analysis. In: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), Albuquerque, New Mexico. Association for Computational Linguistics, 2025. 1822–1833 [17] Cai W, Jiang J, Wang F, Tang J, Kim S, and Huang J. A Survey on Mixture of Experts in Large Language Models. IEEE Transactions on Knowledge and Data Engineering, 2024(37): 3896−3915 doi: 10.1109/tkde.2025.3554028/mm1 [18] Mu S, Lin S. A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications. arXiv: 2503.07137, 2025. [19] Nnamdi J, Dimitri V, Amar S. Improving Deep Learning Performance with Mixture of Experts and Sparse Activation. Preprints 2025, 2025030611. https://doi.org/10.20944/preprints202503.0611.v1 [20] Huy Nguyen, Nhat Ho, and Alessandro Rinaldo. On least square estimation in softmax gating mixture of experts. In: Proceedings of the 41st International Conference on Machine Learning (ICML'24), Vol. 235. JMLR.org, Article 1532, 37707–37735. [21] Wang K, Shen W, Yang Y, Quan X, Wang R. Relational Graph Attention Network for Aspect-Based Sentiment Analysis. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, 2020. 3229–3238 [22] Talaat A S. Sentiment Analysis Classification System Using Hybrid BERT Models. Journal of Big Data, 2023, 10(1), 110: 1-18 [23] Krishnamoorthy A, Sundhar K A, Naveen Kumar V, arthik V. Analyzing Sentiments: A Comprehensive Study of Roberta-Based Sentiment Analysis on Twitters. In: 2024 4th International Conference on Advancement in Electronics & Communication Engineering (AECE), GHAZIABAD, India, 2024. 626-630 [24] Cai Weilin, Jiang Juyong, Wang Fan, Jing Tang, Sunghun Kim and Jiayi Huang. A Survey on Mixture of Experts in Large Language Models. IEEE Transactions on Knowledge and Data Engineering, 2024(37): 3896−3915 doi: 10.1109/tkde.2025.3554028/mm1 [25] Shazeer N, Mirhoseini A, Maziarz K. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017 [26] Fedus W, Zoph B, Shazeer N. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity. Journal of Machine Learning Research, 2022, 23(120): 1−39 [27] Du N, Huang Y, Dai A M. GLaM: Efficient Scaling of Language Models with Mixture-of-Experts. In: International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA; Proceedings of Machine Learning Research, Vol.162, 2022. 5547–5569 [28] Zhu T, Qu X, Dong D, Ruan J, Tong J, He C, Cheng Y. LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-Training, arXiv: 2406.16554, 2024. [29] Tairin S, Mahmud S, Shen H, Iyer A. eMoE: Task-Aware Memory Efficient Mixture-of-Experts-Based (MoE) Model Inference. arXiv: 2503.06823, 2025. [30] Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv: 1907.11692, 2019. -
计量
- 文章访问数: 6
- HTML全文浏览量: 3
- 被引次数: 0
下载: