Word Sense Disambiguation with Graph Model Based on Domain Knowledge
-
摘要: 对领域知识挖掘利用的充分与否,直接影响到面向特定领域的词义消歧(Word sense disambiguation, WSD)的性能.本文提出一种基于领域知识的图模型词义消歧方法,该方法充分挖掘领域知识,为目标领域收集文本领域关联词作为文本领域知识,为目标歧义词的各个词义获取词义领域标注作为词义领域知识;利用文本领域关联词和句子上下文词构建消歧图,并根据词义领域知识对消歧图进行调整;使用改进的图评分方法对消歧图的各个词义结点的重要度进行评分,选择正确的词义.该方法能有效地将领域知识整合到图模型中,在Koeling数据集上,取得了同类研究的最佳消歧效果.本文亦对多种图模型评分方法做了改进,进行了详细的对比实验研究.Abstract: Whether domain knowledge is fully utilized would impact the performance of word sense disambiguation (WSD) on a specific domain. A WSD method with graph model based on domain knowledge is proposed in the paper. The method makes full use of domain knowledge: first, the keywords related with target text domain are collected as text domain knowledge, and domain annotations of each sense of target ambiguous word are obtained as sense domain knowledge; second, a disambiguation graph is constructed with text domain knowledge and sentence context words; thirdly, the disambiguation graph is adjusted based on sense domain knowledge; finally, the sense nodes in the graph are scored with an improved evaluation method to judge the right sense. This WSD method effectively integrates domain knowledge with graph model. Evaluation is performed on Koeling dataset. Compared with similar methods, the WSD method yields state-of-the-art performance. Besides, multiple graph evaluation models are improved and compared in detail.
-
Key words:
- Word sense disambiguation (WSD) /
- domain information /
- graph model /
- sense domain /
- text domain
-
[1] Navigli R. Word sense disambiguation: a survey. ACM Computing Surveys, 2009, 41(2): 1011-1069 [2] Liu Yu-Peng, Li Sheng, Zhao Tie-Jun. System combination based on WSD using WordNet. Acta Automatica Sinica, 2010, 36(11): 1575-1580(刘宇鹏, 李生, 赵铁军. 基于WordNet 词义消歧的系统融合. 自动化学报, 2010, 36(11): 1575-1580) [3] Lu Zhi-Mao, Liu Ting, Li Sheng. The research progress of statistical word sense disambiguation. Acta Electronica Sinica, 2006, 34(2): 333-343(卢志茂, 刘挺, 李生. 统计词义消歧的研究进展. 电子学报, 2006, 34(2): 333-343) [4] Wang Bo, Yang Mu-Yun, Li Sheng, Zhao Tie-Jun. Evaluation of all-words WSD for Chinese in machine translation. Acta Automatica Sinica, 2008, 34(5): 535-541(王博, 杨沐昀, 李生, 赵铁军. 中文全词消歧在机器翻译系统中的性能评测. 自动化学报, 2008, 34(5): 535-541) [5] Wang Rui-Qin, Kong Fan-Sheng. Research on unsupervised word sense disambiguation. Journal of Software, 2009, 20(8): 2138-2152(王瑞琴, 孔繁胜. 无监督词义消歧研究. 软件学报, 2009, 20(8): 2138-2152) [6] Lu Zhi-Mao, Liu Ting, Li Sheng. Full-words automatic word sense tagging based on unsupervised learning algorithm. Acta Automatica Sinica, 2006, 32(2): 228-236(卢志茂, 刘挺, 李生. 基于无指导机器学习的全文词义自动标注方法. 自动化学报, 2006, 32(2): 228-236) [7] Agirre E, de Lacalle O L, Soroa A. Knowledge-based WSD and specific domains: performing better than generic supervised WSD. In: Proceedings of the 2009 International Joint Conference on Artificial Intelligence 2009. Pasadena, USA: Morgan Kaufmann Publishers Inc, 2009. 1501-1506 [8] Magnini B, Strapparava C, Pezzulo G, Gliozzo A. The role of domain information in word sense disambiguation. Natural Language Engineering, 2002, 8(4): 359-373 [9] Navigli R, Ponzetto S P. BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence, 2012, 193: 217-250 [10] Stevenson M, Agirre E, Soroa A. Exploiting domain information for word sense disambiguation of medical documents. Journal of the American Medical Informatics Association, 2011, 19(2): 235-240 [11] Agirre E, de Lacalle O L, Fellbaum C, Hsieh S K, Tesconi M, Monachini M, Vossen P, Seqers R. SemEval-2010 task 17: all-words word sense disambiguation on a specific domain. In: Proceedings of the 2009 NAACL HLT Workshop on Semantic Evaluations: Recent Achievements and Future Directions. Boulder, Colorado: Association for Computational Linguistics, 2009. 123-128 [12] Agirre E, Soroa A. Personalizing PageRank for word sense disambiguation. In: Proceedings of the 12th Conference of the European Chapter of the ACL. Stroudsburg: Association for Computational Linguistics, 2009. 33-41 [13] Mihalcea R, Tarau P, Figa E. PageRank on semantic networks, with application to word sense disambiguation. In: Proceedings of the 20th International Conference on Computational Linguistics (COLING 2004). Stroudsburg: Association for Computational Linguistics, 2004. Article no. 1126, DOI: 10.3115/1220355.1220517 [14] Koeling R, Macarthy D, Carroll J. Domain-specific sense distributions and predominant sense acquisition. In: Proceedings of the 2005 Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP). Stroudsburg: Association for Computational Linguistics, 2005. 419-426 [15] Gale W A, Church K W, Yarowsky D. One sense per discourse. In: Proceedings of the 4th DARPA Workshop on Speech and Natural Language Processing. Stroudsburg, USA: Association for Computational Linguistics, 1992. 233 -237 [16] McCarthy D, Koeling R, Weeds J, Carroll J. Unsupervised acquisition of predominant word senses. Computational Linguistics, 2007, 33(4): 553-590 [17] Agirre E, de Lacalle O L. Supervised domain adaption for WSD. In: Proceedings of the 12th Conference of the European Chapter of the ACL. Athens, Greece: Association for Computational Linguistics, 2009. 42-50 [18] Chan Y S, Ng H T. Domain adaptation with active learning for word sense disambiguation. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. Prague, Czech Republic: Association for Computational Linguistics, 2007. 49-56 [19] Zhong Z, Ng H T, Chan Y S. Word sense disambiguation using OntoNotes: an empirical study. In: Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing. Stoudsburg, PA: Association for Computational Linguistics, 2008. 1002-1010 [20] Aitor S, Eneko A, Oier L L, Monica M, Jessie L, Shu K H. Kyoto: an integrated system for specific domain WSD. In: Proceedings of the 5th International Workshop on Semantic Evaluation. Uppsala, Sweden: Association for Computational Linguistics, 2010. 417-420 [21] Reddy S, Inumella A, McCarthy D, Stevenson M. IIITH: domain specific word sense disambiguation. In: Proceedings of the 5th International Workshop on Semantic Evaluation. Stoudsburg, PA: Association for Computational Linguistics, 2010. 387-391 [22] Galley M, McKeown K. Improving word sense disambiguation in lexical chaining. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI 2003). San Francisco, CA: Morgan Kaufmann Publishers Inc., 2003. 1486-1488 [23] Chen Wen-Liang, Zhu Jing-Bo, Zhu Mu-Hua, Yao Tian-Shun. Text representation using domain dictionary. Journal of Computer Research and Development, 2005, 42(12): 2155 -2160(陈文亮, 朱靖波, 朱慕华, 姚天顺. 基于领域词典的文本特征表示. 计算机研究与发展, 2005, 42(12): 2155-2160) [24] Jin P, McCarthy D, Koeling R, Carroll J. Estimating and exploiting the entropy of sense distributions. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Boulder, Colorado: Association for Computational Linguistics, 2009. 233-236 [25] Liu Peng-Yuan, Zhao Tie-Jun. Unsupervised translation disambiguation based on Web indirect association of bilingual word. Journal of Software, 2010, 21(4): 575-585(刘鹏远, 赵铁军. 基于双语词汇Web间接关联的无指导译文消歧. 软件学报, 2010, 21(4): 575-585) [26] Navigli R, Lapata M. An experimental study of graph connectivity for unsupervised word sense disambiguation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(4): 678-692
点击查看大图
计量
- 文章访问数: 1775
- HTML全文浏览量: 88
- PDF下载量: 1800
- 被引次数: 0