A Topic Link Detection Method Based on Improved Information Bottleneck Theory
-
摘要: 话题关联检测的关键任务在于判断给定报道对是否属于同一话题. 现有判断方法往往忽略种子事件与其直接相关事件之间的层次关系.为此,通过分析报道内部语义分布规律及篇章结构,并依据语义分布规则,利用语义分布规律改进信息瓶颈(Information bottleneck,IB)算法,用于子话题逻辑语义单元的划分,并利用这些逻辑语义单元表示报道,进行话题关联检测. 实验证明该方法有较快的收敛速度,并在一定程度上提高了系统性能.Abstract: Topic link detection aims to detect whether two given stories talk about the same topic, whose key task is how to represent the story utilizing a proper model. In the previous works, the hierarchical relationship between seed events and its directly related events is ignored. Thus, this paper analyzes the regular pattern of semantic distribution and the structure of a story, and proposes a method to divide a story into several sections of sub-topic features based on the regular pattern of semantic distribution and improved information bottleneck (IB) theory. Then, the story represented by the attributes is utilized to do topic link detection. Experimental result shows that this method has a fast convergent rate, and can improve the performance of the system.
-
Key words:
- Link detection /
- logical semantic unit /
- information bottleneck (IB) /
- unit features
-
[1] [2] Kumaran G, Allan J. Text classification and named entities for new event detection. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2004. 297-304 [2] [3] Allan J, Carbonell J, Doddington G, Yamron J, Yang Y M. Topic detection and tracking pilot study final report. In: Proceedings of the 1998 Broadcast News Transcription and Understanding Workshop. Lansdowne, Virginia: 1998. 194-218 Morgankaufmampubl: shers, [3] [4] Naptali W, Tsuchiya M, Nakagawa S. Topic-dependent language model with voting on noun history. ACM Transactions on Asian Language Information Processing (TALIP), 2010, 9(2): 1-31 [4] Shi Jing, Fan Meng, Li Wan-Long. Topic analysis based on LDA Model. Acta Automatica Sinica, 2009, 35(12): 1586-1592(石晶, 范猛, 李万龙. 基于LDA模型的主题分析. 自动化学报, 2009, 35(12): 1586-1592) [5] [6] Nallapati R, Feng A, Peng F C, Allan J. Event threading within news topics. In: Proceedings of the 13th ACM Conference on Information and Knowledge Management (CIKM). New York, USA: ACM, 2004. 446-453 [6] [7] Chemudugunta C, Smyth P, Steyvers M. Combining concept hierarchies and statistical topic models. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management. New York, USA: ACM, 2008. 1469-1470 [7] Hong Yu, Zhang Yu, Fan Ji-Li, Liu Ting, Li Sheng. Chinese topic link detection based on semantic domain language model. Journal of Software, 2008, 19(9): 2265-2275(洪宇, 张宇, 范基礼, 刘挺, 李生. 基于语义域语言模型的中文话题关联检测. 软件学报, 2008, 19(9): 2265-2275) [8] [9] Wang L T, Fang L. Story link detection based on event words. In: Proceedings of the 12th International Conference on Computational Linguistics and Intelligent Text Processing. Berlin, Heidelberg: Springer-Verlag, 2011, 6609: 202-211 [9] Hu Yan-Li, Bai Liang, Zhang Wei-Ming. Modeling and analyzing topic evolution. Acta Automatica, 2012, 38(10): 1690-1697 (胡艳丽, 白亮, 张维明. 一种话题演化建模与分析方法. 自动化学报, 2012, 38(10): 1690-1697) [10] Zhu T, Wang B, Wu B, Zhu C X. Topic correlation and individual influence analysis in online forums. Expert Systems with Applications, 2012, 39(4): 4222-4232 [11] Garrido G, Peas A, Cabaleiro B, Rodrigo . Temporally anchored relation extraction. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics, 2012. 107-116 [12] Chambers N. Labeling documents with timestamps: learning from their time expressions. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics, 2012. 98-106 [13] Lakshmi K, Mukherjee S. Using cohesion-model for story link detection system. International Journal of Computer Science and Network Security, 2007, 7(3): 59-66 [14] Zhang K, Zi J, Wu L G. New event detection based on indexing-tree and named entity. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM, 2007. 215-222 [15] Nomoto T. Two-tier similarity model for story link detection. In: Proceedings of the 19th ACM International Conference Information and Knowledge Management. New York, USA: ACM, 2010. 789-798 [16] Zhang Kuo, Li Juan-Zi, Wu Gang, Wang Ke-Hong. Term-committee-based event identification within topics. Journal of Computer Research and Development, 2009, 46(2): 245-252 (张阔, 李涓子, 吴刚, 王克宏. 基于关键词元的话题内事件检测. 计算机研究与发展, 2009, 46(2): 245-252) [17] Tishby N, Pereira F, Bialek W. The information bottleneck method. In: Proceedings of the 37th Annual Allerton Conference on Communication, Control and Computing. Illinois, USA: IEEE, 1999. 368-377 [18] Zhu Zhen-Feng, Ye Dong-Yang, Li Gang. Iterative sIB algorithm based on mutation. Journal of Computer Research and Development, 2007, 44(11): 1832-1838 (朱真峰, 叶东阳, Li Gang. 基于变异的迭代sIB算法. 计算机研究与发展, 2007, 44(11): 1832-1838) [19] Shen Hua-Wei, Cheng Xue-Qi, Chen Hai-Qiang, Liu Yue. Information bottleneck based community detection in network. Chinese Journal of Computers, 2008, 31(4): 677-686(沈华伟, 程学旗, 陈海强, 刘悦. 基于信息瓶颈的社区发现. 计算机学报, 2008, 31(4): 677-686) [20] Du W F, Tan S B, Cheng X Q, Yun X C. Adapting information bottleneck method for automatic construction of domain-oriented sentiment lexicon. In: Proceedings of the 3rd ACM International Conference Web Search and Data Mining. New York, USA: ACM, 2010. 111-120
点击查看大图
计量
- 文章访问数: 1693
- HTML全文浏览量: 101
- PDF下载量: 1115
- 被引次数: 0