Temporal Summarization Based on Biterm Dirichlet Process
-
摘要: 时序摘要是按照时间顺序生成摘要, 对话题的演化发展进行概括. 已有的相关研究忽视或者不能准确发现句子中隐含的子话题信息. 针对该问题, 本文建立了一种新的主题模型, 即词语对狄利克雷过程, 并提出了一种基于该模型的时序摘要生成方法. 首先通过模型推理得到句子的子话题分布; 然后利用该分布计算句子的相关度和新颖度; 最后按时间顺序抽取与话题相关且新颖度高的句子组成时序摘要. 实验结果表明, 本文方法较目前的代表性研究方法生成了更高质量的时序摘要.Abstract: Temporal summarization aims at extracting sentences chronologically to give an overview about the evolution of a topic. Existing researches either neglect the information of latent subtopics, or fail to accurately discover them. In this paper, we develop a novel topic model called biterm Dirichlet process and generate the temporal summary based on it. Firstly, we get the subtopic distribution in each sentence through posterior inference. Secondly, we calculate each sentence's relevance and novelty degree according to its subtopic distribution. Finally, we chronologically extract the sentences which are relevant and novel to generate the temporal summary. Experiments demonstrate the better performance of our approach compared with currently representative methods.
-
Key words:
- Temporal summarization /
- Dirichlet process /
- biterm /
- topic model
-
[1] Yan R, Wan X J, Otterbacher J, Kong L, Li X M, Zhang Y. Evolutionary timeline summarization: a balanced optimization framework via iterative substitution. In: Proceedings of the 34th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Beijing, China: ACM, 2011. 745-754 [2] Yan R, Kong L, Huang C R, Wan X J, Li X M, Zhang Y. Timeline generation through evolutionary trans-temporal summarization. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. Edinburgh, UK: ACL, 2011. 433-443 [3] Tran G B, Tran T A, Tran N K. Leveraging learning to rank in an optimization framework for timeline summarization. In: Proceedings of the 36th Annual International ACM SIGIR Workshop on Time-aware Information Access. Dublin, Ireland: ACM, 2013. 433-443 [4] Chieu H L, Lee Y K. Query based event extraction along a timeline. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Sheffield, UK: ACM, 2004. 425-432 [5] Xu S Z, Wang S S, Zhang Y. Summarizing complex events: a cross-modal solution of storylines extraction and reconstruction. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Seattle, USA: ACL, 2013. 1281-1291 [6] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet allocation. Journal of Machine Learning Research, 2003, 3: 993-1022 [7] Cao Jian-Ping, Wang Hui, Xia You-Qing, Qiao Feng-Cai, Zhang Xin. Bi-path evolution model for online topic model based on LDA. Acta Automatica Sinica, 2014, 40(12): 2877 -2886(曹建平, 王晖, 夏友清, 乔凤才, 张鑫. 基于 LDA 的双通道在线主题演化模型. 自动化学报, 2014, 40(12): 2877-2886) [8] Gao D H, Li W J, Zhang R X. Sequential summarization: a new application for timely updated twitter trending topics. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. Sofia, Bulgaria: ACL, 2013. 567-571 [9] Huang L F, Huang L E. Optimized event storyline generation based on mixture-event-aspect model. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Seattle, USA: ACL, 2013. 726-735 [10] Li J W, Li S J. Evolutionary hierarchical dirichlet process for timeline summarization. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. Sofia, Bulgaria: ACL, 2013. 556-560 [11] Yan X H, Guo J F, Lan Y Y, Cheng X Q. A biterm topic model for short texts. In: Proceedings of the 22nd International World Wide Web Conference. Rio de Janeiro, Brazil: ACM, 2013. 1445-1455 [12] Allan J, Gupta R, Khandelwal V. Temporal summaries of new topics. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New Orleans, USA: ACM, 2001. 10-18 [13] Lin F R, Liang C H. Storyline-based summarization for news topic retrospection. Decision Support Systems, 2008, 45(3): 473-490 [14] He Rui-Fang, Qin Bing, Liu Ting, Pan Yue-Qun, Li Sheng. Temporal multi-document summarization based on macro-micro importance discriminative model. Journal of Computer Research and Development, 2009, 46(7): 1184-1191(贺瑞芳, 秦兵, 刘挺, 潘越群, 李生. 基于宏微观重要性判别模型的时序多文档文摘. 计算机研究与发展, 2009, 46(7): 1184-1191) [15] Chen C C, Chen M C. TSCAN: a content anatomy approach to temporal topic summarization. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(1): 170-183 [16] Shou L D, Wang Z H, Chen K, Chen G. Sumblr: continuous summarization of evolving tweet streams. In: Proceedings of the 36th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Dublin, Ireland: ACM, 2013. 533-542 [17] Olariu A. Efficient online summarization of microblogging streams. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics. Gothenburg, Sweden: ACL, 2013. 236-240 [18] Olariu A. Hierarchical clustering in improving microblog stream summarization. In: Proceedings of the 14th International Conference on Intelligent Text Processing and Computational Linguistics. Samos, Greece: Springer, 2013. 424- 435 [19] Zubiaga A, Spina D, Amigó E, Gonzalo J. Towards real-time summarization of scheduled events from twitter streams. In: Proceedings of the 23rd ACM Conference on Hypertext and Social Media. Milwaukee, USA: ACM, 2013. 319-320 [20] Teh Y W, Jordan M I, Beal M J, Blei D M. Hierarchical dirichlet processes. Journal of the American Statistical Association, 2006, 101(476): 1566-1581 [21] Griffiths T L, Steyvers M. Finding scientific topics. Proceedings of the National Academy of Science of the United States of America, 2004, 101(Suppl 1): 5228-5235 [22] Carbonell J, Goldstein J. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Melbourne, Australia: ACM, 1998. 335-336 [23] Lin C Y, Hovy E. Automatic evaluation of summaries using N-gram co-occurrence statistics. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology. Edmonton, Canada: ACL, 2003. 71-78 [24] Erkan G, Radev D R. LexRank: graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research, 2004, 22(1): 457-479 [25] Radev D R, Jing H Y, Stys M, Tam D. Centroid-based summarization of multiple documents. Information Processing and Management, 2004, 40(6): 919-938 [26] Li P, Wang Y L, Gao W, Jiang J. Generating aspect-oriented multi-document summarization with event-aspect model. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. Edinburgh, UK: ACL, 2011. 1137-1146
点击查看大图
计量
- 文章访问数: 1616
- HTML全文浏览量: 66
- PDF下载量: 1917
- 被引次数: 0