基于词语对狄利克雷过程的时序摘要

席耀一; 李弼程; 李天彩; 黄山奇

doi:10.16383/j.aas.2015.c150001

基于词语对狄利克雷过程的时序摘要

doi: 10.16383/j.aas.2015.c150001 cstr: 32138.14.j.aas.2015.c150001

1.
解放军信息工程大学信息系统工程学院郑州 450001;
2.
65022部队沈阳 110162

基金项目:

国家社会科学基金(14BXW028)资助

详细信息

作者简介:
李弼程解放军信息工程大学信息系统工程学院教授.主要研究方向为文本分析与理解,语音处理与识别,图像/视频处理与识别,信息融合.E-mail:lbclm@gmail.com

计量
- 文章访问数: 1795
- HTML全文浏览量: 72
- PDF下载量: 1932
- 被引次数: 0
出版历程
- 收稿日期: 2015-01-04
- 修回日期: 2015-04-08
- 刊出日期: 2015-08-20

Temporal Summarization Based on Biterm Dirichlet Process

1.
Institute of Information System Engineering, PLA Information Engineering University, Zhengzhou 450001;
2.
Unit 65022, Shenyang 110162

Funds:

Supported by National Social Science Foundation of China (14BXW028)

摘要

摘要: 时序摘要是按照时间顺序生成摘要, 对话题的演化发展进行概括. 已有的相关研究忽视或者不能准确发现句子中隐含的子话题信息. 针对该问题, 本文建立了一种新的主题模型, 即词语对狄利克雷过程, 并提出了一种基于该模型的时序摘要生成方法. 首先通过模型推理得到句子的子话题分布; 然后利用该分布计算句子的相关度和新颖度; 最后按时间顺序抽取与话题相关且新颖度高的句子组成时序摘要. 实验结果表明, 本文方法较目前的代表性研究方法生成了更高质量的时序摘要.
- 时序摘要 /
- 狄利克雷过程 /
- 词语对 /
- 主题模型
Abstract: Temporal summarization aims at extracting sentences chronologically to give an overview about the evolution of a topic. Existing researches either neglect the information of latent subtopics, or fail to accurately discover them. In this paper, we develop a novel topic model called biterm Dirichlet process and generate the temporal summary based on it. Firstly, we get the subtopic distribution in each sentence through posterior inference. Secondly, we calculate each sentence's relevance and novelty degree according to its subtopic distribution. Finally, we chronologically extract the sentences which are relevant and novel to generate the temporal summary. Experiments demonstrate the better performance of our approach compared with currently representative methods.
- Temporal summarization /
- Dirichlet process /
- biterm /
- topic model

HTML全文

参考文献(26)

[1]	Yan R, Wan X J, Otterbacher J, Kong L, Li X M, Zhang Y. Evolutionary timeline summarization: a balanced optimization framework via iterative substitution. In: Proceedings of the 34th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Beijing, China: ACM, 2011. 745-754
[2]	Yan R, Kong L, Huang C R, Wan X J, Li X M, Zhang Y. Timeline generation through evolutionary trans-temporal summarization. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. Edinburgh, UK: ACL, 2011. 433-443
[3]	Tran G B, Tran T A, Tran N K. Leveraging learning to rank in an optimization framework for timeline summarization. In: Proceedings of the 36th Annual International ACM SIGIR Workshop on Time-aware Information Access. Dublin, Ireland: ACM, 2013. 433-443
[4]	Chieu H L, Lee Y K. Query based event extraction along a timeline. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Sheffield, UK: ACM, 2004. 425-432
[5]	Xu S Z, Wang S S, Zhang Y. Summarizing complex events: a cross-modal solution of storylines extraction and reconstruction. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Seattle, USA: ACL, 2013. 1281-1291
[6]	Blei D M, Ng A Y, Jordan M I. Latent Dirichlet allocation. Journal of Machine Learning Research, 2003, 3: 993-1022
[7]	Cao Jian-Ping, Wang Hui, Xia You-Qing, Qiao Feng-Cai, Zhang Xin. Bi-path evolution model for online topic model based on LDA. Acta Automatica Sinica, 2014, 40(12): 2877 -2886(曹建平, 王晖, 夏友清, 乔凤才, 张鑫. 基于 LDA 的双通道在线主题演化模型. 自动化学报, 2014, 40(12): 2877-2886)
[8]	Gao D H, Li W J, Zhang R X. Sequential summarization: a new application for timely updated twitter trending topics. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. Sofia, Bulgaria: ACL, 2013. 567-571
[9]	Huang L F, Huang L E. Optimized event storyline generation based on mixture-event-aspect model. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Seattle, USA: ACL, 2013. 726-735
[10]	Li J W, Li S J. Evolutionary hierarchical dirichlet process for timeline summarization. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. Sofia, Bulgaria: ACL, 2013. 556-560
[11]	Yan X H, Guo J F, Lan Y Y, Cheng X Q. A biterm topic model for short texts. In: Proceedings of the 22nd International World Wide Web Conference. Rio de Janeiro, Brazil: ACM, 2013. 1445-1455
[12]	Allan J, Gupta R, Khandelwal V. Temporal summaries of new topics. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New Orleans, USA: ACM, 2001. 10-18
[13]	Lin F R, Liang C H. Storyline-based summarization for news topic retrospection. Decision Support Systems, 2008, 45(3): 473-490
[14]	He Rui-Fang, Qin Bing, Liu Ting, Pan Yue-Qun, Li Sheng. Temporal multi-document summarization based on macro-micro importance discriminative model. Journal of Computer Research and Development, 2009, 46(7): 1184-1191(贺瑞芳, 秦兵, 刘挺, 潘越群, 李生. 基于宏微观重要性判别模型的时序多文档文摘. 计算机研究与发展, 2009, 46(7): 1184-1191)
[15]	Chen C C, Chen M C. TSCAN: a content anatomy approach to temporal topic summarization. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(1): 170-183
[16]	Shou L D, Wang Z H, Chen K, Chen G. Sumblr: continuous summarization of evolving tweet streams. In: Proceedings of the 36th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Dublin, Ireland: ACM, 2013. 533-542
[17]	Olariu A. Efficient online summarization of microblogging streams. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics. Gothenburg, Sweden: ACL, 2013. 236-240
[18]	Olariu A. Hierarchical clustering in improving microblog stream summarization. In: Proceedings of the 14th International Conference on Intelligent Text Processing and Computational Linguistics. Samos, Greece: Springer, 2013. 424- 435
[19]	Zubiaga A, Spina D, Amigó E, Gonzalo J. Towards real-time summarization of scheduled events from twitter streams. In: Proceedings of the 23rd ACM Conference on Hypertext and Social Media. Milwaukee, USA: ACM, 2013. 319-320
[20]	Teh Y W, Jordan M I, Beal M J, Blei D M. Hierarchical dirichlet processes. Journal of the American Statistical Association, 2006, 101(476): 1566-1581
[21]	Griffiths T L, Steyvers M. Finding scientific topics. Proceedings of the National Academy of Science of the United States of America, 2004, 101(Suppl 1): 5228-5235
[22]	Carbonell J, Goldstein J. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Melbourne, Australia: ACM, 1998. 335-336
[23]	Lin C Y, Hovy E. Automatic evaluation of summaries using N-gram co-occurrence statistics. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology. Edmonton, Canada: ACL, 2003. 71-78
[24]	Erkan G, Radev D R. LexRank: graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research, 2004, 22(1): 457-479
[25]	Radev D R, Jing H Y, Stys M, Tam D. Centroid-based summarization of multiple documents. Information Processing and Management, 2004, 40(6): 919-938
[26]	Li P, Wang Y L, Gao W, Jiang J. Generating aspect-oriented multi-document summarization with event-aspect model. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. Edinburgh, UK: ACL, 2011. 1137-1146