2.765

2022影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

主题关键词信息融合的中文生成式自动摘要研究

侯丽微 胡珀 曹雯琳

侯丽微, 胡珀, 曹雯琳. 主题关键词信息融合的中文生成式自动摘要研究. 自动化学报, 2019, 45(3): 530-539. doi: 10.16383/j.aas.c170617
引用本文: 侯丽微, 胡珀, 曹雯琳. 主题关键词信息融合的中文生成式自动摘要研究. 自动化学报, 2019, 45(3): 530-539. doi: 10.16383/j.aas.c170617
HOU Li-Wei, HU Po, CAO Wen-Lin. Automatic Chinese Abstractive Summarization With Topical Keywords Fusion. ACTA AUTOMATICA SINICA, 2019, 45(3): 530-539. doi: 10.16383/j.aas.c170617
Citation: HOU Li-Wei, HU Po, CAO Wen-Lin. Automatic Chinese Abstractive Summarization With Topical Keywords Fusion. ACTA AUTOMATICA SINICA, 2019, 45(3): 530-539. doi: 10.16383/j.aas.c170617

主题关键词信息融合的中文生成式自动摘要研究

doi: 10.16383/j.aas.c170617
基金项目: 

中央高校基本科研业务费项目 CCNU18TS044

国家自然科学基金 61402191

中央高校基本科研业务费项目 CCNU16JYKX15

国家语委“十三五”科研规划项目 WT135-11

详细信息
    作者简介:

    侯丽微  华中师范大学计算机学院硕士研究生.主要研究方向为自然语言处理.E-mail:houliwei@mails.ccnu.edu.cn

    曹雯琳  华中师范大学计算机学院硕士研究生.主要研究方向为自然语言处理.E-mail:caowenlin@mails.ccnu.edu.cn

    通讯作者:

    胡珀  华中师范大学计算机学院副教授.主要研究方向为自然语言处理, 机器学习, 本文通信作者.E-mail:phu@mail.ccnu.edu.cn

Automatic Chinese Abstractive Summarization With Topical Keywords Fusion

Funds: 

Fundamental Research Funds for the Central Universities CCNU18TS044

Supported by National Natural Science Foundation of China 61402191

Fundamental Research Funds for the Central Universities CCNU16JYKX15

Thirteen Five-year Research Planning Project of National Language Committee WT135-11

More Information
    Author Bio:

    Master student at the School of Computer Science, Central China Normal University. Her main research interest is natural language processing

    Master student at the School of Computer Science, Central China Normal University. Her main research interest is natural language processing

    Corresponding author: HU Po Associate professor at the School of Computer Science, Central China Normal University. His research interest covers natural language processing and machine learning. Corresponding author of this paper
  • 摘要: 随着大数据和人工智能技术的迅猛发展,传统自动文摘研究正朝着从抽取式摘要到生成式摘要的方向演化,从中达到生成更高质量的自然流畅的文摘的目的.近年来,深度学习技术逐渐被应用于生成式摘要研究中,其中基于注意力机制的序列到序列模型已成为应用最广泛的模型之一,尤其在句子级摘要生成任务(如新闻标题生成、句子压缩等)中取得了显著的效果.然而,现有基于神经网络的生成式摘要模型绝大多数将注意力均匀分配到文本的所有内容中,而对其中蕴含的重要主题信息并没有细致区分.鉴于此,本文提出了一种新的融入主题关键词信息的多注意力序列到序列模型,通过联合注意力机制将文本中主题下重要的一些关键词语的信息与文本语义信息综合起来实现对摘要的引导生成.在NLPCC 2017的中文单文档摘要评测数据集上的实验结果验证了所提方法的有效性和先进性.
    1)  本文责任编委 赵铁军
  • 图  1  序列到序列模型

    Fig.  1  The sequence-to-sequence model

    图  2  注意力机制

    Fig.  2  The attention mechanism

    图  3  主题关键词信息融合的多注意力序列到序列模型

    Fig.  3  The multi-attention sequence-to-sequence model based on keywords information

    表  1  摘要评价结果

    Table  1  The results of summaries

    方法ROUGE-1ROUGE-2ROUGE-3ROUGE-4ROUGE-L
    LexPageRank0.236340.108840.058920.038800.17578
    MEAD0.286740.148720.087610.061240.22365
    Submodular0.297040.152830.089170.062540.21668
    UniAttention0.337520.200670.132150.101780.29462
    NLP_ONE0.349830.211810.144900.112660.30686
    pointer-generator0.360220.219780.147830.114580.29888
    本文模型 0.37667 0.24077 0.16665 0.12914 0.32886
    下载: 导出CSV

    表  2  生成摘要对比示例

    Table  2  The examples of the generative summaries

    对比模型生成结果
    标准摘要:昨天下午, 山西平遥县6名儿童结伴滑冰玩耍时, 不慎溺水身亡, 其中年龄最大的11岁, 最小的为5岁.
    UniAttention:今日下午, 山西平遥县发生一起溺水事件, 6名儿童玩耍不慎溺水身亡.
    NLP_ONE:今晨, 山西平遥县发生意外溺水身亡, 最小为5岁, 最小为5岁, 最小为5岁.
    pointer-generator:快讯:平遥县发生一起意外溺水事件, 已致1死1伤, [UNK]最小的岁, 最小为5岁(图)
    本文模型:组图: 平遥县6名儿童结伴滑冰玩耍不慎落水, 其中年龄最大的11岁, 最小的为5岁, 最小的为5岁.
    标准摘要:石嘴山市发布雷电黄色预警:预计未来6小时, 石嘴山市有雷阵雨, 并伴有雷电和短时强降水.提请相关部门和人员做好防范工作$\cdots$
    UniAttention:石嘴山市发布雷电黄色预警:预计未来6小时, 石嘴山市有雷阵雨, 并伴有雷电和短时强降水, $\cdots$
    NLP_ONE:石嘴山市发布雷电黄色预警:预计未来6小时, 石嘴山市有雷阵雨, 并伴有雷电和短时强降水.提请相关部门和人员做好防范工作
    pointer-generator:石嘴山市发布雷电黄色预警:预计未来6小时石嘴山市有雷阵雨, 并伴有雷电和短时强降水.提请相关部门和人员做好防范工作. $\cdots$ ($\cdots$)
    本文模型: 石嘴山市发布雷电黄色预警:预计未来6小时, 石嘴山市有雷电和短时强降水.提请相关部门和人员做好防范工作$\cdots$
    标准摘要:今日上午10时, 郑州一个汽车销售点停车场发生火灾, 70余辆新车被烧毁. (图)
    UniAttention:郑州: 70余辆汽车结算着火, 70余辆汽车被烧毁, 70余辆汽车被烧毁.
    NLP_ONE:组图:郑州一汽车购点停车场起火, 包括名媛43辆, 包括名媛43辆, 或因电线破损打火, 火星沉地面干草中所致.
    pointer-generator:组图:郑州一汽车销售点停车场失火, 70余辆汽车被烧毁, 70余辆汽车被烧毁, 郑州一汽车销售点停车场失火(图)
    本文模型:组图:今日10时许, 郑州一汽车销售时停车场失火, 70余辆汽车被烧毁.
    标准摘要:呼和浩特原市委副书记, 市长汤爱军涉嫌受贿犯罪被立案侦查, 案件侦查工作正在进行中.
    UniAttention:滁州市原市委书记汤爱军涉嫌受贿, 受贿犯罪被立案侦查, 并采取强制措施.
    NLP_ONE:原海东工业园区原市委书记江山涉嫌受贿犯罪被立案侦查, 并采取强制措施.
    pointer-generator:中纪委网站消息, 呼和浩特原市委副书记, 管委会常务副主任, 管委会市长张丽菊等四名厅级干部受贿犯罪立案侦查, 被证监会涉嫌滥用职权, 受贿犯罪被逮捕.
    本文模型: 呼和浩特原市委书记江山涉嫌受贿犯罪, 已被立案侦查 案件侦查正在进行中.
    标准摘要:今日6时50分许, 西宁城北区一12岁男孩坠楼身亡; 3小时后, 城中区一46岁男子坠楼身亡.
    UniAttention:西宁: 12岁男孩从21层楼坠下身亡, 警方正在对两人死因展开调查.
    NLP_ONE:今晨6时50分, 城北区一12岁男孩坠楼身亡(图).
    pointer-generator:西宁一12岁男孩坠楼身亡, 一名12岁男孩城中区小区14号楼坠楼者死因展开调查; 此前12岁男孩20岁男生是从20层的家中坠落.
    本文模型:组图: 今晨6时50分许, 城北区民惠城内12岁男孩坠楼身亡, 仅3小时后, 其车速3小时后坠楼身亡.
    标准摘要:达州一煤矿发生瓦斯爆炸事故4人被困井下, 1人受伤, 相关部门正在全力救援被困人员.
    UniAttention:组图:达州茶园煤矿发生爆炸事故, 造成4人被困井下, 伤者已送救援人员.
    NLP_ONE:今日下午发生瓦斯爆炸事故, 致4人被困井下, 1人被困井下, 无生命危险.
    pointer-generator:成都:境内境内境内茶园煤矿生产系统工程瓦斯爆炸事故, 造成4人被困井下, 1人被困井下, 1人受伤, 1人受伤(图)
    本文模型:组图: 达川发生瓦斯爆炸事故, 4人被困井下, 1人受伤, 伤者已送达州医院救治.
    注:粗体是本文模型与标准摘要可完全匹配的词
    下载: 导出CSV
  • [1] 陈伟宏, 安吉尧, 李仁发, 李万里.深度学习认知计算综述.自动化学报, 2017, 43(11):1886-1897 http://www.aas.net.cn/CN/abstract/abstract19164.shtml

    Chen Wei-Hong, An Ji-Yao, Li Ren-Fa, Li Wan-Li. Review on deep-learning-based cognitive computing. Acta Automatica Sinica, 2017, 43(11):1886-1897 http://www.aas.net.cn/CN/abstract/abstract19164.shtml
    [2] 奚雪峰, 周国栋.面向自然语言处理的深度学习研究.自动化学报, 2016, 42(10):1445-1465 http://www.aas.net.cn/CN/abstract/abstract18934.shtml

    Xi Xue-Feng, Zhou Guo-Dong. A survey on deep learning for natural language processing. Acta Automatica Sinica, 2016, 42(10):1445-1465 http://www.aas.net.cn/CN/abstract/abstract18934.shtml
    [3] 刘康, 张元哲, 纪国良, 来斯惟, 赵军.基于表示学习的知识库问答研究进展与展望.自动化学报, 2016, 42(6):807-818 http://www.aas.net.cn/CN/abstract/abstract18872.shtml

    Liu Kang, Zhang Yuan-Zhe, Ji Guo-Liang, Lai Si-Wei, Zhao Jun. Representation learning for question answering over knowledge base:an overview. Acta Automatica Sinica, 2016, 42(6):807-818 http://www.aas.net.cn/CN/abstract/abstract18872.shtml
    [4] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. arXiv preprint, arXiv:1409.0473, 2014.
    [5] Radev D R, Jing H Y, Styś M, Tam D. Centroid-based summarization of multiple documents. Information Processing & Management, 2004, 40(6):919-938 http://d.old.wanfangdata.com.cn/OAPaper/oai_arXiv.org_cs%2f0005020
    [6] Erkan G, Radev D R. LexPageRank:prestige in multi-document text summarization. In:Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Barcelona, Spain:DBLP, 2004. 365-371
    [7] Wan X J, Yang J W, Xiao J G. Manifold-ranking based topic-focused multi-document summarization. In:Proceedings of the 20th International Joint Conference on Artificial Intelligence. Hyderabad, India:Morgan Kaufmann Publishers Inc, 2007. 2903-2908 https://www.researchgate.net/publication/220815775_Manifold-Ranking_Based_Topic-Focused_Multi-Document_Summarization
    [8] 纪文倩, 李舟军, 巢文涵, 陈小明.一种基于LexRank算法的改进的自动文摘系统.计算机科学, 2010, 37(5):151-154 doi: 10.3969/j.issn.1002-137X.2010.05.036

    Ji Wen-Qian, Li Zhou-Jun, Chao Wen-Han, Chen Xiao-Ming. Automatic abstracting system based on improved lexRank algorithm. Computer Science, 2010, 37(5):151-154 doi: 10.3969/j.issn.1002-137X.2010.05.036
    [9] Titov I, McDonald R. A joint model of text and aspect ratings for sentiment summarization. In:Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics. Columbus, Ohio, USA:Association for Computational Linguistics, 2008. 308-316
    [10] Hirao T, Yoshida Y, Nishino M, Yasuda N, Nagata M. Single-document summarization as a tree knapsack problem. In:Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Seattle, Washington, USA:Association for Computational Linguistics, 2013. 1515-1520
    [11] Li S J, Ouyang Y, Wang W, Sun B. Multi-document summarization using support vector regression. In:Proceedings of the 2007 Document Understanding Workshop (Presented at the HLT/NAACL). Rochester, New York, USA, 2007.
    [12] Nishikawa H, Arita K, Tanaka K, Hirao T, Makino T, Matsuo Y. Learning to generate coherent summary with discriminative hidden Semi-Markov model. In:Proceedings of the 25th International Conference on Computation Linguistics. Dublin, Ireland:Association for Computational Linguistics, 2014. 1648-1659
    [13] Gillick D, Favre B. A scalable global model for summarization. In:Proceedings of the 2009 NAACL HLT Workshop on Integer Linear Programming for Natural Language Processing. Boulder, Colorado, USA:Association for Computational Linguistics, 2009. 10-18 https://www.researchgate.net/publication/253504417_A_Scalable_Global_Model_for_Summarization
    [14] Li J X, Li L, Li T. Multi-document summarization via submodularity. Applied Intelligence, 2012, 37(3):420-430 doi: 10.1007/s10489-012-0336-1
    [15] Lin H, Bilmes J. Multi-document summarization via budgeted maximization of submodular functions. In:Proceedings of the 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Los Angeles, CA, USA:Association for Computational Linguistics, 2010. 912-920 https://www.researchgate.net/publication/220817197_Multi-document_Summarization_via_Budgeted_Maximization_of_Submodular_Functions
    [16] 任昭春, 马军, 陈竹敏.基于动态主题建模的Web论坛文档摘要.计算机研究与发展, 2012, 49(11):2359-2367 http://d.old.wanfangdata.com.cn/Periodical/jsjyjyfz201211012

    Ren Zhao-Chun, Ma Jun, Chen Zhu-Min. Web forum thread summarization based on dynamic topic modeling. Journal of Computer Research and Development, 2012, 49(11):2359-2367 http://d.old.wanfangdata.com.cn/Periodical/jsjyjyfz201211012
    [17] 莫鹏, 胡珀, 黄湘冀, 何婷婷.基于超图的文本摘要与关键词协同抽取研究.中文信息学报, 2015, 29(6):135-140 doi: 10.3969/j.issn.1003-0077.2015.06.018

    Mo Peng, Hu Po, Huang Xiang-Ji, He Ting-Ting. A hypergraph based approach to collaborative text summarization and keyword extraction. Journal of Chinese Information Processing, 2015, 29(6):135-140 doi: 10.3969/j.issn.1003-0077.2015.06.018
    [18] 彭帝超, 刘琳, 陈广宇, 陈海东, 左伍衡, 陈为.一种新的视频摘要可视化算法.计算机研究与发展, 2013, 50(2):371-378 http://d.old.wanfangdata.com.cn/Periodical/jsjyjyfz201302016

    Peng Di-Chao, Liu Lin, Chen Guang-Yu, Chen Hai-Dong, Zuo Wu-Heng, Chen Wei. A novel approach for abstractive video visualization. Journal of Computer Research and Development, 2013, 50(2):371-378 http://d.old.wanfangdata.com.cn/Periodical/jsjyjyfz201302016
    [19] Rush A M, Chopra S, Weston J. A neural attention model for abstractive sentence summarization. In:Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon, Portugal:Association for Computational Linguistics, 2015. 379-389 http://www.oalib.com/paper/4051215
    [20] Chopra S, Auli M, Rush A M. Abstractive sentence summarization with attentive recurrent neural networks. In:Proceedings of the 2016 NAACL-HLT. San Diego, California, USA:Association for Computational Linguistics, 2016. 93-98
    [21] Nallapati R, Zhou B W, Santos C N D, Gulçehre C, Xiang B. Abstractive text summarization using sequence-to-sequence RNNs and beyond. In:Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning. Berlin, Germany:Association for Computational Linguistics, 2016. 280-290 https://www.researchgate.net/publication/306093640_Abstractive_Text_Summarization_Using_Sequence-to-Sequence_RNNs_and_Beyond
    [22] Paulus R, Xiong C M, Socher R. A deep reinforced model for abstractive summarization. arXiv preprint, arXiv:1705. 04304, 2017.
    [23] Ma S M, Sun X, Xu J J, Wang H F, Li W J, Su Q. Improving semantic relevance for sequence-to-sequence learning of Chinese social media text summarization. In:Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Vancouver, Canada:Association for Computational Linguistics, 2017. 635-640 https://www.researchgate.net/publication/317419309_Improving_Semantic_Relevance_for_Sequence-to-Sequence_Learning_of_Chinese_Social_Media_Text_Summarization
    [24] Tan J W, Wan X J, Xiao J G. Abstractive document summarization with a graph-based attentional neural model. In:Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Vancouver, Canada:Association for Computational Linguistics, 2017. 1171-1181
    [25] Li P J, Lam W, Bing L D, Wang Z H. Deep recurrent generative decoder for abstractive text summarization. In:Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen, Denmark:Association for Computational Linguistics, 2017. 2091-2100 https://www.researchgate.net/publication/318868519_Deep_Recurrent_Generative_Decoder_for_Abstractive_Text_Summarization
    [26] Chen Q, Zhu X D, Ling Z H, Wei S, Jiang H. Distraction- based neural networks for document summarization. arXiv preprint, arXiv:1610.08462, 2016.
    [27] Nema P, Khapra M M, Laha A, Ravindran B. Diversity driven attention model for query-based abstractive summarization. In:Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Vancouver, Canada:Association for Computational Linguistics, 2017. 1063-1072 https://www.researchgate.net/publication/318742366_Diversity_driven_attention_model_for_query-based_abstractive_summarization
    [28] Zhou Q Y, Yang N, Wei F R, Zhou M. Selective encoding for abstractive sentence summarization. In:Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Vancouver, Canada:Association for Computational Linguistics, 2017. 1095-1104 https://www.researchgate.net/publication/316451378_Selective_Encoding_for_Abstractive_Sentence_Summarization
    [29] See A, Liu P J, Manning D C. Get to the point:summarization with pointer-generator networks. In:Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Vancouver, Canada:Association for Computational Linguistics, 2017. 1073-1083 https://www.researchgate.net/publication/318740838_Get_To_The_Point_Summarization_with_Pointer-Generator_Networks
    [30] Hsieh Y L, Liu S H, Chen K Y, Wang H M, Hsu W L, Chen B. Exploiting sequence-to-sequence generation framework for automatic abstractive summarization. In:Proceedings of the 28th Conference on Computational Linguistics and Speech Processing. Tainan, China:ACLCLP, 2016. 115- 128
    [31] Baetens J. Conversations on cognitive cultural studies:literature, language, and aesthetics. Leonardo, 2015, 48(1):93-94 doi: 10.1162/LEON_r_00944
    [32] 赵京胜, 朱巧明, 周国栋, 张丽.自动关键词抽取研究综述.软件学报, 2017, 28(9):2431-2449 http://d.old.wanfangdata.com.cn/Periodical/rjxb201709013

    Zhao Jing-Sheng, Zhu Qiao-Ming, Zhou Guo-Dong, Zhang Li. Review of research in automatic keyword extraction. Journal of Software, 2017, 28(9):2431-2449 http://d.old.wanfangdata.com.cn/Periodical/rjxb201709013
    [33] Mihalcea R, Tarau P. TextRank:bringing order into texts. In:Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Barcelona, Spain:UNT Scholarly Works, 2004. 404-411
    [34] Lin C Y. Rouge:a package for automatic evaluation of summaries. Text summarization branches out. In:Proceedings of the ACL-04 Workshop. East Stroudsburg, USA:Association for Computational Linguistics, 2004. volume 8
    [35] Kingma D P, Ba J. Adam:A method for stochastic optimization. arXiv preprint, arXiv:1412.6980, 2014.
    [36] Radev D, Allison T, Blair-Goldensohn S, Blitzer J, Çelebi A, Dimitrov S, et al. MEAD——a platform for multidocument multilingual text summarization. In:Proceedings of the 4th International Conference on Language Resources and Evaluation. Lisbon, Portugal:ELRA, 2004. 699-702
    [37] Zhang J M, Wang T M, Wan X J. PKUSUMSUM:a java platform for multilingual document summarization. In:Proceedings of the 26th International Conference on Computational Linguistics. Osaka, Japan:The COLING 2016 Organizing Committee, 2016. 287-291
    [38] Hou L W, Hu P, Bei C. Abstractive document summarization via neural model with joint attention. In:Proceedings of the 2018 Natural Language Processing and Chinese Computing, Lecture Notes in Computer Science, vol. 10619. Dalian, China:Springer, 2018. 329-338 doi: 10.1007%2F978-3-319-73618-1_28
  • 加载中
图(3) / 表(2)
计量
  • 文章访问数:  2712
  • HTML全文浏览量:  719
  • PDF下载量:  930
  • 被引次数: 0
出版历程
  • 收稿日期:  2017-11-07
  • 录用日期:  2018-01-08
  • 刊出日期:  2019-03-20

目录

    /

    返回文章
    返回