基于注意力机制的概念化句嵌入研究

王亚珅; 黄河燕; 冯冲; 周强

doi:10.16383/j.aas.2018.c170295

基于注意力机制的概念化句嵌入研究

doi: 10.16383/j.aas.2018.c170295 cstr: 32138.14.j.aas.2018.c170295

王亚珅^1,,
黄河燕^1, ,,
冯冲^1,,
周强^2,

1.
北京理工大学计算机学院北京市海量语言信息处理与云计算应用工程技术研究中心北京 100081
2.
百度在线网络技术 (北京) 有限公司北京 100085

基金项目:

国家自然科学基金重点项目 61751201

详细信息

作者简介:
王亚珅  北京理工大学计算机学院博士研究生. 2012年获得北京理工大学计算机学院学士学位.主要研究方向为自然语言处理与社交网络分析. E-mail: yswang@bit.edu.cn

冯冲  北京理工大学计算机学院副研究员. 2005年获得中国科学技术大学博士学位.主要研究方向为信息抽取和情感分析. E-mail: fengchong@bit.edu.cn

周强  百度公司研发工程师. 2016年获得北京理工大学计算机学院硕士学位.主要研究方向为自然语言处理与社交网络分析. E-mail: qzhou@bit.edu.cn

通讯作者:
黄河燕北京理工大学计算机学院教授. 1989年获得中国科学院计算技术研究所博士学位.主要研究方向为自然语言处理和机器翻译.本文通信作者. E-mail: hhy63@bit.edu.cn

计量
- 文章访问数: 1322
- HTML全文浏览量: 199
- PDF下载量: 189
- 被引次数: 0
出版历程
- 收稿日期: 2017-06-02
- 录用日期: 2018-03-24
- 刊出日期: 2020-07-24

Conceptual Sentence Embeddings Based on Attention Mechanism

1.
Beijing Engineering Research Center of High Volume Language Information Processing and Cloud Computing Applications, School of Computer, Beijing Institute of Technology, Beijing 100081
2.
Baidu Inc., Beijing 100085

Funds:

National Natural Science Foundation of China 61751201

More Information

Author Bio:
WANG Ya-Shen   Ph. D. candidate at the School of Computer, Beijing Institute of Technology. He received his bachelor degree from the School of Computer, Beijing Institute of Technology in 2012. His research interest covers natural language processing and social media analysis

FENG Chong   Associate professor at the School of Computer, Beijing Institute of Technology. He received his Ph. D. degree from University of Science and Technology of China in 2005. His research interest covers information extraction and sentiment analysis

ZHOU Qiang   Research engineer at Baidu Inc. He received his master degree from Beijing Institute of Technology in 2016. His research interest covers natural language processing and deep learning

Corresponding author: HUANG He-Yan Professor at the School of Computer, Beijing Institute of Technology. She received her Ph. D. degree from Institute of Computer Technology, China Academy of Sciences in 1989. Her research interest covers natural language processing and machine translation. Corresponding author of this paper

摘要

摘要: 大多数句嵌模型仅利用文本字面信息来完成句子向量化表示, 导致这些模型对普遍存在的一词多义现象缺乏甄别能力.为了增强句子的语义表达能力, 本文使用短文本概念化算法为语料库中的每个句子赋予相关概念, 然后学习概念化句嵌入(Conceptual sentence embedding, CSE).因此, 由于引入了概念信息, 这种语义表示比目前广泛使用的句嵌入模型更具表达能力.此外, 我们通过引入注意力机制进一步扩展概念化句嵌入模型, 使模型能够有区别地选择上下文语境中的相关词语以实现更高效的预测.本文通过文本分类和信息检索等语言理解任务来验证所提出的概念化句嵌入模型的性能, 实验结果证明本文所提出的模型性能优于其他句嵌入模型.
- 句嵌入 /
- 短文本概念化 /
- 注意力机制 /
- 词嵌入 /
- 语义表达
Abstract: Most sentence embedding models typically represent each sentence only using word surface, which makes these models indiscriminative for ubiquitous homonymy and polysemy. In order to enhance representation capability of sentence, we employ short-text conceptualization algorithm to assign associated concepts for each sentence in the text corpus, and then learn conceptual sentence embedding (CSE). Hence, this semantic representation is more expressive than some widely-used text representation models such as latent topic model, especially for short-text. Moreover, we further extend CSE models by utilizing an attention mechanism that select relevant words within the context to make more efficient prediction. In the experiments, we evaluate the CSE models on three tasks, text classification and information retrieval. The experimental results show that the proposed models outperform typical sentence embed-ding models.
- Sentence embedding /
- short-text conceptualization /
- attention mechanism /
- word embedding /
- semantic representation
Recommended by Associate Editor ZHAO Tie-Jun
注释:

1) 本文责任编委赵铁军

HTML全文

图 1 CBOW模型和Skip-Gram模型

Fig. 1 CBOW model and Skip-Gram model

下载: 全尺寸图片幻灯片

图 2 CSE-CBOW模型和CSE-SkipGram模型

Fig. 2 CSE-CBOW model and CSE-SkipGram model

下载: 全尺寸图片幻灯片

图 3 aCSE-TYPE模型

Fig. 3 aCSE-TYPE model

下载: 全尺寸图片幻灯片

表 1 文本分类任务实验结果

Table 1 Evaluation results of text classification task

数据集	NewsTitle			Twitter			TREC
模型	P	R	F	P	R	F	P	R	F
BOW	0.731	0.719	0.725	0.397	0.415	0.406	0.822	0.820	0.821
LDA	0.720	0.706	0.713	0.340	0.312	0.325	0.815	0.811	0.813
PV-DBOW	0.726	0.721	0.723	0.409	0.410	0.409	0.825	0.817	0.821
PV-DM	0.745	0.738	0.741	0.424	0.423	0.423	0.837	0.824	0.830
TWE	0.810	0.805	0.807	0.454	0.438	0.446	0.894	0.885	0.885
SCBOW	0.812$^\beta$	0.805$^\beta$	0.809$^\beta$	0.455$^\beta$	0.439	0.449$^\beta$	0.897$^\beta$	0.887$^\beta$	0.892$^\beta$
CSE-CBOW	0.814	0.811	0.812	0.458	0.450	0.454	0.895	0.891	0.893
CSE-SkipGram	0.827	0.819	0.823	0.477	0.447	0.462	0.899	0.894	0.896
aCSE-SUR	0.828	0.822	0.825	0.469	0.453	0.462	0.906	0.897	0.901
aCSE-TYPE	0.838	0.830	0.834	0.483	0.455	0.468	0.911	0.903	0.907
aCSE-ALL	0.845$^{\alpha\beta}$	0.832$^{\alpha\beta}$	0.838$^{\alpha\beta}$	0.485$^{\alpha\beta}$	0.462$^{\alpha\beta}$	0.473$^{\alpha\beta}$	0.917$^{\alpha\beta}$	0.914$^{\alpha\beta}$	0.915$^{\alpha\beta}$

下载: 导出CSV

表 2 信息检索任务实验结果

Table 2 Evaluation results of information retrieval

查询项集合	TMB2011		TMB2012
模型	MAP	P@30	MAP	P@30
BOW	0.304	0.412	0.321	0.494
LDA	0.281	0.409	0.311	0.486
PV-DBOW	0.285	0.412	0.324	0.491
PV-DM	0.327	0.431	0.340	0.524
TWE	0.331	0.446	0.347	0.509
SCBOW	0.333	0.448$^{\beta}$	0.349$^{\beta}$	0.511
CSE-CBOW	0.337	0.451	0.344	0.512
CSE-SkipGram	0.367	0.461	0.360	0.517
aCSE-SUR	0.342	0.458	0.349	0.520
aCSE-TYPE	0.373	0.466	0.365	0.525
aCSE-ALL	0.376$^{\alpha\beta}$	0.471$^{\alpha\beta}$	0.369$^{\alpha\beta}$	0.530$^{\alpha\beta}$

下载: 导出CSV

参考文献(39)

[1]	Harris Z S. Distributional Structure. Word, 1954, 10: 146-162 doi: 10.1080/00437956.1954.11659520
[2]	Palangi H, Deng L, Shen Y, Gao J, He X, Chen J, Song X, Ward R. Deep Sentence Embedding Using Long Short-Term Memory Networks: Analysis and Application to Information Retrieval. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016, 24: 694-707 doi: 10.1109/TASLP.2016.2520371
[3]	Le Q, Mikolov T. Distributed Representations of Sentences and Documents. In: Proceedings of the 31st International Conference on Machine Learning. New York, NY, USA: ACM, 2014. 1188-1196
[4]	Liu Y, Liu Z, Chua T S, Sun M. Topical Word Embeddings. In: Proceedings of the 29th AAAI Conference on Artificial Intelligence. Palo Alto, CA, USA: AAAI, 2015. 2418-2424
[5]	Mikolov T, Corrado G, Chen K, Dean J. Efficient Estimation of Word Representations in Vector Space. arXiv preprint arXiv: 1301.3781, 2015. http://arxiv.org/abs/1301.3781
[6]	Wang Z, Zhao K, Wang H, Dean J. Query Understanding through Knowledge-Based Conceptualization. In: Proceedings of the 24th International Joint Conference on Artificial Intelligence. San Francisco, CA, USA: Morgan Kaufmann, 2015. 3264-3270
[7]	Bahdanau D, Cho K, Bengio Y. Neural Machine Translation by Jointly Learning to Align and Translate. arXiv preprint arXiv: 1409.0473, 2015. http://arxiv.org/abs/1409.0473
[8]	Lin Y, Shen S, Liu Z, Luan H, Sun M. Neural Relation Extraction with Selective Attention over Instances. In: Proceedings of the 54th Meeting of the Association for Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics, 2016. 2124-2133
[9]	Wang Y, Huang H, Feng C, Zhou Q, Gu J, Gao X. CSE: Conceptual Sentence Embeddings based on Attention Model. In: Proceedings of the 54th Meeting of the Association for Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics, 2016. 505-515
[10]	Rayner K. Eye movements in Reading and Information Processing: 20 Years of Research. Psychological Bulletin, 1998, 124: 372-422 doi: 10.1037/0033-2909.124.3.372
[11]	Nilsson M, Nivre J. Learning where to look: Modeling eye movements in reading. In: Proceedings of the 13th Conference on Computational Natural Language Learning. Stroudsburg, PA, USA: Association for Computational Linguistics, 2009. 93-101
[12]	Lai S, Xu L, Liu K, Zhao J. Recurrent Convolutional Neural Networks for Text Classification. In: Proceedings of the 29th AAAI Conference on Artificial Intelligence. Palo Alto, CA, USA: AAAI, 2015. 2267-2273
[13]	Kenter T, Borisov A, De Rijke M. Siamese CBOW: Optimizing Word Embeddings for Sentence Representations. In: Proceedings of the 54th Meeting of the Association for Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics, 2016. 941-951
[14]	Xiong C, Merity S, Socher R. Dynamic Memory Networks for Visual and Textual Question Answering. In: Proceedings of the 33rd International Conference on Machine Learning. New York, NY. USA: ACM, 2016. 1230-1239
[15]	Yu J. Learning Sentence Embeddings with Auxiliary Tasks for Cross-Domain Sentiment Classification. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, 2016. 236-246
[16]	Yong Z, Meng J E, Ning W, Pratama M. Attention Pooling-based Convolutional Neural Network for Sentence Modelling. Information Sciences, 2016, 373: 388-403 doi: 10.1016/j.ins.2016.08.084
[17]	Wang S, Zhang J, Zong C. Learning sentence representation with guidance of human attention. In: Proceedings of IJCAI International Joint Conference on Artificial Intelligence. San Francisco, CA, USA: Morgan Kaufmann, 2017. 4137-4143
[18]	Lang Z, Gu X, Zhou Q, Xu T. Combining statistics-based and CNN-based information for sentence classification. In: Proceedings of the 28th IEEE International Conference on Tools with Artificial Intelligence. New York, NY. USA: IEEE, 2017. 1012-1018
[19]	Wieting J, Gimpel K. Revisiting Recurrent Networks for Paraphrastic Sentence Embeddings. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics, 2017. 2078-2088
[20]	Nicosia M, Moschitti A. Learning Contextual Embeddings for Structural Semantic Similarity using Categorical Information. In: Proceedings of the 21st Conference on Computational Natural Language Learning. Stroudsburg, PA, USA: Association for Computational Linguistics, 2017. 260-270
[21]	Peng W, Wang J, Zhao B, Wang L. Identification of Protein Complexes Using Weighted PageRank-Nibble Algorithm and Core-Attachment Structure. IEEE/ACM Transactions on Computational Biology & Bioinformatics, 2015, 12: 179-192 http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=abeae9028ba365cfcf0b6aadc978a058
[22]	Hua W, Wang Z, Wang H, Zheng K, Zhou X. Short text understanding through lexical-semantic analysis. In: Proceedings of the 31st IEEE International Conference on Data Engineering. New York, NY. USA: IEEE, 2015. 495-506
[23]	Song Y, Wang H. Open Domain Short Text Conceptualization: A Generative + Descriptive Modeling Approach. In: Proceedings of the 24th International Joint Conference on Artificial Intelligence. San Francisco, CA, USA: Morgan Kaufmann, 2015. 3820-3826
[24]	Wang F, Wang Z, Li Z, Wen J R. Concept-based Short Text Classification and Ranking. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. New York, NY. USA: ACM, 2014. 1069-1078
[25]	Wang Y, Huang H, Feng C. Query Expansion Based on a Feedback Concept Model for Microblog Retrieval. In: Proceedings of the 26th International Conference on World Wide Web. New York, NY. USA: ACM, 2017. 559-568
[26]	Wu W, Li H, Wang H, Zhu K Q. Probase: A probabilistic taxonomy for text understanding. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. New York, NY. USA: ACM, 2012. 481-492
[27]	Yu Z, Wang H, Lin X, Wang M. Understanding Short Texts through Semantic Enrichment and Hashing. IEEE Transactions on Knowledge and Data Engineering, 2016, 28: 566-579 doi: 10.1109/TKDE.2015.2485224
[28]	Itti L, Koch C, Niebur E. A Model of Saliency Based Visual Attention for Rapid Scene Analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20: 1254-1259 doi: 10.1109/34.730558
[29]	Hahn M, Keller F. Modeling Human Reading with Neural Attention. Psychological Bulletin, 2016, 85: 618-627 http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=Arxiv000001371070
[30]	Narayanan S, Jurafsky D. A Bayesian Model Predicts Human Parse Preference and Reading Times in Sentence Processing. Advances in Neural Information Processing Systems, 2001, 14: 59-65 http://cn.bing.com/academic/profile?id=295791b42c4db2c7f0cea418567b841e&encoded=0&v=paper_preview&mkt=zh-cn
[31]	Demberg V, Keller F. Data from eye-tracking corpora as evidence for theories of syntactic processing complexity. Cognition, 2016, 109: 193-210 http://cn.bing.com/academic/profile?id=8f52c0a66bab4fbea159c968896ec065&encoded=0&v=paper_preview&mkt=zh-cn
[32]	Barrett M, Sogaard A. Reading behavior predicts syntactic categories. In: Proceedings of the 2015 SIGNLL Conference on Computational Natural Language Learning. Stroudsburg, PA, USA: Association for Computational Linguistics, 2015. 345-349
[33]	Mikolov T, Sutskever I, Chen K, Corrado G, Dean J. Distributed Representations of Words and Phrases and their Compositionality. In: Proceedings of the 27th Annual Conference on Neural Information Processing Systems. Cambridge, MA, USA: MIT Press, 2013. 1-9
[34]	Fernandez-Carbajales V, García M A, MartU.S.A.nez J M. Visual attention based on a joint perceptual space of color and brightness for improved video tracking. Pattern Recognit, 2016, 60: 571-584
[35]	Attneave F. Applications of information theory to psychology: a summary of basic concepts, methods, and results. American Journal of Psychology, 1961, 74(2): 319-324 doi: 10.2307/1419430
[36]	Hale J. A probabilistic earley parser as a psycholinguistic model. In: Proceedings of the 2nd meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies. Stroudsburg, PA, USA: Association for Computational Linguistics, 2001. 1-8
[37]	Ounis I, Macdonald C, Lin J. Overview of the trec-2011 microblog track. In: Proceedings of the 2011 Text REtrieval Conference. Gaithersburg, MD, USA: NIST, 2001. 1-9
[38]	Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation. Journal of Machine Learning Research, 2003, 3: 993-1022 http://d.old.wanfangdata.com.cn/Periodical/jsjyy201306024
[39]	Fan R E, Chang K W, Hsieh C J, Wang X R, Lin C J. LIBLINEAR: A Library for Large Linear Classification. Journal of Machine Learning Research, 2008, 9: 1871-1874 http://d.old.wanfangdata.com.cn/Periodical/dzjs201506001