基于用户搜索行为的query-doc关联挖掘

朱亮; 陆静雅; 左万利

doi:10.3724/SP.J.1004.2014.01654

基于用户搜索行为的query-doc关联挖掘

doi: 10.3724/SP.J.1004.2014.01654 cstr: 32138.14.SP.J.1004.2014.01654

朱亮^1,2,
陆静雅^1,2,
左万利^1,2, ,

1.
吉林大学计算机科学与技术学院长春 130012;
2.
吉林大学符号计算与知识工程教育部重点实验室长春 130012

基金项目:

国家自然科学基金（60973040，61300148），中国博士后基金（2012M510879），吉林省重点科技攻关项目（20130206051GX）资助

详细信息

作者简介:
朱亮吉林大学计算机科学与技术学院硕士研究生. 2011 年获吉林大学计算机科学与技术学院理学学士学位. 主要研究方向为网络搜索引擎，信息检索与排序学习理论.E-mail：zhuliang11@mails.jlu.edu.cn

通讯作者:
左万利吉林大学计算机科学与技术学院教授，2005 年获吉林大学计算机软件与理论专业工学博士学位. 主要研究方向为数据库，数据挖掘，机器学习，信息检索，搜索引擎.E-mail：wanli@jlu.edu.cn

计量
- 文章访问数: 2135
- HTML全文浏览量: 69
- PDF下载量: 1977
- 被引次数: 0
出版历程
- 收稿日期: 2013-06-26
- 修回日期: 2014-02-12
- 刊出日期: 2014-08-20

Query-doc Relation Mining Based on User Search Behavior

ZHU Liang^1,2,
LU Jing-Ya^1,2,
ZUO Wan-Li^{1,2
, ,}

1.
College of Computer Science and Technology, Jilin University, Changchun 130012;
2.
Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012

Funds:

Supported by National Natural Science Foundation of China (60973040, 61300148), Science Foundation for China Postdoctor (2012M510879) and Key Scientific and Technological Break-through Program of Jilin Province (20130206051GX)

摘要

摘要: query和doc之间的关联关系是搜索引擎期望获取的一类有价值的信息. query和doc间准确的关联分析不仅可以帮助搜索结果排序，也在query和doc之间的桥接中起到重要作用，以实现相关query和doc之间的信息传递，有利于更深入的query理解和doc理解，并在此基础上开展相关应用.本文提出了一种基于用户搜索行为的query和doc关联关系挖掘算法，该方法首先对用户搜索点击日志中的数据进行整理与分析，构建query与doc间的二部图，再通过采用马尔可夫随机游走模型对二部图数据进行建模，挖掘二部图中的点击数据和session数据，最终挖掘出点击日志中用户没有点击到的doc数据，从而预测出query和doc间的隐含关联关系，同时也可以利用该算法得到query和query潜在的关联关系.基于以上理论基础，我们实现了一套完整的日志挖掘系统，通过大量的实验对比，该系统在各方面均取得了优异的表现，其中对检索结果相关性的性能提升可以达到71.23%，这充分表明，本文所提出的理论和算法能够很好地解决query和doc之间的隐含关系挖掘问题，为提高搜索结果的召回率、实现查询推荐和检索结果聚类奠定了良好的前提基础.
- 关联关系 /
- 搜索行为 /
- 马尔可夫随机游走 /
- 查询推荐 /
- 检索结果聚类
Abstract: The relationship between queries and docs is a valuable type of information that search engines hope to obtain. An exact correlation analysis between queries and docs is not only helpful for ranking search result, but also important for building a bridge between queries and docs to allow information transfer between related queries and docs, which is beneficial to a deep understanding of queries and to a series of applications. This paper presents a query-doc relation mining algorithm based on user search behavior. Initially, we collect and analyze users' search log data to build a bipartite graph between queries and docs. Next we model the bipartite data using a Markov random walk model, and then mine the click-through data and session data from the bi-partite graph. Eventually, we can obtain doc data that the user did not click in the click-through data and predict the implied relationship between queries and docs. Besides, we can also take advantage of the algorithm to get the potential relationship between queries and queries. Based on the theoretical foundation described above, we construct a complete log data mining system. Through a large number of experimental contrasts, the system shows outstanding performance on many aspects, such as increasing relevance up to 71.23%, which indicates that the theory and algorithms proposed in this paper can solve the problem of mining implicit relationships between queries and docs effectively. Our approach provides a good basis for increasing recall of search results, optimizing query recommendation and clustering retrieved results.
- Association relation /
- search behavior /
- Markov random walk model /
- query recommendation /
- clustering of retrieved results

HTML全文

参考文献(18)

[1]	Bhatia S, Majumdar D, Mitra P. Query suggestions in the absence of query logs. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. Beijing, China: ACM, 2011. 795-804
[2]	[2] Li X. Understanding the semantic structure of noun phrase queries. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics Association for Computational Linguistics. Uppsala, Sweden: ACL, 2010. 1337-1345
[3]	[3] Mintz M, Bills S, Snow R, Jurafsky D. Distant supervision for relation extraction without labeled data. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2 Association for Computational Linguistics. Suntec, Singapore: ACL, 2009. 1003-1011
[4]	[4] Peters S, Jacob Y, Denoyer L, Gallinari P. Iterative multi-label multi-relational classification algorithm for complex social networks. Social Network Analysis and Mining, 2012, 2(1): 17-29
[5]	[5] Surdeanu M, Tibshirani J, Nallapati R, Manning C D, Center A I. Multi-instance multi-label learning for relation extraction. In: Proceedings of the 2012 Conference on Empirical Methods in Natural Language Processing and Natural Language Learning (EMNLP-CoNLL). Stroudsburg, PA, USA: Association for Computational Linguistics, 2012. 455-465
[6]	[6] Anagnostopoulos A, Becchetti L, Castillo C, Gionis A. An optimization framework for query recommendation. In: Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. New York, USA: ACM, 2010. 161-170
[7]	[7] Liu Y, Miao J, Zhang M, Ma S, Ru L. How do users describe their information need: query recommendation based on snippet click model. Expert Systems with Applications, 2011, 38(11): 13847-13856
[8]	[8] Yan X H, Guo J F, Cheng X Q. Context-aware query recommendation by learning high-order relation in query logs. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management. Glasgow, UK: ACM, 2011. 2073-2076
[9]	Li Wen-Qing, Sun Xin, Zhang Chang-You, Feng Ye. A semantic similarity measure between ontological concepts. Acta Automatica Sinica, 2012, 38(2): 229-235 (李文清, 孙新, 张常有, 冯烨. 一种本体概念的语义相似度计算方法. 自动化学报, 2012, 38(2): 229-235)
[10]	Zhou Lin, Ping Xi-Jian, Xu Sen, Zhang Tao. Cluster ensemble based on spectral clustering. Acta Automatica Sinica, 2012, 38(8): 1335-1342 (周林, 平西建, 徐森, 张涛. 基于谱聚类的聚类集成算法. 自动化学报, 2012, 38(8): 1335-1342)
[11]	Wang Li, Wu Cheng-Dong, Chen Dong-Yue, Li Meng-Xin, Chen Li. Exploring linear homeomorphic clusters on nonlinear manifold. Acta Automatica Sinica, 2012, 38(8): 1308-1320 (王力, 吴成东, 陈东岳, 李孟歆, 陈莉. 非线性流形上的线性结构聚类挖掘. 自动化学报, 2012, 38(8): 1308-1320)
[12]	Yang Yi, Han De-Qiang, Han Chong-Zhao. Evidence combination based on multi-criteria rank-level fusion. Acta Automatica Sinica, 2012, 38(5): 823-831 (杨艺, 韩德强, 韩崇昭. 基于多准则排序融合的证据组合方法. 自动化学报, 2012, 38(5): 823-831)
[13]	Xiang B, Jiang D, Pei J, Sun X, Chen E H, Li H. Context-aware ranking in web search. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. Geneva, Switzerland Cochairs: ACM, 2010. 451-458
[14]	Chang L J, Xu Y J, Qin L. Context-sensitive document ranking. Journal of Computer Science and Technology, 2010, 25(3): 444-457
[15]	Chen L J, Papakonstantinou Y. Context-sensitive ranking for document retrieval. In: Proceedings of the 2011 International Conference on Management of Data. Athens, Greece: ACM, 2011. 757-768
[16]	Zhuang Z M, Cucerzan S. Exploiting semantic query context to improve search ranking. In: Proceedings of the 2008 IEEE International Conference on Semantic Computing. Santa Clara, California, USA: IEEE, 2008. 50-57
[17]	Nguyen T V T, Moschitti A. End-to-end relation extraction using distant supervision from external semantic repositories. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Portland, Oregon, USA: ACL, 2011: 277-282
[18]	Riedel S, Yao L, Mccallum A. Modeling relations and their mentions without labeled text. Machine Learning and Knowledge Discovery in Databases, 2010, 6323(3): 148-163

施引文献

资源附件(0)

访问统计

计量

文章访问数: 2135
HTML全文浏览量: 69
PDF下载量: 1977
被引次数: 0

姓名
邮箱
手机号码
标题
留言内容
验证码

留言板

基于用户搜索行为的query-doc关联挖掘

doi: 10.3724/SP.J.1004.2014.01654 cstr: 32138.14.SP.J.1004.2014.01654

作者简介:
朱亮吉林大学计算机科学与技术学院硕士研究生. 2011 年获吉林大学计算机科学与技术学院理学学士学位. 主要研究方向为网络搜索引擎，信息检索与排序学习理论.E-mail：zhuliang11@mails.jlu.edu.cn

通讯作者:
左万利吉林大学计算机科学与技术学院教授，2005 年获吉林大学计算机软件与理论专业工学博士学位. 主要研究方向为数据库，数据挖掘，机器学习，信息检索，搜索引擎.E-mail：wanli@jlu.edu.cn

计量

Query-doc Relation Mining Based on User Search Behavior

计量

目录

留言板

基于用户搜索行为的query-doc关联挖掘

doi: 10.3724/SP.J.1004.2014.01654 cstr: 32138.14.SP.J.1004.2014.01654

作者简介: 朱亮 吉林大学计算机科学与技术学院硕士研究生. 2011 年获吉林大学计算机科学与技术学院理学学士学位. 主要研究方向为网络搜索引擎，信息检索与排序学习理论.E-mail：zhuliang11@mails.jlu.edu.cn

通讯作者: 左万利 吉林大学计算机科学与技术学院教授，2005 年获吉林大学计算机软件与理论专业工学博士学位. 主要研究方向为数据库，数据挖掘，机器学习，信息检索，搜索引擎.E-mail：wanli@jlu.edu.cn

计量

出版历程

Query-doc Relation Mining Based on User Search Behavior

计量

出版历程

目录

作者简介:
朱亮吉林大学计算机科学与技术学院硕士研究生. 2011 年获吉林大学计算机科学与技术学院理学学士学位. 主要研究方向为网络搜索引擎，信息检索与排序学习理论.E-mail：zhuliang11@mails.jlu.edu.cn

通讯作者:
左万利吉林大学计算机科学与技术学院教授，2005 年获吉林大学计算机软件与理论专业工学博士学位. 主要研究方向为数据库，数据挖掘，机器学习，信息检索，搜索引擎.E-mail：wanli@jlu.edu.cn