Query-doc Relation Mining Based on User Search Behavior
-
摘要: query和doc之间的关联关系是搜索引擎期望获取的一类有价值的信息. query和doc间准确的关联分析不仅可以帮助搜索结果排序,也在query和doc之间的桥接中起到重要作用,以实现相关query和doc之间的信息传递,有利于更深入的query理解和doc理解,并在此基础上开展相关应用.本文提出了一种基于用户搜索行为的query和doc关联关系挖掘算法,该方法首先对用户搜索点击日志中的数据进行整理与分析,构建query与doc间的二部图,再通过采用马尔可夫随机游走模型对二部图数据进行建模,挖掘二部图中的点击数据和session数据,最终挖掘出点击日志中用户没有点击到的doc数据,从而预测出query和doc间的隐含关联关系,同时也可以利用该算法得到query和query潜在的关联关系.基于以上理论基础,我们实现了一套完整的日志挖掘系统,通过大量的实验对比,该系统在各方面均取得了优异的表现,其中对检索结果相关性的性能提升可以达到71.23%,这充分表明,本文所提出的理论和算法能够很好地解决query和doc之间的隐含关系挖掘问题,为提高搜索结果的召回率、实现查询推荐和检索结果聚类奠定了良好的前提基础.Abstract: The relationship between queries and docs is a valuable type of information that search engines hope to obtain. An exact correlation analysis between queries and docs is not only helpful for ranking search result, but also important for building a bridge between queries and docs to allow information transfer between related queries and docs, which is beneficial to a deep understanding of queries and to a series of applications. This paper presents a query-doc relation mining algorithm based on user search behavior. Initially, we collect and analyze users' search log data to build a bipartite graph between queries and docs. Next we model the bipartite data using a Markov random walk model, and then mine the click-through data and session data from the bi-partite graph. Eventually, we can obtain doc data that the user did not click in the click-through data and predict the implied relationship between queries and docs. Besides, we can also take advantage of the algorithm to get the potential relationship between queries and queries. Based on the theoretical foundation described above, we construct a complete log data mining system. Through a large number of experimental contrasts, the system shows outstanding performance on many aspects, such as increasing relevance up to 71.23%, which indicates that the theory and algorithms proposed in this paper can solve the problem of mining implicit relationships between queries and docs effectively. Our approach provides a good basis for increasing recall of search results, optimizing query recommendation and clustering retrieved results.
-
[1] Bhatia S, Majumdar D, Mitra P. Query suggestions in the absence of query logs. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. Beijing, China: ACM, 2011. 795-804 [2] [2] Li X. Understanding the semantic structure of noun phrase queries. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics Association for Computational Linguistics. Uppsala, Sweden: ACL, 2010. 1337-1345 [3] [3] Mintz M, Bills S, Snow R, Jurafsky D. Distant supervision for relation extraction without labeled data. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2 Association for Computational Linguistics. Suntec, Singapore: ACL, 2009. 1003-1011 [4] [4] Peters S, Jacob Y, Denoyer L, Gallinari P. Iterative multi-label multi-relational classification algorithm for complex social networks. Social Network Analysis and Mining, 2012, 2(1): 17-29 [5] [5] Surdeanu M, Tibshirani J, Nallapati R, Manning C D, Center A I. Multi-instance multi-label learning for relation extraction. In: Proceedings of the 2012 Conference on Empirical Methods in Natural Language Processing and Natural Language Learning (EMNLP-CoNLL). Stroudsburg, PA, USA: Association for Computational Linguistics, 2012. 455-465 [6] [6] Anagnostopoulos A, Becchetti L, Castillo C, Gionis A. An optimization framework for query recommendation. In: Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. New York, USA: ACM, 2010. 161-170 [7] [7] Liu Y, Miao J, Zhang M, Ma S, Ru L. How do users describe their information need: query recommendation based on snippet click model. Expert Systems with Applications, 2011, 38(11): 13847-13856 [8] [8] Yan X H, Guo J F, Cheng X Q. Context-aware query recommendation by learning high-order relation in query logs. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management. Glasgow, UK: ACM, 2011. 2073-2076 [9] Li Wen-Qing, Sun Xin, Zhang Chang-You, Feng Ye. A semantic similarity measure between ontological concepts. Acta Automatica Sinica, 2012, 38(2): 229-235 (李文清, 孙新, 张常有, 冯烨. 一种本体概念的语义相似度计算方法. 自动化学报, 2012, 38(2): 229-235) [10] Zhou Lin, Ping Xi-Jian, Xu Sen, Zhang Tao. Cluster ensemble based on spectral clustering. Acta Automatica Sinica, 2012, 38(8): 1335-1342 (周林, 平西建, 徐森, 张涛. 基于谱聚类的聚类集成算法. 自动化学报, 2012, 38(8): 1335-1342) [11] Wang Li, Wu Cheng-Dong, Chen Dong-Yue, Li Meng-Xin, Chen Li. Exploring linear homeomorphic clusters on nonlinear manifold. Acta Automatica Sinica, 2012, 38(8): 1308-1320 (王力, 吴成东, 陈东岳, 李孟歆, 陈莉. 非线性流形上的线性结构聚类挖掘. 自动化学报, 2012, 38(8): 1308-1320) [12] Yang Yi, Han De-Qiang, Han Chong-Zhao. Evidence combination based on multi-criteria rank-level fusion. Acta Automatica Sinica, 2012, 38(5): 823-831 (杨艺, 韩德强, 韩崇昭. 基于多准则排序融合的证据组合方法. 自动化学报, 2012, 38(5): 823-831) [13] Xiang B, Jiang D, Pei J, Sun X, Chen E H, Li H. Context-aware ranking in web search. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. Geneva, Switzerland Cochairs: ACM, 2010. 451-458 [14] Chang L J, Xu Y J, Qin L. Context-sensitive document ranking. Journal of Computer Science and Technology, 2010, 25(3): 444-457 [15] Chen L J, Papakonstantinou Y. Context-sensitive ranking for document retrieval. In: Proceedings of the 2011 International Conference on Management of Data. Athens, Greece: ACM, 2011. 757-768 [16] Zhuang Z M, Cucerzan S. Exploiting semantic query context to improve search ranking. In: Proceedings of the 2008 IEEE International Conference on Semantic Computing. Santa Clara, California, USA: IEEE, 2008. 50-57 [17] Nguyen T V T, Moschitti A. End-to-end relation extraction using distant supervision from external semantic repositories. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Portland, Oregon, USA: ACL, 2011: 277-282 [18] Riedel S, Yao L, Mccallum A. Modeling relations and their mentions without labeled text. Machine Learning and Knowledge Discovery in Databases, 2010, 6323(3): 148-163
点击查看大图
计量
- 文章访问数: 1968
- HTML全文浏览量: 69
- PDF下载量: 1962
- 被引次数: 0