2.765

2022影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

一种基于语义关系图的词语语义相关度计算模型

张仰森 郑佳 李佳媛

张仰森, 郑佳, 李佳媛. 一种基于语义关系图的词语语义相关度计算模型. 自动化学报, 2018, 44(1): 87-98. doi: 10.16383/j.aas.2018.c170002
引用本文: 张仰森, 郑佳, 李佳媛. 一种基于语义关系图的词语语义相关度计算模型. 自动化学报, 2018, 44(1): 87-98. doi: 10.16383/j.aas.2018.c170002
ZHANG Yang-Sen, ZHENG Jia, LI Jia-Yuan. A Model for Calculating Semantic Relatedness of Words Considering Semantic Relationship Graph. ACTA AUTOMATICA SINICA, 2018, 44(1): 87-98. doi: 10.16383/j.aas.2018.c170002
Citation: ZHANG Yang-Sen, ZHENG Jia, LI Jia-Yuan. A Model for Calculating Semantic Relatedness of Words Considering Semantic Relationship Graph. ACTA AUTOMATICA SINICA, 2018, 44(1): 87-98. doi: 10.16383/j.aas.2018.c170002

一种基于语义关系图的词语语义相关度计算模型

doi: 10.16383/j.aas.2018.c170002
基金项目: 

国家自然科学基金 61370139

国家自然科学基金 61602044

详细信息
    作者简介:

    郑佳  北京信息科技大学硕士研究生.主要研究方向为自然语言处理.E-mail:zhengjia0826@163.com

    李佳媛  北京信息科技大学硕士研究生.主要研究方向为自然语言处理.E-mail:ljyuan0616@126.com

    通讯作者:

    张仰森  北京信息科技大学教授.主要研究方向为自然语言处理和人工智能.本文通信作者.E-mail:zhangyangsen@163.com

A Model for Calculating Semantic Relatedness of Words Considering Semantic Relationship Graph

Funds: 

National Natural Science Foundation of China 61370139

National Natural Science Foundation of China 61602044

More Information
    Author Bio:

     Master student at the Beijing Information Science and Technology University. His main research interest is nature language processing

     Master student at the Beijing Information Science and Technology University. Her main research interest is nature language processing

    Corresponding author: ZHANG Yang-Sen  Professor at the Beijing Information Science and Technology University. His research interest covers nature language processing and artificial intelligence. Corresponding author of this paper
  • 摘要: 词语的语义计算是自然语言处理领域的重要问题之一,目前的研究主要集中在词语语义的相似度计算方面,对词语语义的相关度计算方法研究不够.为此,本文提出了一种基于语义词典和语料库相结合的词语语义相关度计算模型.首先,以HowNet和大规模语料库为基础,制定了相关的语义关系提取规则,抽取了大量的语义依存关系;然后,以语义关系三元组为存储形式,构建了语义关系图;最后,采用图论的相关理论,对语义关系图中的语义关系进行处理,设计了一个基于语义关系图的词语语义相关度计算模型.实验结果表明,本文提出的模型在词语语义相关度计算方面具有较好的效果,在WordSimilarity-353数据集上的斯皮尔曼等级相关系数达到了0.5358,显著地提升了中文词语语义相关度的计算效果.
    1)  本文责任编委 张民
  • 图  1  概念“拳台”的概念树表示

    Fig.  1  The concept tree representation for "ring"

    图  2  语义关系图中的语义相关

    Fig.  2  The semantic relatedness in semantic relationship graph

    图  3  语义连通路径的数量与语义相关度的关系

    Fig.  3  The relationship between the quantity of semantic connected path and semantic relatedness

    图  4  语义连通路径的长度与语义相关度的关系

    Fig.  4  The relationship between the length of semantic connected path and semantic relatedness

    图  5  互信息与共现频次密度矩阵分布图

    Fig.  5  The density matrix distribution figure between mutual information and co-occurrence frequency relatedness

    图  6  互信息与共现频次对语义关系搭配对的覆盖趋势图

    Fig.  6  The coverage trend figure of mutual information and co-occurrence frequency for semantic collocation

    图  7  Spearman系数与语义连通路径长度$\alpha$关系

    Fig.  7  The relationship between Spearman and semantic connected path length $\alpha$

    图  8  Spearman系数与相似度阈值$\lambda$的关系

    Fig.  8  The relationship between Spearman and similarity threshold $\lambda$

    表  1  语义关系的存储格式

    Table  1  The storage format of semantic relations

    关系起始项 关系终止项 语义关系词
    拳台 设施 DEF
    $\cdots$ $\cdots$ $\cdots$
    下载: 导出CSV

    表  2  不同方法的Spearman系数比较

    Table  2  The comparison of Spearman in different methods

    模型 Spearman系数
    Knowledge-based LIU [23] 0.4202
    WU [23] 0.3205
    Corpus-based TFIDF [17] 0.4030
    COMB [17] 0.5150
    ICLinkBased [23] 0.2786
    ICSubCategoryNodes [23] 0.2803
    WLM [23] 0.4984
    WLT [23] 0.5126
    Our methods HN 0.4389
    DSR 0.5012
    HN+DSR 0.5358
    Knowledge-based WUP [24] 0.3390
    J & C [24] 0.3180
    Lin [24] 0.3480
    Resnik [24] 0.3530
    Corpus-based LSA [24] 0.5810
    ESA [24] 0.6290
    SSA [24] 0.5370
    Knowledge + Corpus-based WTMGW [24] 0.7500
    下载: 导出CSV

    表  3  语义相关度计算的实验结果

    Table  3  The experimental result of semantic relatedness computation

    词语1 词语2 相关度
    足球比赛 比分 0.9004
    足球比赛 直播 0.8438
    足球比赛 场地 0.6034
    足球比赛 规则 0.7925
    足球比赛 法庭 0.2016
    滑冰 足球比赛 0.2415
    滑冰 流畅 0.7924
    滑冰 速度 0.8415
    滑冰 摔倒 0.7524
    滑冰 法庭 0.2965
    足球比赛 流畅 0.0251
    下载: 导出CSV
  • [1] Gracia J, Mena E. Web-based measure of semantic relatedness. In:Proceedings of the 9th International Conference on Web Information Systems Engineering. Auckland, New Zealand:Springer, 2008. 136-150
    [2] Resnik P. Using information content to evaluate semantic similarity in a taxonomy. In:Proceedings of the 14th International Joint Conference on Artificial Intelligence. Montreal, Quebec, Canada:Morgan Kaufmann Publishers Inc., 1995. 448-453
    [3] Liu H W, Xu J J, Zheng K, Liu C F, Du L, Wu X. Semantic-aware query processing for activity trajectories. In:Proceedings of the 10th ACM International Conference on Web Search and Data Mining. Cambridge, UK:ACM, 2017. 283-292
    [4] Ensan F, Bagheri E. Document retrieval model through semantic linking. In:Proceedings of the 10th ACM International Conference on Web Search and Data Mining. Cambridge, UK:ACM, 2017. 181-190
    [5] 刘康, 张元哲, 纪国良, 来斯惟, 赵军.基于表示学习的知识库问答研究进展与展望.自动化学报, 2016, 42(6):807-818 http://www.aas.net.cn/CN/Y2016/V42/I6/807

    Liu Kang, Zhang Yuan-Zhe, Ji Guo-Liang, Lai Si-Wei, Zhao Jun. Representation learning for question answering over knowledge base:an overview. Acta Automatica Sinica, 2016, 42(6):807-818 http://www.aas.net.cn/CN/Y2016/V42/I6/807
    [6] Zhang Y M, Iwaihara M. Evaluating semantic relatedness through categorical and contextual information for entity disambiguation. In:Proceedings of the IEEE/ACIS 15th International Conference on Computer and Information Science. Okayama, Japan:IEEE, 2016. 1-6
    [7] Li C, Bendersky M, Garg V, Ravi S. Related event discovery. In:Proceedings of the 10th ACM International Conference on Web Search and Data Mining. Cambridge, UK:ACM, 2017. 355-364
    [8] Arab M, Jahromi M Z, Fakhrahmad S M. A graph-based approach to word sense disambiguation. An unsupervised method based on semantic relatedness. In:Proceedings of the 24th Iranian Conference on Electrical Engineering. Shiraz, Iran:IEEE, 2016. 250-255
    [9] 辛宇, 谢志强, 杨静.基于话题概率模型的语义社区发现方法研究.自动化学报, 2015, 41(10):1693-1710 http://www.aas.net.cn/CN/Y2015/V41/I10/1693

    Xin Yu, Xie Zhi-Qiang, Yang Jing. Semantic community detection research based on topic probability models. Acta Automatica Sinica, 2015, 41(10):1693-1710 http://www.aas.net.cn/CN/Y2015/V41/I10/1693
    [10] Budanitsky A, Hirst G. Evaluating WordNet-based measures of lexical semantic relatedness. Computational Linguistics, 2006, 32(1):13-47 doi: 10.1162/coli.2006.32.1.13
    [11] Taieb M A, Aouicha M B, Hamadou A B. A new semantic relatedness measurement using WordNet features. Knowledge and Information Systems, 2014, 41(2):467-497 doi: 10.1007/s10115-013-0672-4
    [12] 刘群, 李素建.基于《知网》的词汇语义相似度计算.中文计算语言学, 2002, 7(2):59-76 http://mall.cnki.net/magazine/Article/JSJY201308048.htm

    Liu Qun, Li Su-Jian. Word similarity computing based on HowNet. Computational Linguistics, 2002, 7(2):59-76 http://mall.cnki.net/magazine/Article/JSJY201308048.htm
    [13] Zhang P Y. A HowNet-based semantic relatedness kernel for text classification. TELKOMNIKA, 2013, 11(4):1909-1915 http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.301.3337
    [14] Zhang G P, Yu C, Cai D F, Song Y, Sun J G. Research on concept-sememe tree and semantic relevance computation. In:Proceedings of the 20th Pacific Asia Conference on Language, Information and Computation. Wuhan, China:Tsinghua University Press, 2006. 398-402
    [15] 田萱, 杜小勇, 李海华.语义查询扩展中词语——概念相关度的计算.软件学报, 2008, 19(8):2043-2053 https://www.wenkuxiazai.com/doc/3ed9fe8ecc22bcd126ff0c31-2.html

    Tian Xuan, Du Xiao-Yong, Li Hai-Hua. Computing term-concept association in semantic-based query expansion. Journal of Software, 2008, 19(8):2043-2053 https://www.wenkuxiazai.com/doc/3ed9fe8ecc22bcd126ff0c31-2.html
    [16] Ye F Y, Zhang F, Luo X F, Xu L Y. Research on measuring semantic correlation based on the Wikipedia hyperlink network. In:Proceedings of the IEEE/ACIS 12th International Conference on Computer and Information Science. Niigata, Japan:IEEE, 2013. 309-314
    [17] 万富强, 吴云芳.基于中文维基百科的词语语义相关度计算.中文信息学报, 2013, 27(6):31-38 http://www.docin.com/p-1630396880.html

    Wan Fu-Qiang, Wu Yun-Fang. Computing lexical semantic relatedness with Chinese Wikipedia. Journal of Chinese Information Processing, 2013, 27(6):31-38 http://www.docin.com/p-1630396880.html
    [18] 王宏显, 周强, 邬晓钧. 《知网》语义关系图的自动构建.中文信息学报, 2008, 22(5):90-96 http://d.old.wanfangdata.com.cn/Periodical/zwxxxb200805014

    Wang Hong-Xian, Zhou Qiang, Wu Xiao-Jun. The automatic construction of lexical semantic relationship graph based on HowNet. Journal of Chinese Information Processing, 2008, 22(5):90-96 http://d.old.wanfangdata.com.cn/Periodical/zwxxxb200805014
    [19] 郑丽娟, 邵艳秋, 杨尔弘.中文非投射语义依存现象分析研究.中文信息学报, 2014, 28(6):41-47 http://d.old.wanfangdata.com.cn/Periodical/zwxxxb201406006

    Zheng Li-Juan, Shao Yan-Qiu, Yang Er-Hong. Analysis of the non-projective phenomenon in Chinese semantic dependency graph. Journal of Chinese Information Processing, 2014, 28(6):41-47 http://d.old.wanfangdata.com.cn/Periodical/zwxxxb201406006
    [20] 张仰森, 郑佳. 中文文本语义错误侦测方法研究. 计算机学报, 2016, 39, 在线出版号No. 122

    Zhang Yang-Sen, Zheng Jia. Study of semantic error detecting method for Chinese text. Chinese Journal of Computers, 2016, 39, Online Publishing No.122
    [21] 张沪寅, 刘道波, 温春艳.基于《知网》的词语语义相似度改进算法研究.计算机工程, 2015, 41(2):151-156 doi: 10.3969/j.issn.1000-3428.2015.02.029

    Zhang Hu-Yin, Liu Dao-Bo, Wen Chun-Yan. Research on improved algorithm of word semantic similarity based on HowNet. Computer Engineering, 2015, 41(2):151-156 doi: 10.3969/j.issn.1000-3428.2015.02.029
    [22] Finkelstein L, Gabrilovich E, Matias Y, Rivlin E, Solan Z, Wolfman G, Ruppin E. Placing search in context:the concept revisited. ACM Transactions on Information Systems, 2002, 20(1):116-131 doi: 10.1145/503104.503110
    [23] 汪祥, 贾焰, 周斌, 丁兆云, 梁政.基于中文维基百科链接结构与分类体系的语义相关度计算.小型微型计算机系统, 2011, 32(11):2237-2242 http://www.doc88.com/p-9965404579619.html

    Wang Xiang, Jia Yan, Zhou Bin, Ding Zhao-Yun, Liang Zheng. Computing semantic relatedness using Chinese Wikipedia links and taxonomy. Journal of Chinese Computer Systems, 2011, 32(11):2237-2242 http://www.doc88.com/p-9965404579619.html
    [24] Liu B Q, Feng J, Liu M, Liu F, Wang X L, Li P. Computing semantic relatedness using a word-text mutual guidance model. In:Proceedings of the 3rd CCF Conference on Natural Language Processing and Chinese Computing. Shenzhen, China:Springer, 2014. 67-78
  • 加载中
图(8) / 表(3)
计量
  • 文章访问数:  2281
  • HTML全文浏览量:  455
  • PDF下载量:  761
  • 被引次数: 0
出版历程
  • 收稿日期:  2017-01-03
  • 录用日期:  2017-02-15
  • 刊出日期:  2018-01-20

目录

    /

    返回文章
    返回