A Combination Algorithm of Bi-directional Labeling in Identifying of Maximal-length Noun Phrases with Hybrid Feature
-
摘要: 最大名词短语的识别对机器翻译等诸多自然语言处理任务有着重要的意义. 以汉语最大名词短语识别为研究任务,在分析现有方法的基础上,从汉语的语言学 特殊性以及基于支持向量机的序列标注算法的特点出发,考查了基于混合特征的融合算法的适应性. 实验证明,采用词和基本组块混合标注单元的标注方法对汉语最大名词短语的识别 是有效的,并且其正反向识别结果具有一定的互补性, 在此基础上提出的基于"边界分歧"的双向序列标注融合算法恰能发 掘双向识别的互补性,并达到较高的融合精度.Abstract: Maximal-length noun phrase indentification is meaningful to machine translation and many other natural language processing tasks. For the purpose of studying Chinese maximal-length noun phrases, on the basis of current methods, starting from linguistics particularity in Chinese and characteristics of sequence labeling algorithm based on support vector machine (SVM), we explore the adaptability of combination algorithm based on hybrid features. The algorithm is effective, by theoretical analysis and experimental results, to identify Chinese maximal-length noun phrase by applying hybrid unit with words and base chunk, and it is complementary in bi-directional labeling results. From the above, a combination algorithm of bi-directional labeling based on "boundary fork" can discover complement of two directions identification and achieve a high combination accuracy.
-
Key words:
- Maximal-length noun phrase /
- bi-directional labeling /
- base chunk /
- hybrid feature
-
[1] Wang Z G, Zong C Q, Xue N W. Bidirectional sequence labeling via dual decomposition. In: Proceedings of the 12th China National Conference, CCL 2013 and First International Symposium. Suzhou, China: Springer, 2013. 325-332 [2] Kudo T, Matsumoto Y. Chunking with support vector machines. In: Proceedings of the 2nd Meeting of the North American Chapter of the Association for Computational Linguistics on Language Technologies. Pittsburgh, PA, USA: Association for Computational Linguistics, 2001. 192-199 [3] Tjong Kim Sang E F. Noun phrase recognition by system combination. In: Proceedings of the 1st North American Chapter of the Association for Computational Linguistics Conference. Seattle, Washington, USA: Association for Computational Linguistics, 2000. 50-55 [4] Chen W L, Zhang Y J, Isahara H. An empirical study of Chinese chunking. In: Proceedings of the 2006 COLING/ACL on Main Conference Poster Sessions. Sydney, Australia: Association for Computational Linguistics, 2006. 97-104 [5] Jian Ping, Zong Cheng-Qing. A new approach to identifying Chinese maximal-length phrases by combining bidirectional labeling. CAAI Transactions on Intelligent Systems, 2009, 4(5): 406-413 (鉴萍, 宗成庆. 基于双向标注融合的汉语最长短语识别方法. 智能系统学报, 2009, 4(5): 406-413) [6] Li Ye-Gang, Huang He-Yan. A survey on Chinese chunk parsing. Journal of Chinese Information Processing, 2013, 27(3): 1-8 (李业刚, 黄河燕. 汉语组块分析研究综述. 中文信息学报, 2013, 27(3): 1-8) [7] Zhou Qiang, Sun Mao-Song, Huang Chang-Ning. Automatic identification of Chinese maximal noun phrases. Journal of Software, 2000, 11(2): 195-201 (周强, 孙茂松, 黄昌宁. 汉语最长名词短语的自动识别. 软件学报, 2000, 11(2): 195-201) [8] Chen K H, Chen H H. Extracting noun phrases from large-scale texts: a hybrid approach and its automatic evaluation. In: Proceedings of the 32nd Annual Meeting of Association of Computational Linguistics. New York, USA: Association for Computational Linguistics, 1994. 234-241 [9] Cai D F, Liu X, Zhou Q L, Ye N. Chinese maximal noun phrase parsing based on cascaded conditional random fields. In: Proceedings of the 2009 International Conference on Natural Language Processing and Knowledge Engineering. Dalian, China: IEEE. 2009. 1-7 [10] Li Guo-Chen, Wang Rui-Bo, Li Ji-Hong. Automatic labeling of Chinese functional chunks based on conditional random fields model. Journal of Computer Research and Development, 2009, 47(2): 336-343 (李国臣, 王瑞波, 李济洪. 基于条件随机场模型的汉语功能块自动标注. 计算机研究与发展, 2009, 47(2): 336-343) [11] Vapnik V N. An overview of statistical learning theory. IEEE Transactions on Neural Networks, 1999, 10(5): 988-999 [12] Yang Zhi-Min, Shao Yuan-Hai, Liang Jing. Unascertained support vector machine. Acta Automatica Sinica, 2013, 39(6): 895-901 (杨志民, 邵元海, 梁静. 未确知支持向量机. 自动化学报, 2013, 39(6): 895-901) [13] Ying Wen-Hao, Wang Shi-Tong, Deng Zhao-Hong, Wang Jun. Support vector machine for domain adaptation based on class distribution. Acta Automatica Sinica, 2013, 39(8): 1273-1288 (应文豪, 王士同, 邓赵红, 王骏. 基于类分布的领域自适应支持向量机. 自动化学报, 2013, 39(8): 1273-1288) [14] Liang Yan-Ming, Su Fang, Li Qi, Liu Ding. A self-organizing algorithm for T-S fuzzy model based on support vector machine regression and its application. Acta Automatica Sinica, 2013, 39(12): 2143-2149 (梁炎明, 苏芳, 李琦, 刘丁. 基于支持向量机回归的T-S模糊模型自组织算法及应用. 自动化学报, 2013, 39(12): 2143-2149) [15] Zhang Xue-Feng, Wang Peng-Hui, Feng Bo, Du Lan, Liu Hong-Wei. A new method to improve radar HRRP recognition and outlier rejection performances based on classifier combination. Acta Automatica Sinica, 2014, 40(2): 348-356 (张学峰, 王鹏辉, 冯博, 杜兰, 刘宏伟. 基于多分类器融合的雷达高分辨距离像目标识别与拒判新方法. 自动化学报, 2014, 40(2): 348-356)
点击查看大图
计量
- 文章访问数: 1417
- HTML全文浏览量: 87
- PDF下载量: 546
- 被引次数: 0