2.765

2022影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

面向口语统计语言模型建模的自动语料生成算法

司玉景 肖业鸣 徐及 潘接林 颜永红

司玉景, 肖业鸣, 徐及, 潘接林, 颜永红. 面向口语统计语言模型建模的自动语料生成算法. 自动化学报, 2014, 40(12): 2808-2814. doi: 10.3724/SP.J.1004.2014.02808
引用本文: 司玉景, 肖业鸣, 徐及, 潘接林, 颜永红. 面向口语统计语言模型建模的自动语料生成算法. 自动化学报, 2014, 40(12): 2808-2814. doi: 10.3724/SP.J.1004.2014.02808
SI Yu-Jing, XIAO Ye-Ming, XU Ji, PAN Jie-Lin, YAN Yong-Hong. Automatic Text Corpus Generation Algorithm towards Oral Statistical Language Modeling. ACTA AUTOMATICA SINICA, 2014, 40(12): 2808-2814. doi: 10.3724/SP.J.1004.2014.02808
Citation: SI Yu-Jing, XIAO Ye-Ming, XU Ji, PAN Jie-Lin, YAN Yong-Hong. Automatic Text Corpus Generation Algorithm towards Oral Statistical Language Modeling. ACTA AUTOMATICA SINICA, 2014, 40(12): 2808-2814. doi: 10.3724/SP.J.1004.2014.02808

面向口语统计语言模型建模的自动语料生成算法

doi: 10.3724/SP.J.1004.2014.02808
基金项目: 

国家高技术研究发展计划(863计划)(2012AA012503),国家自然科学基金(10925419,90920302,61072124,11074275,11161140319,91120001,61271426),中国科学院战略性先导科技专项(XDA06030100,XDA06030500),中国科学院重点部署项目(KGZD-EW-103-2)资助

详细信息
    作者简介:

    肖业鸣 中国科学院声学研究所博士研究生.2008年获得北京航空航天大学学士学位.主要研究方向为大词汇量连续语音识别,深度学习和神经网络技术.E-mail: xiaoyeming@hccl.ioa.ac.cn

    通讯作者:

    司玉景 中国科学院声学研究所博士研究生.2009年获得吉林大学通信工程学院信息工程系学士学位.主要研究方向为统计语言模型建模, 语音识别解码技术, 机器学习,深度神经网络技术, 自动语音文本同步技术.本文通信作者.E-mail:siyujinglj@126.com

Automatic Text Corpus Generation Algorithm towards Oral Statistical Language Modeling

Funds: 

Supported by National High Technology Research and Development Program of China (863 Program) (2012AA012503), National Natural Science Foundation of China (10925419, 90920302, 61072124, 11074275, 11161140319, 91120001, 61271426), the Strategic Priority Research Program of Chinese Academy of Sciences (XDA06030100, XDA06030500), and the Chinese Academy of Sciences Priority Deployment Project (KGZD-EW-103-2)

  • 摘要: 在资源相对匮乏的自动语音识别(Automatic speech recognition, ASR)领域, 如面向电话交谈的语音识别系统中, 统计语言模型(Language model, LM)存在着严重的数据稀疏问题. 本文提出了一种基于等概率事件的采样语料生成算法, 自动生成领域相关的语料, 用来强化统计语言模型建模. 实验结果表明, 加入本算法生成的采样语料可以缓解语言模型的稀疏性, 从而提升整个语音识别系统的性能. 在开发集上语言模型的困惑度相对降低7.5%, 字错误率(Character error rate, CER)绝对降低0.2个点; 在测试集上语言模型的困惑度相对降低6%, 字错误率绝对降低0.4点.
  • [1] Yang Xing-Jun, Chi Hui-Sheng. Digital Processing of Speech Signals. Beijing: Electronic Industry Press, 1995. 330-331(杨行竣, 迟惠生. 语音信号数字处理. 北京: 电子工业出版牡, 1995. 330-331)
    [2] Chen S F, Goodman J. An empirical study of smoothing techniques for language modeling. In: Proceedings of the 34th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics. Santa Cruz, CA, 1996. 310-318
    [3] Allauzen C, Riley M. Bayesian language model interpolation for mobile speech input. In: Proceedings of the 2011 Interspeech. Italy, 2011. 1429-1432
    [4] Khudanpur S, Wu J. A maximum entropy language model integrating n-grams and topic dependencies for conversational speech recognition. In: Proceedings of the 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Phoenix, AZ: IEEE, 1999. 553-556
    [5] Schwenk H. CSLM —— a modular open-source continuous space language modeling toolkit. In: Proceedings of the 2013 Interspeech. Lyyon, France, 2013. 1198-1202
    [6] Mikolov T, Karafiát M, Burget L, Černocký J H, Khudanpur S. Recurrent neural network based language model. In: Proceedings of the 2010 INTERSPEECH. Lyon, France: ISCA, 2010. 1045-1048
    [7] Mikolov T, Deoras A, Kombrink S, Burget L, Cernocky J H. Empirical evaluation and combination of advanced language modeling techniques. In: Proceedings of the 2011 Interspeech. Italy, 2011. 605-608
    [8] Liu X, Wang Y, Chen X, Gales M J F, Woodland P C. Efficient lattice rescoring using recurrent neural network language models. In: Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). FLORENCE, ITALY, 2014. 4941-4945
    [9] Huang Yun-Zhu, Wei Wei, Luo Yang-Yu, Li Cheng-Rong. Word-class expansion method about training corpus of language modal in restrcited domain. Application of Computer System, 2011, 20(11): 55-58 (黄韵竹, 韦玮, 罗杨宇, 李成荣. 限定领域语言模型训练语料的词类扩展方法. 计算机系统应用, 2011, 20(11): 55-58)
    [10] Bengio Y, Boulanger-Lewandowski N, Pascanu R. Advances in optimizing recurrent networks. In: Proceedings of the 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Vancouver, Canada: IEEE, 2013. 8624-8628
    [11] Sutskever Ilya. Training Recurrent Neural Networks [Ph.D. dissertation], University of Toronto, Canada, 2013.
    [12] Si Y J, Zhang Z, Li T, Pan J, Yan Y. Enhanced word classing for recurrent neural network language model. Journal of Information & Computational Science, 2013, 10(12): 3595-3604
    [13] Shao J, Li T, Zhang Q Q, Zhao Q W, Yan Y H. A one-pass real-time decoder using memory-efficient state network. IEICE Transactions on Information and Systems, 2008, 1(91): 529-537
    [14] Mikolov T, Kombrink S, Deoras A, Burget L, Cernocky J H. RNNLM-Recurrent neural network language modeling toolkit. In: Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition and Understanding, UK, 2011. 16-19
    [15] Shao Jian. Chinese Spoken Term Detection towards Large-Scale Telephone Conversational Speech [Ph.D. dissertation]. Institute of Acoustics, Chinese Academy of Sciences, China, 2008. (邵建. 面向大规模电话交谈语音的汉语语音检索[博士学位论文], 中国科学院声学研究所, 中国, 2008.)
  • 加载中
计量
  • 文章访问数:  2077
  • HTML全文浏览量:  116
  • PDF下载量:  709
  • 被引次数: 0
出版历程
  • 收稿日期:  2013-12-18
  • 修回日期:  2014-06-03
  • 刊出日期:  2014-12-20

目录

    /

    返回文章
    返回