基于扩展<I>N</I>元文法模型的快速语言模型预测算法

单煜翔; 陈谐; 史永哲; 刘加

doi:10.3724/SP.J.1004.2012.01618

基于扩展N元文法模型的快速语言模型预测算法

doi: 10.3724/SP.J.1004.2012.01618

1.
清华大学电子工程系清华信息科学与技术国家实验室北京 100084

计量
- 文章访问数: 2593
- HTML全文浏览量: 88
- PDF下载量: 1107
- 被引次数: 0
出版历程
- 收稿日期: 2012-01-01
- 修回日期: 2012-03-22
- 刊出日期: 2012-10-20

Fast Language Model Look-ahead Algorithm Using Extended N-gram Model

1.
Tsinghua National Laboratory for Information Science and Technology, Department of Electronic Engineering, Tsinghua University, Beijing 100084

摘要

摘要: 针对基于动态解码网络的大词汇量连续语音识别器,本文提出了一种采用扩展N元文法模型进行快速语言模型(Language model, LM)预测的方法.扩展N元文法模型统一了语言模型和语言模型预测树的表示与分数计算方法,从而大大简化了解码器的实现,极大地提升了语言模型预测的速度,使得高阶语言模型预测成为可能.扩展N元文法模型在解码之前离线生成,生成过程利用了N元文法的稀疏性加速计算过程,并采用了词尾节点前推和分数量化的方法压缩模型存储空间大小.实验表明,相比于采用动态规划在解码过程中实时计算语言模型预测分数的传统方法,本文提出的方法在相同的字错误率下使得整个识别系统识别速率提升了5～ 9 倍,并且采用高阶语言模型预测可获得比低阶预测更优的解码速度与精度.
- 语音识别 /
- 语言模型预测 /
- N元文法模型 /
- 解码
Abstract: For a dynamic network based large vocabulary continuous speech recognizer, this paper proposes a fast language model (LM) look-ahead method using extended N-gram model. The extended N-gram model unifies the representations and score computations of the LM and the LM look-ahead tree, and thus greatly simplifies the decoder implementation and improves the LM look-ahead speed significantly, which makes higher-order LM look-ahead possible. The extended N-gram model is generated off-line before decoding starts. The generation procedure makes use of sparseness of backing-off N-gram models for efficient look-ahead score computation, and uses word-end node pushing and score quantitation to compact the model's storage space. Experiments showed that with the same character error rate, the proposed method speeded up the overall recognition speed by a factor of 5～9 than the traditional dynamic programming method which computes LM look-ahead scores on-line during the decoding process, and that using higher-order LM look-ahead algorithm can achieve a faster decoding speed and better accuracy than using the lower-order look-ahead ones.
- Speech recognition /
- language model look-ahead /
- N-gram /
- decoding

HTML全文

参考文献(1)

[1]

Ortmanns S, Ney H, Eiden A. Language-model look-ahead for large vocabulary speech recognition. In: Proceedings of the 1996 International Conference on Spoken Language Processing. Philadelphia, PA, USA: IEEE, 1996. 2095-2098[2] Ortmanns S, Eiden A, Ney H. Improved lexical tree search for large vocabulary speech recognition. In: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing. Seattle, WA, USA: IEEE, 1998. 817-820[3] Soltau H, Saon G. Dynamic network decoding revisited. In: Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition and Understanding. Merano, Italy: IEEE, 2009. 276-281[4] Cardenal-López A, Diéguez-Tirado P, Garcia-Mateo C. Fast LM look-ahead for large vocabulary continuous speech recognition using perfect hashing. In: Proceedings of the 2002 IEEE International Conference on Acoustics, Speech and Signal Processing. Orlando, FL, USA: IEEE, 2002. 705 -708[5] Li X L, Zhao Y X. A fast and memory-efficient N-gram language model lookup method for large vocabulary continuous speech recognition. Computer Speech and Language, 2007, 21(1): 1-25[6] Huijbregts M, Ordelman R, de Jong F. Fast N-gram language model look-ahead for decoders with static pronunciation prefix trees. In: Proceedings of the 9th Annual Conference of the International Speech Communication Association. Brisbane Australia: ISCA, 2008. 1582-1585[7] Chen L Z, Chin K K. Efficient language model look-ahead probabilities generation using lower order LM look-ahead information. In: Proceedings of the 2008 IEEE International Conference on Acoustics, Speech and Signal Processing. Las Vegas, Nevada, USA: IEEE, 2008. 4925-4928[8] Nolden D, Ney H, Schluter R. Exploiting sparseness of backing-off language models for efficient look-ahead in LVCSR. In: Proceedings of the 2011 IEEE International Con-ference on Acoustics, Speech and Signal Processing. Prague, Czech: IEEE, 2011. 4684-4687[9] Mohri M, Pereira F, Riley M. Speech recognition with weighted finite-state transducers. Handbook on Speech Processing and Speech Communication, Part E: Speech Recognition. Heidelberg, Germany: Springer-Verlag, 2008. 559- 584[10] Young S J. A review of large-vocabulary continuous-speech. IEEE Signal Processing Magazine, 1996, 13(5): 45-57[11] Young S J, Russell N H, Thornton J H S. Token Passing: a Simple Conceptual Model for Connected Speech Recognition Systems. Technical Report CUED/F-INFENG/TR38, Engineering Department, Cambridge University, USA, 1989[12] Pylkknen J. New pruning criteria for efficient decoding. In: Proceedings of the 9th European Conference on Speech Communication and Technology (Interspeech 2005). Lisboa, Portugal: IEEE, 2005. 581-584[13] Chen S F, Goodman J. An Empirical Study of Smoothing Techniques for Language Modeling. Technical Report TR-10-98, Computer Science Group, Harvard University, USA, 1998[14] Ravishankar M K. Efficient Algorithms for Speech Recognition [Ph.D. dissertation], Carnegie Mellon University, USA, 1996

施引文献

资源附件(0)

访问统计