Improved Mispronunciation Detection Based on JSM and MLP
-
摘要: 针对发音错误检测的发音字典生成提出基于联合序列多阶模型(Joint-sequence multi-gram, JSM)和多层神经感知(Multi-layer perception, MLP)的方法. 首先使用JSM模型对发音错误进行建模, 将标准发音和错误发音组合为发音对, 表示它们之间的对应关系, 再使用N元文法来统计各发音对之间的关系, 描述错误发音对上下文关系的依赖. 最后使用MLP对发音对之间的关系进行重新建模, 以学习到在相似的上下文条件下发生的相似的错误. 实验证明使用MLP对高阶模型进行概率重估能有效的平滑概率空间, 提高了发音错误检测的性能.Abstract: In this paper, we propose a method of dictionary generation based on joint-sequence multi-gram model (JSM) and multi-layer perception (MLP) for mispronunciation detection. The JSM model is firstly used to model the mispronunciation. The canonical pronunciation and mispronunciation are combined into pronunciation pairs for representation of their corresponding relationship; then the N-gram is used to count the relationship between pronunciation pairs to describe the dependence of mispronunciations on the context. Lastly, the MLP is used to model the relationship of pronunciation pairs again, in order to capture the similar mispronunciations occurred in similar contexts. Experiments show that rescoring the probability of high-order model by MLP can effectively smooth the probability, resulting in improved mispronunciation detection.
-
[1] Eskenazi M. An overview of spoken language technology for education. Speech Communication, 2009, 51(10): 823-844 [2] Ito A, Lim Y L, Suzuki M. Pronunciation error detection method based on error rule clustering using a decision tree. In: Proceeding of the 6th Annual Conference of the International Speech Communication Association. Tohoku University, Japan: ISCA, 2005. 173-176 [3] Yoon S Y, Hasegawa-Johnson M, Sproat R. Landmark-based automated pronunciation error detection. In: Proceeding of the 11th Annual Conference of the International Speech Communication Association. Tokyo: ISCA, 2010. 614-617 [4] Strika H, Truongb K, Wet F D, Cucchiarini C. Comparing different approaches for automatic pronunciation error detection. Speech Communication, 2009, 51(10): 845-852 [5] Zhang F, Huang C, Soong F K, Chu M, Wang R H. Automatic mispronunciation detection for Mandarin. In: Proceeding of 2010 IEEE International Conference on Acoustics, Speech and Signal Processing. Las Vegas, Nevada, USA: IEEE, 2008. 5077-5080 [6] Wei S, Hu G P, Hu Y, Wang R H. A new method for mispronunciation detection using support vector machine based on pronunciation space models. Speech Communication, 2009, 51(10): 896-905 [7] Wang H C, Waple C J, Kawahara T. Computer Assisted language learning system based on dynamic question generation and error prediction for automatic speech recognition. Speech Communication, 2009, 51(10): 995-1005 [8] Luo D, Yang X S, Wang L. Improvement of segmental mispronunciation detection with prior knowledge extracted from large L2 speech corpus. In: Proceeding of the 12th Annual Conference of the International Speech Communication Association. Florence, Italy: ISCA, 2011. 1593-1596 [9] Yuan H, Zhao J H, Liu J. A two-stage mispronunciation detection approach for computer-assisted pronunciation training. In: Proceeding of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2011. Xi'an, China: Asia-Pacific Signal and Information Processing Association, 2011. 972-976 [10] Meng H, Lo Y Y, Wang L, Lau W Y. Deriving salient learners' mispronunciations from cross-language phonological comparisons. In: Proceeding of the 2007 Automatic Speech Recognition and Understanding Workshop. Kyoto, Japan: IEEE, 2007. 437-442 [11] Lo W K, Zhang S, Meng H. Automatic derivation of phonological rules for mispronunciation detection in a computer-assisted pronunciation training system. In: Proceeding of the 11th Annual Conference of the International Speech Communication Association. Makuhari, Chiba, Japan: ISCA, 2010. 765-768 [12] Harrison A M, Lau W Y, Meng H, Wang L. Improving mispronunciation detection and diagnosis of learners' speech with context-sensitive phonological rules based on language transfer. In: Proceeding of the 9th Annual Conference of the International Speech Communication Association. Brisbane: ISCA, 2008. 2787-2790 [13] Stanley T, Hacioglu K, Pellom B. Statistical machine translation framework for modeling phonological errors in computer assisted pronunciation training system. In: The 2011 Speech and Language Technology in Education Workshop. Venice, Italy: ISCA, 2011. 125-128 [14] Stanley T, Hacioglu K. Improving L1-specific phonological error diagnosis in computer assisted pronunciation training. In: Proceeding of the 13th Annual Conference of the International Speech Communication Association. Portland, Oregon: ISCA, 2012. 826-829 [15] Qian X J, Meng H, Soong F F. On mispronunciation lexicon generation using joint-sequence multigrams in computer-aided pronunciation training. In: Proceeding of the 12th Annual Conference of the International Speech Communication Association. Italy, Florence: ISCA, 2011. 865-868 [16] Qian X J, Meng H, Soong F. Capturing L2 segmental mispronunciations with ioint-sequence models in computer-aided pronunciation training (CAPT). In: Proceeding of the 7th International Symposium on Chinese Spoken Language Processing. Taiwan, China: IEEE Computer Society, 2010. 84-88 [17] Gass S M, Selinker L. Language Transfer in Language Learning. Philadelphia, USA: John Benjamins Publishing Company, 1993. 87-101 [18] Mohri M, Pereira F, Riley M. Weighted finite-state transducers in speech recognition. Computer Speech and Language, 2002, 16(1): 69-88 [19] Harrison A M, Lo W K, Qian X J, Meng H. Implementation of an extended recognition network for mispronunciation detection and diagnosis in computer-assisted pronunciation training. In: The 2009 Speech and Language Technology in Education Workshop. Warwickshire, England: ISCA, 2009. 45-48 [20] Bisani M, Ney H. Joint-sequence models for grapheme-to-phoneme conversion. Speech Communication, 2008, 50(5): 434-451 [21] Schwenk H. Continuous space language models. Computer Speech and Language, 2007, 21(3): 492-518 [22] David T, Miles O. Randomised language modelling for statistical machine translation. In: Proceedings of the 45th Prague, Czech Republic Annual Meeting of the Association for Computational Linguistics. Prague, Czech Republic: ACL, 2007. 512-519 [23] Schwenk H. Continuous-space language models for statistical machine translation. The Prague Bulletin of Mathematical Linguistics, 2010, 93(1): 137-146 [24] Oparin I, Sundermeyer M, Ney H, Gauvain J. Performance analysis of neural networks in combination with n-gram language models. In: Proceeding of 2012 IEEE International Conference on Acoustics, Speech and Signal Processing. Kyoto, Japan: IEEE, 2012. 5005-5008
点击查看大图
计量
- 文章访问数: 1907
- HTML全文浏览量: 77
- PDF下载量: 1352
- 被引次数: 0