Research on Joint Adaptation for Phonotactic Language Recognition
-
摘要: 针对真实环境下的语种识别,信道类型和通话内容等非语种方面因素的不同都会造成测试和训练条件的不匹配, 从而影响系统的识别性能.本文以音素识别器后接向量空间模型(Phone recognizer followed by vector space model, PRVSM)为语种识别系统,引入联合自适应算法来解决系统中测试和训练条件的失配问题.研究了三种自适应方法用于系统的不同阶段: 1)基于受约束的最大似然线性回归(Constrained maximum likelihood linear regression, CMLLR)的声学模型自适应; 2)基于全局N元文法的音位特征向量自适应; 3) VSM模型中的支持向量机(Support vector machines, SVM)自适应.在综合采用多种自适应技术后, PRVSM系统的性能有了较大的提高,在NIST LRE 2009测试库上对于30s、10s和3s的测试段, 基于不同音素识别器的PRVSM系统的等错误率(Equal error rate, EER)分别相对降低了18%~23%、12%~20%以及5%~9%.
-
关键词:
- 语种识别 /
- 音素识别器后接向量空间模型 /
- 联合自适应 /
- 受约束的最大似然线性回归 /
- 支持向量机自适应
Abstract: For language recognition in real application, a variety of non-language sources (i.e., channel, content, etc.) will induce mismatch between training and test utterances, which affects the recognition accuracy. This paper introduces joint adaptation to deal with the mismatch problem for the phone recognition followed by vector space model (PRVSM) system. We investigate three adaptation methods in different stage of the system: 1) acoustic model adaptation using constrained maximum likelihood linear regression (CMLLR); 2) phonotactic feature adaptation using the universal N-grams; 3) adapt-SVM for the vector space model(VSM).The joint adaptation is carried out by combining these methods and significant improvements can be obtained. Experiments on the NIST LRE 2009 evaluation corpus show that there are relative decreases of 18%~23%, 12%~20% and 5%~9% in EER for the 30s, 10s and 3s test conditions, respectively.
点击查看大图
计量
- 文章访问数: 1823
- HTML全文浏览量: 65
- PDF下载量: 838
- 被引次数: 0