基于正则化的本征音说话人自适应方法

张文林; 张连海; 牛铜; 屈丹; 李弼程

doi:10.3724/SP.J.1004.2012.01950

基于正则化的本征音说话人自适应方法

doi: 10.3724/SP.J.1004.2012.01950

1.
中国人民解放军信息工程大学信息工程学院郑州 450002

详细信息

通讯作者:
张文林

计量
- 文章访问数: 1658
- HTML全文浏览量: 96
- PDF下载量: 774
- 被引次数: 0
出版历程
- 收稿日期: 2011-12-27
- 修回日期: 2012-04-28
- 刊出日期: 2012-12-20

Regularization Based Eigenvoice Speaker Adaptation Method

1.
Institute of Information Engineering, PLA Information Engineering University, Zhengzhou 450002

摘要

摘要: 将正则化方法应用于本征音说话人自适应算法中,有效地解决了说话人子空间基的先验选择问题. 通过对似然函数引入适当的正则项,在优化过程中从候选本征音基矢量中自动选择最佳的本征音进行线性组合. 本文讨论了三种正则化因子,并给出了其数学优化算法. l1正则化可以得到说话人因子的稀疏解,其非零项即对应最佳本征音基矢量; l2正则化可以提高解的稳健性,在某种程度上减少了子空间维数的先验选择对识别率的影响;而弹性网正则化则通过线性组合在二者之间取得折衷.有监督说话人自适应实验表明,新方法与本征音方法的最好结果相比,在少量的自适应数据条件下(10s以下),识别率相对提高了近1%～2%.三种方法中, l1正则化略优于l2正则化,而在引入弹性网正则化后,系统性能有了进一步提高.
- 语音识别 /
- 说话人自适应 /
- 本征音 /
- 正则化 /
- 弹性网
Abstract: An efficient base selection method for eigenvoice speaker adaptation is proposed. With an appropriate regularization term, the set of eigenvoices which represent the unknown speaker are automatically chosen through a mathematic optimization process. Three regularization terms, namely l1, l2 and elastic net regularization, are discussed and the corresponding optimization methods are presented. With an l1 regularization, a sparse solution of the speaker factor is obtained and the non-zero terms correspond to the selected eigenvoices. Using l2 regularization, the robustness of the speaker factor can be improved in case of limited adaptation data and the requirement of the prior selection of speaker subspace can be relaxed. The elastic net regularization combines l1 and l2 regularization. Supervised speaker adaptation experimental results show that when the adaptation data are limited (less than 10s) the regularization method can improve the performance relatively by 1%～2% compared with the best results of the eigenvoice method. Among the three regularization terms, l1 regularization is slightly better than the l2 regularization, and performance can be further improved when the elastic net regularization is used.
- Speech recognition /
- speaker adaptation /
- eigenvoice /
- regularization /
- elastic net