Regularization Based Eigenvoice Speaker Adaptation Method
-
摘要: 将正则化方法应用于本征音说话人自适应算法中,有效地解决了说话人子空间基的先验选择问题. 通过对似然函数引入适当的正则项,在优化过程中从候选本征音基矢量中自动选择最佳的本征音进行线性组合. 本文讨论了三种正则化因子,并给出了其数学优化算法. l1正则化可以得到说话人因子的稀疏解,其非零项即对应最佳本征音基矢量; l2正则化可以提高解的稳健性,在某种程度上减少了子空间维数的先验选择对识别率的影响;而弹性网正则化则通过线性组合在二者之间取得折衷.有监督说话人自适应实验表明,新方法与本征音方法的最好结果相比,在少量的自适应数据条件下(10s以下),识别率相对提高了近1%~2%.三种方法中, l1正则化略优于l2正则化,而在引入弹性网正则化后,系统性能有了进一步提高.Abstract: An efficient base selection method for eigenvoice speaker adaptation is proposed. With an appropriate regularization term, the set of eigenvoices which represent the unknown speaker are automatically chosen through a mathematic optimization process. Three regularization terms, namely l1, l2 and elastic net regularization, are discussed and the corresponding optimization methods are presented. With an l1 regularization, a sparse solution of the speaker factor is obtained and the non-zero terms correspond to the selected eigenvoices. Using l2 regularization, the robustness of the speaker factor can be improved in case of limited adaptation data and the requirement of the prior selection of speaker subspace can be relaxed. The elastic net regularization combines l1 and l2 regularization. Supervised speaker adaptation experimental results show that when the adaptation data are limited (less than 10s) the regularization method can improve the performance relatively by 1%~2% compared with the best results of the eigenvoice method. Among the three regularization terms, l1 regularization is slightly better than the l2 regularization, and performance can be further improved when the elastic net regularization is used.
-
Key words:
- Speech recognition /
- speaker adaptation /
- eigenvoice /
- regularization /
- elastic net
点击查看大图
计量
- 文章访问数: 1492
- HTML全文浏览量: 65
- PDF下载量: 756
- 被引次数: 0