高斯PLDA在说话人确认中的应用及其联合估计

许云飞; 杨海; 周若华; 颜永红

doi:10.3724/SP.J.1004.2014.01068

高斯PLDA在说话人确认中的应用及其联合估计

doi: 10.3724/SP.J.1004.2014.01068

1.
中国科学院语言声学与内容理解重点实验室北京 100190

基金项目:

国家高技术研究发展计划（863计划）（2012AA012503），国家自然科学基金（10925419，90920302，61072124，11074275，11161140319，91120001，61271426），中国科学院战略性先导科技专项（XDA06030100，XDA06030500），中科院重点部署项目（KGZD-EW-103-2）资助

详细信息

作者简介:
许云飞中国科学院声学研究所博士研究生. 2010 年获南开大学电子科学与技术学士学位. 主要研究方向为语音信号处理，说话人识别及机器学习.E-mail：xuyunfei@hccl.ioa.ac.cn

计量
- 文章访问数: 2171
- HTML全文浏览量: 132
- PDF下载量: 1318
- 被引次数: 0
出版历程
- 收稿日期: 2013-01-06
- 修回日期: 2013-08-12
- 刊出日期: 2014-06-20

Gaussian PLDA for Speaker Verification and Joint Estimation

1.
Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190

Funds:

Supported by National High Technology Research and Development Program of China (863 Program) (2012AA012503), National Natural Science Foundation of China (10925419, 90920302, 61072124, 11074275, 11161140319, 91120001, 61271426), the Strategic Priority Research Program of the Chinese Academy of Sciences (XDA06030100, XDA06030500), and Priority Deployment Project of Chinese Academy of Sciences (KGZD-EW-103-2)

摘要

摘要: 近年来，基于总变化因子的说话人识别方法成为说话人识别领域的主流方法.其中，概率线性鉴别分析（Probabilistic linear discriminant analysis，PLDA）因其优异的性能而得到学者们的广泛关注.然而，在估计PLDA模型时，传统的因子分析方法只更新模型空间，因此，模型均值不能很好地与更新后的模型空间耦合.提出联合估计法对模型均值和模型空间同时估计，得到更为严格的期望最大化更新公式，在美国国家标准与技术局说话人识别评测2010扩展测试数据库以及2012核心测试数据库上，等错率得到一定提升.
- 因子分析 /
- 总变化因子 /
- 概率线性鉴别分析 /
- 联合估计 /
- 期望最大化
Abstract: Recently the approaches based on i-vector have become very popular in the speaker recognition domain. Among these methods, the probabilistic linear discriminant analysis (PLDA) has attracted much attention due to its promising performance. However, the traditional factor analysis method only updates model space, thus making model mean couple with the model space unsuitably. This paper propose an approach of joint estimation for both model mean and model space, resulting in more strict expectation maximization (EM) formula. The equal error rate has been improved on the NIST SRE 2010 extended test corpus and NIST SRE 2012 core test corpus.
- Factor analysis /
- i-vector /
- probabilistic linear discriminant analysis (PLDA) /
- joint estimation /
- expectation-maximization (EM)

HTML全文

参考文献(21)

[1]	Reynolds D A, Quatieri T F, Dunn R B. Speaker verification using adapted gaussian mixture models. Digital Signal Processing, 2000, 10(1-3): 19-41
[2]	Guo Wu, Li Yi-Jie, Dai Li-Rong, Wang Ren-Hua. Factor analysis and space assembling in speaker recognition. Acta Automatica Sinica, 2009, 35(9): 1193-1198(郭武, 李轶杰, 戴礼荣, 王仁华. 说话人识别中的因子分析以及空间拼接. 自动化学报, 2009, 35(9): 1193-1198)
[3]	Kenny P, Boulianne G, Dumouchel P. Eigenvoice modeling with sparse training data. IEEE Transactions on Speech Audio Processing, 2005, 13(3): 345-359
[4]	Kenny P, Boulianne G, Ouellet P, Dumouchel P. Joint factor analysis versus eigenchannels in speaker recognition. IEEE Transactions on Audio, Speech and Language Processing, 2007, 15(4): 1435-1447
[5]	He Liang, Shi Yong-Zhe, Liu Jia. Eigenchannel space combination method of joint factor analysis. Acta Automatica Sinica, 2011, 37(7): 849-856(何亮, 史永哲, 刘加. 联合因子分析中的本征信道空间拼接方法. 自动化学报, 2011, 37(7): 849-856)
[6]	Dehak N. Discriminative and generative approches for long-and short-term speaker characteristics modeling: Application to speaker verification [Ph.D. dissertation], École de Technologie Supérieure, Montreal, QC, Canada, 2009
[7]	Dehak N, Kenny P, Dehak R, Dumouchel P, Ouellet P. Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech and Language Processing, 2011, 19(4): 788-798
[8]	McLaren M, Leeuwen D A V. Sourcenormalised and weighted lda for robust speaker recognition using i-vectors. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. Prague, Czech Republic: IEEE, 2011. 5456-5459
[9]	Simon J D P, James H E. Probabilistic linear discriminant analysis for inferences about identity. In: Proceedings of International Conference on Computer Vision. Rio de Janeiro, Brazil: IEEE, 2007. 1-8
[10]	Dehak N, Karam Z, Reynolds D, Dehak R, Campbell W, Glass J. A channel-blind system for speaker verification. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. Prague, Czech Republic: IEEE, 2011. 4536-4539
[11]	Garcia Romero D, Espy Wilson C. Analysis of i-vector length normalization in speaker recognition systems. In: Proceedings of International Conference on Speech Communication and Technology. Florence, Italy: IEEE, 2011. 249-252
[12]	Bousquet P M, Larcher A, Matrouf D, Bonastre J F, O Plchot. Variance-spectra based normalization for i-vector standard and probabilistic linear discriminant analysis. In: Proceedings of Odyssey Speaker and Language Recognition Workshop. Biopolis, Singapore: 2012. 157-164
[13]	Brummer N, de Villiers E. The speaker partitioning problem. In: Proceedings of Odyssey Speaker and Language Recognition Workshop. Brno, Czech Republic: 2010. 194-201
[14]	Christopher M Bishop. Pattern Recognition and Machine Learning. Singapore: Springer, 2006. 439-441
[15]	Villalba J, Brümmer N. Towards fully Bayesian speaker recognition: integrating out the between speaker covariance. In: Proceedings of International Conference on Speech Communication and Technology. Florence, Italy: IEEE, 2011. 505-508
[16]	Kenny P. Bayesian speaker verification with heavy-tailed priors. In: Proceedings of Odyssey Speaker and Language Recognition Workshop. Brno, Czech Republic: 2010.
[17]	Yang Hai, Liang Chun-Yan, Xu Yun-Fei, Yang Lin, Yan Yong-Hong. Sparse probabilistic linear disciminant analysis for speaker verification. In: Proceedings of International Conference on Speech Communication and Technology. Portland, Oregon: IEEE, 2012.
[18]	Dehak N, Dehak R, Kenny P, Brummer N, Ouellet P, Dumouchel P. Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification. In: Proceedings of International Conference on Speech Communication and Technology. Brighton, UK: IEEE, 2009. 1559-1562
[19]	Dehak N, Kenny P, Dehak R, Glembek O, Dumouchel P, Burget L, Hubeika V. Support vector machines and joint factor analysis for speaker verification. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. Taipei, China: IEEE, 2009. 4237-4240
[20]	Schwarz P, Pavel M, Cernocky J. Hierarchical structures of neural networks for phoneme recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. Toulouse, France: IEEE, 2006.
[21]	McNemar Q. Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika, 1947, 12(2): 153-157