Gaussian PLDA for Speaker Verification and Joint Estimation
-
摘要: 近年来,基于总变化因子的说话人识别方法成为说话人识别领域的主流方法.其中,概率线性鉴别分析(Probabilistic linear discriminant analysis,PLDA)因其优异的性能而得到学者们的广泛关注.然而,在估计PLDA模型时,传统的因子分析方法只更新模型空间,因此,模型均值不能很好地与更新后的模型空间耦合.提出联合估计法对模型均值和模型空间同时估计,得到更为严格的期望最大化更新公式,在美国国家标准与技术局说话人识别评测2010扩展测试数据库以及2012核心测试数据库上,等错率得到一定提升.Abstract: Recently the approaches based on i-vector have become very popular in the speaker recognition domain. Among these methods, the probabilistic linear discriminant analysis (PLDA) has attracted much attention due to its promising performance. However, the traditional factor analysis method only updates model space, thus making model mean couple with the model space unsuitably. This paper propose an approach of joint estimation for both model mean and model space, resulting in more strict expectation maximization (EM) formula. The equal error rate has been improved on the NIST SRE 2010 extended test corpus and NIST SRE 2012 core test corpus.
-
[1] Reynolds D A, Quatieri T F, Dunn R B. Speaker verification using adapted gaussian mixture models. Digital Signal Processing, 2000, 10(1-3): 19-41 [2] Guo Wu, Li Yi-Jie, Dai Li-Rong, Wang Ren-Hua. Factor analysis and space assembling in speaker recognition. Acta Automatica Sinica, 2009, 35(9): 1193-1198(郭武, 李轶杰, 戴礼荣, 王仁华. 说话人识别中的因子分析以及空间拼接. 自动化学报, 2009, 35(9): 1193-1198) [3] Kenny P, Boulianne G, Dumouchel P. Eigenvoice modeling with sparse training data. IEEE Transactions on Speech Audio Processing, 2005, 13(3): 345-359 [4] Kenny P, Boulianne G, Ouellet P, Dumouchel P. Joint factor analysis versus eigenchannels in speaker recognition. IEEE Transactions on Audio, Speech and Language Processing, 2007, 15(4): 1435-1447 [5] He Liang, Shi Yong-Zhe, Liu Jia. Eigenchannel space combination method of joint factor analysis. Acta Automatica Sinica, 2011, 37(7): 849-856(何亮, 史永哲, 刘加. 联合因子分析中的本征信道空间拼接方法. 自动化学报, 2011, 37(7): 849-856) [6] Dehak N. Discriminative and generative approches for long-and short-term speaker characteristics modeling: Application to speaker verification [Ph.D. dissertation], École de Technologie Supérieure, Montreal, QC, Canada, 2009 [7] Dehak N, Kenny P, Dehak R, Dumouchel P, Ouellet P. Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech and Language Processing, 2011, 19(4): 788-798 [8] McLaren M, Leeuwen D A V. Sourcenormalised and weighted lda for robust speaker recognition using i-vectors. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. Prague, Czech Republic: IEEE, 2011. 5456-5459 [9] Simon J D P, James H E. Probabilistic linear discriminant analysis for inferences about identity. In: Proceedings of International Conference on Computer Vision. Rio de Janeiro, Brazil: IEEE, 2007. 1-8 [10] Dehak N, Karam Z, Reynolds D, Dehak R, Campbell W, Glass J. A channel-blind system for speaker verification. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. Prague, Czech Republic: IEEE, 2011. 4536-4539 [11] Garcia Romero D, Espy Wilson C. Analysis of i-vector length normalization in speaker recognition systems. In: Proceedings of International Conference on Speech Communication and Technology. Florence, Italy: IEEE, 2011. 249-252 [12] Bousquet P M, Larcher A, Matrouf D, Bonastre J F, O Plchot. Variance-spectra based normalization for i-vector standard and probabilistic linear discriminant analysis. In: Proceedings of Odyssey Speaker and Language Recognition Workshop. Biopolis, Singapore: 2012. 157-164 [13] Brummer N, de Villiers E. The speaker partitioning problem. In: Proceedings of Odyssey Speaker and Language Recognition Workshop. Brno, Czech Republic: 2010. 194-201 [14] Christopher M Bishop. Pattern Recognition and Machine Learning. Singapore: Springer, 2006. 439-441 [15] Villalba J, Brümmer N. Towards fully Bayesian speaker recognition: integrating out the between speaker covariance. In: Proceedings of International Conference on Speech Communication and Technology. Florence, Italy: IEEE, 2011. 505-508 [16] Kenny P. Bayesian speaker verification with heavy-tailed priors. In: Proceedings of Odyssey Speaker and Language Recognition Workshop. Brno, Czech Republic: 2010. [17] Yang Hai, Liang Chun-Yan, Xu Yun-Fei, Yang Lin, Yan Yong-Hong. Sparse probabilistic linear disciminant analysis for speaker verification. In: Proceedings of International Conference on Speech Communication and Technology. Portland, Oregon: IEEE, 2012. [18] Dehak N, Dehak R, Kenny P, Brummer N, Ouellet P, Dumouchel P. Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification. In: Proceedings of International Conference on Speech Communication and Technology. Brighton, UK: IEEE, 2009. 1559-1562 [19] Dehak N, Kenny P, Dehak R, Glembek O, Dumouchel P, Burget L, Hubeika V. Support vector machines and joint factor analysis for speaker verification. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. Taipei, China: IEEE, 2009. 4237-4240 [20] Schwarz P, Pavel M, Cernocky J. Hierarchical structures of neural networks for phoneme recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. Toulouse, France: IEEE, 2006. [21] McNemar Q. Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika, 1947, 12(2): 153-157
点击查看大图
计量
- 文章访问数: 1974
- HTML全文浏览量: 85
- PDF下载量: 1302
- 被引次数: 0