2.765

2022影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

高斯PLDA在说话人确认中的应用及其联合估计

许云飞 杨海 周若华 颜永红

许云飞, 杨海, 周若华, 颜永红. 高斯PLDA在说话人确认中的应用及其联合估计. 自动化学报, 2014, 40(6): 1068-1074. doi: 10.3724/SP.J.1004.2014.01068
引用本文: 许云飞, 杨海, 周若华, 颜永红. 高斯PLDA在说话人确认中的应用及其联合估计. 自动化学报, 2014, 40(6): 1068-1074. doi: 10.3724/SP.J.1004.2014.01068
XU Yun-Fei, YANG Hai, ZHOU Ruo-Hua, YAN Yong-Hong. Gaussian PLDA for Speaker Verification and Joint Estimation. ACTA AUTOMATICA SINICA, 2014, 40(6): 1068-1074. doi: 10.3724/SP.J.1004.2014.01068
Citation: XU Yun-Fei, YANG Hai, ZHOU Ruo-Hua, YAN Yong-Hong. Gaussian PLDA for Speaker Verification and Joint Estimation. ACTA AUTOMATICA SINICA, 2014, 40(6): 1068-1074. doi: 10.3724/SP.J.1004.2014.01068

高斯PLDA在说话人确认中的应用及其联合估计

doi: 10.3724/SP.J.1004.2014.01068
基金项目: 

国家高技术研究发展计划(863计划)(2012AA012503),国家自然科学基金(10925419,90920302,61072124,11074275,11161140319,91120001,61271426),中国科学院战略性先导科技专项(XDA06030100,XDA06030500),中科院重点部署项目(KGZD-EW-103-2)资助

详细信息
    作者简介:

    许云飞 中国科学院声学研究所博士研究生. 2010 年获南开大学电子科学与技术学士学位. 主要研究方向为语音信号处理,说话人识别及机器学习.E-mail:xuyunfei@hccl.ioa.ac.cn

Gaussian PLDA for Speaker Verification and Joint Estimation

Funds: 

Supported by National High Technology Research and Development Program of China (863 Program) (2012AA012503), National Natural Science Foundation of China (10925419, 90920302, 61072124, 11074275, 11161140319, 91120001, 61271426), the Strategic Priority Research Program of the Chinese Academy of Sciences (XDA06030100, XDA06030500), and Priority Deployment Project of Chinese Academy of Sciences (KGZD-EW-103-2)

  • 摘要: 近年来,基于总变化因子的说话人识别方法成为说话人识别领域的主流方法.其中,概率线性鉴别分析(Probabilistic linear discriminant analysis,PLDA)因其优异的性能而得到学者们的广泛关注.然而,在估计PLDA模型时,传统的因子分析方法只更新模型空间,因此,模型均值不能很好地与更新后的模型空间耦合.提出联合估计法对模型均值和模型空间同时估计,得到更为严格的期望最大化更新公式,在美国国家标准与技术局说话人识别评测2010扩展测试数据库以及2012核心测试数据库上,等错率得到一定提升.
  • [1] Reynolds D A, Quatieri T F, Dunn R B. Speaker verification using adapted gaussian mixture models. Digital Signal Processing, 2000, 10(1-3): 19-41
    [2] Guo Wu, Li Yi-Jie, Dai Li-Rong, Wang Ren-Hua. Factor analysis and space assembling in speaker recognition. Acta Automatica Sinica, 2009, 35(9): 1193-1198(郭武, 李轶杰, 戴礼荣, 王仁华. 说话人识别中的因子分析以及空间拼接. 自动化学报, 2009, 35(9): 1193-1198)
    [3] Kenny P, Boulianne G, Dumouchel P. Eigenvoice modeling with sparse training data. IEEE Transactions on Speech Audio Processing, 2005, 13(3): 345-359
    [4] Kenny P, Boulianne G, Ouellet P, Dumouchel P. Joint factor analysis versus eigenchannels in speaker recognition. IEEE Transactions on Audio, Speech and Language Processing, 2007, 15(4): 1435-1447
    [5] He Liang, Shi Yong-Zhe, Liu Jia. Eigenchannel space combination method of joint factor analysis. Acta Automatica Sinica, 2011, 37(7): 849-856(何亮, 史永哲, 刘加. 联合因子分析中的本征信道空间拼接方法. 自动化学报, 2011, 37(7): 849-856)
    [6] Dehak N. Discriminative and generative approches for long-and short-term speaker characteristics modeling: Application to speaker verification [Ph.D. dissertation], École de Technologie Supérieure, Montreal, QC, Canada, 2009
    [7] Dehak N, Kenny P, Dehak R, Dumouchel P, Ouellet P. Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech and Language Processing, 2011, 19(4): 788-798
    [8] McLaren M, Leeuwen D A V. Sourcenormalised and weighted lda for robust speaker recognition using i-vectors. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. Prague, Czech Republic: IEEE, 2011. 5456-5459
    [9] Simon J D P, James H E. Probabilistic linear discriminant analysis for inferences about identity. In: Proceedings of International Conference on Computer Vision. Rio de Janeiro, Brazil: IEEE, 2007. 1-8
    [10] Dehak N, Karam Z, Reynolds D, Dehak R, Campbell W, Glass J. A channel-blind system for speaker verification. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. Prague, Czech Republic: IEEE, 2011. 4536-4539
    [11] Garcia Romero D, Espy Wilson C. Analysis of i-vector length normalization in speaker recognition systems. In: Proceedings of International Conference on Speech Communication and Technology. Florence, Italy: IEEE, 2011. 249-252
    [12] Bousquet P M, Larcher A, Matrouf D, Bonastre J F, O Plchot. Variance-spectra based normalization for i-vector standard and probabilistic linear discriminant analysis. In: Proceedings of Odyssey Speaker and Language Recognition Workshop. Biopolis, Singapore: 2012. 157-164
    [13] Brummer N, de Villiers E. The speaker partitioning problem. In: Proceedings of Odyssey Speaker and Language Recognition Workshop. Brno, Czech Republic: 2010. 194-201
    [14] Christopher M Bishop. Pattern Recognition and Machine Learning. Singapore: Springer, 2006. 439-441
    [15] Villalba J, Brümmer N. Towards fully Bayesian speaker recognition: integrating out the between speaker covariance. In: Proceedings of International Conference on Speech Communication and Technology. Florence, Italy: IEEE, 2011. 505-508
    [16] Kenny P. Bayesian speaker verification with heavy-tailed priors. In: Proceedings of Odyssey Speaker and Language Recognition Workshop. Brno, Czech Republic: 2010.
    [17] Yang Hai, Liang Chun-Yan, Xu Yun-Fei, Yang Lin, Yan Yong-Hong. Sparse probabilistic linear disciminant analysis for speaker verification. In: Proceedings of International Conference on Speech Communication and Technology. Portland, Oregon: IEEE, 2012.
    [18] Dehak N, Dehak R, Kenny P, Brummer N, Ouellet P, Dumouchel P. Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification. In: Proceedings of International Conference on Speech Communication and Technology. Brighton, UK: IEEE, 2009. 1559-1562
    [19] Dehak N, Kenny P, Dehak R, Glembek O, Dumouchel P, Burget L, Hubeika V. Support vector machines and joint factor analysis for speaker verification. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. Taipei, China: IEEE, 2009. 4237-4240
    [20] Schwarz P, Pavel M, Cernocky J. Hierarchical structures of neural networks for phoneme recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. Toulouse, France: IEEE, 2006.
    [21] McNemar Q. Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika, 1947, 12(2): 153-157
  • 加载中
计量
  • 文章访问数:  1974
  • HTML全文浏览量:  85
  • PDF下载量:  1302
  • 被引次数: 0
出版历程
  • 收稿日期:  2013-01-06
  • 修回日期:  2013-08-12
  • 刊出日期:  2014-06-20

目录

    /

    返回文章
    返回