2.765

2022影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于双因子高斯过程动态模型的声道谱转换方法

孙新建 张雄伟 杨吉斌 曹铁勇 钟新毅

孙新建, 张雄伟, 杨吉斌, 曹铁勇, 钟新毅. 基于双因子高斯过程动态模型的声道谱转换方法. 自动化学报, 2014, 40(6): 1198-1207. doi: 10.3724/SP.J.1004.2014.01198
引用本文: 孙新建, 张雄伟, 杨吉斌, 曹铁勇, 钟新毅. 基于双因子高斯过程动态模型的声道谱转换方法. 自动化学报, 2014, 40(6): 1198-1207. doi: 10.3724/SP.J.1004.2014.01198
SUN Xin-Jian, ZHANG Xiong-Wei, YANG Ji-Bin, CAO Tie-Yong, ZHONG Xin-Yi. Vocal Tract Spectrum Conversion Using a Two-factor Gaussian Process Dynamic Model. ACTA AUTOMATICA SINICA, 2014, 40(6): 1198-1207. doi: 10.3724/SP.J.1004.2014.01198
Citation: SUN Xin-Jian, ZHANG Xiong-Wei, YANG Ji-Bin, CAO Tie-Yong, ZHONG Xin-Yi. Vocal Tract Spectrum Conversion Using a Two-factor Gaussian Process Dynamic Model. ACTA AUTOMATICA SINICA, 2014, 40(6): 1198-1207. doi: 10.3724/SP.J.1004.2014.01198

基于双因子高斯过程动态模型的声道谱转换方法

doi: 10.3724/SP.J.1004.2014.01198
基金项目: 

国家自然科学基金(61072042),江苏省自然科学基金(BK2012510),解放军理工大学预先研究基金(20110205,20110211)资助

详细信息
    作者简介:

    张雄伟 中国人民解放军理工大学指挥信息系统学院教授. 主要研究方向为多媒体信息处理,智能计算,压缩感知.E-mail:xwzhang@public1.ptt.js.cn

Vocal Tract Spectrum Conversion Using a Two-factor Gaussian Process Dynamic Model

Funds: 

Supported by National Natural Science Foundation of China (61072042), Natural Science Foundation of Jiangsu Province (BK2012510), and Pre-research Foundation of PLA University of Science and Technology (20110205, 20110211)

  • 摘要: 针对作者已经提出的双因子高斯过程隐变量模型(Two-factor Gaussian process latent variable model,TF-GPLVM)用于语音转换时未考虑语音的动态特征,并且模型训练时需要估计的参数较多的问题,提出引入隐马尔科夫模型(Hidden Markov model,HMM)对语音动态特征进行建模,并利用HMM隐状态对各帧语音进行关于语义内容的概率软分类,建立了分离精度更高、运算负荷较小的双因子高斯过程动态模型(Two-factor Gaussian process dynamic model,TF-GPDM).基于此模型,设计了一种全新的基于说话人特征替换的语音声道谱转换方案.主、客观实验结果表明,无论是与传统的统计映射和频率弯折转换方法相比,还是与双因子高斯过程隐变量模型方法相比,本文方法都获得了语音质量和转换相似度的提升,以及两项性能的更佳平衡.
  • [1] Moulines E, Sagisaka Y. Voice conversion: state of the art and perspectives. Special Issue of Speech Communication. The Netherlands, 1995, 16(2): 125-126
    [2] Furui S. Research of individuality features in speech waves and automatic speaker recognition techniques. Speech Communication, 1986, 5(2): 183-197
    [3] Abe M, Nakamura S, Shikano K, Kuwabara H. Voice conversion through vector quantization. In: Proceedings of the 1998 IEEE International Conference on Acoustic, Speech, and Signal Processing. New York, USA: IEEE, 1988. 655-658
    [4] Arslan L M. Speaker transformation algorithm using segmental codebooks (STASC). Speech Communication, 1999, 28(3): 211-226
    [5] Narendranath M, Murthy H A, Rajendran S, Yegnanarayana B. Transformation of formants for voice conversion using artificial neural networks. Speech Communication, 1995, 16(2): 207-216
    [6] Guido R C, Vieira L S, Júnior S B, Sanchez F L, Maciel C D, Fonseca E S, Pereira J C. A neural-wavelet architecture for voice conversion. Neurocomputing, 2007, 71(1-3): 174 -180
    [7] Desai S, Black A W, Yegnanarayana B, Prahallad K. Spectral mapping using artificial neural networks for voice conversion. IEEE Transactions on Audio, Speech, and Language Processing, 2010, 18(5): 954-964
    [8] Stylianou Y, Cappé;O, Moulines E. Continuous probabilistic transform for voice conversion. IEEE Transactions on Speech and Audio Processing, 1998, 6(2): 131-142
    [9] Kain A B. High Resolution Voice Transformation [Ph.D. dissertation], OGI School of Science and Engineering at Oregon Health and Science University, United States, 2001
    [10] Toda T, Black A W, Tokuda K. Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Transactions on Audio, Speech, and Language Processing, 2007, 15(8): 2222-2235
    [11] Helander E, Virtanen T, Nurminen J, Gabbouj M. Voice conversion using partial least squares regression. IEEE Transactions on Audio, Speech, and Language Processing, 2010, 18(5): 912-921
    [12] Qiao Y, Saito D, Minematsu N. HMM-based sequence-to-frame mapping for voice conversion. In: Proceedings of the 2010 IEEE International Conference on Acoustic, Speech, and Signal Processing. Dallas, TX: IEEE, 2010. 4830-4833
    [13] Zen H, Nankaku Y, Tokuda K. Continuous stochastic feature mapping based on trajectory HMMs. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(2): 417-430
    [14] Helander E, Silén H, Virtanen T, Gabbouj M. Voice conversion using dynamic kernel partial least squares regression. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(3): 806-817
    [15] Valbret H, Moulines E, Tubach J P. Voice transformation using PSOLA technique. Speech Communication, 1992, 11(2-3): 175-187
    [16] Sundermann D, Ney H. VTLN-based voice conversion. In: Proceedings of the 3rd IEEE International Symposium on Signal Processing and Information Technology. Darmstadt, Germany: IEEE, 2003. 556-559
    [17] Shuang Z W, Bakis R, Qin Y. Voice conversion based on mapping formants. In: Proceedings of the 2006 TC-STAR Workshop on Speech-to-Speech Translation. Barcelona, Spain: ISCA, 2006. 219-223
    [18] Godoy E, Rosec O, Chonavel T. Voice conversion using dynamic frequency warping with amplitude scaling, for parallel or nonparallel corpora. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(4): 1313-1323
    [19] Erro D, Moreno A, Bonafonte A. Voice conversion based on weighted frequency warping. IEEE Transactions on Audio, Speech, and Language Processing, 2010, 18(5): 922-931
    [20] Toda T, Saruwatari H, Shikano K. Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum. In: Proceedings of the 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Salt Lake City, USA: IEEE, 2001. 841-844
    [21] Belin P, Zatorre R J, Lafaille P, Ahad P, Pike B. Voice-selective areas in human auditory cortex. Nature, 2000, 403(6767): 309-312
    [22] Minematsu N. Human speech model based on information separation and its application to speech processing. In: Proceedings of the 7th International Symposium on Chinese Spoken Language Processing. Tainan, China: IEEE, 2010. 477-482
    [23] Latinus M, Belin P. Human voice perception. Current Biology, 2011, 21(4): R143-R145
    [24] Popa V, Nurminen J, Gabbouj M. A novel technique for voice conversion based on style and content decomposition with bilinear models. In: Proceedings of the 2009 Interspeech. Brighton, UK: ISCA, 2009. 2655-2658
    [25] Xu N, Yang Z, Zhang L H, Zhu W P, Bao J Y. Voice conversion based on state-space model for modelling spectral trajectory. Electronics Letters, 2009, 45(14): 763-764
    [26] Sun X J, Zhang X W, Cao T Y, Yang J B, Sun J. Voice conversion using a two-factor Gaussian process latent variable model. Przeglad Elektrotechniczny, 2012, 88(12a): 318-324
    [27] Rasmussen C E, Williams C K I. Gaussian Processes for Machine Learning. Cambridge, MA: MIT Press, 2006. 7-13
    [28] Knagenhjelm H P, Kleijin W B. Spectral dynamics is more important than spectral distortion. In: Proceedings of the 1995 IEEE International Conference on Acoustic, Speech, and Signal Processing. Detroit, USA: IEEE, 1995. 732-735
    [29] Duxans H, Bonafonte A, Kain E, van Santen J. Including dynamic and phonetic information in voice conversion systems. In: Proceedings of the 2004 International Conference on Spoken Language Processing. Jeju Island, Korea: ISCA, 2004. 1193-1196
    [30] Duxans H. Voice Conversion Applied to Text-to-Speech Systems [Ph.D. dissertation], Polytechnic University of Catalonia, Barcelona, 2006
    [31] Møller M F. A scaled conjugate gradient algorithm for fast supervised learning. Neural Networks, 1993, 6(4): 525-533
    [32] Kominek J, Black A W. The CMU ARCTIC speech databases. In: Proceedings of the 5th ISCA Speech Synthesis Workshop. Pittsburgh, USA: ISCA, 2005. 223-224
    [33] Erro D, Moreno A, Bonafonte A. Flexible harmonic/stochastic speech synthesis. In: Proceedings of the 6th ISCA Workshop on Speech Synthesis. Bonn, Germany: ISCA, 2007. 194-199
  • 加载中
计量
  • 文章访问数:  1833
  • HTML全文浏览量:  62
  • PDF下载量:  750
  • 被引次数: 0
出版历程
  • 收稿日期:  2012-12-12
  • 修回日期:  2013-05-21
  • 刊出日期:  2014-06-20

目录

    /

    返回文章
    返回