基于声学特征空间非线性流形结构的语音识别声学模型

张文林; 牛铜; 屈丹; 李弼程; 裴喜龙

doi:10.16383/j.aas.2015.c140399

基于声学特征空间非线性流形结构的语音识别声学模型

doi: 10.16383/j.aas.2015.c140399 cstr: 32138.14.j.aas.2015.c140399

1.
解放军信息工程大学信息系统工程学院郑州 450002

基金项目:

国家自然科学基金(61403415, 61175017)资助

详细信息

作者简介:
牛铜中国人民解放军信息工程大学信息系统工程学院博士研究生. 主要研究方向为语音增强, 语音识别.E-mail: niutong0072@gmail.com

通讯作者:
张文林中国人民解放军信息工程大学信息系统工程学院讲师. 2013 年获解放军信息工程大学博士学位. 主要研究方向为语音信号处理, 语音识别, 机器学习等. E-mail: zwlin 2004@163.com

计量
- 文章访问数: 2642
- HTML全文浏览量: 162
- PDF下载量: 1301
- 被引次数: 0
出版历程
- 收稿日期: 2014-06-03
- 修回日期: 2015-01-09
- 刊出日期: 2015-05-20

Feature Space Nonlinear Manifold Based Acoustic Model for Speech Recognition

1.
Institute of Information Systems Engineering, PLA Information Engineering University, Zhengzhou 450002

Funds:

Supported by National Natural Science Foundation of China (61403415, 61175017)

摘要

摘要: 从语音信号声学特征空间的非线性流形结构特点出发, 利用流形上的压缩感知原理, 构建新的语音识别声学模型. 将特征空间划分为多个局部区域, 对每个局部区域用一个低维的因子分析模型进行近似, 从而得到混合因子分析模型. 将上下文相关状态的观测矢量限定在该非线性低维流形结构上, 推导得到其观测概率模型. 最终, 每个状态由一个服从稀疏约束的权重矢量和若干个服从标准正态分布的低维局部因子矢量所决定. 文中给出了局部区域潜在维数的确定准则及模型参数的迭代估计算法. 基于RM语料库的连续语音识别实验表明, 相比于传统的高斯混合模型(Gaussian mixture model, GMM)和子空间高斯混合模型(Subspace Gaussian mixture model, SGMM), 新声学模型在测试集上的平均词错误率(Word error rate, WER)分别相对下降了33.1%和9.2%.
- 语音识别 /
- 声学模型 /
- 非线性流形 /
- 混合因子分析
Abstract: Based on nonlinear manifold structure of acoustic feature space of speech signal, a new type of acoustic model for speech recognition is developed using compressive sensing. The feature space is divided into multiple local areas, with each area approximated by a low dimensional factor analysis model, so that in a mixture of factor analyzers is obtained. By restricting the observation vectors to be located on that nonlinear manifold, the probabilistic model of each context dependent state can be derived. Each state is determined by a sparse weight vector and several low-dimensional factors which follow standard Gaussian distributions. The principle for selection of the dimension for each local area is given, and iterated estimation methods for various model parameters are presented. Continuous speech recognition experiments on the RM corpus show that compared with the conventional Gaussian mixture model (GMM) and the subspace Gaussian mixture model (SGMM), the new acoustic model reduces the word error rate (WER) by 33.1% and 9.2% respectively.
- Speech recognition /
- acoustic model /
- nonlinear manifold /
- mixture of factor analyzers

HTML全文

参考文献(19)

[1]	Olsen P A, Gopinath R A. Modeling inverse covariance matrices by basis expansion. IEEE Transactions on Speech and Audio Processing, 2004, 12(1): 37-46
[2]	[2] Ko T, Mak B. Eigentriphones for context-dependent acoustic modeling. IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(6): 1285-1294
[3]	[3] Ko T, Mak B. Eigentrigraphemes for under-resourced languages. Speech Communication, 2014, 56: 132-141
[4]	[4] Povey D, Burget L, Agarwal M, Akyazi P, Kai F, Ghoshal A, Glembek O, Goel N, Karafit M, Rastrow A, Rose R C, Schwarz P, Thomas S. The subspace Gaussian mixture model a structured model for speech recognition. Computer Speech Language, 2011, 25(2): 404-439
[5]	[5] Qi J, Wang D, Tejedor J. Subspace models for bottleneck features. In: Proceedings of the 14th Annual Conference of the International Speech Communication Association. Lyon, France: ISCA, 2013. 1746-1750
[6]	[6] Motlcek P, Imseng D, Garner P N. Crosslingual tandem-SGMM: exploiting out-of-language data for acoustic model and feature level adaptation. In: Proceedings of the 14th Annual Conference of the International Speech Communication Association. Lyon, France: ISCA, 2013. 510-514
[7]	[7] Lu L, Ghoshal A, Renals S. Cross-lingual subspace Gaussian mixture models for low-resource speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014, 22(1): 17-27
[8]	[8] Saon G, Chien J T. Bayesian sensing hidden Markov models. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(1): 43-54
[9]	[9] Zhang W B, Fung P. Sparse inverse covariance matrices for low resource speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(3): 659-668
[10]	Zhang W B, Fung P. Discriminatively trained sparse inverse covariance matrices for speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014, 22(5): 873-882
[11]	Jansen A, Niyogi P. Intrinsic Fourier analysis on the manifold of speech sounds. In: Proceedings of the 2006 International Conference on Acoustics, Speech, and Signal Processing. Toulouse: IEEE, 2006. 1: 241-244
[12]	Lu X G, Dang J W. Vowel production manifold: intrinsic factor analysis of vowel articulation. IEEE Transactions on Audio, Speech, and Language Processing, 2010, 18(5): 1053-1062
[13]	Ghahramani Z, Hinton G. The EM Algorithm for Mixtures of Factor Analyzers, Technical Report CRG-TR-96-1, Department of Computer Science, University of Toronto, Toronto, Canada, 1996.
[14]	Carin L, Baraniuk R G, Cevher V, Dunson D, Jordan M I, Sapiro G, Wakin M B. Learning low-dimensional signal models. IEEE Signal Processing Magazine, 2011, 28(2): 39-51
[15]	Chen M H, Silva J, Paisley J, Wang C P, Dunson D, Carin L. Compressive sensing on manifolds using a nonparametric mixture of factor analyzers: algorithm and performance bounds. IEEE Transactions on Signal Processing, 2010, 58(12): 6140-6155
[16]	Bishop C M. Pattern Recognition and Machine Learning. New York: Springer Science+Business Media, 2006. 90-93
[17]	Povey D, Ghoshal A, Boulianne G, Burget L, Glembek O, Goel N, Hannemann M, Motlicek P, Qian Y M, Schwarz P, Silovsky J, Stemmer G, Vesely K. The Kaldi speech recognition toolkit. In: Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition and Understanding. Hawaii, US: IEEE, 2011.
[18]	Zibulevsky M, Elad M. L1-L2 optimization in signal and image processing. IEEE Signal Processing Magazine, 2010, 27(3): 76-88
[19]	Riedhammer K, Bocklet T, Ghoshal A, Povey D. Revisiting semi-continuous hidden Markov models. In: Proceedings of the 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Kyoto: IEEE, 2012. 4721-4724

施引文献

资源附件(0)

访问统计

计量

文章访问数: 2642
HTML全文浏览量: 162
PDF下载量: 1301
被引次数: 0

姓名
邮箱
手机号码
标题
留言内容
验证码

留言板

基于声学特征空间非线性流形结构的语音识别声学模型

doi: 10.16383/j.aas.2015.c140399 cstr: 32138.14.j.aas.2015.c140399

作者简介:
牛铜中国人民解放军信息工程大学信息系统工程学院博士研究生. 主要研究方向为语音增强, 语音识别.E-mail: niutong0072@gmail.com

通讯作者:
张文林中国人民解放军信息工程大学信息系统工程学院讲师. 2013 年获解放军信息工程大学博士学位. 主要研究方向为语音信号处理, 语音识别, 机器学习等. E-mail: zwlin 2004@163.com

计量

Feature Space Nonlinear Manifold Based Acoustic Model for Speech Recognition

计量

目录

留言板

基于声学特征空间非线性流形结构的语音识别声学模型

doi: 10.16383/j.aas.2015.c140399 cstr: 32138.14.j.aas.2015.c140399

作者简介: 牛铜 中国人民解放军信息工程大学信息系统工程学院博士研究生. 主要研究方向为语音增强, 语音识别.E-mail: niutong0072@gmail.com

通讯作者: 张文林 中国人民解放军信息工程大学信息系统工程学院讲师. 2013 年获解放军信息工程大学博士学位. 主要研究方向为语音信号处理, 语音识别, 机器学习等. E-mail: zwlin 2004@163.com

计量

出版历程

Feature Space Nonlinear Manifold Based Acoustic Model for Speech Recognition

计量

出版历程

目录

作者简介:
牛铜中国人民解放军信息工程大学信息系统工程学院博士研究生. 主要研究方向为语音增强, 语音识别.E-mail: niutong0072@gmail.com

通讯作者:
张文林中国人民解放军信息工程大学信息系统工程学院讲师. 2013 年获解放军信息工程大学博士学位. 主要研究方向为语音信号处理, 语音识别, 机器学习等. E-mail: zwlin 2004@163.com