基于MCE准则的语音识别特征线性判别分析

陈斌; 张连海; 牛铜; 屈丹; 李弼程

doi:10.3724/SP.J.1004.2014.01208

基于MCE准则的语音识别特征线性判别分析

doi: 10.3724/SP.J.1004.2014.01208 cstr: 32138.14.SP.J.1004.2014.01208

1.
解放军信息工程大学信息系统工程学院郑州 450002

基金项目:

国家自然科学基金（61175017）资助

详细信息

作者简介:
牛铜解放军信息工程大学信息系统工程学院博士研究生. 主要研究方向为语音增强，语音识别.E-mail：niutong0072@gmail.com

计量
- 文章访问数: 1962
- HTML全文浏览量: 100
- PDF下载量: 850
- 被引次数: 0
出版历程
- 收稿日期: 2013-07-15
- 修回日期: 2013-10-01
- 刊出日期: 2014-06-20

A Minimum Classification Error Criterion Based Linear Discriminant Analysis Method for Speech Recognition Feature

1.
Institute of Information System Engineering, PLA Information Engineering University, Zhengzhou 450002

Funds:

Supported by National Natural Science Foundation of China (61175017)

摘要

摘要: 提出了一种基于最小分类错误（Minimum classification error，MCE）准则的线性判别分析方法（Linear discriminant analysis，LDA），并将其应用到连续语音识别中的特征变换.该方法采用非参数核密度估计方法进行数据概率分布估计；根据得到的概率分布，在最小分类错误准则下，采用基于梯度下降的线性搜索算法求解判别分析变换矩阵.利用判别分析变换矩阵对相邻帧梅尔滤波器组输出拼接的超矢量变换降维，得到时频特征.实验结果表明，与传统的MFCC特征相比，经过本文判别分析提取的时频特征其识别准确率提高了1.41%，相比于HLDA（Heteroscedastic LDA）和近似成对经验正确率准则（Approximate pairwise empirical accuracy criterion，aPEAC）判别分析方法，识别准确率分别提高了1.14%和0.83%.
- 线性判别分析 /
- 语音识别 /
- 核密度估计 /
- 特征变换
Abstract: A linear discriminant analysis (LDA) method based on the minimum classification error criterion is proposed, and further it is applied to the continuous speech recognition feature transformation. The data probability distribution is estimated using non-parametric kernel density estimation method. According to the obtained probability distribution, a gradient descent based linear search procedure is performed to get the discriminant analysis transformation matrix under the minimum classification error criterion. The dimensionality of super-vector conjoined by the adjacent frames Mel filter bank output is reduced with the transformation matrix, and then after dimensionality reduction the time-frequency feature is acquired. Experimental results show that compared with the traditional MFCC feature, the recognition accuracy rate of the time-frequency feature extracted with the presented discriminant analysis method has a 1.41% improvement. In contrast with the HLDA and aPEAC discriminant analysis feature transformation method, the recognition accuracy of the presented method increases by 1.14% and 0.83% separately.
- Linear discriminant analysis (LDA) /
- speech recognition /
- kernel density estimation /
- feature transformation

HTML全文

参考文献(23)

[1]	Abbasian H, Nasersharif B, Akbari A, Rahmani M. Optimized linear discriminant analysis for extracting robust speech features. In: Proceedings of the 3rd International Symposium on Communications, Control and Signal Processing. St Julians: IEEE, 2008. 819-824
[2]	Nasersharif B, Akbari A. SNR-dependent compression of enhanced Mel sub-band energies for compensation of noise effects on MFCC features. Pattern Recognition Letters, 2011, 28(11): 1320-1326
[3]	Li Bi-Cheng, Shao Mei-Zhen, Huang Jie. Pattern Recognition Theory and Application. Xi'an: Xi'an University Press, 2008. 45-52 (李弼程, 邵美珍, 黄洁. 模式识别原理与应用. 西安: 西安电子科技大学出版社, 2008. 45-52)
[4]	Kumar N, Andreou A G. Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition. Speech Communication, 1998, 26(4): 283-297
[5]	Saon G, Padmanabhan M, Gopinath R, Chen S. Maximum likelihood discriminant feature spaces. In: Proceedings of the 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Istanbul: IEEE, 2000, 2: 1129-1132
[6]	Sakai M, Kitaoka N, Nakagawa S. Linear discriminant analysis using a generalized mean of class covariances and its application to speech recognition. IEICE Transactions on Information and Systems, 2008, E91-D(3): 478-487
[7]	Loog M, Duin R P W, Haeb-Umbach R. Multiclass linear dimension reduction by weighted pairwise Fisher criteria. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001, 23(7): 762-766
[8]	Lee H S, Chen B. Empirical error rate minimization based linear discriminant analysis. In: Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing. Taipei, China: IEEE, 2009. 1801-1804
[9]	Kenny P, Stafylakis T, Ouellet P. PLDA for speaker verification with utterances of arbitrary duration. In: Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada: IEEE, 2013. 7649-7653
[10]	Kanagasundaram A, Dean D, Vogt R. Weighted LDA techniques for I-vector based speaker verification. In: Proceedings of the 2012 IEEE International Conference on Acoustics, Speech and Signal Processing. Kyoto, Japan: IEEE, 2012. 4781-4784
[11]	Ye J X, Kobayashi T, Murakawa M. Kernel discriminant analysis for environmental sound recognition based on acoustic subspace. In: Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada: IEEE, 2013. 808-812
[12]	Senior A, Cho Y M, Weston J. Learning improved linear transforms for speech recognition. In: Proceedings of the 2012 IEEE International Conference on Acoustics, Speech and Signal Processing. Kyoto, Japan: IEEE, 2012. 1957-1960
[13]	Tomar V S, Rose R C. Efficient manifold learning for speech recognition using locality sensitive hashing. In: Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada: IEEE, 2013. 6995-6999
[14]	Heigold G, Ney H, Schluter R, Wiesler S. Discriminative training for automatic speech recognition. IEEE Signal Processing Magazine, 2012, 29(5): 58-69
[15]	Juang B H, Chou W, Lee C H. Minimum classification error rate methods for speech recognition. IEEE Transactions on Speech and Audio Processing, 1997, 5(3): 257-265
[16]	Biem A, Katagiri S, McDermott E, Juang B H. An application of discriminative feature extraction to filter-bank-based speech recognition. IEEE Transaction on Speech and Audio Processing, 2001, 9(2): 96-110
[17]	Ruske G, Faltlhauser R, Pfau T. Extended linear discriminative analysis (ELDA) for speech recognition. In: Proceedings of the 1998 ICSLP Sydney. Australia: ISCA, 1998. 1473-1476
[18]	Li X B, Li J Y, Wang R H. Dimensionality reduction using MCE-optimized LDA transformation. In: Proceedings of the 2004 IEEE International Conference on Acoustics, Speech and Signal Processing. Quebec, Canada: IEEE, 2004, 1: 137-140
[19]	Chengalvarayan R, Deng L. Use of generalized dynamic feature parameters for speech recognition. IEEE Transactions on Speech and Audio Processing, 1997, 5(3): 232-242
[20]	Wang Jun, Wang Shi-Tong, Deng Zhao-Hong, Ying Wen-Hao. Fast kernel density estimator based image thresholding algorithm for small target images. Acta Automatica Sinica, 2012, 38(10): 1679-1689(王骏, 王士同, 邓赵红, 应文豪. 面向小目标图像的快速核密度估计图像阈值分割算法. 自动化学报, 2012, 38(10): 1679-1689)
[21]	Scott D W. Multivariate Density Estimation: Theory, Practice, and Visualization. New York: John Wiley and Sons, 1992. 125-190
[22]	Botev Z I, Grotowski J F, Kroese D P. Kernel density estimation via diffusion. The Annals of Statistics, 2010, 38(5): 2916-2957
[23]	Simonoff J S. Smoothing Methods in Statistics. New York: Springer-Verlag, 1996. 53-64