Manifold Regularized Extreme Learning Machine for Language Recognition
-
摘要: 支持向量机 (Support vector machine, SVM) 在语种识别中已经起到了重要的作用.近些年来,极限学习机 (Extreme learning machine, ELM) 在很多领域取得了成功的应用.相比于 SVM, ELM 最大的优点在于极易实现、训练速度快,而且通常可以取得与 SVM 相近甚至优于 SVM 的识别性能. 鉴于 ELM 这些优异的特点,本文将 ELM 引入到语种识别中,并针对 ELM 由于随机初始化模型参 数所带来的潜在问题,提出了流形正则化极限学习机 (Manifold regularized extreme learning machine, MRELM) 算法.实验结果表明,在高斯超矢量(Gaussian supervector, GSV)特征空间上,相对于 SVM 基线系统,该算法对30秒语音的识别性能有明显的提升. 同时该算法也可以成功地应用到 i-vector 特征空间中,取得与当前主流的打分算法相近的识别性能.Abstract: Support vector machines (SVMs) have played an important role in the state-of-the-art language recognition systems. The recently developed extreme learning machine (ELM) which has been successfully applied to many areas tends to achieve much better generalization performance than the traditional SVM. Inspired by the excellent features of ELM, we introduce it into language recognition and propose a manifold regularized extreme learning machine (MRELM) to overcome the potential problem of ELM due to random initialization of model parameters. Experimental results show that the proposed algorithm can achieve much better performance than SVM at 30s durations in the Gaussian supervector (GSV) feature space. In addition, MRELM can be applied to the i-vector space and get comparable results to the existing scoring methods.
-
[1] Li H Z, Ma B, Lee K A. Spoken language recognition: from fundamentals to practice. Proceedings of the IEEE, 2013, 101(5): 1136-1159 [2] Biadsy F. Automatic dialect and accent recognition and its application to speech recognition [Ph.D. dissertation], Columbia University, USA, 2011. [3] Zissman M A, Berkling K M. Automatic language identification. Speech Communication, 2001, 35(1-2): 115-124 [4] Muthusamy Y K, Barnard E, Cole R A. Reviewing automatic language identification. IEEE Signal Processing Magazine, 1994, 11(4): 33-41 [5] Campbell W M, Singer E, Torres-Carrasquillo P A, Reynolds, D A. Language recognition with support vector machines. In: Proceedings of the 2004 ODYSSEY-The Speaker and Language Recognition Workshop. Toledo, Spain: ISCA, 2004. 285-288 [6] Campbell W M, Campbell J P, Reynolds D A, Singer E, Torres-Carrasquillo P A. Support vector machines for speaker and language recognition. Computer Speech & Language, 2006, 20(2-3): 210-229 [7] Huang G B, Zhu Q Y, Siew C K. Extreme learning machine: a new learning scheme of feedforward neural networks. In: Proceedings of the 2004 IEEE International Joint Conference on Neural Networks. Budapest, Hungary: IEEE, 2004. 985-990 [8] Huang G B, Wang D H, Lan Y. Extreme learning machines: a survey. International Journal of Machine Learning and Cybernetics, 2011, 2(2): 107-122 [9] Huang G B, Zhu Q Y, Siew C K. Extreme learning machine: theory and applications. Neurocomputing, 2006, 70(1-3): 489-501 [10] Huang G B, Zhou H M, Ding X J, Zhang R. Extreme learning machine for regression and multiclass classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2012, 42(2): 513-529 [11] Liang N Y, Huang G B, Saratchandran P, Sundararajan N. A fast and accurate online sequential learning algorithm for feedforward networks. IEEE Transactions on Neural Networks, 2006, 17(6): 1411-1423 [12] Xu J T, Zhou H M, Huang G B. Extreme learning machine based fast object recognition. In: Proceedings of the 15th IEEE International Conference on Information Fusion. Singapore: IEEE, 2012. 1490-1496 [13] Sole M M, Tsoeu M S. Sign language recognition using the extreme learning machine. In: Proceedings of the 2011 IEEE AFRICON Conference. Livingstone, Zambia: IEEE, 2011. 1-6 [14] Suresh S, Babu V, Sundararajan N. Image quality measurement using sparse extreme learning machine classifier. In: Proceedings of the 9th IEEE International Conference on Control, Automation, Robotics and Vision. Singapore: IEEE, 2006. 1-6 [15] Horata P, Chiewchanwattana S, Sunat K. Robust extreme learning machine. Neurocomputing, 2013, 102: 31-44 [16] Yu Q, Miche Y, Eirola E, Van Heeswijk M, Séverin E, Lendasse A. Regularized extreme learning machine for regression with missing data. Neurocomputing, 2013, 102: 45-51 [17] Zong W W, Huang G B, Chen Y Q. Weighted extreme learning machine for imbalance learning. Neurocomputing, 2013, 101: 229-242 [18] Iosifidis A, Tefas A, Pitas I. Minimum class variance extreme learning machine for human action recognition. IEEE Transactions on Circuits and Systems for Video Technology, 2013, 23(11): 1968-1979 [19] Tenenbaum J B, De Silva V, Langford J C. A global geometric framework for nonlinear dimensionality reduction. Science, 2000, 290(5500): 2319-2323 [20] Roweis S T, Saul L K. Nonlinear dimensionality reduction by locally linear embedding. Science, 2000, 290(5500): 2323-2326 [21] Huang G, Song S J, Gupta J N D, Wu C. Semi-supervised and unsupervised extreme learning machines. IEEE Transactions on Cybernetics, 2014, 44(12): 2405-2417 [22] Liu B, Xia S X, Meng F R, Zhou Y. Manifold regularized extreme learning machine. Neural Computing and Applications, 2015, DOI: 10.1007/s00521-014-1777-8 [23] Deng W Y, Zheng Q H, Chen L. Regularized extreme learning machine. In: Proceedings of the 2009 IEEE Symposium on Computational Intelligence and Data Mining. Nashville, USA: IEEE, 2009. 389-395 [24] Campbell W M, Sturim D E, Reynolds D A. Support vector machines using GMM supervectors for speaker verification. IEEE Signal Processing Letters, 2006, 13(5): 308-311 [25] Dehak N, Kenny P, Dehak R, Dumouchel P, Ouellet P. Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(4): 788-798 [26] Tomar V S, Rose R C. Manifold regularized deep neural networks. In: Proceedings of the 2014 Annual Conference of the International Speech Communication Association. Singapore: ISCA, 2014. 348-352 [27] Guan N Y, Tao D C, Luo Z G, Yuan B. Manifold regularized discriminative nonnegative matrix factorization with fast gradient descent. IEEE Transactions on Image Processing, 2011, 20(7): 2030-2048 [28] Belkin M, Niyogi P, Sindhwani V. Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. The Journal of Machine Learning Research, 2006, 7: 2399-2434 [29] Peng Y, Zhu J Y, Zheng W L, Lu B L. EEG-based emotion recognition with manifold regularized extreme learning machine. In: Proceedings of the 36th IEEE International Conference on Engineering in Medicine and Biology Society. San Diego, USA: IEEE, 2014. 974-977 [30] Wang H, Yan S C, Xu D, Tang X A, Huang T. Trace ratio vs. ratio trace for dimensionality reduction. In: Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, USA: IEEE, 2007. 1-8 [31] Martin A F, Greenberg C S. The 2009 NIST language recognition evaluation. In: Proceedings of the 2010 ODYSSEY-The Speaker and Language Recognition Workshop. Brno, Czech Republic: ISCA, 2010. 165-171 [32] Zhang W Q, Hou T, Liu J. Discriminative score fusion for language identification. Chinese Journal of Electronics, 2010, 19(1): 124-128 [33] Campbell W M, Sturim D E, Reynolds D A, Solomonoff A. SVM based speaker verification using a GMM supervector kernel and NAP variability compensation. In: Proceedings of the 2006 IEEE International Conference on Acoustics, Speech and Signal Processing. Toulouse, France: IEEE, 2006. 1-1 [34] Singer E, Torres-Carrasquillo P, Reynolds D, McCree A, Richardson F, Dehak N, Sturim D. The MITLL NIST LRE 2011 language recognition system. In: Proceedings of the 2012 The Speaker and Language Recognition Workshop. Singapore: ISCA, 2012. 209-215
点击查看大图
计量
- 文章访问数: 1877
- HTML全文浏览量: 107
- PDF下载量: 1777
- 被引次数: 0