Classification Algorithm of Support Vector Machine via p-norm Regularization
-
摘要: L2范数罚支持向量机(Support vector machine,SVM)是目前使用最广泛的分类器算法之一,同时实现特征选择和分类器构造的L1范数和L0范数罚SVM算法也已经提出.但是,这两个方法中,正则化阶次都是事先给定,预设p=2或p=1.而我们的实验研究显示,对于不同的数据,使用不同的正则化阶次,可以改进分类算法的预测准确率.本文提出p范数正则化SVM分类器算法设计新模式,正则化范数的阶次p可取范围为02范数罚SVM,L1范数罚SVM和L0范数罚SVM.Abstract: The L2 penalty support vector machine (SVM) algorithm is one of the most widely used learning algorithms, meanwhile L1 norm and L0 norm penalty support vector machines have been devised, which achieve simultaneously feature selection and classifier construction. However, in both methods, the regularization parameter is predetermined, i.e., the default p = 2 or p = 1. Our experimental study shows that different data, using a different regularization of order, can improve prediction accuracy of the classification algorithm. In this paper, new classifier design pattern of SVM based on p-norm regularization is proposed, where 02-norm, L1-norm, and L0-norm SVM.
-
[1] Boser B E,Guyon I M,Vapnik V N. A training algorithm for optimal margin classifiers. In:Proceedings of the 5th Annual Workshop on Computational Learning Theory. Pittsburgh,USA:ACM,1992. 144-152[2] Cortes C,Vapnik V. Support-vector networks. Machine Learning,1995,20(3):273-297[3] Weston J,Elisseeff A,Scholkopf B,Tipping M. Use of the zero-norm with linear models and kernel methods. Journal of Machine Learning Research,2003,3:1439-1461[4] Liu Qiao,Qin Zhi-Guang,Chen Wei,Zhang Feng-Li. Zero-norm penalized feature selection support vector machine. Acta Automatica Sinica,2011,37(2):252-256(刘峤,秦志光,陈伟,张凤荔. 基于零范数特征选择的支持向量机模型. 自动化学报,2011,37(2):252-256)[5] Liu Z,Jiang F,Tian G,Wang S,Sato F,Meltzer S J,et al. Sparse logistic regression with L_p penalty for biomarker identification. Statistical Applications in Genetics and Molecular Biology,2007,6(1):1-20[6] Shi J N,Yin W T,Osher S,Sajda P. A fast hybrid algorithm for large-scale L1-regularized logistic regression. Journal of Machine Learning Research,2010,11:713-741[7] Liu Y F,Wu Y C. Variable selection via a combination of the L0 and L1 penalties. Journal of Computational and Graphical Statistics,2007,16(4):782-798[8] Liu Y F,Zhang H H,Park C,Ahn J. Support vector machines with adaptive L_q penalties. Computational Statistics and Data Analysis,2007,51(12):6380-6394[9] Mangasarian O L,Gang K. Feature selection for nonlinear kernel support vector machines. In:Proceedings of the 7th IEEE International Conference on Data Mining Workshops. Omaha,USA:IEEE,2007. 231-236[10] Foucart S,Lai M J. Sparsest solutions of under determined linear system via L_q-minimization for 0 Applied and Computational Harmonic Analysis,2009,26(3):395-407[11] Fan J,Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association,2001,96(456):1348-1360[12] Holland P W,Welsch R E. Robust regression using iteratively reweighted least-squares. Communications in Statistics-Theory and Methods,1977,6(9):813-827[13] Gorodnitsky I F,Rao B D. Sparse signal reconstruction from limited data using FOCUSS:a re-weighted minimum norm algorithm. IEEE Transactions on Signal Processing,1997,45(3):600-616[14] Daubechies I,DeVor R,Fornasier M,Gunturk C S. Iteratively re-weighted least squares minimization for sparse recovery. Communications on Pure and Applied Mathematics,2008,63(1):1-38[15] Chartrand R,Yin W. Iteratively reweighted algorithms for compressive sensing. In:Proceedings of the IEEE International Conference on Acoustics,Speech and Signal Processing. Las Vegas,USA:IEEE,2008. 3869-3872[16] Candes E J,Wakin M B,Boyd S P. Enhancing sparsity by reweighted L1 minimization. Journal of Fourier Analysis and Applications,2008,14(5-6):877-905[17] Candes E J,Wakin M B,Boyd S P. Enhancing sparsity by reweighted L1 minimization. Journal of Fourier Analysis and Applications,2008,14(5):877-905[18] Wang L,Zhu J,Zou H. Hybrid huberized support vector machines for microarray classification and gene selection. Bioinformatics,2008,24(3):412-419[19] Wang L,Zhu J,Zou H. The doubly regularized support vector machine. Statistica Sinica,2006,16(2):589-615[20] Wechsler H. Reliable Face Recognition Methods:System Design,Implementation and Evaluation. New York:Springer,2010[21] Liu D H,Lam K M,Shen L S. Illumination invariant face recognition. Pattern Recognition,2005,38(10):1705-1716[22] Mian A. Online learning from local features for video-based face recognition. Pattern Recognition,2011,44(5):1068-1075[23] Zhang Xue-Gong. Introduction to statistical learning theory and support vector machines. Acta Automatica Sinica,2000,26(1):32-42(张学工. 关于统计学习理论与支持向量机. 自动化学报,2000,26(1):32-42)[24] Osuna E,Freund R,Girosi F. An improved training algorithm for support vector machines. In:Proceedings of the IEEE Workshop Neural Networks for Signal Processing. Amelia Island,USA:IEEE,1997. 276-285[25] Platt J C. Fast training of support vector machines using sequential minimal optimization. Advances in Kernel Methods:Support Vector Learning. Cambridge:MIT Press,1999. 185-208[26] Joachims T. Making large-scale support vector machine learning practical. Advances in Kernel Methods:Support Vector Learning. Cambridge:MIT Press,1999. 169-184[27] Shalev-Shwartz S,Singer Y,Srebro N. Pegasos:primal estimated sub-gradient solver for SVM. In:Proceedings of the 24th International Conference on Machine Learning. Corvallis,USA:ACM,2007. 807-814[28] Hsieh C J,Chang K W,Lin C J,Keerthi S S,Sundararajan S. A dual coordinate descent method for large-scale linear SVM. In:Proceedings of the 25th International Conference on Machine Learning. Helsinki,Finland:ACM,2008. 408-415[29] Teo C H,Vishwanthan S V N,Smola A J,Le Q V. Bundle methods for regularized risk minimization. Journal of Machine Learning Research,2010,11:311-365[30] Mangasarian O L. Exact 1-norm support vector machines via unconstrained convex differentiable minimization. Journal of Machine Learning Research,2006,7:1517-1530[31] Bradley P S,Mangasarian O L. Feature selection via concave minimization and support vector machines. In:Proceedings of the 15th International Conference on Machine Learning. San Francisco,USA:Morgan Kaufmann,1998. 82-90[32] Zhang H H,Liu Y,Wu Y,Zhu J. Variable selection for multicategory SVM via sup-norm regularization. Electronic Journal of Statistics,2008,2:149-167[33] Wang L,Shen X. On L1-norm multiclass support vector machines:methodology and theory. Journal of the American Statistical Association,2007,102(478):583-594[34] Amaldi E,Kann V. On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems. Theoretical Computer Science,1998,209(1-2):237-260[35] Scholkopf B,Smola A J. Learning with Kernels:Support Vector Machines,Regularization,Optimization,and Beyond. Cambridge:MIT Press. 2001[36] Ng A Y. Feature selection,L1 vs. L2 regularization,and rotational invariance. In:Proceedings of the 21st International Conference on Machine Learning. Banff,Canada:ACM,2004. 1-8[37] Lorentz G G. Metric entropy and approximation. Bulletin of the American Mathematical Society,1966,72(6):903-937[38] Kolmogorov A N,Tikhomirov V M.\varepsilon -entropy and \varepsilon -capacity of sets in functional spaces. American Mathematical Society Translations,1961,17(2):277-364[39] Kaban A,Durrant R J. Learning with L_q1 vs. L1-norm regularisation with exponentially many irrelevant features. In:Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases. Antwerp,Belgium:Springer,2008. 580-596[40] Pollard D. Convergence of Stochastic Processes. New York:Springer-Verlag,1984[41] Zhang T. Covering number bounds of certain regularized linear function classes. Journal of Machine Learning Research,2002,2:527-550[42] Luenberger D G,Ye Y Y. Linear and Nonlinear Programming (Third Edition). Boston:Springer,2007
点击查看大图
计量
- 文章访问数: 2940
- HTML全文浏览量: 67
- PDF下载量: 1446
- 被引次数: 0