Joint Feature Selection and Classification Design Based on Variational Relevance Vector Machine
-
摘要: 相关向量机(Relevance vector machine, RVM)是一种函数形式等价于支持向量机(Support vector machine, SVM)的全概率模型,利用变分贝叶斯(Variational Bayesian, VB)方法求解的RVM可以给出所有参数的后验分布. 进一步,通过对样本所在原始特征空间的稀疏化,基于线性核的RVM可以在分类的同时实现对原始特征的线性选择. 本文在传统VB-RVM的基础上提出一种特征选择和分类结合方法. 该方法采用Probit模型将分类问题与回归问题有机地结合起来, 同时,通过对特征维的幂变换扩展,不仅在分类时增加了样本的信息量, 可以构造非线性分类面,而且实现了非线性特征选择的功能. 通过对仿真数据和实测数据分别进行实验, 证明了该特征选择和分类结合方法的实用性和有效性.Abstract: The relevance vector machine (RVM) is a fully probabilistic model with an equivalent functional form as the support vector machine (SVM), which can give posterior distributions over all parameters through the variational Bayesian (VB) method. Moreover, the RVM with linear kernel can realize not only classification but also linear feature selection by imposing sparsity in feature space where data is originally represented. In this paper, a joint feature selection and classification design is proposed based on the traditional VB-RVM. In the proposed framework, the Probit model is utilized to connect the regression problem with the classification problem, and the feature dimension extension by power transformation can make full use of the samples form the nonlinear classification boundary, and can realize nonlinear feature selection as well. The experiments based on the synthetic data and measured data demonstrate the practicability and effectiveness of the proposed method.
-
Key words:
- Feature selection /
- sparsity /
- relevance vector machine (RVM) /
- Probit model /
- variational Bayesian (VB)
-
[1] Fodor I K. A Survey of Dimension Reduction Techniques, Technical Report UCRL-ID-148494, Lawrence Livermore National Laboratory, USA, 2002[2] Jain A, Zongker D. Feature selection: evaluation, application, and small sample performance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997, 19(2): 153-158[3] Jackson J E. A User's Guide to Principal Component. New York: John Wiley and Sons, 1991[4] Yu H, Yang J. A direct LDA algorithm for high-dimensional data with application to face recognition. Pattern Recognition, 2001, 34(10): 2067-2070[5] Mardia K V, Kent J T, Bibby J M. Multivariate Analysis. London: Academic Press, 1980[6] Duda R O, Hart P E, Stork D G. Pattern Classification (Second Edition). New York: John Wiley and Sons, 1997. 94-99[7] Kira K, Rendell L A. The feature selection problem: traditional methods and a new algorithm. In: Proceedings of the 10th National Conference on Artificial Intelligence. California, USA: AAAI, 1992. 129-134[8] Wang X Y, Yang J, Teng X L, Xia W J, Jensen R. Feature selection based on rough sets and particle swarm optimization. Pattern Recognition Letters, 2007, 28(4): 459-471[9] Tipping M E. The relevance vector machine. Advances in Neural Information Processing Systems 12. Cambridge: The MIT Press, 2000. 652-658[10] Tipping M E. Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 2001, 1: 211-244[11] Bishop C M, Tipping M E. Variational relevance vector machines. In: Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence. San Francisco, USA: Morgan Kaufmann, 2000. 46-53[12] Carin L, Dobeck G J. Relevance vector machine feature selection and classification for underwater targets. In: Proceedings of the OCEANS. San Diego, USA: IEEE, 2003. 1110-1110[13] Li D F, Hu W C. Feature selection with RVM and its application to prediction modeling. Lecture Notes in Computer Science. Berlin: Springer-Verlag, 2006. 1140-1144[14] Girolami M, Rogers S. Variational Bayesian multinomial probit regression with Gaussian process priors. Neural Computation, 2006, 18(8): 1790-1817[15] Zhou X, Wang X, Dougherty E R. Multi-class cancer classification using multinomial probit regression with Bayesian gene selection. IEE Proceedings Systems Biology, 2006, 153(2): 70-78[16] Damoulas T, Girolami M. Probabilistic multi-class multi-kernel learning: on protein fold recognition and remote homology detection. Bioinformatics, 2008, 24(10): 1264-1270[17] Burges C J C. A tutorial on support vector machine for pattern recognition. Data Mining and Knowledge Discovery, 1998, 2(2): 121-167[18] Hou Qing-Yu. Study of Radar Automatic Target Recognition Methods Based on High Resolution Profile [Ph.D. dissertation], Xidian University, China, 2009(侯庆禹. 基于高分辨距离像的雷达自动目标识别方法研究 [博士学位论文], 西安电子科技大学, 中国, 2009)[19] Krishnapuram B, Carin L, Figueiredo M A T, Hartemink A J. Sparse multinomial logistic regression: fast algorithms and generalization bounds. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(6): 957-968[20] Zhang Hai-Juan, Zhang Xiao-Ran, Wen Yan-Qing, Guo Ming-Ming. Using Fisher information matrix to deal with parameter estimation for truncated samples from normal distribution. Journal of Chongqing Technology Business University (Natural Science Edition), 2007, 24(3): 228-229(张海娟, 张晓冉, 温艳清, 郭明明. 用Fisher信息阵处理截断正态分布的参数估计. 重庆工商大学学报(自然科学版), 2007, 24(3): 228-229)[21] Beal M J. Variational Algorithms for Approximate Bayesian Inference [Ph.D. dissertation], London University, UK, 2003[22] Nielsen F B. Variational Approach to Factor Analysis and Related Models [Master dissertation], Technical University of Denmark, Denmark, 2004[23] Bi J B, Bennett K P, Embrechts M, Breneman C, Song M H. Dimensionality reduction via sparse support vector machines. Journal of Machine Learning Research, 2003, 3: 1229-1243[24] Xue Y, Liao X J, Carin L, Krishnapuram B. Multi-task learning for classification with Dirichlet process priors. Journal of Machine Learning Research, 2007, 8: 35-63[25] Bradley A P. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 1997, 30(7): 1145-1159
点击查看大图
计量
- 文章访问数: 2211
- HTML全文浏览量: 76
- PDF下载量: 951
- 被引次数: 0