-
摘要: 无监督特征选择是统计模式识别领域中的基础问题,在过去数十年里一直受到重视.近年来,很多工作将特征选择归结为带有离散约束的非线性降维问题.这方面的研究采用数据服从流形分布的假设并强调运用流形学习技术.许多现有的特征选择方法运用图拉普拉斯的基本性质选择能够最大限度地保留数据流形的特征,例如SPEC(图拉普拉斯上的谱分解)、TR准则(迹比)、MSFS(多聚类特征选择)以及EVSC(特征值敏感准则).本文从另一类流形学习算法出发,提出了基于局部线性嵌入(LLE)的新算法.基于LLE特征选择的主要难点是求解带有二次规划和特征值分解的优化问题.我们证明了在特征选择问题中,LLE的目标函数可以按照维数分解,这有助于采用主成分分析(PCA)构造更好的特征.根据这些结果,本文提出了一种新的无监督特征选择算法LLS,它首先从LLE中计算样本间的局部关系,然后用这些关系估计每个特征对内在流形结构的贡献.这些贡献被表示为LLS评分、排序并作为特征选择的依据.我们还提出了一种推广LLS的局部线性旋转选择算法.在一些数据集上的实验结果说明了本文算法比基于拉普拉斯特征图的算法更有效.
-
关键词:
- 流形学习 /
- 拉普拉斯特征图 /
- 局部线性嵌入(LLE) /
- 特征选择
Abstract: Unsupervised feature selection is fundamental in statistical pattern recognition, and has drawn persistent attention in the past several decades. Recently, much work has shown that feature selection can be formulated as nonlinear dimensionality reduction with discrete constraints. This line of research emphasizes utilizing the manifold learning techniques, where feature selection and learning can be studied based on the manifold assumption in data distribution. Many existing feature selection methods such as Laplacian score, SPEC (spectrum decomposition of graph Laplacian), TR (trace ratio) criterion, MSFS (multi-cluster feature selection) and EVSC (eigenvalue sensitive criterion) apply the basic properties of graph Laplacian, and select the optimal feature subsets which best preserve the manifold structure defined on the graph Laplacian. In this paper, we propose a new feature selection perspective from locally linear embedding (LLE), which is another popular manifold learning method. The main difficulty of using LLE for feature selection is that its optimization involves quadratic programming and eigenvalue decomposition, both of which are continuous procedures and different from discrete feature selection. We prove that the LLE objective can be decomposed with respect to data dimensionalities in the subset selection problem, which also facilitates constructing better coordinates from data using the principal component analysis (PCA) technique. Based on these results, we propose a novel unsupervised feature selection algorithm, called locally linear selection (LLS), to select a feature subset representing the underlying data manifold. The local relationship among samples is computed from the LLE formulation, which is then used to estimate the contribution of each individual feature to the underlying manifold structure. These contributions, represented as LLS scores, are ranked and selected as the candidate solution to feature selection. We further develop a locally linear rotation-selection (LLRS) algorithm which extends LLS to identify the optimal coordinate subset from a new space. Experimental results on real-world datasets show that our method can be more effective than Laplacian eigenmap based feature selection methods. -
[1] Guyon I, Elisseeff A. An introduction to variable and feature selection. The Journal of Machine Learning Research, 2003, 3: 1157-1182 [2] [2] Zhao Z, Wang L, Liu H, Ye J P. On similarity preserving feature selection. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(3): 619-632 [3] [3] Bishop C M. Pattern Recognition and Machine Learning. New York: Springer, 2006. [4] [4] Belkin M, Niyogi P. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 2003, 15(6): 1373-1396 [5] [5] Bengio Y. Learning Deep Architectures for AI. Hanover, MA, USA: Now Publishers Inc., 2009. [6] [6] de la Torre F, Black M J. A framework for robust subspace learning. International Journal of Computer Vision, 2003, 54(1-3): 117-142 [7] [7] Roweis S T, Saul L K. Nonlinear dimensionality reduction by locally linear embedding. Science, 2000, 290(5500): 2323- 2326 [8] [9] Cai D, Zhang C Y, He X F. Unsupervised feature selection for multi-cluster data. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'10). New York, NY, USA: ACM, 2010. 333-342 [9] He X F, Cai D, Niyogi P. Laplacian score for feature selection. In: Proceedings of the 2006 Advances in Neural Information Processing Systems. Cambridge, MA: MIP, 2006. 507-514 [10] Zhao Z, Liu H. Spectral feature selection for supervised and unsupervised learning. In: Proceedings of the 24th International Conference on Machine Learning. New York, NY, USA: ACM, 2007. 1151-1157 [11] Zhao Z, Liu H. Semi-supervised feature selection via spectral analysis. In: Proceedings of the 2007 SIAM International Conference on Data Mining. Minneapolis, Minnesota: SIAM, 2007. 26-28 [12] Nie F P, Xiang S M, Jia Y Q, Zhang C S, Yan S C. Trace ratio criterion for feature selection. In: Proceedings of the 23rd National Conference on Artificial Intelligence. Chicago, Illinois, USA: AAAI, 2008. 671-676 [13] Kohavi R, John G H. Wrappers for feature subset selection. Artificial Intelligence, 1997, 97(1-2): 273-324 [14] Efron B, Hastie T, Johnstone I, Tibshirani R. Least angle regression. The Annals of Statistics, 2004, 32(2): 407-499 [15] Quinlan J R. C4. 5: Programs for Machine Learning. San Francisco, CA, USA: Morgan Kaufmann, 1993. [16] Duda R O, Hart P E, Stork D G. Pattern Classification. New York: Wiley, 2001. [17] Koller D, Friedman N. Probabilistic Graphical Models: Principles and Techniques. Cambridge: MIT Press, 2009. [18] Peng H C, Long F H, Ding C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(8): 1226-1238 [19] Nie F P, Huang H, Cai X, Ding C. Efficient and robust feature selection via joint l2, 1-norms minimization. In: Proceedings of the 2010 Advances in Neural Information Processing Systems. Vancouver, British Columbia, Canada, 2010. 1813-1821 [20] Zhao Z, Wang L, Liu H. Efficient spectral feature selection with minimum redundancy. In: Proceedings of the 24th AAAI Conference on Artificial Intelligence. Atlanta, Georgia, USA: AAAI, 2010. 673-678 [21] Jiang Y, Ren J T. Eigenvalue sensitive feature selection. In: Proceedings of the 28th International Conference on Machine Learning. New York, NY, USA: ACM, 2011. 89-96 [22] Chung F R K. Spectral Graph Theory. American Mathematical Society, 1997. [23] Belkin M, Niyogi P. Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Proceedings of the 2001 Advances in Neural Information Processing Systems, Vancouver. British Columbia, Canada: MIT, 2001. 585-591 [24] von Luxburg U. A tutorial on spectral clustering. Statistics and Computing, 2007, 17(4): 395-416 [25] Cai D. Spectral Regression: A Regression Framework for Efficient Regularized Subspace Learning [Ph.D. dissertation], Department of Computer Science, University of Illinois at Urbana-Champaign, USA, 2009. [26] Cai D, Bao H J, He X F. Sparse concept coding for visual analysis. In: Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Providence, RI: IEEE, 2011. 2905-2910 [27] Qiao H, Zhang P, Zhang B, Zheng S W. Tracking feature extraction based on manifold learning framework. Journal of Experimental and Theoretical Artificial Intelligence, 2011, 23(1): 23-38
点击查看大图
计量
- 文章访问数: 1990
- HTML全文浏览量: 120
- PDF下载量: 1399
- 被引次数: 0