A Fast Neighbor Prototype Selection Algorithm Based on Local Mean and Class Global Information
-
摘要: 压缩近邻法是一种简单的非参数原型选择算法,其原型选取易受样本读取序列、异常样本等干扰.为克服上述问题,提出了一个基于局部均值与类全局信息的近邻原型选择方法.该方法既在原型选取过程中,充分利用了待学习样本在原型集中k个同异类近邻局部均值和类全局信息的知识,又设定原型集更新策略实现对原型集的动态更新.该方法不仅能较好克服读取序列、异常样本对原型选取的影响,降低了原型集规模,而且在保持高分类精度的同时,实现了对数据集的高压缩效应.图像识别及UCI(University of California Irvine)基准数据集实验结果表明,所提出算法集具有较比较算法更有效的分类性能.Abstract: The condensed nearest neighbor (CNN) algorithm is a simple non-parametric prototype selection method, but its prototype selection process is susceptible to pattern read sequence, abnormal patterns and so on. To deal with the above problems, a new prototype selection method based on local mean and class global information is proposed. Firstly, the proposed method makes full use of those local means of the k heterogeneous and homogeneous nearest neighbors to each be-learning pattern and the class global information. Secondly, an updating process is introduced to the proposed method. Lastly, updating strategies are adopted in order to realize dynamic update of the prototype set. The proposed method can not only better lessen the influence of the pattern selected sequence and abnormal patterns on prototype selection, but also reduce the scale of the prototype set. The proposed method can achieve a higher compression efficiency that can guarantee the higher classification accuracy synchronously for original data set. Two image recognition data sets and University of California Irvine (UCI) benchmark data sets are selected as experimental data sets. The experiments show that the proposed method based on the classification performance is more effective than the compared algorithms.
-
Key words:
- Data classification /
- prototype selection /
- local mean /
- global class information /
- adaptive learning
-
[1] Wu X D, Kumar V, Quinlan J R, Ghosh J, Yang Q, Motoda H. Top 10 algorithms in data mining. Knowledge and Information Systems, 2008, 14(1): 1-37 [2] López J A O, Ochoa J A C, Trinidad J F M. Prototype selection methods. Computacióny Sistemas, 2010, 13(4): 449-462 [3] Verbiesta N, Cornelisa C, Herrerab F. FRPS: a fuzzy rough prototype selection method. Pattern Recognition, 2013, 46(10): 2770-2782 [4] Rico J R, Iňesta J M. New rank methods for reducing the size of the training set using the nearest neighbor rule. Pattern Recognition Letters, 2012, 33(5): 654-660 [5] Angiulli F. Fast nearest neighbor condensation for large data sets classification. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(11): 1450-1464 [6] Chang F, Lin C C, Lu C J. Adaptive prototype learning algorithms: theoretical and experimental studies. Journal of Machine Learning Research, 2006, 7: 2125-2148 [7] García S, Derrac J, Cano J R. Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(2): 417-435 [8] Wu Y Q, Lanakiev K, Govindaraju V. Improved k-nearest neighbor classification. Pattern Recognition, 2002, 35(10): 2311-2318 [9] Olvera-López J A, Carrasco-Ochoa J A, Martínez-Trinidad J F. A new fast prototype selection method based on clustering. Pattern Analysis and Applications, 2010, 13(2): 131-141 [10] Mitani Y, Hamamoto Y. A local mean-based nonparametric classifier. Pattern Recognition Letters, 2006, 27(10): 1151-1159 [11] Brown T A, Koplowitz J. The weighted nearest neighbour rule for class dependent sample size. IEEE Transaction on Information Theory, 1979, 25(5): 617-619 [12] Han E H, Karypis G. Centroid-Based Document Classification: analysis & Experimental Results. Technical Report 00-017, Computer Science, University of Minnesota, 2000 [13] Zeng Y, Yang Y P, Zhao L. Nonparametric classification based on local mean and class statistics. Expert Systems with Applications, 2009, 36(4): 8443-8448 [14] Brighton H, Mellish C. Advances in instance selection for instance-based learning algorithms. Data Mining and Knowledge Discovery, 2002, 6(2): 153-172 [15] Wang X Z, Wu B, He Y L. An iterative algorithm for sample selection based on the reachable and coverage. In: Proceedings of the IEEE International Conference on Communications Technology and Applications. Beijing, China: IEEE, 2009. 521-526 [16] Theodoridis S, Koutroumbas K. Pattern Recognition (Third Edition). New York: Elsevier, chapter 5, 2006 [17] Xu Y, Shen F R, Zhao J X. An incremental learning vector quantization algorithm for pattern classification. Neural Computing and Applications, 2012, 21(6): 1205-1215
点击查看大图
计量
- 文章访问数: 1826
- HTML全文浏览量: 90
- PDF下载量: 796
- 被引次数: 0