摘要:
将一种新的流形距离作为相似性度量测度, 提出了一种用于无监督分类与识别的人工免疫系统方法. 通过基于流形距离的相似性度量, 有效利用样本集固有的全局一致性信息, 充分挖掘无类属样本的空间分布信息, 对样本进行类别划分. 新方法将免疫响应过程建模为一个四元组 AIR=(G,I,R,A) , 其中 G 为引发免疫响应的外界刺激, 即抗原; I 为所有可能抗体的集合; R 为抗体间相互作用的规则集合; A 为支配抗体反应、指导抗体进化的动态算法. 针对无监督分类问题, 将抗体编码为代表各类别的典型样本序号的排列, 利用动态算法 A 搜索能代表各类别的典型样本的最佳组合. 将新方法与标准的 K-均值算法、基于流形距离的进化聚类算法以及 Maulik 等人提出的基于遗传算法的聚类算法进行了性能比较. 对 6 个人工数据集及手写体数字识别问题的仿真实验结果显示, 新方法对样本空间分布复杂的无监督分类问题和实际的模式识别问题具有较高的准确率和较好的鲁棒性.
Abstract:
In this study, a novel artificial immune system algorithm for unsupervised classification and recognition is proposed by using a novel manifold distance based dissimilarity measure which can measure the geodesic distance along the manifold. The new method formulizes the immune response as a quaternion AIR=(G,I,R,A), where G denotes exterior stimulus or antigen, I denotes the set of valid antibodies, R denotes the set of reaction rules describing the interactions between antibodies, and A denotes the dynamical algorithm describing how the reaction rules are applied to antibody population. In order to solve unsupervised classification problems, the new method encodes each antibody as a sequence of real integer numbers representing the cluster representatives, and searches the optimal cluster representatives from a combinatorial optimization viewpoint using the dynamical algorithm A. Experimental results on six artificial datasets with different manifold structures and the USPS handwritten digit datasets show that the novel algorithm has the ability to identify complex non-convex clusters, compared with the K-means algorithm, a genetic algorithm-based clustering proposed by Maulik, and an evolutionary clustering algorithm with the manifold distance.