2.765

2022影响因子

(CJCR)

  • 中文核心
  • EI
  • 中国科技核心
  • Scopus
  • CSCD
  • 英国科学文摘

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

一种改进的自适应聚类集成选择方法

徐森 皋军 花小朋 李先锋 徐静

徐森, 皋军, 花小朋, 李先锋, 徐静. 一种改进的自适应聚类集成选择方法. 自动化学报, 2018, 44(11): 2103-2112. doi: 10.16383/j.aas.2018.c170376
引用本文: 徐森, 皋军, 花小朋, 李先锋, 徐静. 一种改进的自适应聚类集成选择方法. 自动化学报, 2018, 44(11): 2103-2112. doi: 10.16383/j.aas.2018.c170376
XU Sen, GAO Jun, HUA Xiao-Peng, LI Xian-Feng, XU Jing. An Improved Adaptive Cluster Ensemble Selection Approach. ACTA AUTOMATICA SINICA, 2018, 44(11): 2103-2112. doi: 10.16383/j.aas.2018.c170376
Citation: XU Sen, GAO Jun, HUA Xiao-Peng, LI Xian-Feng, XU Jing. An Improved Adaptive Cluster Ensemble Selection Approach. ACTA AUTOMATICA SINICA, 2018, 44(11): 2103-2112. doi: 10.16383/j.aas.2018.c170376

一种改进的自适应聚类集成选择方法

doi: 10.16383/j.aas.2018.c170376
基金项目: 

江苏省高等学校自然科学研究项目 18KJB520050

江苏省媒体设计与软件技术重点实验室(江南大学)开放课题 18ST0201

国家自然科学基金 61105057

江苏省政策引导类计划(产学研合作)-前瞻性联合研究项目 BY2016065-01

江苏省自然科学基金 BK20151299

国家自然科学基金 61375001

详细信息
    作者简介:

    皋军  盐城工学院信息工程学院教授.主要研究方向为机器学习, 人工智能.E-mail:gaoj@ycit.cn

    花小朋  盐城工学院信息工程学院副教授.主要研究方向为机器学习, 人工智能.E-mail:huaxp@ycit.cn

    李先锋  盐城工学院信息工程学院副教授.主要研究方向为机器学习, 人工智能.E-mail:lxf@ycit.cn

    徐静  盐城工学院信息工程学院副教授.主要研究方向为机器学习, 人工智能.E-mail:xujingycit@163.com

    通讯作者:

    徐森  盐城工学院信息工程学院副教授.主要研究方向为机器学习, 人工智能, 文本挖掘.本文通信作者.E-mail:xusen@ycit.cn

An Improved Adaptive Cluster Ensemble Selection Approach

Funds: 

the Natural Science Foundation of the Jiangsu Higher Education Institutions of China 18KJB520050

Open Project of Jiangsu Key Laboratory of Media Design and Software Technology 18ST0201

National Natural Science Foundation of China 61105057

the Industry-Education-Research Prospective Project of Jiangsu Province BY2016065-01

Natural Science Foundation of Jiangsu Province BK20151299

National Natural Science Foundation of China 61375001

More Information
    Author Bio:

     Professor at the School of Information Engineering, Yancheng Institute of Technology. His research interest covers machine learning and artiflcial intelligence

     Associate professor at the School of Information Engineering, Yancheng Institute of Technology. His research interest covers machine learning and artiflcial intelligence

     Associate professor at the School of Information Engineering, Yancheng Institute of Technology. His research interest covers machine learning and artiflcial intelligence

     Associate professor at the School of Information Engineering, Yancheng Institute of Technology. Her research interest covers machine learning and artiflcial intelligence

    Corresponding author: XU Sen  Associated professor at the School of Information Engineering, Yancheng Institute of Technology. His research interest covers machine learning, artiflcial intelligence and document mining. Corresponding author of this paper
  • 摘要: 针对自适应聚类集成选择方法(Adaptive cluster ensemble selection,ACES)存在聚类集体稳定性判定方法不客观和聚类成员选择方法不够合理的问题,提出了一种改进的自适应聚类集成选择方法(Improved ACES,IACES).IACES依据聚类集体的整体平均归一化互信息值判定聚类集体稳定性,若稳定则选择具有较高质量和适中差异性的聚类成员,否则选择质量较高的聚类成员.在多组基准数据集上的实验结果验证了IACES方法的有效性:1)IACES能够准确判定聚类集体的稳定性,而ACES会将某些不稳定的聚类集体误判为稳定;2)与其他聚类成员选择方法相比,根据IACES选择聚类成员进行集成在绝大部分情况下都获得了更佳的聚类结果,在所有数据集上都获得了更优的平均聚类结果.
    1)  本文责任编委 赵铁军
  • 图  1  选择性聚类集成系统框架

    Fig.  1  Framework of selective cluster ensemble system

    图  2  采用聚类集体P1时获得的聚类结果(NMI值和F值)

    Fig.  2  Clustering results obtained when using cluster ensemble P1 (NMI scores and F measures)

    图  3  采用聚类集体P2时获得的聚类结果(NMI值和F值)

    Fig.  3  Clustering results obtained when using cluster ensemble P2 (NMI scores and F measures)

    图  4  当采用聚类集体P1时获得的聚类结果(平均NMI值和平均F值)

    Fig.  4  Clustering results obtained by combining cluster members selected by ACES and IACES via CSPA, AL, SC and KM++ when using cluster ensemble P1 (Total average NMI scores and total average F measures)

    图  5  当采用聚类集体P1时获得的聚类结果(平均NMI值和平均F值)

    Fig.  5  Clustering results obtained by combining cluster members selected by ACES and IACES via CSPA, AL, SC and KM++ when using cluster ensemble P1 (Total average NMI scores and total average F measures)

    表  1  实验数据集描述

    Table  1  Description of datasets

    Dataset$n_{d}$$n_{w}$$k$$^{\ast }$$n_{c}$Balance
    tr114146 4299460.046
    tr232045 8326340.066
    tr418787 45410880.037
    tr456908 26110690.088
    la13 20431 47265340.290
    la23 07531 47265430.274
    la126 27931 47261 0470.282
    hitech2 30110 08063840.192
    reviews4 06918 48359140.098
    sports8 58014 87071 2260.036
    classic7 09441 68141 7740.323
    k1b2 34021 83963900.043
    ng32 99815 81039990.998
    下载: 导出CSV

    表  2  分别根据ACES和IACES判定的聚类集体稳定性结果

    Table  2  Stability results of cluster ensemble according to ACES and IACES

    聚类集体$P_{1}$聚类集体$P_{2}$
    ACESIACESACESIACES
    DatasetMNMINumberStabilityTANMIProportionStabilityMNMINumberStabilityTANMIProportionStability
    tr110.655989S0.5390.7498S0.682940S0.5740.8384S
    tr230.663991S0.6070.9361S0.712904S0.6490.8736S
    tr410.731999S0.6420.9939S0.732959S0.6490.8922S
    tr450.7181 000S0.6400.9917S0.705922S0.6160.8121S
    la10.597863S0.5140.5553S0.592894S0.5410.6879S
    la20.593934S0.5240.6296S0.539735S0.4890.4374NS
    la120.634973S0.5580.7586S0.570838S0.4930.4938NS
    hitech0.551727S0.4750.3251NS0.537654S0.4580.2602NS
    reviews0.683940S0.6100.8480S0.672958S0.6080.7622S
    sports0.736998S0.6520.9637S0.651958S0.5850.7443S
    classic0.801966S0.6920.8375S0.709945S0.5940.7500S
    k1b0.673994S0.5850.8992S0.654969S0.5550.7811S
    ng30.541664S0.4510.3791NS0.525648S0.4670.4441NS
    下载: 导出CSV
  • [1] Duda R O, Hart P E, Stork D G. Pattern Classification (2nd edition). New York:John Wiley and Sons, 2001.
    [2] Jain A K, Murty M N, Flynn P J. Data clustering:a review. ACM Computing Surveys (CSUR), 1999, 31(3):264-323 doi: 10.1145/331499.331504
    [3] Jain A K. Data clustering:50 years beyond K-means. Pattern Recognition Letters, 2010, 31(8):651-666 doi: 10.1016/j.patrec.2009.09.011
    [4] Lee D D, Seung H S. Learning the parts of objects by non-negative matrix factorization. Nature, 1999, 401(6755):788-791 doi: 10.1038/44565
    [5] Frey B J, Dueck D. Clustering by passing messages between data points. Science, 2007, 315(5814):972-976 doi: 10.1126/science.1136800
    [6] Deng Z H, Choi K S, Jiang Y Z, Wang J, Wang S T. A survey on soft subspace clustering. Information Sciences, 2014, 348:84-106 http://dl.acm.org/citation.cfm?id=2906693
    [7] Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks. Science, 2006, 313(5786):504-507 doi: 10.1126/science.1127647
    [8] Xie J Y, Girshick R, Farhadi A. Unsupervised deep embedding for clustering analysis. In:Proceedings of the 33rd International Conference on Machine Learning. New York City, NY, USA:International Machine Learning Society, 2016. 478-487
    [9] Rodriguez A, Laio A. Clustering by fast search and find of density peaks. Science, 2014, 344(6191):1492-1496 doi: 10.1126/science.1242072
    [10] von Luxburg U. A tutorial on spectral clustering. Statistics and Computing, 2007, 17(4):395-416 doi: 10.1007/s11222-007-9033-z
    [11] Strehl A, Ghosh J. Cluster ensembles-a knowledge reuse framework for combining multiple partitions. The Journal of Machine Learning Research, 2002, 3(3):583-617 http://dl.acm.org/citation.cfm?id=944935
    [12] Topchy A, Jain A K, Punch W. A mixture model for clustering ensembles. In:Proceedings of the 4th SIAM International Conference on Data Mining. Lake Buena Vista, FL, USA:SIAM, 2004. 379-390
    [13] Fern X Z, Brodley C E. Solving cluster ensemble problems by bipartite graph partitioning. In:Proceedings of the 21st International Conference on Machine Learning. Banff, Alberta, Canada:ACM, 2004. 36
    [14] Fred A L N, Jain A K. Combining multiple clusterings using evidence accumulation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(6):835-850 doi: 10.1109/TPAMI.2005.113
    [15] Li T, Ding C, Jordan M I. Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization. In:Proceedings of the 7th IEEE International Conference on Data Mining (ICDM). Omaha, NE, USA:IEEE, 2007. 577-582
    [16] Ayad H G, Kamel M S. Cumulative voting consensus method for partitions with variable number of clusters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(1):160-173 doi: 10.1109/TPAMI.2007.1138
    [17] Iam-On N, Boongeon T, Garrett S, Price C. A link-based cluster ensemble approach for categorical data clustering. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(3):413-425 doi: 10.1109/TKDE.2010.268
    [18] Carpineto C, Romano G. Consensus clustering based on a new probabilistic rand index with application to subtopic retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(12):2315-2326 doi: 10.1109/TPAMI.2012.80
    [19] Wu J J, Liu H F, Xiong H, Cao J, Chen J. K-means-based consensus clustering:a unified view. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(1):155-169 doi: 10.1109/TKDE.2014.2316512
    [20] Berikov V, Pestunov I. Ensemble clustering based on weighted co-association matrices:error bound and convergence properties. Pattern Recognition, 2017, 63:427-436 doi: 10.1016/j.patcog.2016.10.017
    [21] Zhou Z H, Tang W. Clusterer ensemble. Knowledge-Based Systems, 2006, 19(1):77-83 doi: 10.1016/j.knosys.2005.11.003
    [22] Yang Y, Kamel M S. An aggregated clustering approach using multi-ant colonies algorithms. Pattern Recognition, 2006, 39(7):1278-1289 doi: 10.1016/j.patcog.2006.02.012
    [23] 罗会兰, 孔繁胜, 李一啸.聚类集成中的差异性度量研究.计算机学报, 2007, 30(8):1315-1324 doi: 10.3321/j.issn:0254-4164.2007.08.013

    Luo Hui-Lan, Kong Fan-Sheng, Li Yi-Xiao. An analysis of diversity measures in clustering ensembles. Chinese Journal of Computers, 2007, 30(8):1315-1324 doi: 10.3321/j.issn:0254-4164.2007.08.013
    [24] Yu Z W, Li L, Liu J M, Zhang J, Han G Q. Adaptive noise immune cluster ensemble using affinity propagation. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(12):3176-3189 doi: 10.1109/TKDE.2015.2453162
    [25] 褚睿鸿, 王红军, 杨燕, 李天瑞.基于密度峰值的聚类集成.自动化学报, 2016, 42(9):1401-1412 http://www.aas.net.cn/CN/abstract/abstract18928.shtml

    Chu Rui-Hong, Wang Hong-Jun, Yang Yan, Li Tian-Rui. Clustering ensemble based on density peaks. Acta Automatica Sinica, 2016, 42(9):1401-1412 http://www.aas.net.cn/CN/abstract/abstract18928.shtml
    [26] Xu S, Chan K S, Gao J, Xu X F, Li X F, Hua X P, An J. An integrated K-means-Laplacian cluster ensemble approach for document datasets. Neurocomputing, 2016, 214:495-507 doi: 10.1016/j.neucom.2016.06.034
    [27] Fern X Z, Lin W. Cluster ensemble selection. Statistical Analysis and Data Mining, 2008, 1(3):128-141 doi: 10.1002/sam.v1:3
    [28] Azimi J, Fern X. Adaptive cluster ensemble selection. In:Proceedings of the 21st International Joint Conference on Artificial Intelligence. Pasadena, California, USA:ACM, 2009. 992-997
    [29] Naldi M C, Carvalho A C P L F, Campello R J G B. Cluster ensemble selection based on relative validity indexes. Data Mining and Knowledge Discovery, 2013, 27(2):259-289 doi: 10.1007/s10618-012-0290-x
    [30] 毕凯, 王晓丹, 邢雅琼.基于证据空间有效性指标的聚类选择性集成.通信学报, 2015, 36(8):135-145 http://d.old.wanfangdata.com.cn/Periodical/txxb201508017

    Bi Kai, Wang Xiao-Dan, Xing Ya-Qiong. Cluster ensemble selection based on validity index in evidence space. Journal on Communications, 2015, 36(8):135-145 http://d.old.wanfangdata.com.cn/Periodical/txxb201508017
    [31] Iam-On N, Boongoen T. Comparative study of matrix refinement approaches for ensemble clustering. Machine Learning, 2015, 98(1-2):269-300 doi: 10.1007/s10994-013-5342-y
    [32] Fern X Z, Brodley C E. Random projection for high dimensional data clustering:a cluster ensemble approach. In:Proceedings of the 20th International Conference on Machine Learning. Washington, DC, USA:ACM, 2003. 186-193
    [33] Hadjitodorov S T, Kuncheva L I, Todorova L P. Moderate diversity for better cluster ensembles. Information Fusion, 2006, 7(3):264-275 doi: 10.1016/j.inffus.2005.01.008
    [34] Ng A Y, Jordan M I, Weiss Y. On spectral clustering:analysis and an algorithm. In:Proceedings of the 14th International Conference on Neural Information Processing Systems:Natural and Synthetic. Vancouver, British Columbia, Canada:ACM, 2001. 849-856
    [35] 徐森, 周天, 于化龙, 李先锋.一种基于矩阵低秩近似的聚类集成算法.电子学报, 2013, 41(6):1219-1224 doi: 10.3969/j.issn.0372-2112.2013.06.028

    Xu Sen, Zhou Tian, Yu Hua-Long, Li Xian-Feng. Matrix low rank approximation-based cluster ensemble algorithm. Acta Electronica Sinica, 2013, 41(6):1219-1224 doi: 10.3969/j.issn.0372-2112.2013.06.028
    [36] 周林, 平西建, 徐森, 张涛.基于谱聚类的聚类集成算法.自动化学报, 2012, 38(8):1335-1342 http://www.aas.net.cn/CN/abstract/abstract17740.shtml

    Zhou Lin, Ping Xi-Jian, Xu Sen, Zhang Tao. Cluster ensemble based on spectral clustering. Acta Automatica Sinica, 2012, 38(8):1335-1342 http://www.aas.net.cn/CN/abstract/abstract17740.shtml
    [37] Krogh A, Vedelsby J. Neural network ensembles, cross validation and active learning. In:Proceedings of the 7th International Conference on Neural Information Processing Systems. Denver, CO, USA:ACM, 1994. 231-238
    [38] Han E H, Boley D, Gini M, Gross R, Hastings K, Karypis G, Kumar V, Mobasher B, Moore J. WebACE:a web agent for document categorization and exploration. In:Proceedings of the 2nd International Conference on Autonomous Agents. Minneapolis, Minnesota, USA:ACM, 1998. 408-415
  • 加载中
图(5) / 表(2)
计量
  • 文章访问数:  2274
  • HTML全文浏览量:  488
  • PDF下载量:  843
  • 被引次数: 0
出版历程
  • 收稿日期:  2017-03-17
  • 录用日期:  2017-11-06
  • 刊出日期:  2018-11-20

目录

    /

    返回文章
    返回