A Hierarchical Multi-relational Clustering Algorithm Based on IDEF1x
-
摘要: 多关系聚类仍存在利用统计方法提取一对多联系对应的信息时会忽略数据的原始特征、不同关系表间的联系出现的回路可能导致信息重复利用等问题,且尚未见有效的解决方法. 本文认为利用IDEF1x模型中不同联系的特点,可重构有助于解决上述问题的模型. 因此基于IDEF1x模型构建多关系数据集中表间关联关系层次模型的框架,然后定义框架中不同种类的联系对聚类结果传递的影响,以及整合多个子节点聚类结果的方法,并以此为基础提出新的多关系聚类算法.在真实的以及人工数据集上的实验效果表明,相较于单关系聚类算法以及对比的多关系聚类算法,所提算法可获得较准确的聚类结果.Abstract: There still exist some problems in multi-relational clustering, such as the primitive characters of data may be ignored when getting information reflected by one-to-many relationship through statistical methods, and information reutilization may result from the loops in the relationship between tables. We feel that a model can be rebuilt to solve these problems based on the characters of relationships in the IDEF1x model. Therefore, a hierarchical model of relationships between tables in multi-relational dataset is built based on the IDEF1x model, the effect brought by different relationships in the model as well as the integration of multiple clustering results are defined, and then a new multi-relational clustering algorithm is proposed based on these definitions. Finally, the experiments on real-world and synthetic dataset indicate the accuracy of our algorithm.
-
Key words:
- Multi-relational clustering /
- IDEF1x model /
- shortest path /
- transfer results
-
[1] Shen Xiao-Wei, Wang Fei-Yue, Cheng Chang-Jian, Liu Xi-Wei. Application of clustering analysisto team management. Acta Automatica Sinica, 2012, 38(4): 563-569(沈小伟, 王飞跃, 程长建, 刘希未. 聚类分析方法在企业班组管理中的应用. 自动化学报, 2012, 38(4): 563-569) [2] Zhou Lin, Ping Xi-Jian, Xu Sen, Zhang Tao. Cluster ensemble based on spectral clustering. Acta Automatica Sinica, 2012, 38(8): 1335-1342(周林, 平西建, 徐森, 张涛. 基于谱聚类的聚类集成算法. 自动化学报, 2012, 38(8): 1335-1342) [3] Peng Yu, Luo Qing-Hua, Wang Dan, Peng Xi-Yuan. WSN location method using interval data clustering. Acta Automatica Sinica, 2012, 38(7): 1190-1199(彭宇, 罗清华, 王丹, 彭喜元. 基于区间数聚类的无线传感器网络定位方法. 自动化学报, 2012, 38(7): 1190-1199) [4] [4] Taskar B, Segal E, Koller D. Probabilistic classification and clustering in relational data. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence. San Francisco, USA: Morgan Kaufmann Publishers, 2001. 870-878 [5] [5] Wang J D, Zeng H J, Chen Z, Lu H J, Li T, Ma W Y. ReCoM: reinforcement clustering of multi-type interrelated data objects. In: Proceeding of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Toronto, Canada: ACM, 2003. 274-281 [6] [6] Long B, Zhang Z F, Yu P S. A probabilistic framework for relational clustering. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2007. 470-479 [7] [7] Sun Y Z, Han J W, Zhao P X, Yin Z J, Cheng H, Wu T Y. Rankclus: integrating clustering with ranking for heterogeneous information network analysis. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology. New York: ACM, 2009. 565-576 [8] [8] Li T, Anand S S. DIVA: a variance-based clustering approach for multi-type relational data. In: Proceedings of the 16th ACM Conference on Conference on Information and Knowledge Management. Lisboa, Portugal: ACM, 2007. 147-156 [9] [9] Long B, Zhang Z F, Wu X Y, Yu P S. Spectral clustering for multi-type relational data. In: Proceedings of the 23rd International Conference on Marchine Learning. Pittsburgh, USA: ACM, 2006. 585-592 [10] Yin X X, Han J W, Yu P S. CrossClus: user-guided multi-relational clustering. Data Mining and Knowledge Discovery, 2007, 15(3): 321-348 [11] Gao Ying, Liu Da-You, Qi Hong, Liu He. Semi-supervised K-means clustering algorithm for multi-type relational data. Journal of Software, 2008, 19(11): 2814-2821(高滢, 刘大有, 齐红, 刘赫. 一种半监督K均值多关系数据聚类算法. 软件学报, 2008, 19(11): 2814-2821) [12] Lin Y R, Sun J M, Cao N, Liu S X. Contextour: contextual contour visual analysis on dynamic multi-relational clustering. In: Proceedings of the SIAM Conference on Data Mining. Columbus, USA: ASA, 2010. 418-429 [13] Wang H, Huang H, Ding C. Simultaneous clustering of multi-type relational data via symmetric nonnegative matrix tri-factorization. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management. Glasgow, UK: ACM, 2011. 279-284 [14] Wang H, Nie F P, Huang H, Ding C. Nonnegative matrix tri-factorization based high-order co-clustering and its fast implementation. In: Proceedings of the 11th IEEE International Conference on Data Mining. Vancouver, Canada: IEEE, 2011. 774-783 [15] Liu Y, Shen C. Orthogonal nonnegative matrix factorization for multi-type relational clustering. International Journal of Computer and Information Technology, 2013, 2(2): 215-221 [16] Kusiak A, Letsche T, Zakarian A. Data modelling with IDEF1x. International Journal of Computer Integrated Manufacturing, 1997, 10(6): 470-486 [17] Ma Z M, Zhang W J, Ma W Y. Extending IDEF1X to model fuzzy data. Journal of Intelligent Manufacturing, 2002, 13(4): 295-307 [18] Agrawal R, Gehrke J, Gunopulos D, Raghavan P. Automatic subspace clustering of high dimensional data for data mining applications. In: Proceeding of the 1998 ACM SIGMOD International Conference on Management of Data. New York, USA: ACM, 1998. 94-105 [19] Zhao Y, Karypis G. Criterion Functions for Document Clustering: Experiment and Analysis, Technical Report TR 01-40, Department of Computer Science, University of Minnesota, USA, 2001
点击查看大图
计量
- 文章访问数: 2238
- HTML全文浏览量: 87
- PDF下载量: 872
- 被引次数: 0