An Algorithm for Mining High Utility Patterns Based on Pattern-growth
-
摘要: 高效用模式挖掘是数据挖掘领域的一个重要研究内容; 由于其计算过程包含对模式的内、外效用值的处理, 计算复杂度较大, 因此挖掘算法的主要研究热点问题就是提高算法的时间效率.针对此问题, 本文给出一个基于模式增长方式的高效用模式挖掘算法HUPM-FP, 该算法可以从全局树上挖掘高效用模式, 避免产生候选项集.实验中, 采用6个典型数据集进行实验, 并和目前效率较好的算法FHM (Faster high-utility itemset mining)做了对比, 实验结果表明本文给出的算法时空效率都有较大的提高, 特别是时间效率提高较大, 可以达到1个数量级以上.Abstract: High utility pattern mining is an important research topic in data mining. Because of the additional inner/outer utility processing workload, its computational complexity increases, and the improvement of its temporal efficiency is vital. To address this issue, a new pattern-growth mining algorithm is proposed for high utility pattern mining named HUPM-FP. This algorithm can mine high utility patterns from a global tree without generating candidate itemsets. Six classical datasets were used in our experiments for comparing with the state-of-art algorithm faster high-utility itemset mining (FHM). The proposed HUPM-FP out-performed its counterpart significantly, especially for time efficiency, which was up to 1 order of magnitude faster.
-
Key words:
- High utility pattern /
- frequent pattern /
- frequent itemset /
- data mining
-
[1] Zhang Xiao-Jian, Wang Miao, Meng Xiao-Feng. An accurate method for mining top-k frequent pattern under differential privacy. Journal of Computer Research and Development, 2014, 51(1): 104-114(张啸剑, 王淼, 孟小峰. 差分隐私保护下一种精确挖掘top-k频繁模式方法. 计算机研究与发展, 2014, 51(1): 104-114) [2] Mao Yu-Xing, Shi Bai-Le. An incremental method for mining generalized association rules based on extended canonical-order tree. Journal of Computer Research and Development, 2012, 49(3): 598-606(毛宇星, 施伯乐. 基于扩展自然序树的概化关联规则增量挖掘方法. 计算机研究与发展, 2012, 49(3): 598-606) [3] Wu Feng, Zhong Yan, Wu Quan-Yuan. Mining frequent patterns over data stream under the time decaying model. Acta Automatica Sinica, 2010, 36(5): 674-684(吴枫, 仲妍, 吴泉源. 基于时间衰减模型的数据流频繁模式挖掘. 自动化学报, 2010, 36(5): 674-684) [4] Li Hai-Feng, Zhang Ning, Zhu Jian-Ming, Cao Huai-Hu. Frequent itemset mining over time-sensitive streams. Chinese Journal of Computers, 2012, 35(11): 2283-2293(李海峰, 章宁, 朱建明, 曹怀虎. 时间敏感数据流上的频繁项集挖掘算法. 计算机学报, 2012, 351333(11): 2283-2293) [5] Pan Yun-He, Wang Jin-Long, Xu Cong-Fu. State-of-the-art on frequent pattern mining in data streams. Acta Automatica Sinica, 2006, 32(4): 594-602(潘云鹤, 王金龙, 徐从富. 数据流频繁模式挖掘研究进展. 自动化学报, 2006, 32(4): 594-602) [6] Chen Yin, Shan Si-Qing, Liu Lu, Li Yan. Minimum-redundant and lossless association rule-set representation. Acta Automatica Sinica, 2008, 34(12): 1490-1496(陈茵, 闪四清, 刘鲁, 李岩. 最小冗余的无损关联规则集表述. 自动化学报, 2008, 34(12): 1490-1496) [7] Krishnamoorthy S. Pruning strategies for mining high utility itemsets. Expert Systems with Applications, 2015, 42(5): 2371-2381 [8] Lan G C, Hong T P, Tseng V S, Wang S L. Applying the maximum utility measure in high utility sequential pattern mining. Expert Systems with Applications, 2014, 41(11): 5071-5081 [9] Lin C W, Hong T P, Lan G C, Wong J W, Lin W Y. Efficient updating of discovered high-utility itemsets for transaction deletion in dynamic databases. Advanced Engineering Informatics, 2015, 29(1): 16-27 [10] Lin C W, Lan G C, Hong T P. Mining high utility itemsets for transaction deletion in a dynamic database. Intelligent Data Analysis , 2015, 19(1): 43-255 [11] Manike C, Om H. Sliding-window based method to discover high utility patterns from data streams. Computational Intelligence in Data Mining. India: Springer, 2015. 173-184 [12] Yun U, Ryang H, Ryu K H. High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates. Expert Systems with Applications, 2014, 41(8): 3861-3878 [13] Zihayat M, An A J. Mining top-k high utility patterns over data streams. Information Sciences, 2014, 285: 138-161 [14] Fournier-Viger P, Wu C W, Zida S, Tseng V S. FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning. Foundations of Intelligent Systems. Switzerland: Springer, 2014. 83-92 [15] Yao H, Hamilton H J, Butz G J. A foundational approach to mining itemset utilities from databases. In: Proceedings of the 4th SIAM International Conference on Data Mining (ICDM 2004). Lake Buena Vista, FL, United States: Springer, 2004. 482-486 [16] Liu Y, Liao W K, Choudhary A. A two-phase algorithm for fast discovery of high utility itemsets. Advances in Knowledge Discovery and Data Mining: Proceedings of the 9th Pacific-Asia Conference, PAKDD 2005, Hanoi, Vietnam. Berlin Heidelberg: Springer-Verlag, 2005. 689-695 [17] Erwin A, Gopalan R P, Achuthan N R. CTU-mine: an efficient high utility itemset mining algorithm using the pattern growth approach. In: Proceedings of the 7th IEEE International Conference on Computer and Information Technology. Aizu-Wakamatsu, Fukushima, Japan: IEEE, 2007. 71-76 [18] Ahmed C F, Tanbeer S K, Jeong B S, Lee Y K. Efficient tree structures for high utility pattern mining in incremental databases. IEEE Transactions on Knowledge and Data Engineering , 2009, 21(12): 1708-1721 [19] Tseng V S, Shie B E, Wu C W, Yu P S. Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Transactions on Knowledge and Data Engineering , 2013, 25(8): 1772-1786 [20] Tseng V S, Wu C W, Shie B E, Yu P S. UP-growth: an efficient algorithm for high utility itemset mining. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Washington D.C., United States: ACM, 2010. 253-262 [21] Liu M C, Qu J F. Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM 2012). Maui, HI, United States: Association for Computing Machinery, 2012. 55-64
点击查看大图
计量
- 文章访问数: 1787
- HTML全文浏览量: 64
- PDF下载量: 1167
- 被引次数: 0