基于模式增长方式的高效用模式挖掘算法

王乐; 熊松泉; 常艳芬; 王水

doi:10.16383/j.aas.2015.c150056

基于模式增长方式的高效用模式挖掘算法

doi: 10.16383/j.aas.2015.c150056 cstr: 32138.14.j.aas.2015.c150056

1.
宁波大红鹰学院信息工程学院宁波 315175

基金项目:

宁波市自然科学基金和攻关项目(2013A610115,2014A610073,2013C10010),浙江省教育厅一般科研项目(Y201432717),宁波大红鹰学院大宗商品专项课题(1320133004)资助

详细信息

作者简介:
王乐宁波大红鹰学院信息工程学院讲师.2013年获得大连理工大学博士学位.主要研究方向为数据挖掘.E-mail:wangleboro@163.com

熊松泉宁波大红鹰学院信息工程学院副教授.2012年获得同济大学硕士学位.主要研究方向为数据挖掘和嵌入式系统.E-mail:yisan-sky@qq.com

常艳芬宁波大红鹰学院信息工程学院讲师.2009年获得同济大学硕士学位.主要研究方向为数据挖掘和软件工程.E-mail:cyf511@163.com

通讯作者:
王水宁波大红鹰学院信息工程学院教授.1989年获得兰州大学学士学位.主要研究方向为数据挖掘和软件工程.本文通信作者.E-mail:seawan@163.com

计量
- 文章访问数: 1986
- HTML全文浏览量: 65
- PDF下载量: 1191
- 被引次数: 0
出版历程
- 收稿日期: 2015-01-30
- 修回日期: 2015-06-07
- 刊出日期: 2015-09-20

An Algorithm for Mining High Utility Patterns Based on Pattern-growth

1.
School of Information Engineering, Ningbo Dahongying University, Ningbo 315175

Funds:

Supported by Ningbo Natural Science Foundation & Research Program (2013A610115, 2014A610073, 2013C10010), General Scientific Research Fund of Zhejiang Provincial Education Department (Y201432717),and Bulk Commodity Special Fund of Ningbo Dahongying University (1320133004)

摘要

摘要: 高效用模式挖掘是数据挖掘领域的一个重要研究内容; 由于其计算过程包含对模式的内、外效用值的处理, 计算复杂度较大, 因此挖掘算法的主要研究热点问题就是提高算法的时间效率.针对此问题, 本文给出一个基于模式增长方式的高效用模式挖掘算法HUPM-FP, 该算法可以从全局树上挖掘高效用模式, 避免产生候选项集.实验中, 采用6个典型数据集进行实验, 并和目前效率较好的算法FHM (Faster high-utility itemset mining)做了对比, 实验结果表明本文给出的算法时空效率都有较大的提高, 特别是时间效率提高较大, 可以达到1个数量级以上.
- 高效用模式 /
- 频繁模式 /
- 频繁项集 /
- 数据挖掘
Abstract: High utility pattern mining is an important research topic in data mining. Because of the additional inner/outer utility processing workload, its computational complexity increases, and the improvement of its temporal efficiency is vital. To address this issue, a new pattern-growth mining algorithm is proposed for high utility pattern mining named HUPM-FP. This algorithm can mine high utility patterns from a global tree without generating candidate itemsets. Six classical datasets were used in our experiments for comparing with the state-of-art algorithm faster high-utility itemset mining (FHM). The proposed HUPM-FP out-performed its counterpart significantly, especially for time efficiency, which was up to 1 order of magnitude faster.
- High utility pattern /
- frequent pattern /
- frequent itemset /
- data mining

HTML全文

参考文献(21)

[1]	Zhang Xiao-Jian, Wang Miao, Meng Xiao-Feng. An accurate method for mining top-k frequent pattern under differential privacy. Journal of Computer Research and Development, 2014, 51(1): 104-114(张啸剑, 王淼, 孟小峰. 差分隐私保护下一种精确挖掘top-k频繁模式方法. 计算机研究与发展, 2014, 51(1): 104-114)
[2]	Mao Yu-Xing, Shi Bai-Le. An incremental method for mining generalized association rules based on extended canonical-order tree. Journal of Computer Research and Development, 2012, 49(3): 598-606(毛宇星, 施伯乐. 基于扩展自然序树的概化关联规则增量挖掘方法. 计算机研究与发展, 2012, 49(3): 598-606)
[3]	Wu Feng, Zhong Yan, Wu Quan-Yuan. Mining frequent patterns over data stream under the time decaying model. Acta Automatica Sinica, 2010, 36(5): 674-684(吴枫, 仲妍, 吴泉源. 基于时间衰减模型的数据流频繁模式挖掘. 自动化学报, 2010, 36(5): 674-684)
[4]	Li Hai-Feng, Zhang Ning, Zhu Jian-Ming, Cao Huai-Hu. Frequent itemset mining over time-sensitive streams. Chinese Journal of Computers, 2012, 35(11): 2283-2293(李海峰, 章宁, 朱建明, 曹怀虎. 时间敏感数据流上的频繁项集挖掘算法. 计算机学报, 2012, 351333(11): 2283-2293)
[5]	Pan Yun-He, Wang Jin-Long, Xu Cong-Fu. State-of-the-art on frequent pattern mining in data streams. Acta Automatica Sinica, 2006, 32(4): 594-602(潘云鹤, 王金龙, 徐从富. 数据流频繁模式挖掘研究进展. 自动化学报, 2006, 32(4): 594-602)
[6]	Chen Yin, Shan Si-Qing, Liu Lu, Li Yan. Minimum-redundant and lossless association rule-set representation. Acta Automatica Sinica, 2008, 34(12): 1490-1496(陈茵, 闪四清, 刘鲁, 李岩. 最小冗余的无损关联规则集表述. 自动化学报, 2008, 34(12): 1490-1496)
[7]	Krishnamoorthy S. Pruning strategies for mining high utility itemsets. Expert Systems with Applications, 2015, 42(5): 2371-2381
[8]	Lan G C, Hong T P, Tseng V S, Wang S L. Applying the maximum utility measure in high utility sequential pattern mining. Expert Systems with Applications, 2014, 41(11): 5071-5081
[9]	Lin C W, Hong T P, Lan G C, Wong J W, Lin W Y. Efficient updating of discovered high-utility itemsets for transaction deletion in dynamic databases. Advanced Engineering Informatics, 2015, 29(1): 16-27
[10]	Lin C W, Lan G C, Hong T P. Mining high utility itemsets for transaction deletion in a dynamic database. Intelligent Data Analysis , 2015, 19(1): 43-255
[11]	Manike C, Om H. Sliding-window based method to discover high utility patterns from data streams. Computational Intelligence in Data Mining. India: Springer, 2015. 173-184
[12]	Yun U, Ryang H, Ryu K H. High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates. Expert Systems with Applications, 2014, 41(8): 3861-3878
[13]	Zihayat M, An A J. Mining top-k high utility patterns over data streams. Information Sciences, 2014, 285: 138-161
[14]	Fournier-Viger P, Wu C W, Zida S, Tseng V S. FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning. Foundations of Intelligent Systems. Switzerland: Springer, 2014. 83-92
[15]	Yao H, Hamilton H J, Butz G J. A foundational approach to mining itemset utilities from databases. In: Proceedings of the 4th SIAM International Conference on Data Mining (ICDM 2004). Lake Buena Vista, FL, United States: Springer, 2004. 482-486
[16]	Liu Y, Liao W K, Choudhary A. A two-phase algorithm for fast discovery of high utility itemsets. Advances in Knowledge Discovery and Data Mining: Proceedings of the 9th Pacific-Asia Conference, PAKDD 2005, Hanoi, Vietnam. Berlin Heidelberg: Springer-Verlag, 2005. 689-695
[17]	Erwin A, Gopalan R P, Achuthan N R. CTU-mine: an efficient high utility itemset mining algorithm using the pattern growth approach. In: Proceedings of the 7th IEEE International Conference on Computer and Information Technology. Aizu-Wakamatsu, Fukushima, Japan: IEEE, 2007. 71-76
[18]	Ahmed C F, Tanbeer S K, Jeong B S, Lee Y K. Efficient tree structures for high utility pattern mining in incremental databases. IEEE Transactions on Knowledge and Data Engineering , 2009, 21(12): 1708-1721
[19]	Tseng V S, Shie B E, Wu C W, Yu P S. Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Transactions on Knowledge and Data Engineering , 2013, 25(8): 1772-1786
[20]	Tseng V S, Wu C W, Shie B E, Yu P S. UP-growth: an efficient algorithm for high utility itemset mining. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Washington D.C., United States: ACM, 2010. 253-262
[21]	Liu M C, Qu J F. Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM 2012). Maui, HI, United States: Association for Computing Machinery, 2012. 55-64