Cost-sensitive Ensemble Learning Algorithm for Multi-label Classification Problems
-
摘要: 尽管多标签分类问题可以转换成一般多分类问题解决,但多标签代价敏感分类问题却很难转换成多类代价敏感分类问题.通过对多分类代价敏感学习算法扩展为多标签代价敏感学习算法时遇到的一些问题进行分析,提出了一种多标签代价敏感分类集成学习算法.算法的平均错分代价为误检标签代价和漏检标签代价之和,算法的流程类似于自适应提升(Adaptive boosting,AdaBoost)算法,其可以自动学习多个弱分类器来组合成强分类器,强分类器的平均错分代价将随着弱分类器增加而逐渐降低.详细分析了多标签代价敏感分类集成学习算法和多类代价敏感AdaBoost算法的区别,包括输出标签的依据和错分代价的含义.不同于通常的多类代价敏感分类问题,多标签代价敏感分类问题的错分代价要受到一定的限制,详细分析并给出了具体的限制条件.简化该算法得到了一种多标签AdaBoost算法和一种多类代价敏感AdaBoost算法.理论分析和实验结果均表明提出的多标签代价敏感分类集成学习算法是有效的,该算法能实现平均错分代价的最小化.特别地,对于不同类错分代价相差较大的多分类问题,该算法的效果明显好于已有的多类代价敏感AdaBoost算法.Abstract: Although a multi-label classification problem can be converted into a multi-class classification problem to solve, it is difficult that a multi-label cost-sensitive classification problem is converted into a multi-class cost-sensitive classification problem. A cost-sensitive ensemble learning algorithm for multi-label classification problems is proposed based on the analysis on the problems encountered when the multi-class cost-sensitive learning algorithm being extended to multi-label cost-sensitive learning algorithms. The average misclassification cost of the algorithm is composed of fall-out cost and the omission cost. The new algorithm's process is similar to the adaptive boosting (AdaBoost)algorithm, and the algorithm can automatically learn some weak classifiers and combine them into a strong classifier, and the average misclassification cost of the strong classifier will decrease as the weak classifiers gradually increase. The distinction between the cost-sensitive ensemble learning algorithm for multi-label classification problems and the cost-sensitive AdaBoost algorithm for multi-class classification problems is analyzed in detail, including the basis of output label and the meaning of the misclassification cost. Unlike general multi-class cost-sensitive classification problems, the misclassification cost of the multi-label cost-sensitive classification problems are subject to certain restrictions, and the specific restrictions are given. A multi-label AdaBoost algorithm and a multi-class cost-sensitive AdaBoost algorithm can be obtained by simplifying the proposed algorithm. Theoretical analysis and experimental results show that the proposed multi-label cost-sensitive classification ensemble learning algorithm is effective, and that the algorithm can minimize the average misclassification cost. In particular, when the difference of costs of the classes is large, the proposed algorithm can get better results than the existing multi-class cost-sensitive AdaBoost algorithms.
-
[1] Turney P. Types of cost in inductive concept learning. In: Proceedings of the Cost-Sensitive Learning Workshop at the 17th International Conference on Machine Learning. Stanford, USA: NRC, 2000. 15-21 [2] Ting K M. An instance-weighting method to induce cost-sensitive trees. IEEE Transactions on Knowledge and Data Engineering, 2002, 14(3): 659-665 [3] Domingos P. MetaCost: a general method for making classifiers cost-sensitive. In: Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 1999. 155-164 [4] Elkan C. The foundations of cost-sensitive learning. In: Proceedings of the 17th International Joint Conference of Artificial Intelligence. San Francisco, USA: Morgan Kaufmann, 2001. 973-978 [5] Bruka I, Kocková;S. A support for decision-making: cost-sensitive learning system. Artificial Intelligence in Medicine, 1999, 6(7): 67-82 [6] Zadrozny B, Langford J, Abe N. Cost-sensitive learning by cost-proportionate example weighting. In: Proceedings of the 3rd IEEE International Conference on Data Mining. Washington D. C., USA: IEEE, 2003. 435-442 [7] Ling C X, Sheng V S, Yang Q. Test strategies for cost-sensitive decision trees. IEEE Transactions of Knowledge and Data Engineering, 2006, 18(8): 1055-1067 [8] Chai X Y, Deng L, Yang Q, Ling C X. Test-cost sensitive Naive Bayes classification. In: Proceedings of the 4th IEEE International Conference on Data Mining. Washington, D. C., USA: IEEE, 2004. 51-58 [9] Ling C X, Sheng V S. A comparative study of cost-sensitive classifiers. Chinese Journal of Computers, 2007, 30(8): 1203-1211 [10] Ting K M, Zheng Z. Boosting cost-sensitive trees. In: Proceedings of the 1st International Conference on Discovery Science. London, UK: Springer, 1999. 244-255 [11] Fan W, Stolfo S J, Zhang J, Chan P K. AdaCost: misclassification cost-sensitive boosting. In: Proceedings of the 16th International Conference on Machine Learning. San Francisco, CA: Morgan Kaufmann, 1999. 97-105 [12] Fu Zhong-Liang. Cost-sensitive AdaBoost algorithm for multi-class classification problems. Acta Automatica Sinica, 2011, 37(8): 973-983 (付忠良. 多分类问题代价敏感AdaBoost算法. 自动化学报, 2011, 37(8): 973-983) [13] Fu Zhong-Liang. An ensemble learning algorithm for direction prediction. Shanghai Jiaotong University (Science Edition), 2012, 46(2): 250-258 (付忠良. 一种用于方向预测的集成学习算法. 上海交通大学学报(自然版), 2012, 46(2): 250-258) [14] Fu Zhong-Liang. A universal ensemble learning algorithm. Journal of Computer Research and Development, 2013, 50(4): 861-872 (付忠良. 通用集成学习算法的构造. 计算机研究与发展, 2013, 50(4): 861-872) [15] Tsoumakas G, Katakis I. Multi-label classification: an overview. International Journal of Data Warehousing and Mining, 2007, 3(3): 1-13 [16] Zhou Z H, Zhang M L, Huang S J, Li Y F. Multi-instance multi-label learning. Artificial Intelligence, 2012, 176(1): 2291-2320 [17] Zhang M L, Zhou Z H. M3MIML: a maximum margin method for multi-instance multi-label learning. In: Proceedings of the 8th IEEE International Conference on Data Mining. Pisa, Italy: IEEE, 2008. 688-697 [18] Trohidis K, Tsoumakas G, Kalliris G, Vlahavas I. Multi-label classification of music into emotions. In: Proceedings the 9th International Conference on Music Information Retrieval. Philadelphia, USA: Springer, 2008. 325-330 [19] Boutell M R, Luo J B, Shen X P, Brown C M. Learning multi-label scene classification. Pattern Recognition, 2004, 37(9): 1757-1771 [20] Zhang M L, Zhou Z H. ML-KNN: a lazy learning approach to multi-label learning. Pattern Recognition, 2007, 40(7): 2038-2048 [21] Elisseeff A, Weston J. A kernel method for multi-labeled classification. In: Proceedings of Advances in Neural Information. Cambridge: MIT, 2001, 681-687 [22] Yin Hui, Xu Jian-Hua, Xu Hua. A multi-label classification algorithm based on LS-SVM. Journal of Nanjing Normal University (Engineer and Technology Edition), 2010, 10(2): 68-73 (殷会, 许建华, 许花. 基于LS-SVM的多标签分类算法. 南京师范大学学报(工程技术版), 2010, 10(2): 68-73) [23] Benhouzid D, Busa-Fekete R, Cadagrande N, Collin F D, Kégl B. MultiBoost: a multi-purpose boosting package. Journal of Machine Learning Research, 2012, 13: 549-553 [24] Cao Ying, Miao Qi-Guang, Liu Jia-Chen, Gao Lin. Advance and prospects of AdaBoost algorithm. Acta Automatica Sinica, 2013, 39(6): 745-758 (曹莹, 苗启广, 刘家辰, 高琳. AdaBoost算法研究进展与展望. 自动化学报, 2013, 39(6): 745-758) [25] Schapire R E, Singer Y. Improved boosting algorithms using confidence-rated predictions. Machine Learning, 1999, 37(3): 297-336 [26] Lo H Y, Wang J C, Wang H M, Lin H D. Cost-sensitive multi-label learning for audio tag annotation and retrieval. IEEE Transactions on Multimedia, 2011, 13(3): 518-529
点击查看大图
计量
- 文章访问数: 2436
- HTML全文浏览量: 53
- PDF下载量: 1232
- 被引次数: 0