AUC Optimization Boosting Based on Data Rebalance
-
摘要: 接收者操作特性(Receiver operating characteristics, ROC)曲线下面积(Area under the ROC curve, AUC)常被用于度量分类器在整个类先验分布上的总体分类性能. 原始Boosting算法优化分类精度,但在AUC度量下并非最优. 提出了一种AUC优化Boosting改进算法,通过在原始Boosting迭代中引入数据重平衡操作,实现弱学习算法优化目标从精度向AUC的迁移. 实验结果表明,较之原始Boosting算法,新算法在AUC度量下能获得更好性能.Abstract: The area under the receiver operating characteristics (ROC) curve (AUC) is usually used to evaluate the classifier performance over the whole class prior probability distribution. Boosting can maximize the classification accuracy, which is not optimal under the AUC measure. An improved boosting algorithm which optimizes the AUC is proposed. By introducing data rebalance operation into boosting iterations, the optimization objective of the weak learning algorithm is transferred to the AUC instead of accuracy. Experimental results show that compared with naive boosting, the new algorithm gets better performance under the AUC measure.
-
[1] Wang Yun-Yun, Chen Song-Can. A survey of evaluation and design for AUC based classifier. Pattern Recognition and Artificial Intelligence, 2011, 24(1): 64-71(汪云云, 陈松灿. 基于AUC的分类器评价和设计综述. 模式识别与人工智能, 2011, 24(1): 64-71) [2] Rakotomamonjy A. Quadratic programming for AUC optimization. In: Proceedings of the 2nd International Conference on Modelling, Computation and Optimization in Information Systems and Management Sciences. Berlin, Germany: Springer-Verlag, 2008. 603-610 [3] Herschtal A, Raskutti B. Optimising area under the ROC curve using gradient descent. In: Processing of the 21st International Conference on Machine Learning (ICML 2004). New York, USA: ACM, 2004. 49-56 [4] Calders T, Jaroszewicz S. Efficient AUC optimization for classification. In: Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2007). Berlin, Germany: Springer-Verlag, 2007. 42-53 [5] Donmez P, Carbonell J G. Active sampling for rank learning via optimizing the area under the ROC curve. In: Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval (ECIR '09). Berlin, Germany: Springer-Verlag, 2009. 78-89 [6] Wang Y Y, Chen S C, Xue H. Structure-embedded AUC-SVM. International Journal of Pattern Recognition and Artificial Intelligence, 2010, 24(5): 667-690 [7] Freund Y, Schapire R E. A decision-theoretic generalization of online learning and an application to boosting. Journal of Computer and System Sciences, 1997, 55(1): 119-139 [8] Fu Zhong-Liang. Cost-sensitive AdaBoost algorithm for multi-class classification problems. Acta Automatica Sinica, 2011, 37(8): 973-983(付忠良. 多分类问题代价敏感AdaBoost算法. 自动化学报, 2011, 37(8): 973-983) [9] Su Yan-Chao, Ai Hai-Zhou, Lao Shi-Hong. Non-linear boosting regression for multi-view face alignment. Acta Automatica Sinica, 2010, 36(4): 522-527(苏延超, 艾海舟, 劳世竑. 基于非线性Boosting回归的多视角人脸配准. 自动化学报, 2010, 36(4): 522-527) [10] Hayley J A, McNeil B J. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 1982, 143(1): 29-36 [11] Cortes C, Mohri M. AUC optimization vs. error rate minimization. In: Proceedings of the 17th Annual Conference on Neural Information Processing Systems (NIPS 2003). Cambridge, USA: MIT Press, 2003. 312-320 [12] Friedman J, Hastie T, Tibshirani R. Additive logistic regression: a statistical view of boosting. The Annals of Statistics, 2000, 38(2): 337-407
点击查看大图
计量
- 文章访问数: 1799
- HTML全文浏览量: 83
- PDF下载量: 2298
- 被引次数: 0