Semi-supervised Traffic Identification Based on Affinity Propagation
-
摘要: 准确的流量分类是进行网络管理、安全检测以及应用趋势分析的基础.针对完全监督和无监督分类的缺陷, 提出了一种基于近邻传播学习的半监督流量分类方法.通过引入近邻传播聚类机制构建分类模型, 使得分类器实现过程简单、运行高效. 应用半监督学习的思想, 抽象出少量已标记样本流约束和流形空间先验信息, 定义了流形相似度的距离测度, 既降低了标记流量样本的复杂度, 又提高了流量分类器的性能.理论分析和实验结果表明:算法具有较高的分类准确性和较好的凝聚性.Abstract: Accurate traffic identification is the keystone of network management, security diagnosis and application prediction analysis. Aiming at the deficiencies of supervised and unsupervised classified methods, we present a novel scheme called semi-supervised internet traffic identification based on affinity propagation (AP). In order to circumvent the problem of choosing initial points, the method introduces affinity propagation clustering to construct classification model simply and effectively. Based on the idea of semi-supervised learning, a few restrictions of labelled flows and priori manifold distribution of sampled space are abstracted. Also, manifold similarity is defined. Henceforth, the semi-supervised method can not only largely reduce the complexity of marking sampled flows, but also nicely improve the performance of the classifier. Theoretical analysis and experimental results show that the algorithm can achieve higher accuracy and better aggregation.
-
[1] Yang Jia-Hai, Wu Jian-Ping, An Chang-Qing. Internet Measurement Theory and Its Applications. Beijing: Post & Telecom Press, 2009. 383-408 (杨家海, 吴建平, 安常青. 互联网络测量理论与应用. 北京: 人民邮电出版社, 2009. 383-408) [2] Karagiannis T, Broido A, Faloutsos M, Claffy K C. Transport layer identification of P2P traffic. In: Proceedings of the 4th ACM SIGCOMM on Internet Measurement. New York, USA: ACM, 2004. 121-134 [3] Moore A W, Papagiannaki K. Toward the accurate identification of network applications. In: Proceedings of the 2005 Passive and Active Network Measurement. Boston, MA: Springer, 2005: 41-54 [4] Antonello R, Fernandes S, Sadok D, Kelner J. Characterizing signature sets for testing DPI systems. In: Proceedings of the 2011 IEEE GLOBECOM Workshops. Houston, TX: IEEE, 2011. 678-683 [5] Santos A, Fernandes S, Antonello R, Szabo G, Lopes P, Sadok D. High-performance traffic workload architecture for testing DPI systems. In: Proceedings of the 2011 IEEE Global Telecommunications Conference (GLOBECOM 2011). Houston, TX: IEEE, 2011. 1-5 [6] Zander S, Nguyen T, Armitage G. Automated traffic classification and application identification using machine learning. In: Proceedings of the 30th IEEE Conference on Local Computer Networks. Sydney, Australia: IEEE, 2005. 250-257 [7] Roughan M, Sen S, Spatscheck O, Duffield N. Class-of-service mapping for QoS: a statistical signature-based approach to IP traffic classification. In: Proceedings of the 4th ACM SIGCOMM Internet Measurement Conference. Taormina, Sicily, Italy: ACM, 2004. 135-148 [8] Moore A W, Zuev D. Internet traffic classification using Bayesian analysis techniques. In: Proceedings of the 2005 Internet Traffic Classification Using Bayesian Analysis Techniques (SIGMETRICS). Alberta, Canada: ACM, 2005. 50-60 [9] Erman J, Mahanti A, Arlitt M, Cohen I, Williamson C. Offline/realtime traffic classification using semi-supervised learning. Performance Evaluation, 2007, 64(9-12): 1194-1213 [10] Frey B J, Dueck D. Clustering by passing messages between data points. Science, 2007, 315(5814): 972-976 [11] Zhang J, Tuo X G, Yuan Z, Chen H F. Analysis of fMRI data using an integrated principal component analysis and supervised affinity propagation clustering approach. IEEE Transactions on Biomedical Engineering, 2011, 58(11): 3184-3196 [12] He Y C, Chen Q C, Wang X L, Xu R F, Bai X H, Meng X J. An adaptive affinity propagation document clustering. In: Proceedings of the 7th International Conference on Information and System. Cairo, Egypt: IEEE, 2010. 1-7 [13] Liu H W. Community detection by affinity propagation with various similarity measures. In: Proceedings of the 4th International Joint Conference on Computational Sciences and Optimization. Yunnan, China: IEEE, 2011. 182-186 [14] Wagstaf K, Cardie C. Clustering with instance-level constraints. In: Proceedings of the 17th International Conference on Machine Learning. Stanford, USA: Morgan Kaufmann Publishers, 2000. 1103-1110 [15] Bilenko M, Basu S, Mooney R J. Integrating constraints and metric learning in semi-supervised clustering. In: Proceedings of the 21st International Conference on Machine Learning. New York, USA: ACM, 2004. 81-88 [16] Seung H S, Lee D D. The manifold ways of perception. Science, 2000, 290(5500): 2268-2269 [17] Liu Sheng-Lan, Yan De-Qin. A new global embedding algorithm. Acta Automatica Sinica, 2011, 37(7): 828-835 (刘胜蓝, 闫德勤. 一种新的全局嵌入降维算法. 自动化学报, 2011, 37(7): 828-835) [18] Zhang S W, Lei Y K. Modified locally linear discriminant embedding for plant leaf recognition. Neurocomputing, 2011, 74(14-15): 2284-2290 [19] Yang W K, Sun C Y, Zhang L. A multi-manifold discriminant analysis method for image feature extraction. Pattern Recognition, 2011, 44(8): 1648-1657 [20] Zhang J P, Wang X D, Krger U, Wang F Y. Principal curve algorithms for partitioning high-dimensional data spaces. IEEE Transactions on Neural Networks, 2011, 22(3): 367-380 [21] Yan De-Qin, Liu Sheng-Lan, Li Yan-Yan. An embedding dimension reduction algorithm based on sparse analysis. Acta Automatica Sinica, 2011, 37(11): 1306-1312 (闫德勤, 刘胜蓝, 李燕燕. 一种基于稀疏嵌入分析的降维方法. 自动化学报, 2011, 37(11): 1306-1312) [22] Thedoridis S, Koutroumbas K. Pattern Recognition (3rd edition). Beijing: Publishing House of Electronics Industry, 2010. 389-407 [23] Mitzenmacher M, Upfal E. Probability and Computing: Randomized Algorithm and Probabilistic Analysis. Cambridge, U.K.: Cambridge University Press, 2005. 44-45
点击查看大图
计量
- 文章访问数: 1243
- HTML全文浏览量: 74
- PDF下载量: 1681
- 被引次数: 0