数据流中概念漂移问题的研究已成为近年来流数据挖掘领域的研究热点之一. 已有的研究工作多依据单窗口中错误率的变化来检测概念漂移,难以适应不同类型的漂移. 为此, 本文提出一种新的基于双层窗口机制的数据流分类算法(Double-windows-based classification algorithm for concept drifting data streams, DWCDS),该算法采用随机决策树模型构建集成分类器, 利用双层窗口机制周期性地检测滑动窗口中流数据分布的变化,并动态地更新模型以适应概念漂移. 分析与实验结果表明: 该算法可以快速有效地跟踪检测含噪数据流中的概念漂移,且抗噪性能与分类精度显著提高.
Tracking concept drifts in data streams has recently become a hot topic in data mining. Most of the existing work is built on a single-window-based mechanism to detect concept drifts. Due to the inherent limitation of the single-window-based mechanism, it is a challenge to handle different types of drifts. Motivated by this, a new classification algorithm based on a double-window mechanism for handling various concept drifting data streams (DWCDS) is proposed in this paper. In terms of an ensemble classifier in random decision trees, a double-window-based mechanism is presented to detect concept drifts periodically, and the model is updated dynamically to adapt to concept drifts. Extensive studies on both synthetic and real-word data demonstrate that DWCDS could quickly and efficiently detect concept drifts from streaming data, and the performance on the robustness to noise and the accuracy of classification is also improved significantly.