Mining concept-drifting data streams using ensemble classifiers

Haixun Wang, Wei Fan, Philip S. Yu, Jiawei Han

Research output: Contribution to conferencePaper

Abstract

Recently, mining data streams with concept drifts for actionable insights has become an important and challenging task for a wide range of applications including credit card fraud protection, target marketing, network intrusion detection, etc. Conventional knowledge discovery tools are facing two challenges, the overwhelming volume of the streaming data, and the concept drifts. In this paper, we propose a general framework for mining concept-drifting data streams using weighted ensemble classifiers. We train an ensemble of classification models, such as C4.5, RIPPER, naive Beyesian, etc., from sequential chunks of the data stream. The classifiers in the ensemble are judiciously weighted based on their expected classification accuracy on the test data under the time-evolving environment. Thus, the ensemble approach improves both the efficiency in learning the model and the accuracy in performing classification. Our empirical study shows that the proposed methods have substantial advantage over single-classifier approaches in prediction accuracy, and the ensemble framework is effective for a variety of classification models.

Original languageEnglish (US)
Pages226-235
Number of pages10
DOIs
StatePublished - Dec 1 2003
Event9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03 - Washington, DC, United States
Duration: Aug 24 2003Aug 27 2003

Other

Other9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03
CountryUnited States
CityWashington, DC
Period8/24/038/27/03

Fingerprint

Classifiers
Data mining
Intrusion detection
Marketing

Keywords

  • Classifier
  • Classifier ensemble
  • Concept drift
  • Data streams

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this

Wang, H., Fan, W., Yu, P. S., & Han, J. (2003). Mining concept-drifting data streams using ensemble classifiers. 226-235. Paper presented at 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03, Washington, DC, United States. https://doi.org/10.1145/956750.956778

Mining concept-drifting data streams using ensemble classifiers. / Wang, Haixun; Fan, Wei; Yu, Philip S.; Han, Jiawei.

2003. 226-235 Paper presented at 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03, Washington, DC, United States.

Research output: Contribution to conferencePaper

Wang, H, Fan, W, Yu, PS & Han, J 2003, 'Mining concept-drifting data streams using ensemble classifiers' Paper presented at 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03, Washington, DC, United States, 8/24/03 - 8/27/03, pp. 226-235. https://doi.org/10.1145/956750.956778
Wang H, Fan W, Yu PS, Han J. Mining concept-drifting data streams using ensemble classifiers. 2003. Paper presented at 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03, Washington, DC, United States. https://doi.org/10.1145/956750.956778
Wang, Haixun ; Fan, Wei ; Yu, Philip S. ; Han, Jiawei. / Mining concept-drifting data streams using ensemble classifiers. Paper presented at 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03, Washington, DC, United States.10 p.
@conference{ac4e51f29c69490aac65a28f4ce35a72,
title = "Mining concept-drifting data streams using ensemble classifiers",
abstract = "Recently, mining data streams with concept drifts for actionable insights has become an important and challenging task for a wide range of applications including credit card fraud protection, target marketing, network intrusion detection, etc. Conventional knowledge discovery tools are facing two challenges, the overwhelming volume of the streaming data, and the concept drifts. In this paper, we propose a general framework for mining concept-drifting data streams using weighted ensemble classifiers. We train an ensemble of classification models, such as C4.5, RIPPER, naive Beyesian, etc., from sequential chunks of the data stream. The classifiers in the ensemble are judiciously weighted based on their expected classification accuracy on the test data under the time-evolving environment. Thus, the ensemble approach improves both the efficiency in learning the model and the accuracy in performing classification. Our empirical study shows that the proposed methods have substantial advantage over single-classifier approaches in prediction accuracy, and the ensemble framework is effective for a variety of classification models.",
keywords = "Classifier, Classifier ensemble, Concept drift, Data streams",
author = "Haixun Wang and Wei Fan and Yu, {Philip S.} and Jiawei Han",
year = "2003",
month = "12",
day = "1",
doi = "10.1145/956750.956778",
language = "English (US)",
pages = "226--235",
note = "9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03 ; Conference date: 24-08-2003 Through 27-08-2003",

}

TY - CONF

T1 - Mining concept-drifting data streams using ensemble classifiers

AU - Wang, Haixun

AU - Fan, Wei

AU - Yu, Philip S.

AU - Han, Jiawei

PY - 2003/12/1

Y1 - 2003/12/1

N2 - Recently, mining data streams with concept drifts for actionable insights has become an important and challenging task for a wide range of applications including credit card fraud protection, target marketing, network intrusion detection, etc. Conventional knowledge discovery tools are facing two challenges, the overwhelming volume of the streaming data, and the concept drifts. In this paper, we propose a general framework for mining concept-drifting data streams using weighted ensemble classifiers. We train an ensemble of classification models, such as C4.5, RIPPER, naive Beyesian, etc., from sequential chunks of the data stream. The classifiers in the ensemble are judiciously weighted based on their expected classification accuracy on the test data under the time-evolving environment. Thus, the ensemble approach improves both the efficiency in learning the model and the accuracy in performing classification. Our empirical study shows that the proposed methods have substantial advantage over single-classifier approaches in prediction accuracy, and the ensemble framework is effective for a variety of classification models.

AB - Recently, mining data streams with concept drifts for actionable insights has become an important and challenging task for a wide range of applications including credit card fraud protection, target marketing, network intrusion detection, etc. Conventional knowledge discovery tools are facing two challenges, the overwhelming volume of the streaming data, and the concept drifts. In this paper, we propose a general framework for mining concept-drifting data streams using weighted ensemble classifiers. We train an ensemble of classification models, such as C4.5, RIPPER, naive Beyesian, etc., from sequential chunks of the data stream. The classifiers in the ensemble are judiciously weighted based on their expected classification accuracy on the test data under the time-evolving environment. Thus, the ensemble approach improves both the efficiency in learning the model and the accuracy in performing classification. Our empirical study shows that the proposed methods have substantial advantage over single-classifier approaches in prediction accuracy, and the ensemble framework is effective for a variety of classification models.

KW - Classifier

KW - Classifier ensemble

KW - Concept drift

KW - Data streams

UR - http://www.scopus.com/inward/record.url?scp=77952415079&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77952415079&partnerID=8YFLogxK

U2 - 10.1145/956750.956778

DO - 10.1145/956750.956778

M3 - Paper

AN - SCOPUS:77952415079

SP - 226

EP - 235

ER -