A multi-partition multi-chunk ensemble technique to classify concept-drifting data streams

Mohammad M. Masud, Jing Gao, Latifur Khan, Jiawei Han, Bhavani Thuraisingham

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We propose a multi-partition, multi-chunk ensemble classifier based datamining technique to classify concept-drifting data streams. Existing ensemble techniques in classifying concept-drifting data streams follow a single-partition, single-chunk approach, in which a single data chunk is used to train one classifier. In our approach, we train a collection of v classifiers from r consecutive data chunks using v-fold partitioning of the data, and build an ensemble of such classifiers. By introducing this multipartition, multi-chunk ensemble technique, we significantly reduce classification error compared to the single-partition, single-chunk ensemble approaches.We have theoretically justified the usefulness of our algorithm, and empirically proved its effectiveness over other state-of-the-art stream classification techniques on synthetic data and real botnet traffic.

Original languageEnglish (US)
Title of host publication13th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2009
Pages363-375
Number of pages13
DOIs
StatePublished - Jul 23 2009
Externally publishedYes
Event13th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2009 - Bangkok, Thailand
Duration: Apr 27 2009Apr 30 2009

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5476 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other13th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2009
CountryThailand
CityBangkok
Period4/27/094/30/09

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'A multi-partition multi-chunk ensemble technique to classify concept-drifting data streams'. Together they form a unique fingerprint.

  • Cite this

    Masud, M. M., Gao, J., Khan, L., Han, J., & Thuraisingham, B. (2009). A multi-partition multi-chunk ensemble technique to classify concept-drifting data streams. In 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2009 (pp. 363-375). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5476 LNAI). https://doi.org/10.1007/978-3-642-01307-2_34