Classifying data streams with skewed class distributions and concept drifts

Jing Gao, Bolin Ding, Wei Fan, Jiawei Han, Gabrielle Dawn Allen

Research output: Contribution to journalArticlepeer-review

Abstract

Classification is an important data analysis tool that uses a model built from historical data to predict class labels for new observations. More and more applications are featuring data streams, rather than finite stored data sets, which are a challenge for traditional classification algorithms. Concept drifts and skewed distributions, two common properties of data stream applications, make the task of learning in streams difficult. The authors aim to develop a new approach to classify skewed data streams that uses an ensemble of models to match the distribution over under-samples of negatives and repeated samples of positives.

Original languageEnglish (US)
Pages (from-to)37-49
Number of pages13
JournalIEEE Internet Computing
Volume12
Issue number6
DOIs
StatePublished - 2008

Keywords

  • Classification algorithms
  • Companies
  • Concept drifts
  • Data mining
  • Data models
  • Data stream
  • Model averaging
  • Predictive models
  • Presses
  • Skewed distributions
  • Training
  • Training data

ASJC Scopus subject areas

  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Classifying data streams with skewed class distributions and concept drifts'. Together they form a unique fingerprint.

Cite this