Abstract
Classification is an important data analysis tool that uses a model built from historical data to predict class labels for new observations. More and more applications are featuring data streams, rather than finite stored data sets, which are a challenge for traditional classification algorithms. Concept drifts and skewed distributions, two common properties of data stream applications, make the task of learning in streams difficult. The authors aim to develop a new approach to classify skewed data streams that uses an ensemble of models to match the distribution over under-samples of negatives and repeated samples of positives.
Original language | English (US) |
---|---|
Pages (from-to) | 37-49 |
Number of pages | 13 |
Journal | IEEE Internet Computing |
Volume | 12 |
Issue number | 6 |
DOIs | |
State | Published - 2008 |
Keywords
- Classification algorithms
- Companies
- Concept drifts
- Data mining
- Data models
- Data stream
- Model averaging
- Predictive models
- Presses
- Skewed distributions
- Training
- Training data
ASJC Scopus subject areas
- Computer Networks and Communications