Classification and novel class detection of data streams in a dynamic feature space

Mohammad M. Masud, Qing Chen, Jing Gao, Latifur Khan, Jiawei Han, Bhavani Thuraisingham

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Data stream classification poses many challenges, most of which are not addressed by the state-of-the-art. We present DXMiner, which addresses four major challenges to data stream classification, namely, infinite length, concept-drift, concept-evolution, and feature-evolution. Data streams are assumed to be infinite in length, which necessitates single-pass incremental learning techniques. Concept-drift occurs in a data stream when the underlying concept changes over time. Most existing data stream classification techniques address only the infinite length and concept-drift problems. However, concept-evolution and feature- evolution are also major challenges, and these are ignored by most of the existing approaches. Concept-evolution occurs in the stream when novel classes arrive, and feature-evolution occurs when new features emerge in the stream. Our previous work addresses the concept-evolution problem in addition to addressing the infinite length and concept-drift problems. Most of the existing data stream classification techniques, including our previous work, assume that the feature space of the data points in the stream is static. This assumption may be impractical for some type of data, for example text data. DXMiner considers the dynamic nature of the feature space and provides an elegant solution for classification and novel class detection when the feature space is dynamic. We show that our approach outperforms state-of-the-art stream classification techniques in classifying and detecting novel classes in real data streams.

Original languageEnglish (US)
Title of host publicationMachine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2010, Proceedings
Pages337-352
Number of pages16
EditionPART 2
DOIs
StatePublished - 2010
EventEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2010 - Barcelona, Spain
Duration: Sep 20 2010Sep 24 2010

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 2
Volume6322 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

OtherEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2010
Country/TerritorySpain
CityBarcelona
Period9/20/109/24/10

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Classification and novel class detection of data streams in a dynamic feature space'. Together they form a unique fingerprint.

Cite this