TY - GEN
T1 - Classification and novel class detection of data streams in a dynamic feature space
AU - Masud, Mohammad M.
AU - Chen, Qing
AU - Gao, Jing
AU - Khan, Latifur
AU - Han, Jiawei
AU - Thuraisingham, Bhavani
N1 - Copyright:
Copyright 2010 Elsevier B.V., All rights reserved.
PY - 2010
Y1 - 2010
N2 - Data stream classification poses many challenges, most of which are not addressed by the state-of-the-art. We present DXMiner, which addresses four major challenges to data stream classification, namely, infinite length, concept-drift, concept-evolution, and feature-evolution. Data streams are assumed to be infinite in length, which necessitates single-pass incremental learning techniques. Concept-drift occurs in a data stream when the underlying concept changes over time. Most existing data stream classification techniques address only the infinite length and concept-drift problems. However, concept-evolution and feature- evolution are also major challenges, and these are ignored by most of the existing approaches. Concept-evolution occurs in the stream when novel classes arrive, and feature-evolution occurs when new features emerge in the stream. Our previous work addresses the concept-evolution problem in addition to addressing the infinite length and concept-drift problems. Most of the existing data stream classification techniques, including our previous work, assume that the feature space of the data points in the stream is static. This assumption may be impractical for some type of data, for example text data. DXMiner considers the dynamic nature of the feature space and provides an elegant solution for classification and novel class detection when the feature space is dynamic. We show that our approach outperforms state-of-the-art stream classification techniques in classifying and detecting novel classes in real data streams.
AB - Data stream classification poses many challenges, most of which are not addressed by the state-of-the-art. We present DXMiner, which addresses four major challenges to data stream classification, namely, infinite length, concept-drift, concept-evolution, and feature-evolution. Data streams are assumed to be infinite in length, which necessitates single-pass incremental learning techniques. Concept-drift occurs in a data stream when the underlying concept changes over time. Most existing data stream classification techniques address only the infinite length and concept-drift problems. However, concept-evolution and feature- evolution are also major challenges, and these are ignored by most of the existing approaches. Concept-evolution occurs in the stream when novel classes arrive, and feature-evolution occurs when new features emerge in the stream. Our previous work addresses the concept-evolution problem in addition to addressing the infinite length and concept-drift problems. Most of the existing data stream classification techniques, including our previous work, assume that the feature space of the data points in the stream is static. This assumption may be impractical for some type of data, for example text data. DXMiner considers the dynamic nature of the feature space and provides an elegant solution for classification and novel class detection when the feature space is dynamic. We show that our approach outperforms state-of-the-art stream classification techniques in classifying and detecting novel classes in real data streams.
UR - http://www.scopus.com/inward/record.url?scp=78049367486&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78049367486&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-15883-4_22
DO - 10.1007/978-3-642-15883-4_22
M3 - Conference contribution
AN - SCOPUS:78049367486
SN - 364215882X
SN - 9783642158827
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 337
EP - 352
BT - Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2010, Proceedings
T2 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2010
Y2 - 20 September 2010 through 24 September 2010
ER -