TY - GEN
T1 - Rare category detection on time-evolving graphs
AU - Zhou, Dawei
AU - Wang, Kangyang
AU - Cao, Nan
AU - He, Jingrui
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2016/1/5
Y1 - 2016/1/5
N2 - Rare category detection(RCD) is an important topicin data mining, focusing on identifying the initial examples fromrare classes in imbalanced data sets. This problem becomes more challenging when the data is presented as time-evolving graphs, as used in synthetic ID detection and insider threat detection. Most existing techniques for RCD are designed for static data sets, thus not suitable for time-evolving RCD applications. To address this challenge, in this paper, we first proposetwo incremental RCD algorithms, SIRD and BIRD. They arebuilt upon existing density-based techniques for RCD, andincrementally update the detection models, which provide 'timeflexible' RCD. Furthermore, based on BIRD, we propose amodified version named BIRD-LI to deal with the cases wherethe exact priors of the minority classes are not available. Wealso identify a critical task in RCD named query distribution. Itaims to allocate the limited budget into multiple time steps, suchthat the initial examples from the rare classes are detected asearly as possible with the minimum labeling cost. The proposedincremental RCD algorithms and various query distributionstrategies are evaluated empirically on both synthetic and real data.
AB - Rare category detection(RCD) is an important topicin data mining, focusing on identifying the initial examples fromrare classes in imbalanced data sets. This problem becomes more challenging when the data is presented as time-evolving graphs, as used in synthetic ID detection and insider threat detection. Most existing techniques for RCD are designed for static data sets, thus not suitable for time-evolving RCD applications. To address this challenge, in this paper, we first proposetwo incremental RCD algorithms, SIRD and BIRD. They arebuilt upon existing density-based techniques for RCD, andincrementally update the detection models, which provide 'timeflexible' RCD. Furthermore, based on BIRD, we propose amodified version named BIRD-LI to deal with the cases wherethe exact priors of the minority classes are not available. Wealso identify a critical task in RCD named query distribution. Itaims to allocate the limited budget into multiple time steps, suchthat the initial examples from the rare classes are detected asearly as possible with the minimum labeling cost. The proposedincremental RCD algorithms and various query distributionstrategies are evaluated empirically on both synthetic and real data.
KW - Incremental Learning
KW - Rare Category Detection
KW - Time-evolving Graph Mining
UR - http://www.scopus.com/inward/record.url?scp=84963626816&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84963626816&partnerID=8YFLogxK
U2 - 10.1109/ICDM.2015.120
DO - 10.1109/ICDM.2015.120
M3 - Conference contribution
AN - SCOPUS:84963626816
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 1135
EP - 1140
BT - Proceedings - 15th IEEE International Conference on Data Mining, ICDM 2015
A2 - Aggarwal, Charu
A2 - Zhou, Zhi-Hua
A2 - Tuzhilin, Alexander
A2 - Xiong, Hui
A2 - Wu, Xindong
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 15th IEEE International Conference on Data Mining, ICDM 2015
Y2 - 14 November 2015 through 17 November 2015
ER -