TY - GEN
T1 - Nearest-neighbor-based active learning for rare category detection
AU - He, Jingrui
AU - Carbonell, Jaime
PY - 2008
Y1 - 2008
N2 - Rare category detection is an open challenge for active learning, especially in the de-novo case (no labeled examples), but of significant practical importance for data mining - e.g. detecting new financial transaction fraud patterns, where normal legitimate transactions dominate. This paper develops a new method for detecting an instance of each minority class via an unsupervised local-density-differential sampling strategy. Essentially a variable-scale nearest neighbor process is used to optimize the probability of sampling tightly-grouped minority classes, subject to a local smoothness assumption of the majority class. Results on both synthetic and real data sets are very positive, detecting each minority class with only a fraction of the actively sampled points required by random sampling and by Pelleg's Interleave method, the prior best technique in the sparse literature on this topic.
AB - Rare category detection is an open challenge for active learning, especially in the de-novo case (no labeled examples), but of significant practical importance for data mining - e.g. detecting new financial transaction fraud patterns, where normal legitimate transactions dominate. This paper develops a new method for detecting an instance of each minority class via an unsupervised local-density-differential sampling strategy. Essentially a variable-scale nearest neighbor process is used to optimize the probability of sampling tightly-grouped minority classes, subject to a local smoothness assumption of the majority class. Results on both synthetic and real data sets are very positive, detecting each minority class with only a fraction of the actively sampled points required by random sampling and by Pelleg's Interleave method, the prior best technique in the sparse literature on this topic.
UR - http://www.scopus.com/inward/record.url?scp=85138489236&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85138489236&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85138489236
SN - 160560352X
SN - 9781605603520
T3 - Advances in Neural Information Processing Systems 20 - Proceedings of the 2007 Conference
BT - Advances in Neural Information Processing Systems 20 - Proceedings of the 2007 Conference
PB - Curran Associates Inc.
T2 - 21st Annual Conference on Neural Information Processing Systems, NIPS 2007
Y2 - 3 December 2007 through 6 December 2007
ER -