TY - CONF
T1 - SPARC: Self-Paced Network Representation for Few-Shot Rare Category Characterization
AU - Zhou, Dawei
AU - He, Jing Rui
AU - Yang, Hongxia
AU - Fan, Wei
N1 - Funding Information:
This work is supported by the United States Air Force and DARPA under contract number FA8750-17-C-0153 §, National Science Foundation under Grant No. IIS-1552654, Grant No. IIS-1813464 and Grant No. CNS-1629888, the U.S. Department of Homeland Security under Grant Award Number 2017-ST-061-QA0001, and an IBM Faculty Award. The views and conclusions are those of the authors and should not be interpreted as representing the official policies of the funding agencies or the government.
Publisher Copyright:
© 2018 Association for Computing Machinery.
PY - 2018/8
Y1 - 2018/8
N2 - In the era of big data, it is often the rare categories that are of great interest in many high-impact applications, ranging from financial fraud detection in online transaction networks to emerging trend detection in social networks, from network intrusion detection in computer networks to fault detection in manufacturing. As a result, rare category characterization becomes a fundamental learning task, which aims to accurately characterize the rare categories given limited label information. The unique challenge of rare category characterization, i.e., the non-separability nature of the rare categories from the majority classes, together with the availability of the multi-modal representation of the examples, poses a new research question: how can we learn a salient rare category oriented embedding representation such that the rare examples are well separated from the majority class examples in the embedding space, which facilitates the follow-up rare category characterization? To address this question, inspired by the family of curriculum learning that simulates the cognitive mechanism of human beings, we propose a self-paced framework named SPARC that gradually learns the rare category oriented network representation and the characterization model in a mutually beneficial way by shifting from the 'easy' concept to the target 'difficult' one, in order to facilitate more reliable label propagation to the large number of unlabeled examples. The experimental results on various real data demonstrate that our proposed SPARC algorithm: (1) shows a significant improvement over state-of-the-art graph embedding methods on representing the rare categories that are non-separable from the majority classes; (2) outperforms the existing methods on rare category characterization tasks.
AB - In the era of big data, it is often the rare categories that are of great interest in many high-impact applications, ranging from financial fraud detection in online transaction networks to emerging trend detection in social networks, from network intrusion detection in computer networks to fault detection in manufacturing. As a result, rare category characterization becomes a fundamental learning task, which aims to accurately characterize the rare categories given limited label information. The unique challenge of rare category characterization, i.e., the non-separability nature of the rare categories from the majority classes, together with the availability of the multi-modal representation of the examples, poses a new research question: how can we learn a salient rare category oriented embedding representation such that the rare examples are well separated from the majority class examples in the embedding space, which facilitates the follow-up rare category characterization? To address this question, inspired by the family of curriculum learning that simulates the cognitive mechanism of human beings, we propose a self-paced framework named SPARC that gradually learns the rare category oriented network representation and the characterization model in a mutually beneficial way by shifting from the 'easy' concept to the target 'difficult' one, in order to facilitate more reliable label propagation to the large number of unlabeled examples. The experimental results on various real data demonstrate that our proposed SPARC algorithm: (1) shows a significant improvement over state-of-the-art graph embedding methods on representing the rare categories that are non-separable from the majority classes; (2) outperforms the existing methods on rare category characterization tasks.
KW - Network Embedding
KW - Rare Category Analysis
KW - Self-Paced Learning
UR - http://www.scopus.com/inward/record.url?scp=85051496075&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85051496075&partnerID=8YFLogxK
U2 - 10.1145/3219819.3219968
DO - 10.1145/3219819.3219968
M3 - Paper
SP - 2807
EP - 2816
T2 - the 24th ACM SIGKDD International Conference
Y2 - 19 August 2018 through 23 August 2018
ER -