TY - GEN
T1 - Unsupervised Node Clustering via Contrastive Hard Sampling
AU - Cui, Hang
AU - Abdelzaher, Tarek
N1 - Research reported in this paper was sponsored in part by DARPA award HR001121C0165, DARPA award HR00112290105, and DoD Basic Research Office award HQ00342110002. It was also supported in part by ACE, one of the seven centers in JUMP 2.0, a Semiconductor Research Corporation (SRC) program sponsored by DARPA
PY - 2024
Y1 - 2024
N2 - This paper introduces a fine-grained contrastive learning scheme for unsupervised node clustering. Previous clustering methods only focus on a small feature set (class-dependent features), which demonstrates explicit clustering characteristics, ignoring the rest of the feature spaces (class-invariant features). This paper exploits class-invariant features via graph contrastive learning to discover additional high-quality features for unsupervised clustering. We formulate a novel node-level fine-grained augmentation framework for self-supervised learning, which iteratively identifies competitive contrastive samples from the whole feature spaces, in the form of positive and negative examples of node relations. While positive examples of node relations are usually expressed as edges in graph homophily, negative examples are implicit without a direct edge. We show, however, that simply sampling nodes beyond the local neighborhood results in less competitive negative pairs, that are less effective for contrastive learning. Inspired by counterfactual augmentation, we instead sample competitive negative node relations by creating virtual nodes that inherit (in a self-supervised fashion) class-invariant features, while altering class-dependent features, creating contrasting pairs that lie closer to the boundary and offering better contrast. Consequently, our experiments demonstrate significant improvements in supervised node clustering tasks on six baselines and six real-world social network datasets.
AB - This paper introduces a fine-grained contrastive learning scheme for unsupervised node clustering. Previous clustering methods only focus on a small feature set (class-dependent features), which demonstrates explicit clustering characteristics, ignoring the rest of the feature spaces (class-invariant features). This paper exploits class-invariant features via graph contrastive learning to discover additional high-quality features for unsupervised clustering. We formulate a novel node-level fine-grained augmentation framework for self-supervised learning, which iteratively identifies competitive contrastive samples from the whole feature spaces, in the form of positive and negative examples of node relations. While positive examples of node relations are usually expressed as edges in graph homophily, negative examples are implicit without a direct edge. We show, however, that simply sampling nodes beyond the local neighborhood results in less competitive negative pairs, that are less effective for contrastive learning. Inspired by counterfactual augmentation, we instead sample competitive negative node relations by creating virtual nodes that inherit (in a self-supervised fashion) class-invariant features, while altering class-dependent features, creating contrasting pairs that lie closer to the boundary and offering better contrast. Consequently, our experiments demonstrate significant improvements in supervised node clustering tasks on six baselines and six real-world social network datasets.
KW - Clustering
KW - Contrastive learning
UR - http://www.scopus.com/inward/record.url?scp=85203596624&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85203596624&partnerID=8YFLogxK
U2 - 10.1007/978-981-97-5572-1_18
DO - 10.1007/978-981-97-5572-1_18
M3 - Conference contribution
AN - SCOPUS:85203596624
SN - 9789819755714
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 285
EP - 300
BT - Database Systems for Advanced Applications - 29th International Conference, DASFAA 2024, Proceedings
A2 - Onizuka, Makoto
A2 - Lee, Jae-Gil
A2 - Tong, Yongxin
A2 - Xiao, Chuan
A2 - Ishikawa, Yoshiharu
A2 - Lu, Kejing
A2 - Amer-Yahia, Sihem
A2 - Jagadish, H.V.
PB - Springer
T2 - 29th International Conference on Database Systems for Advanced Applications, DASFAA 2024
Y2 - 2 July 2024 through 5 July 2024
ER -