TY - GEN
T1 - Node Generation for Node Classification in Sparsely-Labeled Graphs
AU - Cui, Hang
AU - Abdelzaher, Tarek
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025
Y1 - 2025
N2 - In the broader machine learning literature, data-generation methods demonstrate promising results by generating additional informative training examples via augmenting sparse labels. Such methods are less studied in graphs due to the intricate dependencies among nodes in complex topology structures. This paper presents a novel node generation method that infuses a small set of high-quality synthesized nodes into the graph as additional labeled nodes to optimally expand the propagation of labeled information. By simply infusing additional nodes, the framework is orthogonal to the graph learning and downstream classification techniques, and thus is compatible with most popular graph pre-training (self-supervised learning), semi-supervised learning, and meta-learning methods. The contribution lies in designing the generated node set by solving a novel optimization problem. The optimization places the generated nodes in a manner that: (1) minimizes the classification loss to guarantee training accuracy and (2) maximizes label propagation to low-confidence nodes in the downstream task to ensure high-quality propagation. Theoretically, we show that the above dual optimization maximizes the global confidence of node classification. Our Experiments demonstrate statistically significant performance improvements over 14 baselines on 10 publicly available datasets.
AB - In the broader machine learning literature, data-generation methods demonstrate promising results by generating additional informative training examples via augmenting sparse labels. Such methods are less studied in graphs due to the intricate dependencies among nodes in complex topology structures. This paper presents a novel node generation method that infuses a small set of high-quality synthesized nodes into the graph as additional labeled nodes to optimally expand the propagation of labeled information. By simply infusing additional nodes, the framework is orthogonal to the graph learning and downstream classification techniques, and thus is compatible with most popular graph pre-training (self-supervised learning), semi-supervised learning, and meta-learning methods. The contribution lies in designing the generated node set by solving a novel optimization problem. The optimization places the generated nodes in a manner that: (1) minimizes the classification loss to guarantee training accuracy and (2) maximizes label propagation to low-confidence nodes in the downstream task to ensure high-quality propagation. Theoretically, we show that the above dual optimization maximizes the global confidence of node classification. Our Experiments demonstrate statistically significant performance improvements over 14 baselines on 10 publicly available datasets.
UR - http://www.scopus.com/inward/record.url?scp=85218467704&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85218467704&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-78541-2_25
DO - 10.1007/978-3-031-78541-2_25
M3 - Conference contribution
AN - SCOPUS:85218467704
SN - 9783031785405
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 405
EP - 421
BT - Social Networks Analysis and Mining - 16th International Conference, ASONAM 2024, Proceedings
A2 - Aiello, Luca Maria
A2 - Chakraborty, Tanmoy
A2 - Gaito, Sabrina
PB - Springer
T2 - 16th International Conference on Social Networks Analysis and Mining, ASONAM 2024
Y2 - 2 September 2024 through 5 September 2024
ER -