TY - GEN
T1 - Using ghost edges for classification in sparsely labeled networks
AU - Gallagher, Brian
AU - Tong, Hanghang
AU - Eliassi-Rad, Tina
AU - Faloutsos, Christos
PY - 2008
Y1 - 2008
N2 - We address the problem of classification in partially labeled networks (a.k.a. within-network classification) where observed class labels are sparse. Techniques for statistical relational learning have been shown to perform well on network classification tasks by exploiting dependencies between class labels of neighboring nodes. However, relational classifiers can fail when unlabeled nodes have too few labeled neighbors to support learning (during training phase) and/or inference (during testing phase). This situation arises in real-world problems when observed labels are sparse. In this paper, we propose a novel approach to within-network classification that combines aspects of statistical relational learning and semi-supervised learning to improve classification performance in sparse networks. Our approach works by adding "ghost edges" to a network, which enable the flow of information from labeled to unlabeled nodes. Through experiments on real-world data sets, we demonstrate that our approach performs well across a range of conditions where existing approaches, such as collective classification and semi-supervised learning, fail. On all tasks, our approach improves area under the ROC curve (AUC) by up to 15 points over existing approaches. Furthermore, we demonstrate that our approach runs in time proportional to L · E, where L is the number of labeled nodes and E is the number of edges.
AB - We address the problem of classification in partially labeled networks (a.k.a. within-network classification) where observed class labels are sparse. Techniques for statistical relational learning have been shown to perform well on network classification tasks by exploiting dependencies between class labels of neighboring nodes. However, relational classifiers can fail when unlabeled nodes have too few labeled neighbors to support learning (during training phase) and/or inference (during testing phase). This situation arises in real-world problems when observed labels are sparse. In this paper, we propose a novel approach to within-network classification that combines aspects of statistical relational learning and semi-supervised learning to improve classification performance in sparse networks. Our approach works by adding "ghost edges" to a network, which enable the flow of information from labeled to unlabeled nodes. Through experiments on real-world data sets, we demonstrate that our approach performs well across a range of conditions where existing approaches, such as collective classification and semi-supervised learning, fail. On all tasks, our approach improves area under the ROC curve (AUC) by up to 15 points over existing approaches. Furthermore, we demonstrate that our approach runs in time proportional to L · E, where L is the number of labeled nodes and E is the number of edges.
KW - Collective classification
KW - Random walk
KW - Semi-supervised learning
KW - Statistical relational learning
UR - http://www.scopus.com/inward/record.url?scp=65449133627&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=65449133627&partnerID=8YFLogxK
U2 - 10.1145/1401890.1401925
DO - 10.1145/1401890.1401925
M3 - Conference contribution
AN - SCOPUS:65449133627
SN - 9781605581934
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 256
EP - 264
BT - KDD 2008 - Proceedings of the 14th ACMKDD International Conference on Knowledge Discovery and Data Mining
T2 - 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008
Y2 - 24 August 2008 through 27 August 2008
ER -