Using ghost edges for classification in sparsely labeled networks

Brian Gallagher, Hanghang Tong, Tina Eliassi-Rad, Christos Faloutsos

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We address the problem of classification in partially labeled networks (a.k.a. within-network classification) where observed class labels are sparse. Techniques for statistical relational learning have been shown to perform well on network classification tasks by exploiting dependencies between class labels of neighboring nodes. However, relational classifiers can fail when unlabeled nodes have too few labeled neighbors to support learning (during training phase) and/or inference (during testing phase). This situation arises in real-world problems when observed labels are sparse. In this paper, we propose a novel approach to within-network classification that combines aspects of statistical relational learning and semi-supervised learning to improve classification performance in sparse networks. Our approach works by adding "ghost edges" to a network, which enable the flow of information from labeled to unlabeled nodes. Through experiments on real-world data sets, we demonstrate that our approach performs well across a range of conditions where existing approaches, such as collective classification and semi-supervised learning, fail. On all tasks, our approach improves area under the ROC curve (AUC) by up to 15 points over existing approaches. Furthermore, we demonstrate that our approach runs in time proportional to L · E, where L is the number of labeled nodes and E is the number of edges.

Original languageEnglish (US)
Title of host publicationKDD 2008 - Proceedings of the 14th ACMKDD International Conference on Knowledge Discovery and Data Mining
Pages256-264
Number of pages9
DOIs
StatePublished - 2008
Externally publishedYes
Event14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008 - Las Vegas, NV, United States
Duration: Aug 24 2008Aug 27 2008

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Other

Other14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008
Country/TerritoryUnited States
CityLas Vegas, NV
Period8/24/088/27/08

Keywords

  • Collective classification
  • Random walk
  • Semi-supervised learning
  • Statistical relational learning

ASJC Scopus subject areas

  • Software
  • Information Systems

Fingerprint

Dive into the research topics of 'Using ghost edges for classification in sparsely labeled networks'. Together they form a unique fingerprint.

Cite this