Attribute-Guided Network Sampling Mechanisms

Suhansanu Kumar, Hari Sundaram

Research output: Contribution to journalArticlepeer-review

Abstract

This article introduces a novel task-independent sampler for attributed networks. The problem is important because while data mining tasks on network content are common, sampling on internet-scale networks is costly. Link-trace samplers such as Snowball sampling, Forest Fire, Random Walk, and Metropolis-Hastings Random Walk are widely used for sampling from networks. The design of these attribute-agnostic samplers focuses on preserving salient properties of network structure, and are not optimized for tasks on node content. This article has three contributions. First, we propose a task-independent, attribute aware link-trace sampler grounded in Information Theory. Our sampler greedily adds to the sample the node with the most informative (i.e., surprising) neighborhood. The sampler tends to rapidly explore the attribute space, maximally reducing the surprise of unseen nodes. Second, we prove that content sampling is an NP-hard problem. A well-known algorithm best approximates the optimization solution within 1-1/e, but requires full access to the entire graph. Third, we show through empirical counterfactual analysis that in many real-world datasets, network structure does not hinder the performance of surprise based link-trace samplers. Experimental results over 18 real-world datasets reveal: surprise-based samplers are sample efficient and outperform the state-of-the-art attribute-agnostic samplers by a wide margin (e.g., 45% performance improvement in clustering tasks).

Original languageEnglish (US)
Article number3441445
JournalACM Transactions on Knowledge Discovery from Data
Volume15
Issue number4
DOIs
StatePublished - Jun 2021

Keywords

  • Task-independent sampling
  • attributed networks
  • data mining

ASJC Scopus subject areas

  • General Computer Science

Fingerprint

Dive into the research topics of 'Attribute-Guided Network Sampling Mechanisms'. Together they form a unique fingerprint.

Cite this