Heterogeneous Network Crawling: Reaching Target Nodes by Motif-Guided Navigation

Changyu Wang, Kevin Chang, Pinghui Wang, Tao Qin, Xiaohong Guan

Research output: Contribution to journalArticlepeer-review

Abstract

With numerous nodes on online heterogeneous networks, how to reach and extract target nodes of our specic interests is a pressing problem. In this paper, we propose a novel heterogeneous network crawler, MCrawl. It addresses the problem via iterative online heterogeneous network crawling by navigating its available APIs, starting from a set of target nodes, i.e., seed nodes. We are facing two challenges towards addressing the problem. First, to navigate within a vast network, how do we start from a small set of target nodes In other words, which nodes in the current frontier and which direction shall we expand, to reach promising target nodes quickly We propose motif-based crawling to exploit the complex structures and rich semantics of heterogeneous networks. Second, in many scenarios, we do not have a classier to assess the quality of the harvested nodes and thus the motifs to expand. We develop a probabilistic inference framework to estimate the yield and harvest rates of motifs, achieving principled bootstrapping for crawling. Our experiment on real networks of MCrawl achieves signicant margins over baselines.

Original languageEnglish (US)
JournalIEEE Transactions on Knowledge and Data Engineering
DOIs
StateAccepted/In press - 2020

Keywords

  • Blogs
  • Crawlers
  • Heterogeneous network crawling
  • Heterogeneous networks
  • Navigation
  • Semantics
  • Social networking (online)
  • Task analysis
  • harvest and yield rate
  • label propagation
  • network motifs
  • probabilistic inference

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Computational Theory and Mathematics

Fingerprint Dive into the research topics of 'Heterogeneous Network Crawling: Reaching Target Nodes by Motif-Guided Navigation'. Together they form a unique fingerprint.

Cite this