Abstract
Similarity search is a fundamental problem in network analysis and can be applied in many applications, such as collaborator recommendation in coauthor networks, friend recommendation in social networks, and relation prediction in medical information networks. In this article, we propose a sampling-based method using random paths to estimate the similarities based on both common neighbors and structural contexts efficiently in very large homogeneous or heterogeneous information networks. We give a theoretical guarantee that the sampling size depends on the error-bound ϵ, the confidence level (1 - δ), and the path length T of each random walk. We perform an extensive empirical study on a Tencent microblogging network of 1,000,000,000 edges. We show that our algorithm can return top-k similar vertices for any vertex in a network 300× faster than the state-of-the-art methods.We develop a prototype system of recommending similar authors to demonstrate the effectiveness of our method.
Original language | English (US) |
---|---|
Article number | 13 |
Journal | ACM Transactions on Information Systems |
Volume | 36 |
Issue number | 2 |
DOIs | |
State | Published - Aug 2017 |
Externally published | Yes |
Keywords
- Heterogeneous information network
- Random path
- Similarity search
- Social network
- Vertex similarity
ASJC Scopus subject areas
- Information Systems
- General Business, Management and Accounting
- Computer Science Applications