TY - GEN
T1 - QUINT
T2 - 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016
AU - Li, Liangyue
AU - Yao, Yuan
AU - Tang, Jie
AU - Fan, Wei
AU - Tong, Hanghang
N1 - This work is partially supported by the National Science Foundation under Grant No. IIS1017415, by DTRA under the grant number HDTRA1-16-0017, by Army Research Office under the contract number W911NF-16-1-0168, by National Institutes of Health under the grant number R01LM011986, Region II University Transportation Center under the project number 49997-33 25 and a Baidu gift. Jie Tang is supported by the National 863 Program (2014AA015103, 2015AA124102) and NSFC (2014CB340506, 2012CB316006)
PY - 2016/8/13
Y1 - 2016/8/13
N2 - Measuring node proximity on large scale networks is a fundamental building block in many application domains, ranging from computer vision, e-commerce, social networks, software engineering, disaster management to biology and epidemiology. The state of the art (e.g., random walk based methods) typically assumes the input network is given a priori, with the known network topology and the associated edge weights. A few recent works aim to further infer the optimal edge weights based on the side information. This paper generalizes the challenge in multiple dimensions, aiming to learn optimal networks for node proximity measures. First (optimization scope), our proposed formulation explores a much larger parameter space, so that it is able to simultaneously infer the optimal network topology and the associated edge weights. This is important as a noisy or missing edge could greatly mislead the network node proximity measures. Second (optimization granularity), while all the existing works assume one common optimal network, be it given as the input or learned by the algorithms, exists for all queries, our method performs optimization at a much finer granularity, essentially being able to infer an optimal network that is specific to a given query. Third (optimization efficiency), we carefully design our algorithms with a linear complexity wrt the neighborhood size of the user preference set. We perform extensive empirical evaluations on a diverse set of 10+ real networks, which show that the proposed algorithms (1) consistently outperform the existing methods on all six commonly used metrics; (2) empirically scale sub-linearly to billion-scale networks and (3) respond in a fraction of a second.
AB - Measuring node proximity on large scale networks is a fundamental building block in many application domains, ranging from computer vision, e-commerce, social networks, software engineering, disaster management to biology and epidemiology. The state of the art (e.g., random walk based methods) typically assumes the input network is given a priori, with the known network topology and the associated edge weights. A few recent works aim to further infer the optimal edge weights based on the side information. This paper generalizes the challenge in multiple dimensions, aiming to learn optimal networks for node proximity measures. First (optimization scope), our proposed formulation explores a much larger parameter space, so that it is able to simultaneously infer the optimal network topology and the associated edge weights. This is important as a noisy or missing edge could greatly mislead the network node proximity measures. Second (optimization granularity), while all the existing works assume one common optimal network, be it given as the input or learned by the algorithms, exists for all queries, our method performs optimization at a much finer granularity, essentially being able to infer an optimal network that is specific to a given query. Third (optimization efficiency), we carefully design our algorithms with a linear complexity wrt the neighborhood size of the user preference set. We perform extensive empirical evaluations on a diverse set of 10+ real networks, which show that the proposed algorithms (1) consistently outperform the existing methods on all six commonly used metrics; (2) empirically scale sub-linearly to billion-scale networks and (3) respond in a fraction of a second.
KW - Node proximity
KW - Optimal networks
UR - http://www.scopus.com/inward/record.url?scp=84985004020&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84985004020&partnerID=8YFLogxK
U2 - 10.1145/2939672.2939768
DO - 10.1145/2939672.2939768
M3 - Conference contribution
AN - SCOPUS:84985004020
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 985
EP - 994
BT - KDD 2016 - Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PB - Association for Computing Machinery
Y2 - 13 August 2016 through 17 August 2016
ER -