TY - JOUR
T1 - Metagraph-based learning on heterogeneous graphs
AU - Fang, Yuan
AU - Lin, Wenqing
AU - Zheng, Vincent W.
AU - Wu, Min
AU - Shi, Jiaqi
AU - Chang, Kevin Chen Chuan
AU - Li, Xiao Li
N1 - Funding Information:
This research was supported by the Singapore Ministry of Education (MOE) Academic Research Fund (AcRF) Tier 1 grant (Approval No. 18-C220-SMU-006).
Publisher Copyright:
© 1989-2012 IEEE.
PY - 2021/1/1
Y1 - 2021/1/1
N2 - Data in the form of graphs are prevalent, ranging from biological and social networks to citation graphs and the Web. In particular, most real-world graphs are heterogeneous, containing objects of multiple types, which present new opportunities for many problems on graphs. Consider a typical proximity search problem on graphs, which boils down to measuring the proximity between two given nodes. Most earlier studies on homogeneous or bipartite graphs only measure a generic form of proximity, without accounting for different 'semantic classes' - for instance, on a social network two users can be close for different reasons, such as being classmates or family members, which represent two distinct semantic classes. Learning these semantic classes are made possible on heterogeneous graphs through the concept of metagraphs. In this study, we identify metagraphs as a novel and effective means to characterize the common structures for a desired class of proximity. Subsequently, we propose a family of metagraph-based proximity, and employ a learning-to-rank technique that automatically learns the right parameters to suit the desired semantic class. In terms of efficiency, we develop a symmetry-based matching algorithm to speed up the computation of metagraph instances. Empirically, extensive experiments reveal that our metagraph-based proximity substantially outperforms the best competitor by more than 10 percent, and our matching algorithm can reduce matching time by more than half. As a further generalization, we aim to derive a general node and edge representation for heterogeneous graphs, in order to support arbitrary machine learning tasks beyond proximity search. In particular, we propose the finer-grained anchored metagraph, which is capable of discriminating the roles of nodes within the same metagraph. Finally, further experiments on the general representation show that we can outperform the state of the art significantly and consistently across various machine learning tasks.
AB - Data in the form of graphs are prevalent, ranging from biological and social networks to citation graphs and the Web. In particular, most real-world graphs are heterogeneous, containing objects of multiple types, which present new opportunities for many problems on graphs. Consider a typical proximity search problem on graphs, which boils down to measuring the proximity between two given nodes. Most earlier studies on homogeneous or bipartite graphs only measure a generic form of proximity, without accounting for different 'semantic classes' - for instance, on a social network two users can be close for different reasons, such as being classmates or family members, which represent two distinct semantic classes. Learning these semantic classes are made possible on heterogeneous graphs through the concept of metagraphs. In this study, we identify metagraphs as a novel and effective means to characterize the common structures for a desired class of proximity. Subsequently, we propose a family of metagraph-based proximity, and employ a learning-to-rank technique that automatically learns the right parameters to suit the desired semantic class. In terms of efficiency, we develop a symmetry-based matching algorithm to speed up the computation of metagraph instances. Empirically, extensive experiments reveal that our metagraph-based proximity substantially outperforms the best competitor by more than 10 percent, and our matching algorithm can reduce matching time by more than half. As a further generalization, we aim to derive a general node and edge representation for heterogeneous graphs, in order to support arbitrary machine learning tasks beyond proximity search. In particular, we propose the finer-grained anchored metagraph, which is capable of discriminating the roles of nodes within the same metagraph. Finally, further experiments on the general representation show that we can outperform the state of the art significantly and consistently across various machine learning tasks.
KW - Semantic proximity search
KW - graph mining
KW - heterogeneous graph representation
KW - meta-structures
UR - http://www.scopus.com/inward/record.url?scp=85097784698&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85097784698&partnerID=8YFLogxK
U2 - 10.1109/TKDE.2019.2922956
DO - 10.1109/TKDE.2019.2922956
M3 - Article
AN - SCOPUS:85097784698
SN - 1041-4347
VL - 33
SP - 154
EP - 168
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 1
M1 - 8744385
ER -