Similarity modeling on heterogeneous networks via automatic path discovery

Carl Yang, Mengxiong Liu, Frank He, Xikun Zhang, Jian Peng, Jiawei Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Heterogeneous networks are widely used to model real-world semi-structured data. The key challenge of learning over such networks is the modeling of node similarity under both network structures and contents. To deal with network structures, most existing works assume a given or enumerable set of meta-paths and then leverage them for the computation of meta-path-based proximities or network embeddings. However, expert knowledge for given meta-paths is not always available, and as the length of considered meta-paths increases, the number of possible paths grows exponentially, which makes the path searching process very costly. On the other hand, while there are often rich contents around network nodes, they have hardly been leveraged to further improve similarity modeling. In this work, to properly model node similarity in content-rich heterogeneous networks, we propose to automatically discover useful paths for pairs of nodes under both structural and content information. To this end, we combine continuous reinforcement learning and deep content embedding into a novel semi-supervised joint learning framework. Specifically, the supervised reinforcement learning component explores useful paths between a small set of example similar pairs of nodes, while the unsupervised deep embedding component captures node contents and enables inductive learning on the whole network. The two components are jointly trained in a closed loop to mutually enhance each other. Extensive experiments on three real-world heterogeneous networks demonstrate the supreme advantages of our algorithm. Code related to this paper is available at: https://github.com/yangji9181/AutoPath.

Original languageEnglish (US)
Title of host publicationMachine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2018, Proceedings
EditorsFrancesco Bonchi, Michele Berlingerio, Thomas Gärtner, Neil Hurley, Georgiana Ifrim
PublisherSpringer-Verlag
Pages37-54
Number of pages18
ISBN (Print)9783030109271
DOIs
StatePublished - Jan 1 2019
EventEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML-PKDD 2018 - Dublin, Ireland
Duration: Sep 10 2018Sep 14 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11052 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML-PKDD 2018
CountryIreland
CityDublin
Period9/10/189/14/18

Fingerprint

Heterogeneous networks
Heterogeneous Networks
Reinforcement learning
Path
Computer simulation
Modeling
Vertex of a graph
Reinforcement Learning
Network Structure
Inductive Learning
Semistructured Data
Similarity
Information Content
Supervised Learning
Experiments
Leverage
Closed-loop
Proximity
Model
Demonstrate

Keywords

  • Deep embedding
  • Heterogeneous networks
  • Similarity modeling

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Yang, C., Liu, M., He, F., Zhang, X., Peng, J., & Han, J. (2019). Similarity modeling on heterogeneous networks via automatic path discovery. In F. Bonchi, M. Berlingerio, T. Gärtner, N. Hurley, & G. Ifrim (Eds.), Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2018, Proceedings (pp. 37-54). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11052 LNAI). Springer-Verlag. https://doi.org/10.1007/978-3-030-10928-8_3

Similarity modeling on heterogeneous networks via automatic path discovery. / Yang, Carl; Liu, Mengxiong; He, Frank; Zhang, Xikun; Peng, Jian; Han, Jiawei.

Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2018, Proceedings. ed. / Francesco Bonchi; Michele Berlingerio; Thomas Gärtner; Neil Hurley; Georgiana Ifrim. Springer-Verlag, 2019. p. 37-54 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11052 LNAI).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yang, C, Liu, M, He, F, Zhang, X, Peng, J & Han, J 2019, Similarity modeling on heterogeneous networks via automatic path discovery. in F Bonchi, M Berlingerio, T Gärtner, N Hurley & G Ifrim (eds), Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2018, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11052 LNAI, Springer-Verlag, pp. 37-54, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML-PKDD 2018, Dublin, Ireland, 9/10/18. https://doi.org/10.1007/978-3-030-10928-8_3
Yang C, Liu M, He F, Zhang X, Peng J, Han J. Similarity modeling on heterogeneous networks via automatic path discovery. In Bonchi F, Berlingerio M, Gärtner T, Hurley N, Ifrim G, editors, Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2018, Proceedings. Springer-Verlag. 2019. p. 37-54. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-030-10928-8_3
Yang, Carl ; Liu, Mengxiong ; He, Frank ; Zhang, Xikun ; Peng, Jian ; Han, Jiawei. / Similarity modeling on heterogeneous networks via automatic path discovery. Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2018, Proceedings. editor / Francesco Bonchi ; Michele Berlingerio ; Thomas Gärtner ; Neil Hurley ; Georgiana Ifrim. Springer-Verlag, 2019. pp. 37-54 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{46165442456f4f9694c9852fcbb77908,
title = "Similarity modeling on heterogeneous networks via automatic path discovery",
abstract = "Heterogeneous networks are widely used to model real-world semi-structured data. The key challenge of learning over such networks is the modeling of node similarity under both network structures and contents. To deal with network structures, most existing works assume a given or enumerable set of meta-paths and then leverage them for the computation of meta-path-based proximities or network embeddings. However, expert knowledge for given meta-paths is not always available, and as the length of considered meta-paths increases, the number of possible paths grows exponentially, which makes the path searching process very costly. On the other hand, while there are often rich contents around network nodes, they have hardly been leveraged to further improve similarity modeling. In this work, to properly model node similarity in content-rich heterogeneous networks, we propose to automatically discover useful paths for pairs of nodes under both structural and content information. To this end, we combine continuous reinforcement learning and deep content embedding into a novel semi-supervised joint learning framework. Specifically, the supervised reinforcement learning component explores useful paths between a small set of example similar pairs of nodes, while the unsupervised deep embedding component captures node contents and enables inductive learning on the whole network. The two components are jointly trained in a closed loop to mutually enhance each other. Extensive experiments on three real-world heterogeneous networks demonstrate the supreme advantages of our algorithm. Code related to this paper is available at: https://github.com/yangji9181/AutoPath.",
keywords = "Deep embedding, Heterogeneous networks, Similarity modeling",
author = "Carl Yang and Mengxiong Liu and Frank He and Xikun Zhang and Jian Peng and Jiawei Han",
year = "2019",
month = "1",
day = "1",
doi = "10.1007/978-3-030-10928-8_3",
language = "English (US)",
isbn = "9783030109271",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer-Verlag",
pages = "37--54",
editor = "Francesco Bonchi and Michele Berlingerio and Thomas G{\"a}rtner and Neil Hurley and Georgiana Ifrim",
booktitle = "Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2018, Proceedings",

}

TY - GEN

T1 - Similarity modeling on heterogeneous networks via automatic path discovery

AU - Yang, Carl

AU - Liu, Mengxiong

AU - He, Frank

AU - Zhang, Xikun

AU - Peng, Jian

AU - Han, Jiawei

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Heterogeneous networks are widely used to model real-world semi-structured data. The key challenge of learning over such networks is the modeling of node similarity under both network structures and contents. To deal with network structures, most existing works assume a given or enumerable set of meta-paths and then leverage them for the computation of meta-path-based proximities or network embeddings. However, expert knowledge for given meta-paths is not always available, and as the length of considered meta-paths increases, the number of possible paths grows exponentially, which makes the path searching process very costly. On the other hand, while there are often rich contents around network nodes, they have hardly been leveraged to further improve similarity modeling. In this work, to properly model node similarity in content-rich heterogeneous networks, we propose to automatically discover useful paths for pairs of nodes under both structural and content information. To this end, we combine continuous reinforcement learning and deep content embedding into a novel semi-supervised joint learning framework. Specifically, the supervised reinforcement learning component explores useful paths between a small set of example similar pairs of nodes, while the unsupervised deep embedding component captures node contents and enables inductive learning on the whole network. The two components are jointly trained in a closed loop to mutually enhance each other. Extensive experiments on three real-world heterogeneous networks demonstrate the supreme advantages of our algorithm. Code related to this paper is available at: https://github.com/yangji9181/AutoPath.

AB - Heterogeneous networks are widely used to model real-world semi-structured data. The key challenge of learning over such networks is the modeling of node similarity under both network structures and contents. To deal with network structures, most existing works assume a given or enumerable set of meta-paths and then leverage them for the computation of meta-path-based proximities or network embeddings. However, expert knowledge for given meta-paths is not always available, and as the length of considered meta-paths increases, the number of possible paths grows exponentially, which makes the path searching process very costly. On the other hand, while there are often rich contents around network nodes, they have hardly been leveraged to further improve similarity modeling. In this work, to properly model node similarity in content-rich heterogeneous networks, we propose to automatically discover useful paths for pairs of nodes under both structural and content information. To this end, we combine continuous reinforcement learning and deep content embedding into a novel semi-supervised joint learning framework. Specifically, the supervised reinforcement learning component explores useful paths between a small set of example similar pairs of nodes, while the unsupervised deep embedding component captures node contents and enables inductive learning on the whole network. The two components are jointly trained in a closed loop to mutually enhance each other. Extensive experiments on three real-world heterogeneous networks demonstrate the supreme advantages of our algorithm. Code related to this paper is available at: https://github.com/yangji9181/AutoPath.

KW - Deep embedding

KW - Heterogeneous networks

KW - Similarity modeling

UR - http://www.scopus.com/inward/record.url?scp=85061147723&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85061147723&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-10928-8_3

DO - 10.1007/978-3-030-10928-8_3

M3 - Conference contribution

AN - SCOPUS:85061147723

SN - 9783030109271

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 37

EP - 54

BT - Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2018, Proceedings

A2 - Bonchi, Francesco

A2 - Berlingerio, Michele

A2 - Gärtner, Thomas

A2 - Hurley, Neil

A2 - Ifrim, Georgiana

PB - Springer-Verlag

ER -