TY - GEN
T1 - G-Finder
T2 - 2019 IEEE International Conference on Big Data, Big Data 2019
AU - Liu, Lihui
AU - Du, Boxin
AU - Xu, Jiejun
AU - Tong, Hanghang
N1 - Funding Information:
Research Office under the contract number W911NF-16-1-0168, and by the U.S. Department of Homeland Security under Grant Award Number 2017-ST-061-QA0001. The content of the information in this document does not necessarily reflect the position or the policy of the Government, and no official endorsement should be inferred. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on. REFERENCES
Funding Information:
Fig. 10. Scalability of G-FINDER-Dynamic. B - Inexact subgraph matching. Inexact subgraph matching focuses on finding approximate subgraphs on a large data graph. There have been extensive studies on this problem. To name a few, Tong et al. [13] propose the best-effort pattern matching, which aims to maintain the topology of the query. Tian et al. [12] propose an approximate subgraph matching tool TALE with efficient indexing. Khan et al. [20] propose a heuristic approach NeMa based on a new definition of matching cost metric. Pientar et al. [19] propose an algorithm called MAGE which is an improved version of G-Ray. Different from G-Ray, it supports graphs with both node and edge attributes. Zhang et al. propose an inexact subgraph matching algorithm SAPPER [26] which utilizes the hybrid neighborhood unit structures in the index. Tian et al. [11] propose an approximate graph matching algorithm SAGA which employs a flexible graph distance model to measure similarities between graphs. He et al. [27] propose an index based algorithm called Closure-Tree to support both subgraph queries and similarity queries. VII. CONCLUSION In this paper, we study the approximate attributed subgraph matching problem and develop an effective and scalable algorithm (G-FINDER). First, we propose a data structure called LOOKUP-TABLE (LTB) to efficiently index the candidates of the query graph. Second, we propose an effective algorithm to build and search LOOKUP-TABLE-GRAPH (LTBG) in order to find the top-k approximate subgraphs. Finally, we conduct extensive experiments on real world data, which demonstrate that the proposed G-FINDER (1) achieves an up to 30% F1-Score improvement over the best baseline method, and (1) scales near-linearly to large data graphs with 1M + vertices and large query graphs with hundreds of nodes. Future work includes generalizing the proposed G-FINDER to (a) interactive subgraph matching (b) knowledge graph matching. VIII. ACKNOWLEDGEMENT This material is supported by the National Science Foundation under Grant No. IIS-1651203, IIS-1715385, IIS-1743040, and CNS-1629888, by DTRA under the grant number HDTRA1-16-0017, by the United States Air Force and DARPA under contract number FA8750-17-C-01539, by Army
Publisher Copyright:
© 2019 IEEE.
PY - 2019/12
Y1 - 2019/12
N2 - Subgraph matching is a core primitive across a number of disciplines, ranging from data mining, databases, information retrieval, computer vision to natural language processing. Despite decades of efforts, it is still highly challenging to balance between the matching accuracy and the computational efficiency, especially when the query graph and/or the data graph are large. In this paper, we propose an index-based algorithm (G-FINDER) to find the top-k approximate matching subgraphs. At the heart of the proposed algorithm are two techniques, including (1) a novel auxiliary data structure (LOOKUP-TABLE) in conjunction with a neighborhood expansion method to effectively and efficiently index candidate vertices, and (2) a dynamic filtering and refinement strategy to prune the false candidates at an early stage. The proposed G-FINDER bears some distinctive features, including (1) generality, being able to handle different types of inexact matching (e.g., missing nodes, missing edges, intermediate vertices) on node attributed and/or edge attributed graphs or multigraphs; (2) effectiveness, achieving up to 30% Fl-Score improvement over the best known competitor; and (3) efficiency, scaling near-linearly w.r.t. the size of the data graph as well as the query graph.
AB - Subgraph matching is a core primitive across a number of disciplines, ranging from data mining, databases, information retrieval, computer vision to natural language processing. Despite decades of efforts, it is still highly challenging to balance between the matching accuracy and the computational efficiency, especially when the query graph and/or the data graph are large. In this paper, we propose an index-based algorithm (G-FINDER) to find the top-k approximate matching subgraphs. At the heart of the proposed algorithm are two techniques, including (1) a novel auxiliary data structure (LOOKUP-TABLE) in conjunction with a neighborhood expansion method to effectively and efficiently index candidate vertices, and (2) a dynamic filtering and refinement strategy to prune the false candidates at an early stage. The proposed G-FINDER bears some distinctive features, including (1) generality, being able to handle different types of inexact matching (e.g., missing nodes, missing edges, intermediate vertices) on node attributed and/or edge attributed graphs or multigraphs; (2) effectiveness, achieving up to 30% Fl-Score improvement over the best known competitor; and (3) efficiency, scaling near-linearly w.r.t. the size of the data graph as well as the query graph.
KW - approximate matching
KW - subgraph index
KW - subgraph matching
UR - http://www.scopus.com/inward/record.url?scp=85081290751&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85081290751&partnerID=8YFLogxK
U2 - 10.1109/BigData47090.2019.9006525
DO - 10.1109/BigData47090.2019.9006525
M3 - Conference contribution
AN - SCOPUS:85081290751
T3 - Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019
SP - 513
EP - 522
BT - Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019
A2 - Baru, Chaitanya
A2 - Huan, Jun
A2 - Khan, Latifur
A2 - Hu, Xiaohua Tony
A2 - Ak, Ronay
A2 - Tian, Yuanyuan
A2 - Barga, Roger
A2 - Zaniolo, Carlo
A2 - Lee, Kisung
A2 - Ye, Yanfang Fanny
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 9 December 2019 through 12 December 2019
ER -