Substructure similarity search in graph databases

Xifeng Yan, Gabrielle Dawn Allen, Jiawei Han

Research output: Contribution to journalConference articlepeer-review

Abstract

Advanced database systems face a great challenge raised by the emergence of massive, complex structural data in bioinformatics, chem-informatics, and many other applications. The most fundamental support needed in these applications is the efficient search of complex structured data. Since exact matching is often too restrictive, similarity search of complex structures becomes a vital operation that must be supported efficiently. In this paper, we investigate the issues of substructure similarity search using indexed features in graph databases. By transforming the edge relaxation ratio of a query graph into the maximum allowed missing features, our structural filtering algorithm, called Grafil, can filter many graphs without performing pairwise similarity computations. It is further shown that using either too few or too many features can result in poor filtering performance. Thus the challenge is to design an effective feature set selection strategy for filtering, By examining the effect of different feature selection mechanisms, we develop a multi-filter composition strategy, where each filter uses a distinct and complementary subset of the features, We identify the criteria to form effective feature sets for filtering, and demonstrate that combining features with similar size and selectivity can improve the filtering and search performance significantly. Moreover, the concept presented in Grafil can be applied to searching approximate non-consecutive sequences, trees, and other complicated structures as well.

Original languageEnglish (US)
Pages (from-to)766-777
Number of pages12
JournalProceedings of the ACM SIGMOD International Conference on Management of Data
DOIs
StatePublished - 2005
EventSIGMOD 2005: ACM SIGMOD International Conference on Management of Data - Baltimore, MD, United States
Duration: Jun 14 2005Jun 16 2005

ASJC Scopus subject areas

  • Software
  • Information Systems

Fingerprint

Dive into the research topics of 'Substructure similarity search in graph databases'. Together they form a unique fingerprint.

Cite this