Towards graph containment search and indexing

Chen Chen, Xifeng Yan, Gabrielle Dawn Allen, Jiawei Han, Dong Qing Zhang, Xiaohui Gu

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Given a set of model graphs D and a query graph q, containment search aims to find all model graphs g ∈ D such that q contains g (q ⊇ g). Due to the wide adoption of graph models, fast containment search of graph data finds many applications in various domains. In comparison to traditional graph search that retrieves all the graphs containing q (q ⊆ g), containment search has its own indexing characteristics that have not yet been examined. In this paper, we perform a systematic study on these characteristics and propose a contrast subgraph-based indexing model, called cIndex. Contrast subgraphs capture the structure differences between model graphs and query graphs, and are thus perfect for indexing due to their high selectivity. Using a redundancy-aware feature selection process, cIndex can sort out a set of significant and distinctive contrast subgraphs and maximize its indexing capability. We show that it is NP-complete to choose the best set of indexing features, and our greedy algorithm can approximate the one-level optimal index within a ratio of 1-1/e. Taking this solution as a base indexing model, we further extend it to accommodate hierarchical indexing methodologies and apply data space clustering and sampling techniques to reduce the index construction time. The proposed methodology provides a general solution to containment search and indexing, not only for graphs, but also for any data with transitive relations as well. Experimental results on real test data show that cIndex achieves near-optimal pruning power on various containment search workloads, and confirms its obvious advantage over indices built for traditional graph search in this new scenario.

Original languageEnglish (US)
Title of host publication33rd International Conference on Very Large Data Bases, VLDB 2007 - Conference Proceedings
EditorsJohannes Gehrke, Christoph Koch, Minos Garofalakis, Karl Aberer, Carl-Christian Kanne, Erich J. Neuhold, Venkatesh Ganti, Wolfgang Klas, Chee-Yong Chan, Divesh Srivastava, Dana Florescu, Anand Deshpande
PublisherAssociation for Computing Machinery, Inc
Number of pages12
ISBN (Electronic)9781595936493
StatePublished - 2007
Event33rd International Conference on Very Large Data Bases, VLDB 2007 - Vienna, Austria
Duration: Sep 23 2007Sep 27 2007

Publication series

Name33rd International Conference on Very Large Data Bases, VLDB 2007 - Conference Proceedings


Other33rd International Conference on Very Large Data Bases, VLDB 2007

ASJC Scopus subject areas

  • Hardware and Architecture
  • Information Systems and Management
  • Information Systems
  • Software


Dive into the research topics of 'Towards graph containment search and indexing'. Together they form a unique fingerprint.

Cite this