Efficient keyword-based search for top-K cells in text cube

Bolin Ding, Bo Zhao, Cindy Xide Lin, Jiawei Han, Chengxiang Zhai, Asok Srivastava, Nikunj C. Oza

Research output: Contribution to journalArticle

Abstract

Previous studies on supporting free-form keyword queries over RDBMSs provide users with linked structures (e.g., a set of joined tuples) that are relevant to a given keyword query. Most of them focus on ranking individual tuples from one table or joins of multiple tables containing a set of keywords. In this paper, we study the problem of keyword search in a data cube with text-rich dimension(s) (so-called text cube). The text cube is built on a multidimensional text database, where each row is associated with some text data (a document) and other structural dimensions (attributes). A cell in the text cube aggregates a set of documents with matching attribute values in a subset of dimensions. We define a keyword-based query language and an IR-style relevance model for scoring/ranking cells in the text cube. Given a keyword query, our goal is to find the top-k most relevant cells. We propose four approaches: inverted-index one-scan, document sorted-scan, bottom-up dynamic programming, and search-space ordering. The search-space ordering algorithm explores only a small portion of the text cube for finding the top-k answers, and enables early termination. Extensive experimental studies are conducted to verify the effectiveness and efficiency of the proposed approaches.

Original languageEnglish (US)
Article number5710919
Pages (from-to)1795-1810
Number of pages16
JournalIEEE Transactions on Knowledge and Data Engineering
Volume23
Issue number12
DOIs
StatePublished - Oct 31 2011

Keywords

  • Keyword search
  • data cube
  • multidimensional text data

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Computational Theory and Mathematics

Fingerprint Dive into the research topics of 'Efficient keyword-based search for top-K cells in text cube'. Together they form a unique fingerprint.

  • Cite this