Keyword search in text cube: Finding top-k relevant cells

Bolin Ding, Yintao Yu, Bo Zhao, Cindy Xide Lin, Jiawei Han, Chengxiang Zhai

Research output: Contribution to conferencePaperpeer-review


We study the problem of keyword search in a data cube with text-rich dimension(s) (so-called text cube). The text cube is built on a multidimensional text database, where each row is associated with some text data (e.g., a document) and other structural dimensions (attributes). A cell in the text cube aggregates a set of documents with matching attribute values in a subset of dimensions. A cel l document is the concatenation of all documents in a cell. Given a keyword query, our goal is to find the top-k most relevant cells (ranked according to the relevance scores of cell documents w.r.t. the given query) in the text cube. We define a keyword-based query language and apply IR-style relevance model for scoring and ranking cell documents in the text cube. We propose two efficient approaches to find the top-k answers. The proposed approaches support a general class of IR-style relevance scoring formulas that satisfy certain basic and common properties. One of them uses more time for pre-processing and less time for answering online queries; and the other one is more efficient in pre-processing and consumes more time for online queries. Experimental studies on the ASRS dataset are conducted to verify the efficiency and effectiveness of the proposed approaches.

Original languageEnglish (US)
Number of pages15
StatePublished - 2010
EventNASA Conference on Intelligent Data Understanding, CIDU 2010 - Mountain View, CA, United States
Duration: Oct 5 2010Oct 6 2010


OtherNASA Conference on Intelligent Data Understanding, CIDU 2010
Country/TerritoryUnited States
CityMountain View, CA

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software


Dive into the research topics of 'Keyword search in text cube: Finding top-k relevant cells'. Together they form a unique fingerprint.

Cite this