Keyword search in text cube: Finding top-k relevant cells

Bolin Ding, Yintao Yu, Bo Zhao, Cindy Xide Lin, Jiawei Han, Chengxiang Zhai

Research output: Contribution to conferencePaperpeer-review

Abstract

We study the problem of keyword search in a data cube with text-rich dimension(s) (so-called text cube). The text cube is built on a multidimensional text database, where each row is associated with some text data (e.g., a document) and other structural dimensions (attributes). A cell in the text cube aggregates a set of documents with matching attribute values in a subset of dimensions. A cel l document is the concatenation of all documents in a cell. Given a keyword query, our goal is to find the top-k most relevant cells (ranked according to the relevance scores of cell documents w.r.t. the given query) in the text cube. We define a keyword-based query language and apply IR-style relevance model for scoring and ranking cell documents in the text cube. We propose two efficient approaches to find the top-k answers. The proposed approaches support a general class of IR-style relevance scoring formulas that satisfy certain basic and common properties. One of them uses more time for pre-processing and less time for answering online queries; and the other one is more efficient in pre-processing and consumes more time for online queries. Experimental studies on the ASRS dataset are conducted to verify the efficiency and effectiveness of the proposed approaches.

Original languageEnglish (US)
Pages145-159
Number of pages15
StatePublished - 2010
EventNASA Conference on Intelligent Data Understanding, CIDU 2010 - Mountain View, CA, United States
Duration: Oct 5 2010Oct 6 2010

Other

OtherNASA Conference on Intelligent Data Understanding, CIDU 2010
Country/TerritoryUnited States
CityMountain View, CA
Period10/5/1010/6/10

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software

Fingerprint

Dive into the research topics of 'Keyword search in text cube: Finding top-k relevant cells'. Together they form a unique fingerprint.

Cite this