Abstract
We study the problem of keyword search in a data cube with text-rich dimension(s) (so-called text cube). The text cube is built on a multidimensional text database, where each row is associated with some text data (e.g., a document) and other structural dimensions (attributes). A cell in the text cube aggregates a set of documents with matching attribute values in a subset of dimensions. A cel l document is the concatenation of all documents in a cell. Given a keyword query, our goal is to find the top-k most relevant cells (ranked according to the relevance scores of cell documents w.r.t. the given query) in the text cube. We define a keyword-based query language and apply IR-style relevance model for scoring and ranking cell documents in the text cube. We propose two efficient approaches to find the top-k answers. The proposed approaches support a general class of IR-style relevance scoring formulas that satisfy certain basic and common properties. One of them uses more time for pre-processing and less time for answering online queries; and the other one is more efficient in pre-processing and consumes more time for online queries. Experimental studies on the ASRS dataset are conducted to verify the efficiency and effectiveness of the proposed approaches.
Original language | English (US) |
---|---|
Pages | 145-159 |
Number of pages | 15 |
State | Published - 2010 |
Event | NASA Conference on Intelligent Data Understanding, CIDU 2010 - Mountain View, CA, United States Duration: Oct 5 2010 → Oct 6 2010 |
Other
Other | NASA Conference on Intelligent Data Understanding, CIDU 2010 |
---|---|
Country/Territory | United States |
City | Mountain View, CA |
Period | 10/5/10 → 10/6/10 |
ASJC Scopus subject areas
- Artificial Intelligence
- Software