Keyword search in text cube: Finding top-k relevant cells

Bolin Ding, Yintao Yu, Bo Zhao, Cindy Xide Lin, Jiawei Han, Chengxiang Zhai

Research output: Contribution to conferencePaper

Abstract

We study the problem of keyword search in a data cube with text-rich dimension(s) (so-called text cube). The text cube is built on a multidimensional text database, where each row is associated with some text data (e.g., a document) and other structural dimensions (attributes). A cell in the text cube aggregates a set of documents with matching attribute values in a subset of dimensions. A cel l document is the concatenation of all documents in a cell. Given a keyword query, our goal is to find the top-k most relevant cells (ranked according to the relevance scores of cell documents w.r.t. the given query) in the text cube. We define a keyword-based query language and apply IR-style relevance model for scoring and ranking cell documents in the text cube. We propose two efficient approaches to find the top-k answers. The proposed approaches support a general class of IR-style relevance scoring formulas that satisfy certain basic and common properties. One of them uses more time for pre-processing and less time for answering online queries; and the other one is more efficient in pre-processing and consumes more time for online queries. Experimental studies on the ASRS dataset are conducted to verify the efficiency and effectiveness of the proposed approaches.

Original languageEnglish (US)
Pages145-159
Number of pages15
StatePublished - Dec 1 2010
EventNASA Conference on Intelligent Data Understanding, CIDU 2010 - Mountain View, CA, United States
Duration: Oct 5 2010Oct 6 2010

Other

OtherNASA Conference on Intelligent Data Understanding, CIDU 2010
CountryUnited States
CityMountain View, CA
Period10/5/1010/6/10

Fingerprint

Query languages
Processing

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software

Cite this

Ding, B., Yu, Y., Zhao, B., Lin, C. X., Han, J., & Zhai, C. (2010). Keyword search in text cube: Finding top-k relevant cells. 145-159. Paper presented at NASA Conference on Intelligent Data Understanding, CIDU 2010, Mountain View, CA, United States.

Keyword search in text cube : Finding top-k relevant cells. / Ding, Bolin; Yu, Yintao; Zhao, Bo; Lin, Cindy Xide; Han, Jiawei; Zhai, Chengxiang.

2010. 145-159 Paper presented at NASA Conference on Intelligent Data Understanding, CIDU 2010, Mountain View, CA, United States.

Research output: Contribution to conferencePaper

Ding, B, Yu, Y, Zhao, B, Lin, CX, Han, J & Zhai, C 2010, 'Keyword search in text cube: Finding top-k relevant cells', Paper presented at NASA Conference on Intelligent Data Understanding, CIDU 2010, Mountain View, CA, United States, 10/5/10 - 10/6/10 pp. 145-159.
Ding B, Yu Y, Zhao B, Lin CX, Han J, Zhai C. Keyword search in text cube: Finding top-k relevant cells. 2010. Paper presented at NASA Conference on Intelligent Data Understanding, CIDU 2010, Mountain View, CA, United States.
Ding, Bolin ; Yu, Yintao ; Zhao, Bo ; Lin, Cindy Xide ; Han, Jiawei ; Zhai, Chengxiang. / Keyword search in text cube : Finding top-k relevant cells. Paper presented at NASA Conference on Intelligent Data Understanding, CIDU 2010, Mountain View, CA, United States.15 p.
@conference{6de44fcaf4364d1b95e9d27bb6cfa384,
title = "Keyword search in text cube: Finding top-k relevant cells",
abstract = "We study the problem of keyword search in a data cube with text-rich dimension(s) (so-called text cube). The text cube is built on a multidimensional text database, where each row is associated with some text data (e.g., a document) and other structural dimensions (attributes). A cell in the text cube aggregates a set of documents with matching attribute values in a subset of dimensions. A cel l document is the concatenation of all documents in a cell. Given a keyword query, our goal is to find the top-k most relevant cells (ranked according to the relevance scores of cell documents w.r.t. the given query) in the text cube. We define a keyword-based query language and apply IR-style relevance model for scoring and ranking cell documents in the text cube. We propose two efficient approaches to find the top-k answers. The proposed approaches support a general class of IR-style relevance scoring formulas that satisfy certain basic and common properties. One of them uses more time for pre-processing and less time for answering online queries; and the other one is more efficient in pre-processing and consumes more time for online queries. Experimental studies on the ASRS dataset are conducted to verify the efficiency and effectiveness of the proposed approaches.",
author = "Bolin Ding and Yintao Yu and Bo Zhao and Lin, {Cindy Xide} and Jiawei Han and Chengxiang Zhai",
year = "2010",
month = "12",
day = "1",
language = "English (US)",
pages = "145--159",
note = "NASA Conference on Intelligent Data Understanding, CIDU 2010 ; Conference date: 05-10-2010 Through 06-10-2010",

}

TY - CONF

T1 - Keyword search in text cube

T2 - Finding top-k relevant cells

AU - Ding, Bolin

AU - Yu, Yintao

AU - Zhao, Bo

AU - Lin, Cindy Xide

AU - Han, Jiawei

AU - Zhai, Chengxiang

PY - 2010/12/1

Y1 - 2010/12/1

N2 - We study the problem of keyword search in a data cube with text-rich dimension(s) (so-called text cube). The text cube is built on a multidimensional text database, where each row is associated with some text data (e.g., a document) and other structural dimensions (attributes). A cell in the text cube aggregates a set of documents with matching attribute values in a subset of dimensions. A cel l document is the concatenation of all documents in a cell. Given a keyword query, our goal is to find the top-k most relevant cells (ranked according to the relevance scores of cell documents w.r.t. the given query) in the text cube. We define a keyword-based query language and apply IR-style relevance model for scoring and ranking cell documents in the text cube. We propose two efficient approaches to find the top-k answers. The proposed approaches support a general class of IR-style relevance scoring formulas that satisfy certain basic and common properties. One of them uses more time for pre-processing and less time for answering online queries; and the other one is more efficient in pre-processing and consumes more time for online queries. Experimental studies on the ASRS dataset are conducted to verify the efficiency and effectiveness of the proposed approaches.

AB - We study the problem of keyword search in a data cube with text-rich dimension(s) (so-called text cube). The text cube is built on a multidimensional text database, where each row is associated with some text data (e.g., a document) and other structural dimensions (attributes). A cell in the text cube aggregates a set of documents with matching attribute values in a subset of dimensions. A cel l document is the concatenation of all documents in a cell. Given a keyword query, our goal is to find the top-k most relevant cells (ranked according to the relevance scores of cell documents w.r.t. the given query) in the text cube. We define a keyword-based query language and apply IR-style relevance model for scoring and ranking cell documents in the text cube. We propose two efficient approaches to find the top-k answers. The proposed approaches support a general class of IR-style relevance scoring formulas that satisfy certain basic and common properties. One of them uses more time for pre-processing and less time for answering online queries; and the other one is more efficient in pre-processing and consumes more time for online queries. Experimental studies on the ASRS dataset are conducted to verify the efficiency and effectiveness of the proposed approaches.

UR - http://www.scopus.com/inward/record.url?scp=84877877300&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84877877300&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:84877877300

SP - 145

EP - 159

ER -