TY - GEN
T1 - Ranking objects based on relationships
AU - Chakrabarti, Kaushik
AU - Ganti, Venkatesh
AU - Han, Jiawei
AU - Xin, Dong
PY - 2006
Y1 - 2006
N2 - In many document collections, documents are related to objects such as document authors, products described in the document, or persons referred to in the document. In many applications, the goal is to find these objects that best match a set of keywords. However, the keywords may not necessarily occur in the target objects; they occur only in the documents. For example, in a product review database, a user might search for names of products (say, laptops) using keywords like "lightweight" and "business use" that occur only in the reviews but not in the names of laptops. In order to answer these queries, we need to exploit relationships between documents containing the keywords and the target objects related to those documents. Current keyword query paradigms do not exploit these relationships effectively and hence are inefficient for these queries.In this paper, we consider a class of queries called the "object finder" queries. Our main intuition is to exploit the relationships between searchable documents and related objects and further "aggregate" the document scores from these relationships in order to find the best ranking target objects. Building upon existing keyword search engines such as full text search, we design efficient algorithms that exploit the requirement of only the best k target objects to terminate early. The main challenge here is to push early termination through blocking operators such as group by and aggregation. Our experiments with real datasets and workloads demonstrate the effectiveness of our techniques. Although we present our techniques in the context of keyword search, our techniques apply to other types of ranked searches (e.g., multimedia search) as well.
AB - In many document collections, documents are related to objects such as document authors, products described in the document, or persons referred to in the document. In many applications, the goal is to find these objects that best match a set of keywords. However, the keywords may not necessarily occur in the target objects; they occur only in the documents. For example, in a product review database, a user might search for names of products (say, laptops) using keywords like "lightweight" and "business use" that occur only in the reviews but not in the names of laptops. In order to answer these queries, we need to exploit relationships between documents containing the keywords and the target objects related to those documents. Current keyword query paradigms do not exploit these relationships effectively and hence are inefficient for these queries.In this paper, we consider a class of queries called the "object finder" queries. Our main intuition is to exploit the relationships between searchable documents and related objects and further "aggregate" the document scores from these relationships in order to find the best ranking target objects. Building upon existing keyword search engines such as full text search, we design efficient algorithms that exploit the requirement of only the best k target objects to terminate early. The main challenge here is to push early termination through blocking operators such as group by and aggregation. Our experiments with real datasets and workloads demonstrate the effectiveness of our techniques. Although we present our techniques in the context of keyword search, our techniques apply to other types of ranked searches (e.g., multimedia search) as well.
KW - Aggregation
KW - Early termination
KW - Keyword search
KW - Named entities
KW - Ranking
KW - Relationships
KW - Top-k queries
UR - http://www.scopus.com/inward/record.url?scp=34250705666&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34250705666&partnerID=8YFLogxK
U2 - 10.1145/1142473.1142516
DO - 10.1145/1142473.1142516
M3 - Conference contribution
AN - SCOPUS:34250705666
SN - 1595934340
SN - 9781595934345
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 371
EP - 382
BT - SIGMOD 2006 - Proceedings of the ACM SIGMOD International Conference on Management of Data
T2 - 2006 ACM SIGMOD International Conference on Management of Data
Y2 - 27 June 2006 through 29 June 2006
ER -