Efficient parsing-based search over structured data

Aditya G Parameswaran, Raghav Kaushik, Arvind Arasu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Parsing-based search, i.e., parsing keyword search queries using grammars, is often used to override the traditional "bag-of-words" semantics in web search and enterprise search scenarios. Compared to the "bag-of- words" semantics, the parsing-based semantics is richer and more customizable. While a formalism for parsing-based semantics for keyword search has been proposed in prior work and ad-hoc implementations exist, the problem of designing efficient algorithms to support the semantics is largely unstudied. In this paper, we present a suite of efficient algorithms and auxiliary indexes for this problem. Our algorithms work for a broad classes of grammars used in practice, and cover a variety of database matching functions (set- and substring-containment, approximate and exact equality) and scoring functions (to filter and rank different parses). We formally analyze the time complexity of our algorithms and provide an empirical evaluation over real-world data to show that our algorithms scale well with the size of the database and grammar.

Original languageEnglish (US)
Title of host publicationCIKM 2013 - Proceedings of the 22nd ACM International Conference on Information and Knowledge Management
Pages49-58
Number of pages10
DOIs
StatePublished - 2013
Externally publishedYes
Event22nd ACM International Conference on Information and Knowledge Management, CIKM 2013 - San Francisco, CA, United States
Duration: Oct 27 2013Nov 1 2013

Other

Other22nd ACM International Conference on Information and Knowledge Management, CIKM 2013
Country/TerritoryUnited States
CitySan Francisco, CA
Period10/27/1311/1/13

Keywords

  • Efficiency
  • Keyword search
  • Parsing

ASJC Scopus subject areas

  • Decision Sciences(all)
  • Business, Management and Accounting(all)

Fingerprint

Dive into the research topics of 'Efficient parsing-based search over structured data'. Together they form a unique fingerprint.

Cite this