Two-stage hashing for fast document retrieval

Hao Li, Wei Liu, Heng Ji

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This work fulfills sublinear time Nearest Neighbor Search (NNS) in massivescale document collections. The primary contribution is to propose a two-stage unsupervised hashing framework which harmoniously integrates two state-of-theart hashing algorithms Locality Sensitive Hashing (LSH) and Iterative Quantization (ITQ). LSH accounts for neighbor candidate pruning, while ITQ provides an efficient and effective reranking over the neighbor pool captured by LSH. Furthermore, the proposed hashing framework capitalizes on both term and topic similarity among documents, leading to precise document retrieval. The experimental results convincingly show that our hashing based document retrieval approach well approximates the conventional Information Retrieval (IR) method in terms of retrieving semantically similar documents, and meanwhile achieves a speedup of over one order of magnitude in query time.

Original languageEnglish (US)
Title of host publicationLong Papers
PublisherAssociation for Computational Linguistics (ACL)
Pages495-500
Number of pages6
ISBN (Print)9781937284732
StatePublished - Jan 1 2014
Externally publishedYes
Event52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014 - Baltimore, MD, United States
Duration: Jun 22 2014Jun 27 2014

Publication series

Name52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014 - Proceedings of the Conference
Volume2

Other

Other52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014
CountryUnited States
CityBaltimore, MD
Period6/22/146/27/14

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Fingerprint Dive into the research topics of 'Two-stage hashing for fast document retrieval'. Together they form a unique fingerprint.

  • Cite this

    Li, H., Liu, W., & Ji, H. (2014). Two-stage hashing for fast document retrieval. In Long Papers (pp. 495-500). (52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014 - Proceedings of the Conference; Vol. 2). Association for Computational Linguistics (ACL).