Neighbor-sensitive hashing

Yongjoo Park, Michael Cafarella, Barzan Mozafari

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

Approximate kNN (κ-nearest neighbor) techniques using binary hash functions are among the most commonly used approaches for overcoming the prohibitive cost of performing exact kNN queries. However, the success of these techniques largely depends on their hash functions' ability to distinguish kNN items; that is, the kNN items retrieved based on data items' hashcodes, should include as many true kNN items as possible. A widely-adopted principle for this process is to ensure that similar items are assigned to the same hashcode so that the items with the hashcodes similar to a query's hashcode are likely to be true neighbors. In this work, we abandon this heavily-utilized principle and pursue the opposite direction for generating more effective hash functions for kNN tasks. That is, we aim to increase the distance between similar items in the hashcode space, instead of reducing it. Our contribution begins by providing theoretical analysis on why this revolutionary and seemingly counter-intuitive approach leads to a more accurate identification of kNN items. Our analysis is followed by a proposal for a hashing algorithm that embeds this novel principle. Our empirical studies confirm that a hashing algorithm based on this counter-intuitive idea significantly improves the efficiency and accuracy of state-of-the-art techniques.

Original languageEnglish (US)
Title of host publicationProceedings of the VLDB Endowment
PublisherAssociation for Computing Machinery
Pages144-155
Number of pages12
Edition3
StatePublished - 2016
Externally publishedYes
Event42nd International Conference on Very Large Data Bases, VLDB 2016 - Delhi, India
Duration: Sep 5 2016Sep 9 2016

Publication series

NameProceedings of the VLDB Endowment
Number3
Volume9
ISSN (Electronic)2150-8097

Other

Other42nd International Conference on Very Large Data Bases, VLDB 2016
Country/TerritoryIndia
CityDelhi
Period9/5/169/9/16

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • General Computer Science

Fingerprint

Dive into the research topics of 'Neighbor-sensitive hashing'. Together they form a unique fingerprint.

Cite this