### Abstract

We study lower bounds for Locality-Sensitive Hashing (LSH) in the strongest setting: point sets in {0,1}^{d} under the Hamming distance. Recall that H is said to be an (r,cr,p, q)-sensitive hash family if all pairs x,y ∈ {0,1}^{d} with dist(x,y) ≤ r have probability at least p of collision under a randomly chosen h ∈ H, whereas all pairs x,y ∈ {0,1}^{d} with dist(x,y) ≥ cr have probability at most q of collision. Typically, one considers d → ∞, with c > 1 fixed and q bounded away from 0. For its applications to approximate nearest-neighbor search in high dimensions, the quality of an LSH family H is governed by how small its ρ parameter ρ = ln(1/p)/ln(1/q) is as a function of the parameter c. The seminal paper of Indyk and Motwani [1998] showed that for each c ≥ 1, the extremely simple family H = {x → x_{i}: i ∈ [d]} achieves ρ ≤ 1/c. The only known lower bound, due to Motwani et al. [2007], is that ρ must be at least (e^{1/c} - 1)/(e^{1/c} + 1) ≥.46/c (minus o_{d}(1)). The contribution of this article is twofold. (1) We show the "optimal" lower bound for ρ: it must be at least 1/c (minus o_{d}(1)). Our proof is very simple, following almost immediately from the observation that the noise stability of a boolean function at time t is a log-convex function of t. (2) We raise and discuss the following issue: neither the application of LSH to nearest-neighbor search nor the known LSH lower bounds hold as stated if the q parameter is tiny. Here, "tiny" means q = 2^{-Θ(d)}, a parameter range we believe is natural.

Original language | English (US) |
---|---|

Article number | 5 |

Journal | ACM Transactions on Computation Theory |

Volume | 6 |

Issue number | 1 |

DOIs | |

State | Published - Mar 2014 |

Externally published | Yes |

### Fingerprint

### Keywords

- Fourier analysis of boolean functions
- Locality-Sensitive Hashing
- LSH
- Noise sensitivity
- Noise stability

### ASJC Scopus subject areas

- Theoretical Computer Science
- Computational Theory and Mathematics

### Cite this

*ACM Transactions on Computation Theory*,

*6*(1), [5]. https://doi.org/10.1145/2578221

**Optimal lower bounds for locality-sensitive hashing (except when q is tiny).** / O'Donnell, Ryan; Wu, Yi; Zhou, Yuan.

Research output: Contribution to journal › Article

*ACM Transactions on Computation Theory*, vol. 6, no. 1, 5. https://doi.org/10.1145/2578221

}

TY - JOUR

T1 - Optimal lower bounds for locality-sensitive hashing (except when q is tiny)

AU - O'Donnell, Ryan

AU - Wu, Yi

AU - Zhou, Yuan

PY - 2014/3

Y1 - 2014/3

N2 - We study lower bounds for Locality-Sensitive Hashing (LSH) in the strongest setting: point sets in {0,1}d under the Hamming distance. Recall that H is said to be an (r,cr,p, q)-sensitive hash family if all pairs x,y ∈ {0,1}d with dist(x,y) ≤ r have probability at least p of collision under a randomly chosen h ∈ H, whereas all pairs x,y ∈ {0,1}d with dist(x,y) ≥ cr have probability at most q of collision. Typically, one considers d → ∞, with c > 1 fixed and q bounded away from 0. For its applications to approximate nearest-neighbor search in high dimensions, the quality of an LSH family H is governed by how small its ρ parameter ρ = ln(1/p)/ln(1/q) is as a function of the parameter c. The seminal paper of Indyk and Motwani [1998] showed that for each c ≥ 1, the extremely simple family H = {x → xi: i ∈ [d]} achieves ρ ≤ 1/c. The only known lower bound, due to Motwani et al. [2007], is that ρ must be at least (e1/c - 1)/(e1/c + 1) ≥.46/c (minus od(1)). The contribution of this article is twofold. (1) We show the "optimal" lower bound for ρ: it must be at least 1/c (minus od(1)). Our proof is very simple, following almost immediately from the observation that the noise stability of a boolean function at time t is a log-convex function of t. (2) We raise and discuss the following issue: neither the application of LSH to nearest-neighbor search nor the known LSH lower bounds hold as stated if the q parameter is tiny. Here, "tiny" means q = 2-Θ(d), a parameter range we believe is natural.

AB - We study lower bounds for Locality-Sensitive Hashing (LSH) in the strongest setting: point sets in {0,1}d under the Hamming distance. Recall that H is said to be an (r,cr,p, q)-sensitive hash family if all pairs x,y ∈ {0,1}d with dist(x,y) ≤ r have probability at least p of collision under a randomly chosen h ∈ H, whereas all pairs x,y ∈ {0,1}d with dist(x,y) ≥ cr have probability at most q of collision. Typically, one considers d → ∞, with c > 1 fixed and q bounded away from 0. For its applications to approximate nearest-neighbor search in high dimensions, the quality of an LSH family H is governed by how small its ρ parameter ρ = ln(1/p)/ln(1/q) is as a function of the parameter c. The seminal paper of Indyk and Motwani [1998] showed that for each c ≥ 1, the extremely simple family H = {x → xi: i ∈ [d]} achieves ρ ≤ 1/c. The only known lower bound, due to Motwani et al. [2007], is that ρ must be at least (e1/c - 1)/(e1/c + 1) ≥.46/c (minus od(1)). The contribution of this article is twofold. (1) We show the "optimal" lower bound for ρ: it must be at least 1/c (minus od(1)). Our proof is very simple, following almost immediately from the observation that the noise stability of a boolean function at time t is a log-convex function of t. (2) We raise and discuss the following issue: neither the application of LSH to nearest-neighbor search nor the known LSH lower bounds hold as stated if the q parameter is tiny. Here, "tiny" means q = 2-Θ(d), a parameter range we believe is natural.

KW - Fourier analysis of boolean functions

KW - Locality-Sensitive Hashing

KW - LSH

KW - Noise sensitivity

KW - Noise stability

UR - http://www.scopus.com/inward/record.url?scp=84897475487&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84897475487&partnerID=8YFLogxK

U2 - 10.1145/2578221

DO - 10.1145/2578221

M3 - Article

AN - SCOPUS:84897475487

VL - 6

JO - ACM Transactions on Computation Theory

JF - ACM Transactions on Computation Theory

SN - 1942-3454

IS - 1

M1 - 5

ER -