TY - GEN

T1 - Proximity in the age of distraction

T2 - 28th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2017

AU - Har-Peled, Sariel

AU - Mahabadi, Sepideh

N1 - Funding Information:
Supported by NSF AF awards CCF-0915984, CCF-1217462 and IIS -1447476.
Publisher Copyright:
Copyright © by SIAM.

PY - 2017

Y1 - 2017

N2 - We introduce a new variant of the nearest neighbor search problem, which allows for some coordinates of the dataset to be arbitrarily corrupted or unknown. Formally, given a dataset of n points P = fx1; : : : ; xng in high-dimensions, and a parameter k, the goal is to preprocess the dataset, such that given a query point q, one can compute quickly a point x ∈ P, such that the distance of the query to the point x is minimized, when ignoring the optimal k coordinates. Note, that the coordinates being ignored are a function of both the query point and the point returned. We present a general reduction from this problem to answering ANN queries, which is similar in spirit to LSH (locality sensitive hashing) [19]. Specifically, we give a sampling technique which achieves a bi-criterion approximation for this problem. If the distance to the nearest neighbor after ignoring k coordinates is r, the data-structure returns a point that is within a distance of O(r) after ignoring O(k) coordinates. We also present other applications and further extensions and refinements of the above result. The new data-structures are simple and (arguably) elegant, and should be practical { specifically, all bounds are polynomial in all relevant parameters (including the dimension of the space, and the robustness parameter k).

AB - We introduce a new variant of the nearest neighbor search problem, which allows for some coordinates of the dataset to be arbitrarily corrupted or unknown. Formally, given a dataset of n points P = fx1; : : : ; xng in high-dimensions, and a parameter k, the goal is to preprocess the dataset, such that given a query point q, one can compute quickly a point x ∈ P, such that the distance of the query to the point x is minimized, when ignoring the optimal k coordinates. Note, that the coordinates being ignored are a function of both the query point and the point returned. We present a general reduction from this problem to answering ANN queries, which is similar in spirit to LSH (locality sensitive hashing) [19]. Specifically, we give a sampling technique which achieves a bi-criterion approximation for this problem. If the distance to the nearest neighbor after ignoring k coordinates is r, the data-structure returns a point that is within a distance of O(r) after ignoring O(k) coordinates. We also present other applications and further extensions and refinements of the above result. The new data-structures are simple and (arguably) elegant, and should be practical { specifically, all bounds are polynomial in all relevant parameters (including the dimension of the space, and the robustness parameter k).

UR - http://www.scopus.com/inward/record.url?scp=85016187614&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85016187614&partnerID=8YFLogxK

U2 - 10.1137/1.9781611974782.1

DO - 10.1137/1.9781611974782.1

M3 - Conference contribution

AN - SCOPUS:85016187614

T3 - Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms

SP - 1

EP - 15

BT - 28th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2017

A2 - Klein, Philip N.

PB - Association for Computing Machinery

Y2 - 16 January 2017 through 19 January 2017

ER -