MLR-index: An index structure for fast and scalable similarity search in high dimensions

Rahul Malik, Sangkyum Kim, Xin Jin, Chandrasekar Ramachandran, Jiawei Han, Indranil Gupta, Klara Nahrstedt

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

High-dimensional indexing has been very popularly used for performing similarity search over various data types such as multimedia (audio/image/video) databases, document collections, time-series data, sensor data and scientific databases. Because of the curse of dimensionality, it is already known that well-known data structures like kd-tree, R-tree, and M-tree suffer in their performance over high-dimensional data space which is inferior to a brute-force approach linear scan. In this paper, we focus on an approximate nearest neighbor search for two different types of queries: r-Range search and k-NN search. Adapting a novel concept of a ring structure, we define a new index structure MLR-Index (Multi-Layer Ring-based Index) in a metric space and propose time and space efficient algorithms with high accuracy. Evaluations through comprehensive experiments comparing with the best-known high-dimensional indexing method LSH show that our approach is faster for a similar accuracy, and shows higher accuracy for a similar response time than LSH.

Original languageEnglish (US)
Title of host publicationScientific and Statistical Database Management - 21st International Conference, SSDBM 2009, Proceedings
Pages167-184
Number of pages18
DOIs
StatePublished - Aug 27 2009
Event21st International Conference on Scientific and Statistical Database Management, SSDBM 2009 - New Orleans, LA, United States
Duration: Jun 2 2009Jun 4 2009

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5566 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other21st International Conference on Scientific and Statistical Database Management, SSDBM 2009
CountryUnited States
CityNew Orleans, LA
Period6/2/096/4/09

Fingerprint

Similarity Search
Indexing
Higher Dimensions
High Accuracy
High-dimensional
Kd-tree
Video Databases
Ring
Nearest Neighbor Search
R-tree
Curse of Dimensionality
Image Database
High-dimensional Data
Time Series Data
Response Time
Metric space
Data structures
Multimedia
Multilayer
Time series

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Malik, R., Kim, S., Jin, X., Ramachandran, C., Han, J., Gupta, I., & Nahrstedt, K. (2009). MLR-index: An index structure for fast and scalable similarity search in high dimensions. In Scientific and Statistical Database Management - 21st International Conference, SSDBM 2009, Proceedings (pp. 167-184). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5566 LNCS). https://doi.org/10.1007/978-3-642-02279-1_12

MLR-index : An index structure for fast and scalable similarity search in high dimensions. / Malik, Rahul; Kim, Sangkyum; Jin, Xin; Ramachandran, Chandrasekar; Han, Jiawei; Gupta, Indranil; Nahrstedt, Klara.

Scientific and Statistical Database Management - 21st International Conference, SSDBM 2009, Proceedings. 2009. p. 167-184 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5566 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Malik, R, Kim, S, Jin, X, Ramachandran, C, Han, J, Gupta, I & Nahrstedt, K 2009, MLR-index: An index structure for fast and scalable similarity search in high dimensions. in Scientific and Statistical Database Management - 21st International Conference, SSDBM 2009, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 5566 LNCS, pp. 167-184, 21st International Conference on Scientific and Statistical Database Management, SSDBM 2009, New Orleans, LA, United States, 6/2/09. https://doi.org/10.1007/978-3-642-02279-1_12
Malik R, Kim S, Jin X, Ramachandran C, Han J, Gupta I et al. MLR-index: An index structure for fast and scalable similarity search in high dimensions. In Scientific and Statistical Database Management - 21st International Conference, SSDBM 2009, Proceedings. 2009. p. 167-184. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-642-02279-1_12
Malik, Rahul ; Kim, Sangkyum ; Jin, Xin ; Ramachandran, Chandrasekar ; Han, Jiawei ; Gupta, Indranil ; Nahrstedt, Klara. / MLR-index : An index structure for fast and scalable similarity search in high dimensions. Scientific and Statistical Database Management - 21st International Conference, SSDBM 2009, Proceedings. 2009. pp. 167-184 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{e229a02995074258be5f8646f3ea5c63,
title = "MLR-index: An index structure for fast and scalable similarity search in high dimensions",
abstract = "High-dimensional indexing has been very popularly used for performing similarity search over various data types such as multimedia (audio/image/video) databases, document collections, time-series data, sensor data and scientific databases. Because of the curse of dimensionality, it is already known that well-known data structures like kd-tree, R-tree, and M-tree suffer in their performance over high-dimensional data space which is inferior to a brute-force approach linear scan. In this paper, we focus on an approximate nearest neighbor search for two different types of queries: r-Range search and k-NN search. Adapting a novel concept of a ring structure, we define a new index structure MLR-Index (Multi-Layer Ring-based Index) in a metric space and propose time and space efficient algorithms with high accuracy. Evaluations through comprehensive experiments comparing with the best-known high-dimensional indexing method LSH show that our approach is faster for a similar accuracy, and shows higher accuracy for a similar response time than LSH.",
author = "Rahul Malik and Sangkyum Kim and Xin Jin and Chandrasekar Ramachandran and Jiawei Han and Indranil Gupta and Klara Nahrstedt",
year = "2009",
month = "8",
day = "27",
doi = "10.1007/978-3-642-02279-1_12",
language = "English (US)",
isbn = "3642022782",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "167--184",
booktitle = "Scientific and Statistical Database Management - 21st International Conference, SSDBM 2009, Proceedings",

}

TY - GEN

T1 - MLR-index

T2 - An index structure for fast and scalable similarity search in high dimensions

AU - Malik, Rahul

AU - Kim, Sangkyum

AU - Jin, Xin

AU - Ramachandran, Chandrasekar

AU - Han, Jiawei

AU - Gupta, Indranil

AU - Nahrstedt, Klara

PY - 2009/8/27

Y1 - 2009/8/27

N2 - High-dimensional indexing has been very popularly used for performing similarity search over various data types such as multimedia (audio/image/video) databases, document collections, time-series data, sensor data and scientific databases. Because of the curse of dimensionality, it is already known that well-known data structures like kd-tree, R-tree, and M-tree suffer in their performance over high-dimensional data space which is inferior to a brute-force approach linear scan. In this paper, we focus on an approximate nearest neighbor search for two different types of queries: r-Range search and k-NN search. Adapting a novel concept of a ring structure, we define a new index structure MLR-Index (Multi-Layer Ring-based Index) in a metric space and propose time and space efficient algorithms with high accuracy. Evaluations through comprehensive experiments comparing with the best-known high-dimensional indexing method LSH show that our approach is faster for a similar accuracy, and shows higher accuracy for a similar response time than LSH.

AB - High-dimensional indexing has been very popularly used for performing similarity search over various data types such as multimedia (audio/image/video) databases, document collections, time-series data, sensor data and scientific databases. Because of the curse of dimensionality, it is already known that well-known data structures like kd-tree, R-tree, and M-tree suffer in their performance over high-dimensional data space which is inferior to a brute-force approach linear scan. In this paper, we focus on an approximate nearest neighbor search for two different types of queries: r-Range search and k-NN search. Adapting a novel concept of a ring structure, we define a new index structure MLR-Index (Multi-Layer Ring-based Index) in a metric space and propose time and space efficient algorithms with high accuracy. Evaluations through comprehensive experiments comparing with the best-known high-dimensional indexing method LSH show that our approach is faster for a similar accuracy, and shows higher accuracy for a similar response time than LSH.

UR - http://www.scopus.com/inward/record.url?scp=69049102406&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=69049102406&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-02279-1_12

DO - 10.1007/978-3-642-02279-1_12

M3 - Conference contribution

AN - SCOPUS:69049102406

SN - 3642022782

SN - 9783642022784

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 167

EP - 184

BT - Scientific and Statistical Database Management - 21st International Conference, SSDBM 2009, Proceedings

ER -