TY - GEN
T1 - iQAN
T2 - 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, PPoPP 2023
AU - Peng, Zhen
AU - Zhang, Minjia
AU - Li, Kai
AU - Jin, Ruoming
AU - Ren, Bin
N1 - The authors would like to thank the anonymous reviewers for their constructive comments and helpful suggestions. This work was supported in part by National Science Foundation (NSF) under the awards of CCF-2047516 (CAREER), CCF-2146873, IIS-2142681, III-2008557, and IIS-2142675. Any errors and opinions are not those of the NSF and are attributable solely to the author(s). The authors also acknowledge William & Mary Research Computing for providing computational resources.
PY - 2023/2/25
Y1 - 2023/2/25
N2 - Vector search has drawn a rapid increase of interest in the research community due to its application in novel AI applications. Maximizing its performance is essential for many tasks but remains preliminary understood. In this work, we investigate the root causes of the scalability bottleneck of using intra-query parallelism to speedup the state-of-the-art graph-based vector search systems on multi-core architectures. Our in-depth analysis reveals several scalability challenges from both system and algorithm perspectives. Based on the insights, we propose iQAN, a parallel search algorithm with a set of optimizations that boost convergence, avoid redundant computations, and mitigate synchronization overhead. Our evaluation results on a wide range of real-world datasets show that iQAN achieves up to 37.7× and 76.6× lower latency than state-of-the-art sequential baselines on datasets ranging from a million to a hundred million datasets. We also show that iQAN achieves outstanding scalability as the graph size or the accuracy target increases, allowing it to outperform the state-of-the-art baseline on two billion-scale datasets by up to 16.0× with up to 64 cores.
AB - Vector search has drawn a rapid increase of interest in the research community due to its application in novel AI applications. Maximizing its performance is essential for many tasks but remains preliminary understood. In this work, we investigate the root causes of the scalability bottleneck of using intra-query parallelism to speedup the state-of-the-art graph-based vector search systems on multi-core architectures. Our in-depth analysis reveals several scalability challenges from both system and algorithm perspectives. Based on the insights, we propose iQAN, a parallel search algorithm with a set of optimizations that boost convergence, avoid redundant computations, and mitigate synchronization overhead. Our evaluation results on a wide range of real-world datasets show that iQAN achieves up to 37.7× and 76.6× lower latency than state-of-the-art sequential baselines on datasets ranging from a million to a hundred million datasets. We also show that iQAN achieves outstanding scalability as the graph size or the accuracy target increases, allowing it to outperform the state-of-the-art baseline on two billion-scale datasets by up to 16.0× with up to 64 cores.
KW - approximate nearest neighbor search
KW - graph-based
KW - intra-query parallelism
KW - vector search
UR - http://www.scopus.com/inward/record.url?scp=85149333001&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85149333001&partnerID=8YFLogxK
U2 - 10.1145/3572848.3577527
DO - 10.1145/3572848.3577527
M3 - Conference contribution
AN - SCOPUS:85149333001
T3 - Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP
SP - 313
EP - 328
BT - PPoPP 2023 - Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming
PB - Association for Computing Machinery
Y2 - 25 February 2023 through 1 March 2023
ER -