TY - JOUR
T1 - BanditPAM++
T2 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023
AU - Tiwari, Mo
AU - Kang, Ryan
AU - Lee, Donghyun
AU - Thrun, Sebastian
AU - Piech, Chris
AU - Shomorony, Ilan
AU - Zhang, Martin Jinye
N1 - We would like to thank the anonymous Reviewers, PCs, and ACs for their helpful feedback on our paper. Mo Tiwari was supported by a Stanford Interdisciplinary Graduate Fellowship (SIGF) and a Standard Data Science Scholarship. The work of Ilan Shomorony was supported in part by the National Science Foundation (NSF) under grant CCF-2046991. Martin Zhang is supported by NIH R01 MH115676.
PY - 2023
Y1 - 2023
N2 - Clustering is a fundamental task in data science with wide-ranging applications. In k-medoids clustering, cluster centers must be actual datapoints and arbitrary distance metrics may be used; these features allow for greater interpretability of the cluster centers and the clustering of exotic objects in k-medoids clustering, respectively. k-medoids clustering has recently grown in popularity due to the discovery of more efficient k-medoids algorithms. In particular, recent research has proposed BanditPAM, a randomized k-medoids algorithm with state-of-the-art complexity and clustering accuracy. In this paper, we present BanditPAM++, which accelerates BanditPAM via two algorithmic improvements, and is O(k) faster than BanditPAM in complexity and substantially faster than BanditPAM in wall-clock runtime. First, we demonstrate that BanditPAM has a special structure that allows the reuse of clustering information within each iteration. Second, we demonstrate that BanditPAM has additional structure that permits the reuse of information across different iterations. These observations inspire our proposed algorithm, BanditPAM++, which returns the same clustering solutions as BanditPAM but often several times faster. For example, on the CIFAR10 dataset, BanditPAM++ returns the same results as BanditPAM but runs over 10× faster. Finally, we provide a high-performance C++ implementation of BanditPAM++, callable from Python and R, that may be of interest to practitioners at https://github.com/motiwari/BanditPAM. Auxiliary code to reproduce all of our experiments via a one-line script is available at https://github.com/ThrunGroup/BanditPAM_plusplus_experiments/.
AB - Clustering is a fundamental task in data science with wide-ranging applications. In k-medoids clustering, cluster centers must be actual datapoints and arbitrary distance metrics may be used; these features allow for greater interpretability of the cluster centers and the clustering of exotic objects in k-medoids clustering, respectively. k-medoids clustering has recently grown in popularity due to the discovery of more efficient k-medoids algorithms. In particular, recent research has proposed BanditPAM, a randomized k-medoids algorithm with state-of-the-art complexity and clustering accuracy. In this paper, we present BanditPAM++, which accelerates BanditPAM via two algorithmic improvements, and is O(k) faster than BanditPAM in complexity and substantially faster than BanditPAM in wall-clock runtime. First, we demonstrate that BanditPAM has a special structure that allows the reuse of clustering information within each iteration. Second, we demonstrate that BanditPAM has additional structure that permits the reuse of information across different iterations. These observations inspire our proposed algorithm, BanditPAM++, which returns the same clustering solutions as BanditPAM but often several times faster. For example, on the CIFAR10 dataset, BanditPAM++ returns the same results as BanditPAM but runs over 10× faster. Finally, we provide a high-performance C++ implementation of BanditPAM++, callable from Python and R, that may be of interest to practitioners at https://github.com/motiwari/BanditPAM. Auxiliary code to reproduce all of our experiments via a one-line script is available at https://github.com/ThrunGroup/BanditPAM_plusplus_experiments/.
UR - http://www.scopus.com/inward/record.url?scp=85191155150&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85191155150&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85191155150
SN - 1049-5258
VL - 36
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
Y2 - 10 December 2023 through 16 December 2023
ER -