TY - JOUR
T1 - Landmark diffusion maps (L-dMaps)
T2 - Accelerated manifold learning out-of-sample extension
AU - Long, Andrew W.
AU - Ferguson, Andrew L.
N1 - Publisher Copyright:
© 2017 Elsevier Inc.
PY - 2019/7
Y1 - 2019/7
N2 - Diffusion maps are a nonlinear manifold learning technique based on harmonic analysis of a diffusion process over the data. Out-of-sample extensions with computational complexity O(N), where N is the number of points comprising the manifold, frustrate applications to online learning applications requiring rapid embedding of high-dimensional data streams. We propose landmark diffusion maps (L-dMaps)to reduce the complexity to O(M), where M≪N is the number of landmark points selected using pruned spanning trees or k-medoids. Offering (N/M)speedups in out-of-sample extension, L-dMaps enable the application of diffusion maps to high-volume and/or high-velocity streaming data. We illustrate our approach on three datasets: the Swiss roll, molecular simulations of a C 24 H 50 polymer chain, and biomolecular simulations of alanine dipeptide. We demonstrate up to 50-fold speedups in out-of-sample extension for the molecular systems with less than 4% errors in manifold reconstruction fidelity relative to calculations over the full dataset.
AB - Diffusion maps are a nonlinear manifold learning technique based on harmonic analysis of a diffusion process over the data. Out-of-sample extensions with computational complexity O(N), where N is the number of points comprising the manifold, frustrate applications to online learning applications requiring rapid embedding of high-dimensional data streams. We propose landmark diffusion maps (L-dMaps)to reduce the complexity to O(M), where M≪N is the number of landmark points selected using pruned spanning trees or k-medoids. Offering (N/M)speedups in out-of-sample extension, L-dMaps enable the application of diffusion maps to high-volume and/or high-velocity streaming data. We illustrate our approach on three datasets: the Swiss roll, molecular simulations of a C 24 H 50 polymer chain, and biomolecular simulations of alanine dipeptide. We demonstrate up to 50-fold speedups in out-of-sample extension for the molecular systems with less than 4% errors in manifold reconstruction fidelity relative to calculations over the full dataset.
KW - Diffusion maps
KW - Harmonic analysis
KW - Molecular simulation
KW - Nonlinear dimensionality reduction
KW - Spectral graph theory
UR - http://www.scopus.com/inward/record.url?scp=85028622354&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85028622354&partnerID=8YFLogxK
U2 - 10.1016/j.acha.2017.08.004
DO - 10.1016/j.acha.2017.08.004
M3 - Article
AN - SCOPUS:85028622354
SN - 1063-5203
VL - 47
SP - 190
EP - 211
JO - Applied and Computational Harmonic Analysis
JF - Applied and Computational Harmonic Analysis
IS - 1
ER -