TY - GEN
T1 - Visualization-aware sampling for very large databases
AU - Park, Yongjoo
AU - Cafarella, Michael
AU - Mozafari, Barzan
N1 - Publisher Copyright:
© 2016 IEEE.
Copyright:
Copyright 2017 Elsevier B.V., All rights reserved.
PY - 2016/6/22
Y1 - 2016/6/22
N2 - Interactive visualizations are crucial in ad hoc data exploration and analysis. However, with the growing number of massive datasets, generating visualizations in interactive timescales is increasingly challenging. One approach for improving the speed of the visualization tool is via data reduction in order to reduce the computational overhead, but at a potential cost in visualization accuracy. Common data reduction techniques, such as uniform and stratified sampling, do not exploit the fact that the sampled tuples will be transformed into a visualization for human consumption. We propose a visualization-aware sampling (VAS) that guarantees high quality visualizations with a small subset of the entire dataset. We validate our method when applied to scatter and map plots for three common visualization goals: regression, density estimation, and clustering. The key to our sampling method's success is in choosing a set of tuples that minimizes a visualization-inspired loss function. While existing sampling approaches minimize the error of aggregation queries, we focus on a loss function that maximizes the visual fidelity of scatter plots. Our user study confirms that our proposed loss function correlates strongly with user success in using the resulting visualizations. Our experiments show that (i) VAS improves user's success by up to 35% in various visualization tasks, and (ii) VAS can achieve a required visualization quality up to 400× faster.
AB - Interactive visualizations are crucial in ad hoc data exploration and analysis. However, with the growing number of massive datasets, generating visualizations in interactive timescales is increasingly challenging. One approach for improving the speed of the visualization tool is via data reduction in order to reduce the computational overhead, but at a potential cost in visualization accuracy. Common data reduction techniques, such as uniform and stratified sampling, do not exploit the fact that the sampled tuples will be transformed into a visualization for human consumption. We propose a visualization-aware sampling (VAS) that guarantees high quality visualizations with a small subset of the entire dataset. We validate our method when applied to scatter and map plots for three common visualization goals: regression, density estimation, and clustering. The key to our sampling method's success is in choosing a set of tuples that minimizes a visualization-inspired loss function. While existing sampling approaches minimize the error of aggregation queries, we focus on a loss function that maximizes the visual fidelity of scatter plots. Our user study confirms that our proposed loss function correlates strongly with user success in using the resulting visualizations. Our experiments show that (i) VAS improves user's success by up to 35% in various visualization tasks, and (ii) VAS can achieve a required visualization quality up to 400× faster.
UR - http://www.scopus.com/inward/record.url?scp=84980373400&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84980373400&partnerID=8YFLogxK
U2 - 10.1109/ICDE.2016.7498287
DO - 10.1109/ICDE.2016.7498287
M3 - Conference contribution
AN - SCOPUS:84980373400
T3 - 2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016
SP - 755
EP - 766
BT - 2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 32nd IEEE International Conference on Data Engineering, ICDE 2016
Y2 - 16 May 2016 through 20 May 2016
ER -