TY - GEN
T1 - On efficient processing of subspace skyline queries on high dimensional data
AU - Jin, Wen
AU - Tung, Anthony K.H.
AU - Ester, Martin
AU - Han, Jiawei
PY - 2007
Y1 - 2007
N2 - Recent studies on efficiently answering subspace skyline queries can be separated into two approaches. The first focused on pre-materializing a set of skylines points in various subspaces while the second focus on dynamically answering the queries by using a set of anchors to prune off skyline points through spatial reasoning. Despite effort to compress the pre-materialized subspace skylines through removal of redundancy, the storage space for the first approach remain exponential in the number of dimensions. The query time for the second approach on the other hand also grow substantially for data with higher dimensionality where the pruning power of anchors become much weaker. In this paper, we propose methods for answering subspace skyline query on high dimensional data such that both prematerialization storage and query time can be moderated. We propose novel notions of maximal partial-dominating space, maximal partial-dominated space and the maximal equality space between pairs of skyline objects in the full space and use these concepts as the foundation for answering subspace skyline queries for high dimensional data. Query processing involves mostly simple pruning operations while skyline computation is done only on a small subset of candidate skyline points in the subspace. We also develop a random sampling method to compute the subspace skyline in an on-line fashion. Extensive experiments have been conducted and demonstrated the efficiency and effectiveness of our methods.
AB - Recent studies on efficiently answering subspace skyline queries can be separated into two approaches. The first focused on pre-materializing a set of skylines points in various subspaces while the second focus on dynamically answering the queries by using a set of anchors to prune off skyline points through spatial reasoning. Despite effort to compress the pre-materialized subspace skylines through removal of redundancy, the storage space for the first approach remain exponential in the number of dimensions. The query time for the second approach on the other hand also grow substantially for data with higher dimensionality where the pruning power of anchors become much weaker. In this paper, we propose methods for answering subspace skyline query on high dimensional data such that both prematerialization storage and query time can be moderated. We propose novel notions of maximal partial-dominating space, maximal partial-dominated space and the maximal equality space between pairs of skyline objects in the full space and use these concepts as the foundation for answering subspace skyline queries for high dimensional data. Query processing involves mostly simple pruning operations while skyline computation is done only on a small subset of candidate skyline points in the subspace. We also develop a random sampling method to compute the subspace skyline in an on-line fashion. Extensive experiments have been conducted and demonstrated the efficiency and effectiveness of our methods.
UR - http://www.scopus.com/inward/record.url?scp=46649093161&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=46649093161&partnerID=8YFLogxK
U2 - 10.1109/SSDBM.2007.20
DO - 10.1109/SSDBM.2007.20
M3 - Conference contribution
AN - SCOPUS:46649093161
SN - 0769528686
SN - 9780769528687
T3 - Proceedings of the International Conference on Scientific and Statistical Database Management, SSDBM
BT - 19th International Conference on Scientific and Statistical Database Management, SSDBM 2007
T2 - 19th International Conference on Scientific and Statistical Database Management, SSDBM 2007
Y2 - 9 July 2007 through 11 July 2007
ER -