TY - CHAP
T1 - Mining thick skylines over large databases
AU - Jin, Wen
AU - Han, Jiawei
AU - Ester, Martin
N1 - Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2004
Y1 - 2004
N2 - People recently are interested in a new operator, called skyline [3], which returns the objects that are not dominated by any other objects with regard to certain measures in a multi-dimensional space. Recent work on the skyline operator [3, 15, 8, 13, 2] focuses on efficient computation of skylines in large databases. However, such work gives users only thin skylines, i.e., single objects, which may not be desirable in some real applications. In this paper, we propose a novel concept, called thick skyline, which recommends not only skyline objects but also their nearby neighbors within ε-distance. Efficient computation methods are developed including (1) two efficient algorithms, Sampling-and-Pruning and Indexing-and-Estimating, to find such thick skyline with the help of statistics or indexes in large databases, and (2) a highly efficient Microcluster-based algorithm for mining thick skyline. The Microcluster-based method not only leads to substantial savings in computation but also provides a concise representation of the thick skyline in the case of high cardinalities. Our experimental performance study shows that the proposed methods are both efficient and effective.
AB - People recently are interested in a new operator, called skyline [3], which returns the objects that are not dominated by any other objects with regard to certain measures in a multi-dimensional space. Recent work on the skyline operator [3, 15, 8, 13, 2] focuses on efficient computation of skylines in large databases. However, such work gives users only thin skylines, i.e., single objects, which may not be desirable in some real applications. In this paper, we propose a novel concept, called thick skyline, which recommends not only skyline objects but also their nearby neighbors within ε-distance. Efficient computation methods are developed including (1) two efficient algorithms, Sampling-and-Pruning and Indexing-and-Estimating, to find such thick skyline with the help of statistics or indexes in large databases, and (2) a highly efficient Microcluster-based algorithm for mining thick skyline. The Microcluster-based method not only leads to substantial savings in computation but also provides a concise representation of the thick skyline in the case of high cardinalities. Our experimental performance study shows that the proposed methods are both efficient and effective.
UR - http://www.scopus.com/inward/record.url?scp=35048903758&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=35048903758&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-30116-5_25
DO - 10.1007/978-3-540-30116-5_25
M3 - Chapter
AN - SCOPUS:35048903758
SN - 3540231080
SN - 9783540231080
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 255
EP - 266
BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
A2 - Boulicaut, Jean-Francois
A2 - Esposito, Floriana
A2 - Giannotti, Fosca
A2 - Pedreschi, Dino
PB - Springer
ER -