TY - GEN
T1 - UnkClus
T2 - 32nd International Conference on Very Large Data Bases, VLDB 2006
AU - Yin, Xiaoxin
AU - Han, Jiawei
AU - Yu, Philip S.
PY - 2006
Y1 - 2006
N2 - Uata objects in a relational database are cross-linked with each other via multi-typed links. Links contain rich seman-tic information that may indicate important relationships among objects. Most current clustering methods rely only on the properties that belong to the objects per se. Howler, the similarities between objects are often indicated by the links, and desirable clusters cannot be generated using only the properties of objects. In this paper we explore linkage-based clustering, in which the similarity between two objects is measured based on the similarities between the objects linked with them. In comparison with a previous study (SimRank) that computes links recursively on all pairs of objects, we take advantage of the power law distribution of links, and develop a hi-erarchical structure called SimTree to represent similarities in multi-granularity manner. This method avoids the high cost of computing and storing pairwise similarities but still thoroughly explore relationships among objects. An efficient algorithm is proposed to compute similarities between objects by avoiding pairwise similarity computations through Purging computations that go through the same branches In the SimTree. Experiments show the proposed approach achieves high efficiency, scalability, and accuracy in clustering multi-typed linked objects.
AB - Uata objects in a relational database are cross-linked with each other via multi-typed links. Links contain rich seman-tic information that may indicate important relationships among objects. Most current clustering methods rely only on the properties that belong to the objects per se. Howler, the similarities between objects are often indicated by the links, and desirable clusters cannot be generated using only the properties of objects. In this paper we explore linkage-based clustering, in which the similarity between two objects is measured based on the similarities between the objects linked with them. In comparison with a previous study (SimRank) that computes links recursively on all pairs of objects, we take advantage of the power law distribution of links, and develop a hi-erarchical structure called SimTree to represent similarities in multi-granularity manner. This method avoids the high cost of computing and storing pairwise similarities but still thoroughly explore relationships among objects. An efficient algorithm is proposed to compute similarities between objects by avoiding pairwise similarity computations through Purging computations that go through the same branches In the SimTree. Experiments show the proposed approach achieves high efficiency, scalability, and accuracy in clustering multi-typed linked objects.
UR - http://www.scopus.com/inward/record.url?scp=84893853717&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84893853717&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84893853717
SN - 1595933859
SN - 9781595933850
T3 - VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases
SP - 427
EP - 438
BT - VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases
PB - Association for Computing Machinery
Y2 - 12 September 2006 through 15 September 2006
ER -