UnkClus: Efficient clustering via heterogeneous semantic links

Xiaoxin Yin, Jiawei Han, Philip S. Yu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Uata objects in a relational database are cross-linked with each other via multi-typed links. Links contain rich seman-tic information that may indicate important relationships among objects. Most current clustering methods rely only on the properties that belong to the objects per se. Howler, the similarities between objects are often indicated by the links, and desirable clusters cannot be generated using only the properties of objects. In this paper we explore linkage-based clustering, in which the similarity between two objects is measured based on the similarities between the objects linked with them. In comparison with a previous study (SimRank) that computes links recursively on all pairs of objects, we take advantage of the power law distribution of links, and develop a hi-erarchical structure called SimTree to represent similarities in multi-granularity manner. This method avoids the high cost of computing and storing pairwise similarities but still thoroughly explore relationships among objects. An efficient algorithm is proposed to compute similarities between objects by avoiding pairwise similarity computations through Purging computations that go through the same branches In the SimTree. Experiments show the proposed approach achieves high efficiency, scalability, and accuracy in clustering multi-typed linked objects.

Original languageEnglish (US)
Title of host publicationVLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases
Pages427-438
Number of pages12
StatePublished - Dec 1 2006
Event32nd International Conference on Very Large Data Bases, VLDB 2006 - Seoul, Korea, Republic of
Duration: Sep 12 2006Sep 15 2006

Publication series

NameVLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases

Other

Other32nd International Conference on Very Large Data Bases, VLDB 2006
CountryKorea, Republic of
CitySeoul
Period9/12/069/15/06

Fingerprint

Semantics
Purging
Scalability
Costs
Experiments
Clustering

ASJC Scopus subject areas

  • Hardware and Architecture
  • Information Systems
  • Software
  • Information Systems and Management

Cite this

Yin, X., Han, J., & Yu, P. S. (2006). UnkClus: Efficient clustering via heterogeneous semantic links. In VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases (pp. 427-438). (VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases).

UnkClus : Efficient clustering via heterogeneous semantic links. / Yin, Xiaoxin; Han, Jiawei; Yu, Philip S.

VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases. 2006. p. 427-438 (VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yin, X, Han, J & Yu, PS 2006, UnkClus: Efficient clustering via heterogeneous semantic links. in VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases. VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 427-438, 32nd International Conference on Very Large Data Bases, VLDB 2006, Seoul, Korea, Republic of, 9/12/06.
Yin X, Han J, Yu PS. UnkClus: Efficient clustering via heterogeneous semantic links. In VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases. 2006. p. 427-438. (VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases).
Yin, Xiaoxin ; Han, Jiawei ; Yu, Philip S. / UnkClus : Efficient clustering via heterogeneous semantic links. VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases. 2006. pp. 427-438 (VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases).
@inproceedings{2768a427a8ea420f8057d6a3fbefda65,
title = "UnkClus: Efficient clustering via heterogeneous semantic links",
abstract = "Uata objects in a relational database are cross-linked with each other via multi-typed links. Links contain rich seman-tic information that may indicate important relationships among objects. Most current clustering methods rely only on the properties that belong to the objects per se. Howler, the similarities between objects are often indicated by the links, and desirable clusters cannot be generated using only the properties of objects. In this paper we explore linkage-based clustering, in which the similarity between two objects is measured based on the similarities between the objects linked with them. In comparison with a previous study (SimRank) that computes links recursively on all pairs of objects, we take advantage of the power law distribution of links, and develop a hi-erarchical structure called SimTree to represent similarities in multi-granularity manner. This method avoids the high cost of computing and storing pairwise similarities but still thoroughly explore relationships among objects. An efficient algorithm is proposed to compute similarities between objects by avoiding pairwise similarity computations through Purging computations that go through the same branches In the SimTree. Experiments show the proposed approach achieves high efficiency, scalability, and accuracy in clustering multi-typed linked objects.",
author = "Xiaoxin Yin and Jiawei Han and Yu, {Philip S.}",
year = "2006",
month = "12",
day = "1",
language = "English (US)",
isbn = "1595933859",
series = "VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases",
pages = "427--438",
booktitle = "VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases",

}

TY - GEN

T1 - UnkClus

T2 - Efficient clustering via heterogeneous semantic links

AU - Yin, Xiaoxin

AU - Han, Jiawei

AU - Yu, Philip S.

PY - 2006/12/1

Y1 - 2006/12/1

N2 - Uata objects in a relational database are cross-linked with each other via multi-typed links. Links contain rich seman-tic information that may indicate important relationships among objects. Most current clustering methods rely only on the properties that belong to the objects per se. Howler, the similarities between objects are often indicated by the links, and desirable clusters cannot be generated using only the properties of objects. In this paper we explore linkage-based clustering, in which the similarity between two objects is measured based on the similarities between the objects linked with them. In comparison with a previous study (SimRank) that computes links recursively on all pairs of objects, we take advantage of the power law distribution of links, and develop a hi-erarchical structure called SimTree to represent similarities in multi-granularity manner. This method avoids the high cost of computing and storing pairwise similarities but still thoroughly explore relationships among objects. An efficient algorithm is proposed to compute similarities between objects by avoiding pairwise similarity computations through Purging computations that go through the same branches In the SimTree. Experiments show the proposed approach achieves high efficiency, scalability, and accuracy in clustering multi-typed linked objects.

AB - Uata objects in a relational database are cross-linked with each other via multi-typed links. Links contain rich seman-tic information that may indicate important relationships among objects. Most current clustering methods rely only on the properties that belong to the objects per se. Howler, the similarities between objects are often indicated by the links, and desirable clusters cannot be generated using only the properties of objects. In this paper we explore linkage-based clustering, in which the similarity between two objects is measured based on the similarities between the objects linked with them. In comparison with a previous study (SimRank) that computes links recursively on all pairs of objects, we take advantage of the power law distribution of links, and develop a hi-erarchical structure called SimTree to represent similarities in multi-granularity manner. This method avoids the high cost of computing and storing pairwise similarities but still thoroughly explore relationships among objects. An efficient algorithm is proposed to compute similarities between objects by avoiding pairwise similarity computations through Purging computations that go through the same branches In the SimTree. Experiments show the proposed approach achieves high efficiency, scalability, and accuracy in clustering multi-typed linked objects.

UR - http://www.scopus.com/inward/record.url?scp=84893853717&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84893853717&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84893853717

SN - 1595933859

SN - 9781595933850

T3 - VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases

SP - 427

EP - 438

BT - VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases

ER -