TY - GEN
T1 - Hierarchical web-page clustering via in-page and cross-page link structures
AU - Lin, Cindy Xide
AU - Yu, Yintao
AU - Han, Jiawei
AU - Liu, Bing
PY - 2010
Y1 - 2010
N2 - Despite of the wide diversity of web-pages, web-pages residing in a particular organization, in most cases, are organized with semantically hierarchic structures. For example, the website of a computer science department contains pages about its people, courses and research, among which pages of people are categorized into faculty, staff and students, and pages of research diversify into different areas. Uncovering such hierarchic structures could supply users a convenient way of comprehensive navigation and accelerate other web mining tasks. In this study, we extract a similarity matrix among pages via in-page and crosspage link structures, based on which a density-based clustering algorithm is developed, which hierarchically groups densely linked webpages into semantic clusters. Our experiments show that this method is efficient and effective, and sheds light on mining and exploring web structures.
AB - Despite of the wide diversity of web-pages, web-pages residing in a particular organization, in most cases, are organized with semantically hierarchic structures. For example, the website of a computer science department contains pages about its people, courses and research, among which pages of people are categorized into faculty, staff and students, and pages of research diversify into different areas. Uncovering such hierarchic structures could supply users a convenient way of comprehensive navigation and accelerate other web mining tasks. In this study, we extract a similarity matrix among pages via in-page and crosspage link structures, based on which a density-based clustering algorithm is developed, which hierarchically groups densely linked webpages into semantic clusters. Our experiments show that this method is efficient and effective, and sheds light on mining and exploring web structures.
UR - http://www.scopus.com/inward/record.url?scp=79956304274&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79956304274&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-13672-6_22
DO - 10.1007/978-3-642-13672-6_22
M3 - Conference contribution
AN - SCOPUS:79956304274
SN - 3642136710
SN - 9783642136719
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 222
EP - 229
BT - Advances in Knowledge Discovery and Data Mining - 14th Pacific-Asia Conference, PAKDD 2010, Proceedings
T2 - 14th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2010
Y2 - 21 June 2010 through 24 June 2010
ER -