Mining knowledge from data: An information network analysis approach

Jiawei Han, Yizhou Sun, Xifeng Yan, Philip S. Yu

Research output: Contribution to journalConference article

Abstract

Most objects and data in the real world are interconnected, forming complex, heterogeneous but often semistructured information networks. However, many database researchers consider a database merely as a data repository that supports storage and retrieval rather than an information-rich, inter-related and multi-typed information network that supports comprehensive data analysis, whereas many network researchers focus on homogeneous networks. Departing from both, we view interconnected, semi-structured datasets as heterogeneous, information-rich networks and study how to uncover hidden knowledge in such networks. For example, a university database can be viewed as a heterogeneous information network, where objects of multiple types, such as students, professors, courses, departments, and multiple typed relationships, such as teach and advise are intertwined together, providing abundant information. In this tutorial, we present an organized picture on mining heterogeneous information networks and introduce a set of interesting, effective and scalable network mining methods. The topics to be covered include (i) database as an information network, (ii) mining information networks: clustering, classification, ranking, similarity search, and meta path-guided analysis, (iii) construction of quality, informative networks by data mining, (iv) trend and evolution analysis in heterogeneous information networks, and (v) research frontiers. We show that heterogeneous information networks are informative, and link analysis on such networks is powerful at uncovering critical knowledge hidden in large semi-structured datasets. Finally, we also present a few promising research directions.

Original languageEnglish (US)
Article number6228171
Pages (from-to)1214-1217
Number of pages4
JournalProceedings - International Conference on Data Engineering
DOIs
StatePublished - Jul 30 2012
EventIEEE 28th International Conference on Data Engineering, ICDE 2012 - Arlington, VA, United States
Duration: Apr 1 2012Apr 5 2012

Fingerprint

Electric network analysis
Data mining
Students

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Information Systems

Cite this

Mining knowledge from data : An information network analysis approach. / Han, Jiawei; Sun, Yizhou; Yan, Xifeng; Yu, Philip S.

In: Proceedings - International Conference on Data Engineering, 30.07.2012, p. 1214-1217.

Research output: Contribution to journalConference article

@article{e997c4506fdd4a6dbff809cf7081f61a,
title = "Mining knowledge from data: An information network analysis approach",
abstract = "Most objects and data in the real world are interconnected, forming complex, heterogeneous but often semistructured information networks. However, many database researchers consider a database merely as a data repository that supports storage and retrieval rather than an information-rich, inter-related and multi-typed information network that supports comprehensive data analysis, whereas many network researchers focus on homogeneous networks. Departing from both, we view interconnected, semi-structured datasets as heterogeneous, information-rich networks and study how to uncover hidden knowledge in such networks. For example, a university database can be viewed as a heterogeneous information network, where objects of multiple types, such as students, professors, courses, departments, and multiple typed relationships, such as teach and advise are intertwined together, providing abundant information. In this tutorial, we present an organized picture on mining heterogeneous information networks and introduce a set of interesting, effective and scalable network mining methods. The topics to be covered include (i) database as an information network, (ii) mining information networks: clustering, classification, ranking, similarity search, and meta path-guided analysis, (iii) construction of quality, informative networks by data mining, (iv) trend and evolution analysis in heterogeneous information networks, and (v) research frontiers. We show that heterogeneous information networks are informative, and link analysis on such networks is powerful at uncovering critical knowledge hidden in large semi-structured datasets. Finally, we also present a few promising research directions.",
author = "Jiawei Han and Yizhou Sun and Xifeng Yan and Yu, {Philip S.}",
year = "2012",
month = "7",
day = "30",
doi = "10.1109/ICDE.2012.145",
language = "English (US)",
pages = "1214--1217",
journal = "Proceedings - International Conference on Data Engineering",
issn = "1084-4627",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Mining knowledge from data

T2 - An information network analysis approach

AU - Han, Jiawei

AU - Sun, Yizhou

AU - Yan, Xifeng

AU - Yu, Philip S.

PY - 2012/7/30

Y1 - 2012/7/30

N2 - Most objects and data in the real world are interconnected, forming complex, heterogeneous but often semistructured information networks. However, many database researchers consider a database merely as a data repository that supports storage and retrieval rather than an information-rich, inter-related and multi-typed information network that supports comprehensive data analysis, whereas many network researchers focus on homogeneous networks. Departing from both, we view interconnected, semi-structured datasets as heterogeneous, information-rich networks and study how to uncover hidden knowledge in such networks. For example, a university database can be viewed as a heterogeneous information network, where objects of multiple types, such as students, professors, courses, departments, and multiple typed relationships, such as teach and advise are intertwined together, providing abundant information. In this tutorial, we present an organized picture on mining heterogeneous information networks and introduce a set of interesting, effective and scalable network mining methods. The topics to be covered include (i) database as an information network, (ii) mining information networks: clustering, classification, ranking, similarity search, and meta path-guided analysis, (iii) construction of quality, informative networks by data mining, (iv) trend and evolution analysis in heterogeneous information networks, and (v) research frontiers. We show that heterogeneous information networks are informative, and link analysis on such networks is powerful at uncovering critical knowledge hidden in large semi-structured datasets. Finally, we also present a few promising research directions.

AB - Most objects and data in the real world are interconnected, forming complex, heterogeneous but often semistructured information networks. However, many database researchers consider a database merely as a data repository that supports storage and retrieval rather than an information-rich, inter-related and multi-typed information network that supports comprehensive data analysis, whereas many network researchers focus on homogeneous networks. Departing from both, we view interconnected, semi-structured datasets as heterogeneous, information-rich networks and study how to uncover hidden knowledge in such networks. For example, a university database can be viewed as a heterogeneous information network, where objects of multiple types, such as students, professors, courses, departments, and multiple typed relationships, such as teach and advise are intertwined together, providing abundant information. In this tutorial, we present an organized picture on mining heterogeneous information networks and introduce a set of interesting, effective and scalable network mining methods. The topics to be covered include (i) database as an information network, (ii) mining information networks: clustering, classification, ranking, similarity search, and meta path-guided analysis, (iii) construction of quality, informative networks by data mining, (iv) trend and evolution analysis in heterogeneous information networks, and (v) research frontiers. We show that heterogeneous information networks are informative, and link analysis on such networks is powerful at uncovering critical knowledge hidden in large semi-structured datasets. Finally, we also present a few promising research directions.

UR - http://www.scopus.com/inward/record.url?scp=84864194534&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84864194534&partnerID=8YFLogxK

U2 - 10.1109/ICDE.2012.145

DO - 10.1109/ICDE.2012.145

M3 - Conference article

AN - SCOPUS:84864194534

SP - 1214

EP - 1217

JO - Proceedings - International Conference on Data Engineering

JF - Proceedings - International Conference on Data Engineering

SN - 1084-4627

M1 - 6228171

ER -