Query-Based Outlier Detection in Heterogeneous Information Networks

Jonathan Kuck, Honglei Zhuang, Xifeng Yan, Hasan Cam, Jiawei Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Outlier or anomaly detection in large data sets is a fundamental task in data science, with broad applications. However, in real data sets with high-dimensional space, most outliers are hidden in certain dimensional combinations and are relative to a user's search space and interest. It is often more effective to give power to users and allow them to specify outlier queries flexibly, and the system will then process such mining queries efficiently. In this study, we introduce the concept of query-based outlier in heterogeneous information networks, design a query language to facilitate users to specify such queries flexibly, define a good outlier measure in heterogeneous networks, and study how to process outlier queries efficiently in large data sets. Our experiments on real data sets show that following such a methodology, interesting outliers can be defined and uncovered flexibly and effectively in large heterogeneous networks.

Original languageEnglish (US)
Title of host publicationAdvances in database technology: proceedings. International Conference on Extending Database Technology
Pages325-336
Number of pages12
Volume2015
DOIs
StatePublished - Mar 2015

Fingerprint

Dive into the research topics of 'Query-Based Outlier Detection in Heterogeneous Information Networks'. Together they form a unique fingerprint.

Cite this