Summarizing itemset patterns: A profile-based approach

Xifeng Yan, Hong Cheng, Jiawei Han, Dong Xin

Research output: Contribution to conferencePaper

Abstract

Frequent-pattern mining has been studied extensively on scalable methods for mining various kinds of patterns including itemsets, sequences, and graphs. However, the bottleneck of frequent-pattern mining is not at the efficiency but at the interpretability, due to the huge number of patterns generated by the mining process. In this paper, we examine how to summarize a collection of itemset patterns using only K representatives, a small number of patterns that a user can handle easily. The K representatives should not only cover most of the frequent patterns but also approximate their supports. A generative model is built to extract and profile these representatives, under which the supports of the patterns can be easily recovered without consulting the original dataset. Based on the restoration error, we propose a quality measure function to determine the optimal value of parameter K. Polynomial time algorithms are developed together with several optimization heuristics for efficiency improvement. Empirical studies indicate that we can obtain compact summarization in real datasets.

Original languageEnglish (US)
Pages314-323
Number of pages10
DOIs
StatePublished - Dec 1 2005
EventKDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - Chicago, IL, United States
Duration: Aug 21 2005Aug 24 2005

Other

OtherKDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
CountryUnited States
CityChicago, IL
Period8/21/058/24/05

Fingerprint

Restoration
Polynomials

Keywords

  • Frequent pattern
  • Probabilistic model
  • Summarization

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this

Yan, X., Cheng, H., Han, J., & Xin, D. (2005). Summarizing itemset patterns: A profile-based approach. 314-323. Paper presented at KDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, United States. https://doi.org/10.1145/1081870.1081907

Summarizing itemset patterns : A profile-based approach. / Yan, Xifeng; Cheng, Hong; Han, Jiawei; Xin, Dong.

2005. 314-323 Paper presented at KDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, United States.

Research output: Contribution to conferencePaper

Yan, X, Cheng, H, Han, J & Xin, D 2005, 'Summarizing itemset patterns: A profile-based approach', Paper presented at KDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, United States, 8/21/05 - 8/24/05 pp. 314-323. https://doi.org/10.1145/1081870.1081907
Yan X, Cheng H, Han J, Xin D. Summarizing itemset patterns: A profile-based approach. 2005. Paper presented at KDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, United States. https://doi.org/10.1145/1081870.1081907
Yan, Xifeng ; Cheng, Hong ; Han, Jiawei ; Xin, Dong. / Summarizing itemset patterns : A profile-based approach. Paper presented at KDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, United States.10 p.
@conference{1e25c6f3480148dab9ecdc088f5c16b9,
title = "Summarizing itemset patterns: A profile-based approach",
abstract = "Frequent-pattern mining has been studied extensively on scalable methods for mining various kinds of patterns including itemsets, sequences, and graphs. However, the bottleneck of frequent-pattern mining is not at the efficiency but at the interpretability, due to the huge number of patterns generated by the mining process. In this paper, we examine how to summarize a collection of itemset patterns using only K representatives, a small number of patterns that a user can handle easily. The K representatives should not only cover most of the frequent patterns but also approximate their supports. A generative model is built to extract and profile these representatives, under which the supports of the patterns can be easily recovered without consulting the original dataset. Based on the restoration error, we propose a quality measure function to determine the optimal value of parameter K. Polynomial time algorithms are developed together with several optimization heuristics for efficiency improvement. Empirical studies indicate that we can obtain compact summarization in real datasets.",
keywords = "Frequent pattern, Probabilistic model, Summarization",
author = "Xifeng Yan and Hong Cheng and Jiawei Han and Dong Xin",
year = "2005",
month = "12",
day = "1",
doi = "10.1145/1081870.1081907",
language = "English (US)",
pages = "314--323",
note = "KDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining ; Conference date: 21-08-2005 Through 24-08-2005",

}

TY - CONF

T1 - Summarizing itemset patterns

T2 - A profile-based approach

AU - Yan, Xifeng

AU - Cheng, Hong

AU - Han, Jiawei

AU - Xin, Dong

PY - 2005/12/1

Y1 - 2005/12/1

N2 - Frequent-pattern mining has been studied extensively on scalable methods for mining various kinds of patterns including itemsets, sequences, and graphs. However, the bottleneck of frequent-pattern mining is not at the efficiency but at the interpretability, due to the huge number of patterns generated by the mining process. In this paper, we examine how to summarize a collection of itemset patterns using only K representatives, a small number of patterns that a user can handle easily. The K representatives should not only cover most of the frequent patterns but also approximate their supports. A generative model is built to extract and profile these representatives, under which the supports of the patterns can be easily recovered without consulting the original dataset. Based on the restoration error, we propose a quality measure function to determine the optimal value of parameter K. Polynomial time algorithms are developed together with several optimization heuristics for efficiency improvement. Empirical studies indicate that we can obtain compact summarization in real datasets.

AB - Frequent-pattern mining has been studied extensively on scalable methods for mining various kinds of patterns including itemsets, sequences, and graphs. However, the bottleneck of frequent-pattern mining is not at the efficiency but at the interpretability, due to the huge number of patterns generated by the mining process. In this paper, we examine how to summarize a collection of itemset patterns using only K representatives, a small number of patterns that a user can handle easily. The K representatives should not only cover most of the frequent patterns but also approximate their supports. A generative model is built to extract and profile these representatives, under which the supports of the patterns can be easily recovered without consulting the original dataset. Based on the restoration error, we propose a quality measure function to determine the optimal value of parameter K. Polynomial time algorithms are developed together with several optimization heuristics for efficiency improvement. Empirical studies indicate that we can obtain compact summarization in real datasets.

KW - Frequent pattern

KW - Probabilistic model

KW - Summarization

UR - http://www.scopus.com/inward/record.url?scp=32344439809&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=32344439809&partnerID=8YFLogxK

U2 - 10.1145/1081870.1081907

DO - 10.1145/1081870.1081907

M3 - Paper

AN - SCOPUS:32344439809

SP - 314

EP - 323

ER -