CLOSET+: Searching for the best strategies for mining frequent closed itemsets

Jianyong Wang, Jiawei Han, Jian Pei

Research output: Contribution to conferencePaper

Abstract

Mining frequent closed itemsets provides complete and non-redundant results for frequent pattern analysis. Extensive studies have proposed various strategies for efficient frequent closed itemset mining, such as depth-first search vs. breadthfirst search, vertical formats vs. horizontal formats, tree-structure vs. other data structures, top-down vs. bottom-up traversal, pseudo projection vs. physical projection of conditional database, etc. It is the right time to ask "what are the pros and cons of the strategies?" and "what and how can we pick and integrate the best strategies to achieve higher performance in general cases?"In this study, we answer the above questions by a systematic study of the search strategies and develop a winning algorithm CLOSET+. CLOSET+ integrates the advantages of the previously proposed effective strategies as well as some ones newly developed here. A thorough performance study on synthetic and real data sets has shown the advantages of the strategies and the improvement of CLOSET+ over existing mining algorithms, including CLOSET, CHARM and OP, in terms of runtime, memory usage and scalability.

Original languageEnglish (US)
Pages236-245
Number of pages10
DOIs
StatePublished - Dec 1 2003
Event9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03 - Washington, DC, United States
Duration: Aug 24 2003Aug 27 2003

Other

Other9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03
CountryUnited States
CityWashington, DC
Period8/24/038/27/03

Fingerprint

Data structures
Scalability
Data storage equipment

Keywords

  • Association rules
  • Frequent closed itemsets

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this

Wang, J., Han, J., & Pei, J. (2003). CLOSET+: Searching for the best strategies for mining frequent closed itemsets. 236-245. Paper presented at 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03, Washington, DC, United States. https://doi.org/10.1145/956750.956779

CLOSET+ : Searching for the best strategies for mining frequent closed itemsets. / Wang, Jianyong; Han, Jiawei; Pei, Jian.

2003. 236-245 Paper presented at 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03, Washington, DC, United States.

Research output: Contribution to conferencePaper

Wang, J, Han, J & Pei, J 2003, 'CLOSET+: Searching for the best strategies for mining frequent closed itemsets' Paper presented at 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03, Washington, DC, United States, 8/24/03 - 8/27/03, pp. 236-245. https://doi.org/10.1145/956750.956779
Wang J, Han J, Pei J. CLOSET+: Searching for the best strategies for mining frequent closed itemsets. 2003. Paper presented at 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03, Washington, DC, United States. https://doi.org/10.1145/956750.956779
Wang, Jianyong ; Han, Jiawei ; Pei, Jian. / CLOSET+ : Searching for the best strategies for mining frequent closed itemsets. Paper presented at 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03, Washington, DC, United States.10 p.
@conference{89891083e2934830a8a92d733b7804c0,
title = "CLOSET+: Searching for the best strategies for mining frequent closed itemsets",
abstract = "Mining frequent closed itemsets provides complete and non-redundant results for frequent pattern analysis. Extensive studies have proposed various strategies for efficient frequent closed itemset mining, such as depth-first search vs. breadthfirst search, vertical formats vs. horizontal formats, tree-structure vs. other data structures, top-down vs. bottom-up traversal, pseudo projection vs. physical projection of conditional database, etc. It is the right time to ask {"}what are the pros and cons of the strategies?{"} and {"}what and how can we pick and integrate the best strategies to achieve higher performance in general cases?{"}In this study, we answer the above questions by a systematic study of the search strategies and develop a winning algorithm CLOSET+. CLOSET+ integrates the advantages of the previously proposed effective strategies as well as some ones newly developed here. A thorough performance study on synthetic and real data sets has shown the advantages of the strategies and the improvement of CLOSET+ over existing mining algorithms, including CLOSET, CHARM and OP, in terms of runtime, memory usage and scalability.",
keywords = "Association rules, Frequent closed itemsets",
author = "Jianyong Wang and Jiawei Han and Jian Pei",
year = "2003",
month = "12",
day = "1",
doi = "10.1145/956750.956779",
language = "English (US)",
pages = "236--245",
note = "9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03 ; Conference date: 24-08-2003 Through 27-08-2003",

}

TY - CONF

T1 - CLOSET+

T2 - Searching for the best strategies for mining frequent closed itemsets

AU - Wang, Jianyong

AU - Han, Jiawei

AU - Pei, Jian

PY - 2003/12/1

Y1 - 2003/12/1

N2 - Mining frequent closed itemsets provides complete and non-redundant results for frequent pattern analysis. Extensive studies have proposed various strategies for efficient frequent closed itemset mining, such as depth-first search vs. breadthfirst search, vertical formats vs. horizontal formats, tree-structure vs. other data structures, top-down vs. bottom-up traversal, pseudo projection vs. physical projection of conditional database, etc. It is the right time to ask "what are the pros and cons of the strategies?" and "what and how can we pick and integrate the best strategies to achieve higher performance in general cases?"In this study, we answer the above questions by a systematic study of the search strategies and develop a winning algorithm CLOSET+. CLOSET+ integrates the advantages of the previously proposed effective strategies as well as some ones newly developed here. A thorough performance study on synthetic and real data sets has shown the advantages of the strategies and the improvement of CLOSET+ over existing mining algorithms, including CLOSET, CHARM and OP, in terms of runtime, memory usage and scalability.

AB - Mining frequent closed itemsets provides complete and non-redundant results for frequent pattern analysis. Extensive studies have proposed various strategies for efficient frequent closed itemset mining, such as depth-first search vs. breadthfirst search, vertical formats vs. horizontal formats, tree-structure vs. other data structures, top-down vs. bottom-up traversal, pseudo projection vs. physical projection of conditional database, etc. It is the right time to ask "what are the pros and cons of the strategies?" and "what and how can we pick and integrate the best strategies to achieve higher performance in general cases?"In this study, we answer the above questions by a systematic study of the search strategies and develop a winning algorithm CLOSET+. CLOSET+ integrates the advantages of the previously proposed effective strategies as well as some ones newly developed here. A thorough performance study on synthetic and real data sets has shown the advantages of the strategies and the improvement of CLOSET+ over existing mining algorithms, including CLOSET, CHARM and OP, in terms of runtime, memory usage and scalability.

KW - Association rules

KW - Frequent closed itemsets

UR - http://www.scopus.com/inward/record.url?scp=77952363125&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77952363125&partnerID=8YFLogxK

U2 - 10.1145/956750.956779

DO - 10.1145/956750.956779

M3 - Paper

AN - SCOPUS:77952363125

SP - 236

EP - 245

ER -