TY - GEN
T1 - Mining Frequent Patterns without Candidate Generation
AU - Han, Jiawei
AU - Pei, Jian
AU - Yin, Yiwen
N1 - Funding Information:
The work was supported in part by the Natural Sciences and Engineering Research Council of Canada (grant NSERCA3723), the Networks of Centres of Excellence of Canada (grant NCE/IRIS-3), and the Hewlett-Packard Lab, U.S.A.
Publisher Copyright:
© ACM 2000
PY - 2000
Y1 - 2000
N2 - Mining frequent patterns in transaction databases, time-series databases, and many other kinds of databases has been studied popularly in data mining research. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist prolific patterns and/or long patterns. In this study, we propose a novel frequent pattern tree (FP-tree) structure, which is an extended prefix-tree structure for storing compressed, crucial information about frequent patterns, and develop an efficient FP-tree-based mining method, FP-growth, for mining the complete set of frequent patterns by pattern fragment growth. Efficiency of mining is achieved with three techniques: (1) a large database is compressed into a highly condensed, much smaller data structure, which avoids costly, repeated database scans, (2) our FP-tree-based mining adopts a pattern fragment growth method to avoid the costly generation of a large number of candidate sets, and (3) a partitioning-based, divide-and-conquer method is used to decompose the mining task into a set of smaller tasks for mining confined patterns in conditional databases, which dramatically reduces the search space. Our performance study shows that the FP-growth method is efficient and scalable for mining both long and short frequent patterns, and is about an order of magnitude faster than the Apriori algorithm and also faster than some recently reported new frequent pattern mining methods.
AB - Mining frequent patterns in transaction databases, time-series databases, and many other kinds of databases has been studied popularly in data mining research. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist prolific patterns and/or long patterns. In this study, we propose a novel frequent pattern tree (FP-tree) structure, which is an extended prefix-tree structure for storing compressed, crucial information about frequent patterns, and develop an efficient FP-tree-based mining method, FP-growth, for mining the complete set of frequent patterns by pattern fragment growth. Efficiency of mining is achieved with three techniques: (1) a large database is compressed into a highly condensed, much smaller data structure, which avoids costly, repeated database scans, (2) our FP-tree-based mining adopts a pattern fragment growth method to avoid the costly generation of a large number of candidate sets, and (3) a partitioning-based, divide-and-conquer method is used to decompose the mining task into a set of smaller tasks for mining confined patterns in conditional databases, which dramatically reduces the search space. Our performance study shows that the FP-growth method is efficient and scalable for mining both long and short frequent patterns, and is about an order of magnitude faster than the Apriori algorithm and also faster than some recently reported new frequent pattern mining methods.
UR - http://www.scopus.com/inward/record.url?scp=85086032933&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85086032933&partnerID=8YFLogxK
U2 - 10.1145/342009.335372
DO - 10.1145/342009.335372
M3 - Conference contribution
AN - SCOPUS:85086032933
T3 - SIGMOD 2000 - Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data
SP - 1
EP - 12
BT - SIGMOD 2000 - Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data
PB - Association for Computing Machinery
T2 - 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD 2000
Y2 - 15 May 2000 through 18 May 2000
ER -