TY - GEN
T1 - H-mine
T2 - 1st IEEE International Conference on Data Mining, ICDM'01
AU - Pei, Jian
AU - Han, Jiawei
AU - Lu, Hongjun
AU - Nishio, Shojiro
AU - Tang, Shiwei
AU - Yang, Dongqing
PY - 2001
Y1 - 2001
N2 - Methods for efficient mining of frequent patterns have been studied extensively by many researchers. However, the previously proposed methods still encounter some performance bottlenecks when mining databases with different data characteristics, such as dense vs. sparse, long vs. short patterns, memory-based vs. disk-based, etc. In this study, we propose a simple and novel hyperlinked data structure, H-Struct, and a new mining algorithm, H-mine, which takes advantage of this data structure and dynamically adjusts links in the mining process. A distinct feature of this method is that it has very limited and precisely predictable space overhead and runs really fast in memory-based setting. Moreover, it can be scaled up to very large databases by database partitioning, and when the data set becomes dense, (conditional) FP-trees can be constructed dynamically as part of the mining process. Our study shows that H-mine has high performance in various kinds of data, outperforms the previously developed algorithms in different settings, and is highly scalable in mining large databases. This study also proposes a new data mining methodology, space-preserving mining, which may have strong impact in the future development of efficient and scalable data mining methods.
AB - Methods for efficient mining of frequent patterns have been studied extensively by many researchers. However, the previously proposed methods still encounter some performance bottlenecks when mining databases with different data characteristics, such as dense vs. sparse, long vs. short patterns, memory-based vs. disk-based, etc. In this study, we propose a simple and novel hyperlinked data structure, H-Struct, and a new mining algorithm, H-mine, which takes advantage of this data structure and dynamically adjusts links in the mining process. A distinct feature of this method is that it has very limited and precisely predictable space overhead and runs really fast in memory-based setting. Moreover, it can be scaled up to very large databases by database partitioning, and when the data set becomes dense, (conditional) FP-trees can be constructed dynamically as part of the mining process. Our study shows that H-mine has high performance in various kinds of data, outperforms the previously developed algorithms in different settings, and is highly scalable in mining large databases. This study also proposes a new data mining methodology, space-preserving mining, which may have strong impact in the future development of efficient and scalable data mining methods.
UR - http://www.scopus.com/inward/record.url?scp=78149320187&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78149320187&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:78149320187
SN - 0769511198
SN - 9780769511191
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 441
EP - 448
BT - Proceedings - 2001 IEEE International Conference on Data Mining, ICDM'01
Y2 - 29 November 2001 through 2 December 2001
ER -