H-Mine: Fast and space-preserving frequent pattern mining in a large databases

Jian Pei, Jiawei Han, Hongjun Lu, Shojiro Nishio, Shiwei Tang, Dongqing Yang

Research output: Contribution to journalArticlepeer-review

Abstract

In this study, we propose a simple and novel data structure using hyper-links, H-struct, and a new mining algorithm, H-mine, which takes advantage of this data structure and dynamically adjusts links in the mining process. A distinct feature of this method is that it has a very limited and precisely predictable main memory cost and runs very quickly in memory-based settings. Moreover, it can be scaled up to very large databases using database partitioning. When the data set becomes dense, (conditional) FP-trees can be constructed dynamically as part of the mining process. Our study shows that H-mine has an excellent performance for various kinds of data, outperforms currently available algorithms in different settings, and is highly scalable to mining large databases. This study also proposes a new data mining methodology, space-preserving mining, which may have a major impact on the future development of efficient and scalable data mining methods.

Original languageEnglish (US)
Pages (from-to)593-605
Number of pages13
JournalIIE Transactions (Institute of Industrial Engineers)
Volume39
Issue number6
DOIs
StatePublished - Jun 2007

Keywords

  • FP-tree
  • Frequent pattern mining
  • Transaction databases

ASJC Scopus subject areas

  • Industrial and Manufacturing Engineering

Fingerprint Dive into the research topics of 'H-Mine: Fast and space-preserving frequent pattern mining in a large databases'. Together they form a unique fingerprint.

Cite this