TY - GEN
T1 - Mining compressed frequent-pattern sets
AU - Xin, Dong
AU - Han, Jiawei
AU - Yan, Xifeng
AU - Cheng, Hong
PY - 2005
Y1 - 2005
N2 - A major challenge in frequent-pattern mining is the sheer size of its mining results. In many cases, a high min_sup threshold may discover only commonsense patterns but a low one may generate an explosive number of output patterns, which severely restricts its usage. In this paper, we study the problem of compressing frequent-pattern sets. Typically, frequent patterns can be clustered with a tightness measure δ (called δ-cluster), and a representative pattern can be selected for each cluster. Unfortunately, finding a minimum set of representative patterns is NP-Hard. We develop two greedy methods, RP global and RPlocal. The former has the guaranteed compression bound but higher computational complexity. The latter sacrifices the theoretical bounds but is far more efficient. Our performance study shows that the compression quality using RPlocal is very close to RPglobal, and both can reduce the number of closed frequent patterns by almost two orders of magnitude. Furthermore, RPlocal mines even faster than FPClose[11], a very fast closed frequent-pattern mining method. We also show that RPglobal and RPlocal can be combined together to balance the quality and efficiency.
AB - A major challenge in frequent-pattern mining is the sheer size of its mining results. In many cases, a high min_sup threshold may discover only commonsense patterns but a low one may generate an explosive number of output patterns, which severely restricts its usage. In this paper, we study the problem of compressing frequent-pattern sets. Typically, frequent patterns can be clustered with a tightness measure δ (called δ-cluster), and a representative pattern can be selected for each cluster. Unfortunately, finding a minimum set of representative patterns is NP-Hard. We develop two greedy methods, RP global and RPlocal. The former has the guaranteed compression bound but higher computational complexity. The latter sacrifices the theoretical bounds but is far more efficient. Our performance study shows that the compression quality using RPlocal is very close to RPglobal, and both can reduce the number of closed frequent patterns by almost two orders of magnitude. Furthermore, RPlocal mines even faster than FPClose[11], a very fast closed frequent-pattern mining method. We also show that RPglobal and RPlocal can be combined together to balance the quality and efficiency.
UR - http://www.scopus.com/inward/record.url?scp=33745458995&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33745458995&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:33745458995
SN - 1595931546
SN - 9781595931542
T3 - VLDB 2005 - Proceedings of 31st International Conference on Very Large Data Bases
SP - 709
EP - 720
BT - VLDB 2005 - Proceedings of 31st International Conference on Very Large Data Bases
T2 - VLDB 2005 - 31st International Conference on Very Large Data Bases
Y2 - 30 August 2005 through 2 September 2005
ER -