TY - JOUR
T1 - Top-down mining of frequent closed patterns from very high dimensional data
AU - Liu, Hongyan
AU - Wang, Xiaoyu
AU - He, Jun
AU - Han, Jiawei
AU - Xin, Dong
AU - Shao, Zheng
N1 - Funding Information:
A preliminary version of this work has been published in the Proceedings of 2006 SIAM Data Mining Conference (SIDM 2006). Substantial new technical materials have been added to this journal submission. This work was supported in part by the National Natural Science Foundation of China under Grant Nos. 70871068, 70890083 and 70621061, and by the US National Science Foundation NSF IIS-03-08215 and IIS-05-13857.
PY - 2009/3/15
Y1 - 2009/3/15
N2 - Frequent pattern mining is an essential theme in data mining. Existing algorithms usually use a bottom-up search strategy. However, for very high dimensional data, this strategy cannot fully utilize the minimum support constraint to prune the rowset search space. In this paper, we propose a new method called top-down mining together with a novel row enumeration tree to make full use of the pruning power of the minimum support constraint. Furthermore, to efficiently check if a rowset is closed, we develop a method called the trace-based method. Based on these methods, an algorithm called TD-Close is designed for mining a complete set of frequent closed patterns. To enhance its performance further, we improve it by using new pruning strategies and new data structures that lead to a new algorithm TTD-Close. Our performance study shows that the top-down strategy is effective in cutting down search space and saving memory space, while the trace-based method facilitates the closeness-checking. As a result, the algorithm TTD-Close outperforms the bottom-up search algorithms such as Carpenter and FPclose in most cases. It also runs faster than TD-Close.
AB - Frequent pattern mining is an essential theme in data mining. Existing algorithms usually use a bottom-up search strategy. However, for very high dimensional data, this strategy cannot fully utilize the minimum support constraint to prune the rowset search space. In this paper, we propose a new method called top-down mining together with a novel row enumeration tree to make full use of the pruning power of the minimum support constraint. Furthermore, to efficiently check if a rowset is closed, we develop a method called the trace-based method. Based on these methods, an algorithm called TD-Close is designed for mining a complete set of frequent closed patterns. To enhance its performance further, we improve it by using new pruning strategies and new data structures that lead to a new algorithm TTD-Close. Our performance study shows that the top-down strategy is effective in cutting down search space and saving memory space, while the trace-based method facilitates the closeness-checking. As a result, the algorithm TTD-Close outperforms the bottom-up search algorithms such as Carpenter and FPclose in most cases. It also runs faster than TD-Close.
KW - Association rules
KW - Data mining
KW - Frequent patterns
KW - High dimensional data
UR - http://www.scopus.com/inward/record.url?scp=58249110789&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=58249110789&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2008.11.033
DO - 10.1016/j.ins.2008.11.033
M3 - Article
AN - SCOPUS:58249110789
SN - 0020-0255
VL - 179
SP - 899
EP - 924
JO - Information Sciences
JF - Information Sciences
IS - 7
ER -