TY - GEN
T1 - Efficient discovery of frequent approximate sequential patterns
AU - Zhu, Feida
AU - Yan, Xifeng
AU - Han, Jiawei
AU - Allen, Gabrielle Dawn
PY - 2007
Y1 - 2007
N2 - We propose an efficient algorithm for mining frequent approximate sequential patterns under the Hamming distance model. Our algorithm gains its efficiency by adopting a "break-down-and-build-up" methodology. The "breakdown" is based on the observation that all occurrences of a frequent pattern can be classified into groups, which we call strands. We developed efficient algorithms to quickly mine out all strands by iterative growth. In the "build-up" stage, these strands are grouped up to form the support sets from which all approximate patterns would be identified. A salient feature of our algorithm is its ability to grow the frequent patterns by iteratively assembling building blocks of significant sizes in a local search fashion. By avoiding incremental growth and global search, we achieve greater efficiency without losing the completeness of the mining result. Our experimental studies demonstrate that our algorithm is efficient in mining globally repeating approximate sequential patterns that would have been missed by existing methods.
AB - We propose an efficient algorithm for mining frequent approximate sequential patterns under the Hamming distance model. Our algorithm gains its efficiency by adopting a "break-down-and-build-up" methodology. The "breakdown" is based on the observation that all occurrences of a frequent pattern can be classified into groups, which we call strands. We developed efficient algorithms to quickly mine out all strands by iterative growth. In the "build-up" stage, these strands are grouped up to form the support sets from which all approximate patterns would be identified. A salient feature of our algorithm is its ability to grow the frequent patterns by iteratively assembling building blocks of significant sizes in a local search fashion. By avoiding incremental growth and global search, we achieve greater efficiency without losing the completeness of the mining result. Our experimental studies demonstrate that our algorithm is efficient in mining globally repeating approximate sequential patterns that would have been missed by existing methods.
UR - http://www.scopus.com/inward/record.url?scp=49749097886&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=49749097886&partnerID=8YFLogxK
U2 - 10.1109/ICDM.2007.75
DO - 10.1109/ICDM.2007.75
M3 - Conference contribution
AN - SCOPUS:49749097886
SN - 0769530184
SN - 9780769530185
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 751
EP - 756
BT - Proceedings of the 7th IEEE International Conference on Data Mining, ICDM 2007
T2 - 7th IEEE International Conference on Data Mining, ICDM 2007
Y2 - 28 October 2007 through 31 October 2007
ER -