Efficient discovery of frequent approximate sequential patterns

Feida Zhu, Xifeng Yan, Jiawei Han, Gabrielle Dawn Allen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We propose an efficient algorithm for mining frequent approximate sequential patterns under the Hamming distance model. Our algorithm gains its efficiency by adopting a "break-down-and-build-up" methodology. The "breakdown" is based on the observation that all occurrences of a frequent pattern can be classified into groups, which we call strands. We developed efficient algorithms to quickly mine out all strands by iterative growth. In the "build-up" stage, these strands are grouped up to form the support sets from which all approximate patterns would be identified. A salient feature of our algorithm is its ability to grow the frequent patterns by iteratively assembling building blocks of significant sizes in a local search fashion. By avoiding incremental growth and global search, we achieve greater efficiency without losing the completeness of the mining result. Our experimental studies demonstrate that our algorithm is efficient in mining globally repeating approximate sequential patterns that would have been missed by existing methods.

Original languageEnglish (US)
Title of host publicationProceedings of the 7th IEEE International Conference on Data Mining, ICDM 2007
Pages751-756
Number of pages6
DOIs
StatePublished - 2007
Event7th IEEE International Conference on Data Mining, ICDM 2007 - Omaha, NE, United States
Duration: Oct 28 2007Oct 31 2007

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)1550-4786

Other

Other7th IEEE International Conference on Data Mining, ICDM 2007
Country/TerritoryUnited States
CityOmaha, NE
Period10/28/0710/31/07

ASJC Scopus subject areas

  • Engineering(all)

Fingerprint

Dive into the research topics of 'Efficient discovery of frequent approximate sequential patterns'. Together they form a unique fingerprint.

Cite this