Stream sequential pattern mining with precise error bounds

Luiz F. Mendes, Bolin Ding, Jiawei Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Sequential pattern mining is an interesting data mining problem with many real-world applications. This problem has been studied extensively in static databases. However, in recent years, emerging applications have introduced a new form of data called data stream. In a data stream, new elements are generated continuously. This poses additional constraints on the methods used for mining such data: memory usage is restricted, the infinitely flowing original dataset cannot be scanned multiple times, and current results should be available on demand. This paper introduces two effective methods for mining sequential patterns from data streams: the SS-BE method and the SS-MB method. The proposed methods break the stream into batches and only process each batch once. The two methods use different pruning strategies that restrict the memory usage but can still guarantee that all true sequential patterns are output at the end of any batch. Both algorithms scale linearly in execution time as the number of sequences grows, making them effective methods for sequential pattern mining in data streams. The experimental results also show that our methods are very accurate in that only a small fraction of the patterns that are output are false positives. Even for these false positives, SS-BE guarantees that their true support is above a pre-defined threshold.

Original languageEnglish (US)
Title of host publicationProceedings - 8th IEEE International Conference on Data Mining, ICDM 2008
Pages941-946
Number of pages6
DOIs
StatePublished - 2008
Event8th IEEE International Conference on Data Mining, ICDM 2008 - Pisa, Italy
Duration: Dec 15 2008Dec 19 2008

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)1550-4786

Other

Other8th IEEE International Conference on Data Mining, ICDM 2008
Country/TerritoryItaly
CityPisa
Period12/15/0812/19/08

ASJC Scopus subject areas

  • Engineering(all)

Fingerprint

Dive into the research topics of 'Stream sequential pattern mining with precise error bounds'. Together they form a unique fingerprint.

Cite this