TY - GEN
T1 - Stream sequential pattern mining with precise error bounds
AU - Mendes, Luiz F.
AU - Ding, Bolin
AU - Han, Jiawei
PY - 2008
Y1 - 2008
N2 - Sequential pattern mining is an interesting data mining problem with many real-world applications. This problem has been studied extensively in static databases. However, in recent years, emerging applications have introduced a new form of data called data stream. In a data stream, new elements are generated continuously. This poses additional constraints on the methods used for mining such data: memory usage is restricted, the infinitely flowing original dataset cannot be scanned multiple times, and current results should be available on demand. This paper introduces two effective methods for mining sequential patterns from data streams: the SS-BE method and the SS-MB method. The proposed methods break the stream into batches and only process each batch once. The two methods use different pruning strategies that restrict the memory usage but can still guarantee that all true sequential patterns are output at the end of any batch. Both algorithms scale linearly in execution time as the number of sequences grows, making them effective methods for sequential pattern mining in data streams. The experimental results also show that our methods are very accurate in that only a small fraction of the patterns that are output are false positives. Even for these false positives, SS-BE guarantees that their true support is above a pre-defined threshold.
AB - Sequential pattern mining is an interesting data mining problem with many real-world applications. This problem has been studied extensively in static databases. However, in recent years, emerging applications have introduced a new form of data called data stream. In a data stream, new elements are generated continuously. This poses additional constraints on the methods used for mining such data: memory usage is restricted, the infinitely flowing original dataset cannot be scanned multiple times, and current results should be available on demand. This paper introduces two effective methods for mining sequential patterns from data streams: the SS-BE method and the SS-MB method. The proposed methods break the stream into batches and only process each batch once. The two methods use different pruning strategies that restrict the memory usage but can still guarantee that all true sequential patterns are output at the end of any batch. Both algorithms scale linearly in execution time as the number of sequences grows, making them effective methods for sequential pattern mining in data streams. The experimental results also show that our methods are very accurate in that only a small fraction of the patterns that are output are false positives. Even for these false positives, SS-BE guarantees that their true support is above a pre-defined threshold.
UR - http://www.scopus.com/inward/record.url?scp=67049137938&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=67049137938&partnerID=8YFLogxK
U2 - 10.1109/ICDM.2008.154
DO - 10.1109/ICDM.2008.154
M3 - Conference contribution
AN - SCOPUS:67049137938
SN - 9780769535029
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 941
EP - 946
BT - Proceedings - 8th IEEE International Conference on Data Mining, ICDM 2008
T2 - 8th IEEE International Conference on Data Mining, ICDM 2008
Y2 - 15 December 2008 through 19 December 2008
ER -