TY - GEN
T1 - Efficient mining of closed repetitive gapped subsequences from a sequence database
AU - Ding, Bolin
AU - Lo, David
AU - Han, Jiawei
AU - Khoo, Siau Cheng
PY - 2009
Y1 - 2009
N2 - There is a huge wealth of sequence data available, for example, customer purchase histories, program execution traces, DNA, and protein sequences. Analyzing this wealth of data to mine important knowledge is certainly a worthwhile goal. In this paper, as a step forward to analyzing patterns in sequences, we introduce the problem of mining closed repetitive gapped subsequences and propose efficient solutions. Given a database of sequences where each sequence is an ordered list of events, the pattern we would like to mine is called repetitive gapped subsequence, which is a subsequence (possibly with gaps between two successive events within it) of some sequences in the database. We introduce the concept of repetitive support to measure how frequently a pattern repeats in the database. Different from the sequential pattern mining problem, repetitive support captures not only repetitions of a pattern in different sequences but also the repetitions within a sequence. Given a userspecified support threshold min sup, we study finding the set of all patterns with repetitive support no less than min sup. To obtain a compact yet complete result set and improve the efficiency, we also study finding closed patterns. Efficient mining algorithms to find the complete set of desired patterns are proposed based on the idea of instance growth. Our performance study on various datasets shows the efficiency of our approach. A case study is also performed to show the utility of our approach.
AB - There is a huge wealth of sequence data available, for example, customer purchase histories, program execution traces, DNA, and protein sequences. Analyzing this wealth of data to mine important knowledge is certainly a worthwhile goal. In this paper, as a step forward to analyzing patterns in sequences, we introduce the problem of mining closed repetitive gapped subsequences and propose efficient solutions. Given a database of sequences where each sequence is an ordered list of events, the pattern we would like to mine is called repetitive gapped subsequence, which is a subsequence (possibly with gaps between two successive events within it) of some sequences in the database. We introduce the concept of repetitive support to measure how frequently a pattern repeats in the database. Different from the sequential pattern mining problem, repetitive support captures not only repetitions of a pattern in different sequences but also the repetitions within a sequence. Given a userspecified support threshold min sup, we study finding the set of all patterns with repetitive support no less than min sup. To obtain a compact yet complete result set and improve the efficiency, we also study finding closed patterns. Efficient mining algorithms to find the complete set of desired patterns are proposed based on the idea of instance growth. Our performance study on various datasets shows the efficiency of our approach. A case study is also performed to show the utility of our approach.
UR - http://www.scopus.com/inward/record.url?scp=67649646374&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=67649646374&partnerID=8YFLogxK
U2 - 10.1109/ICDE.2009.104
DO - 10.1109/ICDE.2009.104
M3 - Conference contribution
AN - SCOPUS:67649646374
SN - 9780769535456
T3 - Proceedings - International Conference on Data Engineering
SP - 1024
EP - 1035
BT - Proceedings - 25th IEEE International Conference on Data Engineering, ICDE 2009
T2 - 25th IEEE International Conference on Data Engineering, ICDE 2009
Y2 - 29 March 2009 through 2 April 2009
ER -