TY - GEN
T1 - Efficient mining of correlated sequential patterns based on null hypothesis
AU - Lin, Cindy Xide
AU - Ji, Ming
AU - Danilevsky, Marina
AU - Han, Jiawei
PY - 2012
Y1 - 2012
N2 - Frequent pattern mining has been a widely studied topic in the research area of data mining for more than a decade. However, pattern mining with real data sets is complicated - a huge number of co-occurrence patterns are usually generated, a majority of which are either redundant or uninformative. The true correlation relationships among data objects are buried deep among a large pile of useless information. To overcome this difficulty, mining correlations has been recognized as an important data mining task for its many advantages over mining frequent patterns. In this paper, we formally propose and define the task of mining frequent correlated sequential patterns from a sequential database. With this aim in mind, we re-examine various interestingness measures to select the appropriate one(s), which can disclose succinct relationships of sequential patterns. We then propose PSBSpan, an efficient mining algorithm based on the framework of the pattern-growth methodology which mines frequent correlated sequential patterns. Our experimental study on real datasets shows that our algorithm has outstanding performance in terms of both efficiency and effectiveness.
AB - Frequent pattern mining has been a widely studied topic in the research area of data mining for more than a decade. However, pattern mining with real data sets is complicated - a huge number of co-occurrence patterns are usually generated, a majority of which are either redundant or uninformative. The true correlation relationships among data objects are buried deep among a large pile of useless information. To overcome this difficulty, mining correlations has been recognized as an important data mining task for its many advantages over mining frequent patterns. In this paper, we formally propose and define the task of mining frequent correlated sequential patterns from a sequential database. With this aim in mind, we re-examine various interestingness measures to select the appropriate one(s), which can disclose succinct relationships of sequential patterns. We then propose PSBSpan, an efficient mining algorithm based on the framework of the pattern-growth methodology which mines frequent correlated sequential patterns. Our experimental study on real datasets shows that our algorithm has outstanding performance in terms of both efficiency and effectiveness.
KW - Correlated pattern mining
KW - Frequent pattern mining
UR - http://www.scopus.com/inward/record.url?scp=84870401905&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84870401905&partnerID=8YFLogxK
U2 - 10.1145/2389656.2389660
DO - 10.1145/2389656.2389660
M3 - Conference contribution
AN - SCOPUS:84870401905
SN - 9781450317115
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 17
EP - 24
BT - Web-KR'12 - Proceedings of the 2012 ACM International Workshop on Web-Scale Knowledge Representation, Retrieval and Reasoning, Co-located with CIKM 2012
T2 - 2012 3rd ACM International Workshop on Web-Scale Knowledge Representation, Retrieval, and Reasoning, Web-KR 2012, Co-located with the 21st ACM International Conference on Information and Knowledge Management, CIKM 2012
Y2 - 29 October 2012 through 29 October 2012
ER -