TSP: Mining top-k closed sequential patterns

Research output: Contribution to journalArticlepeer-review

Abstract

Sequential pattern mining has been studied extensively in the data mining community. Most previous studies require the specification of a min_support threshold for mining a complete set of sequential patterns satisfying the threshold. However, in practice, it is difficult for users to provide an appropriate min_support threshold. To overcome this difficulty, we propose an alternative mining task: mining top-k frequent closed sequential patterns of length no less than min_ℓ, where k is the desired number of closed sequential patterns to be mined and min_ℓ is the minimal length of each pattern. We mine the set of closed patterns because it is a compact representation of the complete set of frequent patterns. An efficient algorithm, called TSP, is developed for mining such patterns without min_support. Starting at (absolute) min_support = 1, the algorithm makes use of the length constraint and the properties of top-k closed sequential patterns to perform dynamic support raising and projected database pruning. Our extensive performance study shows that TSP has high performance. In most cases, it outperforms the efficient closed sequential pattern-mining algorithm, CloSpan, even when the latter is running with the best tuned min_support threshold. Thus, we conclude that, for sequential pattern mining, mining top-k frequent closed sequential patterns without min_support is more preferable than the traditional min_support-based mining. Springer-Verlag London Ltd.

Original languageEnglish (US)
Pages (from-to)438-457
Number of pages20
JournalKnowledge and Information Systems
Volume7
Issue number4
DOIs
StatePublished - May 2005

Keywords

  • Closed pattern
  • Frequent pattern
  • Sequence
  • Top-k

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Human-Computer Interaction
  • Hardware and Architecture
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'TSP: Mining top-k closed sequential patterns'. Together they form a unique fingerprint.

Cite this