FAST sequence mining based on sparse id-lists

Eliana Salvemini, Fabio Fumarola, Donato Malerba, Jiawei Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Sequential pattern mining is an important data mining task with applications in basket analysis, world wide web, medicine and telecommunication. This task is challenging because sequence databases are usually large with many and long sequences and the number of possible sequential patterns to mine can be exponential. We proposed a new sequential pattern mining algorithm called FAST which employs a representation of the dataset with indexed sparse id-lists to fast counting the support of sequential patterns. We also use a lexicographic tree to improve the efficiency of candidates generation. FAST mines the complete set of patterns by greatly reducing the effort for support counting and candidate sequences generation. Experimental results on artificial and real data show that our method outperforms existing methods in literature up to an order of magnitude or two for large datasets.

Original languageEnglish (US)
Title of host publicationFoundations of Intelligent Systems - 19th International Symposium, ISMIS 2011, Proceedings
Pages316-325
Number of pages10
DOIs
StatePublished - Jul 14 2011
Event19th International Symposium on Methodologies for Intelligent Systems, ISMIS 2011 - Warsaw, Poland
Duration: Jun 28 2011Jun 30 2011

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume6804 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other19th International Symposium on Methodologies for Intelligent Systems, ISMIS 2011
Country/TerritoryPoland
CityWarsaw
Period6/28/116/30/11

Keywords

  • Data Mining
  • Sequential Pattern Discovery
  • Sparse Id-List

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'FAST sequence mining based on sparse id-lists'. Together they form a unique fingerprint.

Cite this