CISpan: Comprehensive incremental mining algorithms of closed sequential patterns for multi-versional software mining

Ding Yuan, Kyuhyung Lee, Hong Cheng, Gopal Krishna, Zhenmin Li, Xiao Ma, Yuanyuan Zhou, Jiawei Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Recently, frequent sequential pattern mining algorithms have been widely used in software engineering field to mine various source code or specification patterns. In practice, software evolves from one version to another in its life span. The effort of mining frequent sequential patterns across multiple versions of a software can be substantially reduced by efficient incremental mining. This problem is challenging in this domain since the databases are usually updated in all kinds of manners including insertion, various modifications as well as removal of sequences. Also, different mining tools may have various mining constraints, such as low minimum support. None of the existing work can be applied effectively due to various limitations of such work. For example, our recent work, IncSpan, failed solving the problem because it could neither handle low minimum support nor removal of sequences from database. In this paper, we propose a novel, comprehensive incremental mining algorithm for frequent sequential pattern, CISpan (Comprehensive Incremental Sequential Pattern mining). CISpan supports both closed and complete incremental frequent sequence mining, with all kinds of updates to the database. Compared to IncSpan, CISpan tolerates a wide range for minimum support threshold (as low as 2). Our performance study shows that in addition to handling more test cases on which IncSpan fails, CISpan outperforms IncSpan in all test cases which IncSpan could handle, including various sequence length, number of sequences, modification ratio, etc., with an average of 3.4 times speedup. We also tested CISpan's performance on databases transformed from 20 consecutive versions of Linux Kernel source code. On average, CISpan outperforms the non-incremental CloSpan by 42 times.

Original languageEnglish (US)
Title of host publicationSociety for Industrial and Applied Mathematics - 8th SIAM International Conference on Data Mining 2008, Proceedings in Applied Mathematics 130
PublisherSociety for Industrial and Applied Mathematics Publications
Pages84-95
Number of pages12
ISBN (Print)9781605603179
DOIs
StatePublished - 2008
Event8th SIAM International Conference on Data Mining 2008, Applied Mathematics 130 - Atlanta, GA, United States
Duration: Apr 24 2008Apr 26 2008

Publication series

NameSociety for Industrial and Applied Mathematics - 8th SIAM International Conference on Data Mining 2008, Proceedings in Applied Mathematics 130
Volume1

Other

Other8th SIAM International Conference on Data Mining 2008, Applied Mathematics 130
Country/TerritoryUnited States
CityAtlanta, GA
Period4/24/084/26/08

Keywords

  • Cross module mining
  • Frequent pattern
  • Incremental mining
  • Software engineering

ASJC Scopus subject areas

  • Information Systems
  • Software
  • Signal Processing
  • Theoretical Computer Science

Fingerprint

Dive into the research topics of 'CISpan: Comprehensive incremental mining algorithms of closed sequential patterns for multi-versional software mining'. Together they form a unique fingerprint.

Cite this