BIDE: Efficient mining of frequent closed sequences

Jianyong Wang, Jiawei Han

Research output: Contribution to conferencePaper

Abstract

Previous studies have presented convincing arguments that a frequent pattern mining algorithm should not mine all frequent patterns but only the closed ones because the latter leads to not only more compact yet complete result set but also better efficiency. However, most of the previously developed closed pattern mining algorithms work under the candidate maintenance-and-test paradigm which is inherently costly in both runtime and space usage when the support threshold is low or the patterns become long. In this paper, we present, BIDE, an efficient algorithm for mining frequent closed sequences without candidate maintenance. It adopts a novel sequence closure checking scheme called BI-Directional Extension, and prunes the search space more deeply compared to the previous algorithms by using the BackScan pruning method and the Scan-Skip optimization technique. A thorough performance study with both sparse and dense real-life data sets has demonstrated that BIDE significantly outperforms the previous algorithms: it consumes order(s) of magnitude less memory and can be more than an order of magnitude faster. It is also linearly scalable in terms of database size.

Original languageEnglish (US)
Pages79-90
Number of pages12
StatePublished - Jun 1 2004
EventProceedings - 20th International Conference on Data Engineering - ICDE 2004 - Boston, MA., United States
Duration: Mar 30 2004Apr 2 2004

Other

OtherProceedings - 20th International Conference on Data Engineering - ICDE 2004
CountryUnited States
CityBoston, MA.
Period3/30/044/2/04

Fingerprint

Data storage equipment

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Information Systems

Cite this

Wang, J., & Han, J. (2004). BIDE: Efficient mining of frequent closed sequences. 79-90. Paper presented at Proceedings - 20th International Conference on Data Engineering - ICDE 2004, Boston, MA., United States.

BIDE : Efficient mining of frequent closed sequences. / Wang, Jianyong; Han, Jiawei.

2004. 79-90 Paper presented at Proceedings - 20th International Conference on Data Engineering - ICDE 2004, Boston, MA., United States.

Research output: Contribution to conferencePaper

Wang, J & Han, J 2004, 'BIDE: Efficient mining of frequent closed sequences', Paper presented at Proceedings - 20th International Conference on Data Engineering - ICDE 2004, Boston, MA., United States, 3/30/04 - 4/2/04 pp. 79-90.
Wang J, Han J. BIDE: Efficient mining of frequent closed sequences. 2004. Paper presented at Proceedings - 20th International Conference on Data Engineering - ICDE 2004, Boston, MA., United States.
Wang, Jianyong ; Han, Jiawei. / BIDE : Efficient mining of frequent closed sequences. Paper presented at Proceedings - 20th International Conference on Data Engineering - ICDE 2004, Boston, MA., United States.12 p.
@conference{7f1683a2f55a47f9ba2535050fa2f637,
title = "BIDE: Efficient mining of frequent closed sequences",
abstract = "Previous studies have presented convincing arguments that a frequent pattern mining algorithm should not mine all frequent patterns but only the closed ones because the latter leads to not only more compact yet complete result set but also better efficiency. However, most of the previously developed closed pattern mining algorithms work under the candidate maintenance-and-test paradigm which is inherently costly in both runtime and space usage when the support threshold is low or the patterns become long. In this paper, we present, BIDE, an efficient algorithm for mining frequent closed sequences without candidate maintenance. It adopts a novel sequence closure checking scheme called BI-Directional Extension, and prunes the search space more deeply compared to the previous algorithms by using the BackScan pruning method and the Scan-Skip optimization technique. A thorough performance study with both sparse and dense real-life data sets has demonstrated that BIDE significantly outperforms the previous algorithms: it consumes order(s) of magnitude less memory and can be more than an order of magnitude faster. It is also linearly scalable in terms of database size.",
author = "Jianyong Wang and Jiawei Han",
year = "2004",
month = "6",
day = "1",
language = "English (US)",
pages = "79--90",
note = "Proceedings - 20th International Conference on Data Engineering - ICDE 2004 ; Conference date: 30-03-2004 Through 02-04-2004",

}

TY - CONF

T1 - BIDE

T2 - Efficient mining of frequent closed sequences

AU - Wang, Jianyong

AU - Han, Jiawei

PY - 2004/6/1

Y1 - 2004/6/1

N2 - Previous studies have presented convincing arguments that a frequent pattern mining algorithm should not mine all frequent patterns but only the closed ones because the latter leads to not only more compact yet complete result set but also better efficiency. However, most of the previously developed closed pattern mining algorithms work under the candidate maintenance-and-test paradigm which is inherently costly in both runtime and space usage when the support threshold is low or the patterns become long. In this paper, we present, BIDE, an efficient algorithm for mining frequent closed sequences without candidate maintenance. It adopts a novel sequence closure checking scheme called BI-Directional Extension, and prunes the search space more deeply compared to the previous algorithms by using the BackScan pruning method and the Scan-Skip optimization technique. A thorough performance study with both sparse and dense real-life data sets has demonstrated that BIDE significantly outperforms the previous algorithms: it consumes order(s) of magnitude less memory and can be more than an order of magnitude faster. It is also linearly scalable in terms of database size.

AB - Previous studies have presented convincing arguments that a frequent pattern mining algorithm should not mine all frequent patterns but only the closed ones because the latter leads to not only more compact yet complete result set but also better efficiency. However, most of the previously developed closed pattern mining algorithms work under the candidate maintenance-and-test paradigm which is inherently costly in both runtime and space usage when the support threshold is low or the patterns become long. In this paper, we present, BIDE, an efficient algorithm for mining frequent closed sequences without candidate maintenance. It adopts a novel sequence closure checking scheme called BI-Directional Extension, and prunes the search space more deeply compared to the previous algorithms by using the BackScan pruning method and the Scan-Skip optimization technique. A thorough performance study with both sparse and dense real-life data sets has demonstrated that BIDE significantly outperforms the previous algorithms: it consumes order(s) of magnitude less memory and can be more than an order of magnitude faster. It is also linearly scalable in terms of database size.

UR - http://www.scopus.com/inward/record.url?scp=2442446148&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=2442446148&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:2442446148

SP - 79

EP - 90

ER -