Abstract

Discovery of sequential patterns is an essential data mining task with broad applications. Among several variations of sequential patterns, closed sequential pattern is the most useful one since it retains all the information of the complete pattern set but is often much more compact than it. Unfortunately, there is no parallel closed sequential pattern mining method proposed yet. In this paper we develop an algorithm, called Par-CSP (Parallel Closed Sequential Pattern mining), to conduct parallel mining of closed sequential patterns on a distributed memory system. Par-CSP partitions the work among the processors by exploiting the divide-and-conquer property so that the overhead of interprocessor communication is minimized. Par-CSP applies dynamic scheduling to avoid processor idling. Moreover, it employs a technique, called selective sampling, to address the load imbalance problem. We implement Par-CSP using MPI on a 64-node Linux cluster. Our experimental results show that Par-CSP attains good parallelization efficiencies on various input datasets.

Original languageEnglish (US)
Pages562-567
Number of pages6
DOIs
StatePublished - Dec 1 2005
EventKDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - Chicago, IL, United States
Duration: Aug 21 2005Aug 24 2005

Other

OtherKDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
CountryUnited States
CityChicago, IL
Period8/21/058/24/05

Fingerprint

Parallel algorithms
Data mining
Scheduling
Sampling
Data storage equipment
Communication
Linux

Keywords

  • Load balancing
  • Parallel algorithms
  • Sampling

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this

Cong, S., Han, J., & Padua, D. (2005). Parallel mining of closed sequential patterns. 562-567. Paper presented at KDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, United States. https://doi.org/10.1145/1081870.1081937

Parallel mining of closed sequential patterns. / Cong, Shengnan; Han, Jiawei; Padua, David.

2005. 562-567 Paper presented at KDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, United States.

Research output: Contribution to conferencePaper

Cong, S, Han, J & Padua, D 2005, 'Parallel mining of closed sequential patterns', Paper presented at KDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, United States, 8/21/05 - 8/24/05 pp. 562-567. https://doi.org/10.1145/1081870.1081937
Cong S, Han J, Padua D. Parallel mining of closed sequential patterns. 2005. Paper presented at KDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, United States. https://doi.org/10.1145/1081870.1081937
Cong, Shengnan ; Han, Jiawei ; Padua, David. / Parallel mining of closed sequential patterns. Paper presented at KDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, United States.6 p.
@conference{15449a9a6f32406cb62f8536c117906e,
title = "Parallel mining of closed sequential patterns",
abstract = "Discovery of sequential patterns is an essential data mining task with broad applications. Among several variations of sequential patterns, closed sequential pattern is the most useful one since it retains all the information of the complete pattern set but is often much more compact than it. Unfortunately, there is no parallel closed sequential pattern mining method proposed yet. In this paper we develop an algorithm, called Par-CSP (Parallel Closed Sequential Pattern mining), to conduct parallel mining of closed sequential patterns on a distributed memory system. Par-CSP partitions the work among the processors by exploiting the divide-and-conquer property so that the overhead of interprocessor communication is minimized. Par-CSP applies dynamic scheduling to avoid processor idling. Moreover, it employs a technique, called selective sampling, to address the load imbalance problem. We implement Par-CSP using MPI on a 64-node Linux cluster. Our experimental results show that Par-CSP attains good parallelization efficiencies on various input datasets.",
keywords = "Load balancing, Parallel algorithms, Sampling",
author = "Shengnan Cong and Jiawei Han and David Padua",
year = "2005",
month = "12",
day = "1",
doi = "10.1145/1081870.1081937",
language = "English (US)",
pages = "562--567",
note = "KDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining ; Conference date: 21-08-2005 Through 24-08-2005",

}

TY - CONF

T1 - Parallel mining of closed sequential patterns

AU - Cong, Shengnan

AU - Han, Jiawei

AU - Padua, David

PY - 2005/12/1

Y1 - 2005/12/1

N2 - Discovery of sequential patterns is an essential data mining task with broad applications. Among several variations of sequential patterns, closed sequential pattern is the most useful one since it retains all the information of the complete pattern set but is often much more compact than it. Unfortunately, there is no parallel closed sequential pattern mining method proposed yet. In this paper we develop an algorithm, called Par-CSP (Parallel Closed Sequential Pattern mining), to conduct parallel mining of closed sequential patterns on a distributed memory system. Par-CSP partitions the work among the processors by exploiting the divide-and-conquer property so that the overhead of interprocessor communication is minimized. Par-CSP applies dynamic scheduling to avoid processor idling. Moreover, it employs a technique, called selective sampling, to address the load imbalance problem. We implement Par-CSP using MPI on a 64-node Linux cluster. Our experimental results show that Par-CSP attains good parallelization efficiencies on various input datasets.

AB - Discovery of sequential patterns is an essential data mining task with broad applications. Among several variations of sequential patterns, closed sequential pattern is the most useful one since it retains all the information of the complete pattern set but is often much more compact than it. Unfortunately, there is no parallel closed sequential pattern mining method proposed yet. In this paper we develop an algorithm, called Par-CSP (Parallel Closed Sequential Pattern mining), to conduct parallel mining of closed sequential patterns on a distributed memory system. Par-CSP partitions the work among the processors by exploiting the divide-and-conquer property so that the overhead of interprocessor communication is minimized. Par-CSP applies dynamic scheduling to avoid processor idling. Moreover, it employs a technique, called selective sampling, to address the load imbalance problem. We implement Par-CSP using MPI on a 64-node Linux cluster. Our experimental results show that Par-CSP attains good parallelization efficiencies on various input datasets.

KW - Load balancing

KW - Parallel algorithms

KW - Sampling

UR - http://www.scopus.com/inward/record.url?scp=32344453032&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=32344453032&partnerID=8YFLogxK

U2 - 10.1145/1081870.1081937

DO - 10.1145/1081870.1081937

M3 - Paper

AN - SCOPUS:32344453032

SP - 562

EP - 567

ER -