Abstract
The goal of data mining algorithm is to discover useful information embedded in large databases. Frequent itemset mining and sequential pattern mining are two important data mining problems with broad applications. Perhaps the most efficient way to solve these problems sequentially is to apply a pattern-growth algorithm, which is a divide-and-conquer algorithm [9, 10]. In this paper, we present a framework for parallel mining frequent itemsets and sequential patterns based on the divide-and-conquer strategy of pattern growth. Then, we discuss the load balancing problem and introduce a sampling technique, called selective sampling, to address this problem. We implemented parallel versions of both frequent iternsets and sequential pattern mining algorithms following our framework. The experimental results show that our parallel algorithms usually achieve excellent speedups.
Original language | English (US) |
---|---|
Pages | 255-265 |
Number of pages | 11 |
DOIs | |
State | Published - 2005 |
Event | 2005 ACM SIGPLAN Symposium on Principles and Practise of Parallel Programming, PROPP 05 - Chicago, IL, United States Duration: Jun 15 2005 → Jun 17 2005 |
Conference
Conference | 2005 ACM SIGPLAN Symposium on Principles and Practise of Parallel Programming, PROPP 05 |
---|---|
Country/Territory | United States |
City | Chicago, IL |
Period | 6/15/05 → 6/17/05 |
Keywords
- Data mining
- Load balancing
- Parallel algorithms
- Sampling
ASJC Scopus subject areas
- General Computer Science