PIC: Partitioned iterative convergence for clusters

Reza Farivar, Anand Raghunathan, Srimat Chakradhar, Harshit Kharbanda, R H Campbell

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Iterative-convergence algorithms are frequently used in a variety of domains to build models from large data sets. Cluster implementations of these algorithms are commonly realized using parallel programming models such as MapReduce. However, these implementations suffer from significant performance bottlenecks, especially due to large volumes of network traffic resulting from intermediate data and model updates during the iterations. To address these challenges, we propose partitioned iterative convergence (PIC), a new approach to programming and executing iterative convergence algorithms on frameworks like MapReduce. In PIC, we execute the iterative-convergence computation in two phases - the best-effort phase, which quickly produces a good initial model and the top-off phase, which further refines this model to produce the final solution. The best-effort phase iteratively performs the following steps: (a) partition the input data and the model to create several smaller, model-building sub-problems, (b) independently solve these sub-problems using iterative convergence computations, and (c) merge solutions of the sub-problems to create the next version of the model. This partitioned, loosely coupled execution of the computation produces a model of good quality, while drastically reducing network traffic due to intermediate data and model updates. The top-off phase further refines this model by employing the original iterative-convergence computation on the entire (un-partitioned) problem until convergence. However, the number of iterations executed in the top-off phase is quite small, resulting in a significant overall improvement in performance. We have implemented a library for PIC on top of the Hadoop MapReduce framework, and evaluated it using five popular iterative-convergence algorithms (Page Rank, K-Means clustering, neural network training, linear equation solver and image smoothing). Our evaluations on clusters ranging from 6 nodes to 256 nodes demonstrate a 2.5X-4X speedup compared to conventional implementations using Hadoop.

Original languageEnglish (US)
Title of host publicationProceedings - 2012 IEEE International Conference on Cluster Computing, CLUSTER 2012
PublisherIEEE Computer Society
Pages391-401
Number of pages11
ISBN (Print)9780768548074
DOIs
StatePublished - Jan 1 2012
Event2012 IEEE International Conference on Cluster Computing, CLUSTER 2012 - Beijing, China
Duration: Sep 24 2012Sep 28 2012

Publication series

NameProceedings - 2012 IEEE International Conference on Cluster Computing, CLUSTER 2012

Other

Other2012 IEEE International Conference on Cluster Computing, CLUSTER 2012
CountryChina
CityBeijing
Period9/24/129/28/12

    Fingerprint

ASJC Scopus subject areas

  • Software

Cite this

Farivar, R., Raghunathan, A., Chakradhar, S., Kharbanda, H., & Campbell, R. H. (2012). PIC: Partitioned iterative convergence for clusters. In Proceedings - 2012 IEEE International Conference on Cluster Computing, CLUSTER 2012 (pp. 391-401). [6337802] (Proceedings - 2012 IEEE International Conference on Cluster Computing, CLUSTER 2012). IEEE Computer Society. https://doi.org/10.1109/CLUSTER.2012.84