Optimal partitioning of a data set based on the p-median model

Michael J. Brusco, Hans Friedrich Köhn

Research output: Contribution to journalArticle

Abstract

Although the K-means algorithm for minimizing the within-cluster sums of squared deviations from cluster centroids is perhaps the most common method for applied cluster analyses, a variety of other criteria are available. The p-median model is an especially well-studied clustering problem that requires the selection of p objects to serve as cluster centers. The objective is to choose the cluster centers such that the sum of the Euclidean distances (or some other dissimilarity measure) of objects assigned to each center is minimized. Using 12 data sets from the literature, we demonstrate that a three-stage procedure consisting of a greedy heuristic, Lagrangian relaxation, and a branch-and-bound algorithm can produce globally optimal solutions for p-median problems of nontrivial size (several hundred objects, five or more variables, and up to 10 clusters). We also report the results of an application of the p-median model to an empirical data set from the telecommunications industry.

Original languageEnglish (US)
Pages (from-to)89-105
Number of pages17
JournalPsychometrika
Volume73
Issue number1
DOIs
StatePublished - Mar 1 2008
Externally publishedYes

Fingerprint

P-median
Cluster Analysis
Partitioning
Telecommunications
Telecommunication industry
Industry
Model
P-median Problem
Dissimilarity Measure
Greedy Heuristics
Lagrangian Relaxation
K-means Algorithm
Branch and Bound Algorithm
Euclidean Distance
Centroid
Datasets
Deviation
Optimal Solution
Choose
Clustering

Keywords

  • Branch and bound
  • Cluster analysis
  • Combinatorial data analysis
  • Heuristics
  • Lagrangian relaxation
  • P-median problem

ASJC Scopus subject areas

  • Psychology(all)
  • Applied Mathematics

Cite this

Optimal partitioning of a data set based on the p-median model. / Brusco, Michael J.; Köhn, Hans Friedrich.

In: Psychometrika, Vol. 73, No. 1, 01.03.2008, p. 89-105.

Research output: Contribution to journalArticle

@article{faeca2ccc32c4aad81f3ae2917cf5f57,
title = "Optimal partitioning of a data set based on the p-median model",
abstract = "Although the K-means algorithm for minimizing the within-cluster sums of squared deviations from cluster centroids is perhaps the most common method for applied cluster analyses, a variety of other criteria are available. The p-median model is an especially well-studied clustering problem that requires the selection of p objects to serve as cluster centers. The objective is to choose the cluster centers such that the sum of the Euclidean distances (or some other dissimilarity measure) of objects assigned to each center is minimized. Using 12 data sets from the literature, we demonstrate that a three-stage procedure consisting of a greedy heuristic, Lagrangian relaxation, and a branch-and-bound algorithm can produce globally optimal solutions for p-median problems of nontrivial size (several hundred objects, five or more variables, and up to 10 clusters). We also report the results of an application of the p-median model to an empirical data set from the telecommunications industry.",
keywords = "Branch and bound, Cluster analysis, Combinatorial data analysis, Heuristics, Lagrangian relaxation, P-median problem",
author = "Brusco, {Michael J.} and K{\"o}hn, {Hans Friedrich}",
year = "2008",
month = "3",
day = "1",
doi = "10.1007/s11336-007-9021-4",
language = "English (US)",
volume = "73",
pages = "89--105",
journal = "Psychometrika",
issn = "0033-3123",
publisher = "Springer New York",
number = "1",

}

TY - JOUR

T1 - Optimal partitioning of a data set based on the p-median model

AU - Brusco, Michael J.

AU - Köhn, Hans Friedrich

PY - 2008/3/1

Y1 - 2008/3/1

N2 - Although the K-means algorithm for minimizing the within-cluster sums of squared deviations from cluster centroids is perhaps the most common method for applied cluster analyses, a variety of other criteria are available. The p-median model is an especially well-studied clustering problem that requires the selection of p objects to serve as cluster centers. The objective is to choose the cluster centers such that the sum of the Euclidean distances (or some other dissimilarity measure) of objects assigned to each center is minimized. Using 12 data sets from the literature, we demonstrate that a three-stage procedure consisting of a greedy heuristic, Lagrangian relaxation, and a branch-and-bound algorithm can produce globally optimal solutions for p-median problems of nontrivial size (several hundred objects, five or more variables, and up to 10 clusters). We also report the results of an application of the p-median model to an empirical data set from the telecommunications industry.

AB - Although the K-means algorithm for minimizing the within-cluster sums of squared deviations from cluster centroids is perhaps the most common method for applied cluster analyses, a variety of other criteria are available. The p-median model is an especially well-studied clustering problem that requires the selection of p objects to serve as cluster centers. The objective is to choose the cluster centers such that the sum of the Euclidean distances (or some other dissimilarity measure) of objects assigned to each center is minimized. Using 12 data sets from the literature, we demonstrate that a three-stage procedure consisting of a greedy heuristic, Lagrangian relaxation, and a branch-and-bound algorithm can produce globally optimal solutions for p-median problems of nontrivial size (several hundred objects, five or more variables, and up to 10 clusters). We also report the results of an application of the p-median model to an empirical data set from the telecommunications industry.

KW - Branch and bound

KW - Cluster analysis

KW - Combinatorial data analysis

KW - Heuristics

KW - Lagrangian relaxation

KW - P-median problem

UR - http://www.scopus.com/inward/record.url?scp=41449106691&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=41449106691&partnerID=8YFLogxK

U2 - 10.1007/s11336-007-9021-4

DO - 10.1007/s11336-007-9021-4

M3 - Article

AN - SCOPUS:41449106691

VL - 73

SP - 89

EP - 105

JO - Psychometrika

JF - Psychometrika

SN - 0033-3123

IS - 1

ER -