Heart of the matter: Discovering the consensus of multiple clustering results

Alex Kosorukoff, Saurabh Sinha

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Clustering is widely used by genomics researchers to discover functional patterns in data. The inherent subjectivity and hardness of the clustering task often lead researchers to explore multiple clustering results of the same data, using different algorithms and parameter settings. This further necessitates a method to automatically summarize multiple clustering results. A natural question to ask about several clustering results is "what is the structure they all have in common?" This work presents a computational method to answer this question. We provide a precise formulation of the problem of computing the consensus of several clusterings, examine its computational complexity and find the problem to be NP-hard. We describe a greedy heuristic to solve the problem, and assess its performance on synthetic data. We demonstrate several applications of this algorithm on genomics data. Our program will be freely available for download.

Original languageEnglish (US)
Title of host publicationProceedings - IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008
Pages155-162
Number of pages8
DOIs
StatePublished - 2008
Event2008 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008 - Philadelphia, PA, United States
Duration: Nov 3 2008Nov 5 2008

Publication series

NameProceedings - IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008

Other

Other2008 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008
Country/TerritoryUnited States
CityPhiladelphia, PA
Period11/3/0811/5/08

ASJC Scopus subject areas

  • Molecular Biology
  • Information Systems
  • Biomedical Engineering

Fingerprint

Dive into the research topics of 'Heart of the matter: Discovering the consensus of multiple clustering results'. Together they form a unique fingerprint.

Cite this