ClusterEnG: An interactive educational web resource for clustering and visualizing high-dimensional data

Mohith Manjunath, Yi Zhang, Yeonsung Kim, Steve H. Yeo, Omar Sobh, Nathan Russell, Christian Followell, Colleen Bushell, Umberto Ravaioli, Jun S. Song

Research output: Contribution to journalArticle

Abstract

Background. Clustering is one of the most common techniques in data analysis and seeks to group together data points that are similar in some measure. Although there are many computer programs available for performing clustering, a single web resource that provides several state-of-the-art clustering methods, interactive visualizations and evaluation of clustering results is lacking. Methods. ClusterEnG (acronym for Clustering Engine for Genomics) provides a web interface for clustering data and interactive visualizations including 3D views, data selection and zoom features. Eighteen clustering validation measures are also presented to aid the user in selecting a suitable algorithm for their dataset. ClusterEnG also aims at educating the user about the similarities and differences between various clustering algorithms and provides tutorials that demonstrate potential pitfalls of each algorithm. Conclusions. The web resource will be particularly useful to scientists who are not conversant with computing but want to understand the structure of their data in an intuitive manner. The validation measures facilitate the process of choosing a suitable clustering algorithm among the available options. ClusterEnG is part of a bigger project called KnowEnG (Knowledge Engine for Genomics) and is available at http://education.knoweng.org/clustereng.

Original languageEnglish (US)
Article numbere155
JournalPeerJ Computer Science
Volume2018
Issue number5
DOIs
StatePublished - Jan 1 2018

Keywords

  • Clustering
  • Education
  • Genomics
  • Validation measures
  • Web interface

ASJC Scopus subject areas

  • Computer Science(all)

Fingerprint Dive into the research topics of 'ClusterEnG: An interactive educational web resource for clustering and visualizing high-dimensional data'. Together they form a unique fingerprint.

  • Cite this