A feature-first approach to clustering for highlighting regions of interest in scientific data

Research output: Contribution to journalConference article

Abstract

We present a clustering algorithm that classifies the points of a dataset by a combination of scalar variables' values as well as spatial locations. How heavily the spatial locations impact the algorithm is a tunable parameter. With no impact the algorithm bins the data by calculating a histogram and classifies each point by a bin ID. With full impact, points are bunched together by spatial neighborhood regardless of value. This approach is unsurprisingly very sensitive to this weighting; a sampling of possible values yields a wide variety of classifications. However, we have found that when tuned just right it is indeed possible to extract meaningful features from the resulting clustering. Furthermore, the principles behind our development of this technique are also applicable in both tuning the algorithm as well as in selecting data regions. In this paper we will provide the details of design and implementation and demonstrate using the auto-tuned approach to extract interesting regions of real scientific data. Our target application is the automatic detection of land cover data anomalies in NASA's Moderate Resolution Imaging Spectroradiometer (MODIS) sensors.

Original languageEnglish (US)
Pages (from-to)2207-2216
Number of pages10
JournalProcedia Computer Science
Volume51
Issue number1
DOIs
StatePublished - Jan 1 2015
EventInternational Conference on Computational Science, ICCS 2002 - Amsterdam, Netherlands
Duration: Apr 21 2002Apr 24 2002

Fingerprint

Bins
Clustering algorithms
NASA
Tuning
Sampling
Imaging techniques
Sensors

Keywords

  • Anomaly detection
  • MODIS
  • Parallel computing
  • Visualization

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

A feature-first approach to clustering for highlighting regions of interest in scientific data. / Sisneros, Robert.

In: Procedia Computer Science, Vol. 51, No. 1, 01.01.2015, p. 2207-2216.

Research output: Contribution to journalConference article

@article{8ca803ffc00c4b4d911ec824acfc8ec8,
title = "A feature-first approach to clustering for highlighting regions of interest in scientific data",
abstract = "We present a clustering algorithm that classifies the points of a dataset by a combination of scalar variables' values as well as spatial locations. How heavily the spatial locations impact the algorithm is a tunable parameter. With no impact the algorithm bins the data by calculating a histogram and classifies each point by a bin ID. With full impact, points are bunched together by spatial neighborhood regardless of value. This approach is unsurprisingly very sensitive to this weighting; a sampling of possible values yields a wide variety of classifications. However, we have found that when tuned just right it is indeed possible to extract meaningful features from the resulting clustering. Furthermore, the principles behind our development of this technique are also applicable in both tuning the algorithm as well as in selecting data regions. In this paper we will provide the details of design and implementation and demonstrate using the auto-tuned approach to extract interesting regions of real scientific data. Our target application is the automatic detection of land cover data anomalies in NASA's Moderate Resolution Imaging Spectroradiometer (MODIS) sensors.",
keywords = "Anomaly detection, MODIS, Parallel computing, Visualization",
author = "Robert Sisneros",
year = "2015",
month = "1",
day = "1",
doi = "10.1016/j.procs.2015.05.497",
language = "English (US)",
volume = "51",
pages = "2207--2216",
journal = "Procedia Computer Science",
issn = "1877-0509",
publisher = "Elsevier BV",
number = "1",

}

TY - JOUR

T1 - A feature-first approach to clustering for highlighting regions of interest in scientific data

AU - Sisneros, Robert

PY - 2015/1/1

Y1 - 2015/1/1

N2 - We present a clustering algorithm that classifies the points of a dataset by a combination of scalar variables' values as well as spatial locations. How heavily the spatial locations impact the algorithm is a tunable parameter. With no impact the algorithm bins the data by calculating a histogram and classifies each point by a bin ID. With full impact, points are bunched together by spatial neighborhood regardless of value. This approach is unsurprisingly very sensitive to this weighting; a sampling of possible values yields a wide variety of classifications. However, we have found that when tuned just right it is indeed possible to extract meaningful features from the resulting clustering. Furthermore, the principles behind our development of this technique are also applicable in both tuning the algorithm as well as in selecting data regions. In this paper we will provide the details of design and implementation and demonstrate using the auto-tuned approach to extract interesting regions of real scientific data. Our target application is the automatic detection of land cover data anomalies in NASA's Moderate Resolution Imaging Spectroradiometer (MODIS) sensors.

AB - We present a clustering algorithm that classifies the points of a dataset by a combination of scalar variables' values as well as spatial locations. How heavily the spatial locations impact the algorithm is a tunable parameter. With no impact the algorithm bins the data by calculating a histogram and classifies each point by a bin ID. With full impact, points are bunched together by spatial neighborhood regardless of value. This approach is unsurprisingly very sensitive to this weighting; a sampling of possible values yields a wide variety of classifications. However, we have found that when tuned just right it is indeed possible to extract meaningful features from the resulting clustering. Furthermore, the principles behind our development of this technique are also applicable in both tuning the algorithm as well as in selecting data regions. In this paper we will provide the details of design and implementation and demonstrate using the auto-tuned approach to extract interesting regions of real scientific data. Our target application is the automatic detection of land cover data anomalies in NASA's Moderate Resolution Imaging Spectroradiometer (MODIS) sensors.

KW - Anomaly detection

KW - MODIS

KW - Parallel computing

KW - Visualization

UR - http://www.scopus.com/inward/record.url?scp=84939153231&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84939153231&partnerID=8YFLogxK

U2 - 10.1016/j.procs.2015.05.497

DO - 10.1016/j.procs.2015.05.497

M3 - Conference article

AN - SCOPUS:84939153231

VL - 51

SP - 2207

EP - 2216

JO - Procedia Computer Science

JF - Procedia Computer Science

SN - 1877-0509

IS - 1

ER -