High-performance commercial data mining: A multistrategy machine learning application

William H. Hsu, Michael Welge, Tom Redman, David Clutter

Research output: Contribution to journalReview article

Abstract

We present an application of inductive concept learning and interactive visualization techniques to a large-scale commercial data mining project. This paper focuses on design and configuration of high-level optimization systems (wrappers) for relevance determination and constructive induction, and on integrating these wrappers with elicited knowledge on attribute relevance and synthesis. In particular, we discuss decision support issues for the application (cost prediction for automobile insurance markets in several states) and report experiments using D2K, a Java-based visual programming system for data mining and information visualization, and several commercial and research tools. We describe exploratory clustering, descriptive statistics, and supervised decision tree learning in this application, focusing on a parallel genetic algorithm (GA) system, Jenesis, which is used to implement relevance determination (attribute subset selection). Deployed on several high-performance network-of-workstation systems (Beowulf clusters), Jenesis achieves a linear speedup, due to a high degree of task parallelism. Its test set accuracy is significantly higher than that of decision tree inducers alone and is comparable to that of the best extant search-space based wrappers.

Original languageEnglish (US)
Pages (from-to)361-391
Number of pages31
JournalData Mining and Knowledge Discovery
Volume6
Issue number4
DOIs
StatePublished - Dec 1 2002

Keywords

  • Constructive induction
  • Genetic algorithms
  • Real-world decision support applications
  • Relevance determination
  • Scalable high-performance computing
  • Software development environments for knowledge discovery in databases (KDD)

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Computer Networks and Communications

Fingerprint Dive into the research topics of 'High-performance commercial data mining: A multistrategy machine learning application'. Together they form a unique fingerprint.

  • Cite this