Repairing fractures between data using genetic programming-based feature extraction: A case study in cancer diagnosis

Jose G. Moreno-Torres, Xavier Llorà, David E. Goldberg, Rohit Bhargava

Research output: Contribution to journalArticlepeer-review

Abstract

There is an underlying assumption on most model building processes: given a learned classifier, it should be usable to explain unseen data from the same given problem. Despite this seemingly reasonable assumption, when dealing with biological data it tends to fail; where classifiers built out of data generated using the same protocols in two different laboratories can lead to two different, non-interchangeable, classifiers. There are usually too many uncontrollable variables in the process of generating data in the lab and biological variations, and small differences can lead to very different data distributions, with a fracture between data. This paper presents a genetics-based machine learning approach that performs feature extraction on data from a lab to help increase the classification performance of an existing classifier that was built using the data from a different laboratory which uses the same protocols, while learning about the shape of the fractures between data that motivated the bad behavior. The experimental analysis over benchmark problems together with a real-world problem on prostate cancer diagnosis show the good behavior of the proposed algorithm.

Original languageEnglish (US)
Pages (from-to)805-823
Number of pages19
JournalInformation Sciences
Volume222
DOIs
StatePublished - Feb 10 2013

Keywords

  • Biological data
  • Cancer diagnosis
  • Different laboratories
  • Feature extraction
  • Fractures between data
  • Genetic programming

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Theoretical Computer Science
  • Computer Science Applications
  • Information Systems and Management

Fingerprint

Dive into the research topics of 'Repairing fractures between data using genetic programming-based feature extraction: A case study in cancer diagnosis'. Together they form a unique fingerprint.

Cite this