Cancer classification using gene expression data

Ying Lu, Jiawei Han

Research output: Contribution to journalArticlepeer-review


The classification of different tumor types is of great importance in cancer diagnosis and drug discovery. However. most previous cancer classification studies are clinical based and have limited diagnostic ability. Cancer classification using gene expression data is known to contain the keys for addressing the fundamental problems relating to cancer diagnosis and drug discovery. The recent advent of DNA microarray technique has made simultaneous monitoring of thousands of gene expressions possible. With this abundance of gene expression data, researchers have started to explore the possibilities of cancer classification using gene expression data. Quite a number of methods have been proposed in recent years with promising results. But there are still a lot of issues which need to be addressed and understood. In order to gain a deep insight into the cancer classification problem, it is necessary to take a closer look at the problem, the proposed solutions and the related issues all together. In this survey paper, we present a comprehensive overview of various proposed cancer classification methods and evaluate them based on their computation time, classification accuracy and ability to reveal biologically meaningful gene information. We also introduce and evaluate various proposed gene selection methods which we believe should be an integral preprocessing step for cancer classification. In order to obtain a full picture of cancer classification, we also discuss several issues related to cancer classification, including the biological significance vs. statistical significance of a cancer classifier, the asymmetrical classification errors for cancer classifiers, and the gene contamination problem.

Original languageEnglish (US)
Pages (from-to)243-268
Number of pages26
JournalInformation Systems
Issue number4
StatePublished - Jun 2003


  • Cancer classification
  • Gene expression data

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Hardware and Architecture


Dive into the research topics of 'Cancer classification using gene expression data'. Together they form a unique fingerprint.

Cite this