Random Forests for Survival Analysis and High-Dimensional Data

Ruoqing Zhu, Sarah E. Formentini, Yifan Cui

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

One of the most commonly encountered problems in biomedical studies is analyzing censored survival data. Survival analysis differs from standard regression problems by one central feature: the event of interest may not be fully observed. Therefore, statistical methods used to analyze this data must be adapted to handle the missing information. In this chapter, we provide a brief introduction of right-censored survival data and introduce survival random forest models for analyzing them. Random forests are among the most popular machine learning algorithms. During the past decade, they have seen tremendous success in biomedical studies for prediction and decision-making. In addition to the statistical formulation, we also provide details of tuning parameters commonly considered in practice. An analysis example of breast cancer relapse free survival data is used as a demonstration. We further introduce the variable importance measure that serves as a variable selection tool in high-dimensional analysis. These examples are carried out using a newly developed R package RLT, which is available on GitHub.

Original languageEnglish (US)
Title of host publicationSpringer Handbooks
PublisherSpringer
Pages831-847
Number of pages17
DOIs
StatePublished - 2023

Publication series

NameSpringer Handbooks
ISSN (Print)2522-8692
ISSN (Electronic)2522-8706

Keywords

  • Brier score
  • C-index
  • High-dimensional data
  • Random forests
  • Random survival forest
  • Right censoring
  • Survival analysis
  • Tree estimator
  • Variable importance
  • Variable selection

ASJC Scopus subject areas

  • General

Fingerprint

Dive into the research topics of 'Random Forests for Survival Analysis and High-Dimensional Data'. Together they form a unique fingerprint.

Cite this