Skip to Main Content
1,085
Views
164
CrossRef citations to date
Altmetric

Theory and Methods

High-Dimensional Variable Selection for Survival Data

, , , &
Pages 205-217
Received 01 Nov 2008
Published online: 01 Jan 2012
 
Translator disclaimer

The minimal depth of a maximal subtree is a dimensionless order statistic measuring the predictiveness of a variable in a survival tree. We derive the distribution of the minimal depth and use it for high-dimensional variable selection using random survival forests. In big p and small n problems (where p is the dimension and n is the sample size), the distribution of the minimal depth reveals a “ceiling effect” in which a tree simply cannot be grown deep enough to properly identify predictive variables. Motivated by this limitation, we develop a new regularized algorithm, termed RSF-Variable Hunting. This algorithm exploits maximal subtrees for effective variable selection under such scenarios. Several applications are presented demonstrating the methodology, including the problem of gene selection using microarray data. In this work we focus only on survival settings, although our methodology also applies to other random forests applications, including regression and classification settings. All examples presented here use the R-software package randomSurvivalForest.