aorsf
accelerated oblique random survival forests
By Byron C Jaeger in Machine learning survival analysis R package
July 30, 2022





aorsf provides optimized software to fit, interpret, and make predictions with oblique random survival forests (ORSFs).
Why aorsf?
-
over 400 times faster than
obliqueRSF
. -
accurate predictions for time-to-event outcomes.
-
negation importance, a novel technique to estimate variable importance for ORSFs.
-
intuitive API with formula based interface.
-
extensive input checks + informative error messages.
Installation
You can install the development version of aorsf from GitHub with:
# install.packages("remotes")
remotes::install_github("bcjaeger/aorsf")
Example
The orsf()
function is used to fit ORSFs. Printing the output from
orsf()
will give some descriptive statistics about the ensemble.
library(aorsf)
fit <- orsf(data = pbc_orsf,
formula = Surv(time, status) ~ . - id)
print(fit)
#> ---------- Oblique random survival forest
#>
#> N observations: 276
#> N events: 111
#> N trees: 500
#> N predictors total: 17
#> N predictors per node: 5
#> Average leaves per tree: 24
#> Min observations in leaf: 5
#> Min events in leaf: 1
#> OOB stat value: 0.84
#> OOB stat type: Harrell's C-statistic
#>
#> -----------------------------------------
How about interpreting the fit?
-
use
orsf_vi_negate()
andorsf_vi_anova()
for variable importanceorsf_vi_negate(fit) #> bili age protime ascites albumin #> 0.0129714524 0.0125026047 0.0085955407 0.0063554907 0.0061992082 #> spiders sex copper edema ast #> 0.0057303605 0.0049489477 0.0044280058 0.0027783566 0.0018753907 #> hepato trig stage alk.phos platelet #> 0.0018232965 0.0016670140 0.0005209419 -0.0006772244 -0.0008856012 #> chol trt #> -0.0016670140 -0.0025526151
-
use
orsf_pd_ice()
ororsf_pd_summary()
for individual or aggregated partial dependence values.orsf_pd_summary(fit, pd_spec = list(bili = c(1:5))) #> bili mean lwr medn upr #> 1: 1 0.2376652 0.01286480 0.1331328 0.8669733 #> 2: 2 0.2904995 0.04154464 0.1889294 0.8943053 #> 3: 3 0.3441193 0.06285072 0.2538335 0.9146453 #> 4: 4 0.3973732 0.10091026 0.3247072 0.9271855 #> 5: 5 0.4424865 0.14005297 0.3807676 0.9315099
-
use
orsf_summarize_uni()
to show the top predictor variables in an ORSF model and the expected predicted risk at specific values of those predictors. (The term ‘uni’ is short for univariate.)# take a look at the top 5 variables # for continuous predictors, see expected risk at 25/50/75th quantile # for categorical predictors, see expected risk in specified category orsf_summarize_uni(object = fit, n_variables = 5) #> #> -- bili (VI Rank: 1) --------------------------- #> #> |---------------- risk ----------------| #> Value Mean Median 25th % 75th % #> 0.80 0.2337275 0.1267164 0.04190426 0.3716894 #> 1.40 0.2526511 0.1452163 0.05476829 0.3998048 #> 3.52 0.3730196 0.2841226 0.16033444 0.5673070 #> #> -- age (VI Rank: 2) ---------------------------- #> #> |---------------- risk ----------------| #> Value Mean Median 25th % 75th % #> 41.5 0.2736650 0.1332843 0.04270990 0.4575449 #> 49.7 0.2993351 0.1656678 0.04910598 0.5203980 #> 56.6 0.3314766 0.2113831 0.06956163 0.5472662 #> #> -- protime (VI Rank: 3) ------------------------ #> #> |---------------- risk ----------------| #> Value Mean Median 25th % 75th % #> 10.0 0.2815397 0.1430703 0.04704506 0.4987604 #> 10.6 0.2947893 0.1526181 0.05252053 0.5393637 #> 11.2 0.3176024 0.1831842 0.06925762 0.5566000 #> #> -- ascites (VI Rank: 4) ------------------------ #> #> |---------------- risk ----------------| #> Value Mean Median 25th % 75th % #> 0 0.2954662 0.1462719 0.04778412 0.5409697 #> 1 0.4597940 0.3742798 0.25999819 0.6551916 #> #> -- albumin (VI Rank: 5) ------------------------ #> #> |---------------- risk ----------------| #> Value Mean Median 25th % 75th % #> 3.31 0.3176012 0.1789175 0.05525473 0.5785138 #> 3.54 0.2934144 0.1527704 0.04236004 0.5225606 #> 3.77 0.2791547 0.1412617 0.04199867 0.4937002 #> #> Predicted risk at time t = 1788 for top 5 predictors
References
Byron C. Jaeger, D. Leann Long, Dustin M. Long, Mario Sims, Jeff M. Szychowski, Yuan-I Min, Leslie A. Mcclure, George Howard, Noah Simon (2019). Oblique Random Survival Forests. Ann. Appl. Stat. 13(3): 1847-1883. URL https://doi.org/10.1214/19-AOAS1261 DOI: 10.1214/19-AOAS1261
Funding
The developers of aorsf
receive financial support from the Center for
Biomedical Informatics, Wake Forest University School of Medicine. We
also receive support from the National Center for Advancing
Translational Sciences of the National Institutes of Health under Award
Number UL1TR001420.
The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.