Making Machine Learning Fast and Interpretable

Risk prediction can help direct treatment and interventions to patients who are most likely to benefit. The oblique random survival forest, a machine learning algorithm for risk prediction, has been used to develop a risk prediction algorithm for heart failure and to identify specific factors for adults who are black or white that drive predicted heart failure risk, with adverse social determinants of health being a major driver for adults who are black. This talk covers these topics and also introduces methods to increase the computational efficiency and interpretability of the oblique random survival forest.

Disseminating Prediction Methods

Statisticians are trained to develop novel statistical techniques that can be used to engage with complex problems. However, we are less likely to receive training in software development. Without efficiently coded algorithms, intuitive documentation, and friendly APIs, the methods we ‘share’ in our R packages may cause frustration and turn potential users away (possibly to a less valid method!). In this talk, I focus on efficiently writing the core algorithms in R packages using Rcpp. I share my experience developing R packages with statistical methods and present four ideas that have made a positive impact on my work.

Happier version control with Git and GitHub

git and GitHub are fantastic tools for version control and collaboration. Data scientists have increasingly used GitHub as a platform for sharing their work and working together thanks to publicly available guides such as Jenny Bryan’s Happy git with R textbook. In this seminar, I walk through the basics of git and GitHub, beginning with the jargon of git and proceeding up through submitting pull requests on GitHub.