Last modified: 2024-05-14
Abstract
This work is centered on the application of survival analysis to a high-dimensional dataset provided by the Gregorio Marañón Health Research Institute. The dataset contains clinical and genetic information from patients with triple-negative breast cancer (TNBC), a type of cancer known for its aggressive nature and low survival rates. The patients in the dataset have been treated with a specific type of chemotherapy, and their survival time is measured from the beginning of the treatment until death.
The main objective of this work is to classify variables (genetic and clinical) based on their influence on the survival of these patients. To achieve this, we assess the effectiveness of Cox regression models (Cox, 1972) in the context of high-dimensional data and high proportion of censure. Dimensionality reduction techniques are crucial for model interpretability and predictive accuracy in this context. Two regularization techniques, the lasso penalty (Tibshirani, 1997) and the adaptive lasso penalty (Zou, 2006), are evaluated. The main contributions of this work are the proposal of different adaptive weight calculation methods for the adaptive lasso, and a new procedure for finding the best model or variable selection for Cox regression.