Assessing Multiple Myeloma Risk: Differential Expression Analysis
Abstract
Multiple myeloma is an incurable blood cancer affecting more than 20,000 Americans each year. Sub-classification of heterogeneous subsets of the disease under the broad umbrella of multiple myeloma is likely to increase efficacy of treatments. Our study aimed to analyze an open-access transcriptomics dataset to elucidate how gene expression can predict health outcomes in multiple myeloma, principally focusing on overall survival and physical ability. The Multiple Myeloma Research Foundation’s CoMMpass dataset utilized RNA sequencing (RNA-Seq) to obtain abundance counts for each of over 20,000 genes expressed in bone marrow. We extracted the 13,000 most abundantly expressed genes at time of diagnosis in 767 patients for whom medication history, time to death, and numerous clinical measures were also available. After normalization and quality control, we performed differential expression analysis as well as one-way ANOVA to correlate expression values with clinical factors. Finally, we constructed a logistic regression model to predict clinical outcome. At a sensitivity of 30% and a specificity of 95% cutoff, we can predict death within 5 years from baseline expression with a precision of 55% for 59 patients, which represents a three-fold higher rate than in the total cohort. This study serves as an important foundation in personalized medicine for multiple myeloma that should be expanded upon by determining if there is a way to increase survival for the high-risk individuals.