Global Dysregulation of Gene Expression and Tumorigenesis: Data Science for Cancer
MetadataShow full item record
Dysregulation of gene expression is a hallmark of cancer. Broadly speaking, my research is focused on the changes in gene expression that characterize the transition from normal to cancerous states, i.e. tumorigenesis. To study such changes, I performed integrated analysis of next generation sequencing data for matched normal and primary tumor samples from hundreds of patients across numerous different cancer types. By analyzing this sequencing data, I have been able to explore the global landscape of transcriptional reprogramming in cancer and discover how changes in the regulation of gene expression may be implicated in tumorigenesis. My thesis is focused on four specific areas of transcriptional reprogramming in cancer: (1) changes in the expression and activity of transposable elements (TEs), (2) changes in alternative splicing induced by TEs, (3) allele-specific expression of tumor suppressor genes (TSGs), and (4) gene expression changes that are implicated in cancer drug response. TEs are known to be uniformly overexpressed in cancer, suggesting a possible role for their activity in tumorigenesis. I discovered a class of long interspersed nuclear elements (the LINE-1 family) with elevated levels of expression and activity in three different cancer types, and I showed examples where cancer-specific LINE-1 insertions disrupt enhancers, leading to the down-regulation of TSGs. TEs are also implicated in the creation of novel splicing isoforms, and aberrant alternative splicing has been associated with tumorigenesis for a number of different cancers. Integrated analysis of genome sequence and transcriptome data revealed thousands of TE-generated alternative splice events genome-wide, including close to 5,000 events distributed among cancer associated genes. I explored the functional implications of specific cases of isoform switching, whereby TE-induced isoforms of cancer associated genes show elevated levels of relative expression in tumor samples. A closer look at TSG expression in matched normal and tumor samples indicated that functionally important changes in patterns of allele-specific expression in individuals heterozygous for loss-of-function TSG alleles is a significant factor in cancer onset/progression. These results identified a variety of molecular mechanisms that contribute to the observed changes in allele-specific expression patterns in cancer with allele-specific alternative splicing mediated by anti-sense RNA emerging as a predominant factor. Furthermore, analysis of the genomic variation for world-wide human populations demonstrates that loss-of-function TSG alleles are segregating at remarkedly high frequencies implying that a significant fraction of otherwise healthy individuals may be pre-disposed to developing cancer. For the final study of my thesis research, I applied the gene expression data from primary tumor samples to build predictive models of cancer drug response for two common chemotherapeutics: 5-Fluorouracil and Gemcitabine. My gene expression based models predict whether patients will respond to individual therapies with up to 86% accuracy. The genes that I found to be most informative for predicting drug response were enriched in well-known cancer signaling pathways highlighting their potential significance in prognosis of chemotherapy.