Blood eQTL detection in structured populations and its application to interpretation of genetic association studies
MetadataShow full item record
Expression QTL (eQTL) detection has emerged as an important tool for unravelling the relationship between genetic risk factors and disease or clinical phenotypes. Most studies focus on analyses predicated on the assumption that only a single causal variant explains the association signal in each interval. This greatly simplifies the statistical modeling, but is liable to biases in scenarios where multiple linked causal-variants are responsible. Here in this thesis, my primary goal was to address the prevalence of secondary cis-eQTL signals regulating peripheral blood gene expression locally, utilizing two large human cohort studies, each greater than 2,500 samples with accompanying whole genome genotypes. The CAGE dataset is a compendium of Illumina microarray studies, and the Framingham Heart Study (FHS) is a two-generation Affymetrix dataset. I firstly describe performing simulation to reveal the potential interference of causal variants in LD regions. I then also describe a Bayesian co-localization analysis of the extent of sharing of cis-eQTL detected in both studies as well as with the BIOS RNA-seq dataset. Stepwise conditional modeling demonstrates that multiple eQTL signals are present for ~40% of over 3,500 eGenes in both microarray datasets, and that the number of loci with additional signals reduces by approximately two-thirds with each conditioning step. Although fewer than 20% of the peak signals across platforms fine-map to the same credible interval, the co-localization analysis finds that as many as 50%~60% of the primary eQTL are actually shared. Subsequently, co-localization of eQTL signals with GWAS hits detected 1,349 genes whose expression in peripheral blood is associated with 591 human phenotype traits or diseases, including enrichment for genes with regulatory functions such as protein kinase activity and DNA binding. Just one quarter of these co-localization signals are replicated, further highlighting the technological and methodological barriers to reconciliation of GWAS and eQTL signals. My results are provided as a web-based resource for visualization of multi-site regulation of gene expression and their association with human complex traits and disease states. In addition to the cis-eQTL study, as a member of the eQTLgen consortium, I also conduct trans-eQTL detection in multiple cohorts, including FHS, which contains related individuals, and performed cis-trans eQTL mediation analysis, which I will report as a side project. This thesis provides novel insights into the complexity of gene regulation and the low consistency of fine mapping across studies, and introduces new software, PolyQTL, for co-localization of genetic signals in structured populations.