Prediction of disease risks across multiple populations using evolutionary genetics
MetadataShow full item record
Complex diseases tend to be polygenic and allele frequencies at disease-associated loci vary widely across the globe. Interestingly, while some disease risks are similar across continents, the incidence and mortality rates of prostate cancer vary greatly across global populations. This dissertation is focused on exploring the genetic architecture of phenotypic diversity among global populations. The motivation behind this work is to bridge the gap between evolutionary genetics and genetic epidemiology with the following questions: (1) Why do we observe risk allele frequency differences across populations? (2) How well do GWAS signals replicate across populations with different evolutionary histories? (3) How well can discovered genetic associations predict risk across different populations? Thousands of genome-wide association studies (GWAS) have successfully identified genetic associations with common diseases and other traits. However, the vast majority of published GWAS have used samples of European ancestry and genotyping arrays as opposed to whole genome sequencing. By simulating GWAS with different study populations, I found that non-African cohorts (bottlenecked populations) yield disease associations that have biased allele frequencies and that African cohorts yield disease associations that are relatively free of bias. In addition, I found empirical evidence that genotyping arrays and SNP ascertainment bias contribute to continental differences in risk allele frequencies. Next, I studied the replicability of trait-associations in a European cohort from the UK Biobank to examine whether continental ancestry impacts the results. By comparing GWAS results from the UK Biobank to the novel sub-Saharan Africa dataset, I found that trait associations from European GWAS poorly replicate in sub-Saharan Africa. Notably, the converse was not valid: the top hits from African GWAS were enriched for low p-values in the UK Biobank. GWAS do not always replicate well across populations, and this can cause polygenic risk scores (PRS) to poorly predict disease risks. PRS quantify an individual’s chances of having a disease by summing up the number of risk-increasing alleles in each individual’s genome. Here, I quantified how well genetic predictions of prostate cancer work in different continental populations using PRS that was originally generated from GWAS of European Americans. Using African individuals genotyped using the Men of African Descent and Carcinoma of the Prostate (MADCaP) Array and British individuals from the UK Biobank, I found that genetic predictions of case vs. control status were much more effective European than African individuals. Similarly, genetic prediction of height performs poorly when European results are generalized to African samples. In most PRS calculations, additive effects of risk-increasing alleles are used. Thus, an additional aspect of my dissertation work explores how non-additive models of disease influence risk predictability. Here I incorporated dominance coefficients into PRS calculations, and explored models that ranged from complete recessivity of risk alleles (h = 0) to complete dominance of risk alleles (h = 1). In general, additive models (h = 0.5) work well, but genetic predictions are marginally improved by allowing individual risk-increasing alleles to have different dominance coefficients. Together, these studies underscore how genetic variation contributes to health and disease, as well as the benefits of an evolutionary perspective and correcting for ascertainment bias. As presently calculated, PRS lead to misestimates of hereditary disease risks when they are applied to individuals with different ancestries. To remedy this problem, my research illustrates that PRS can be generated that correct for existing biases by incorporating allele dosage, SNP effect sizes, and ancestral or derived state of the risk alleles. These corrections can partially alleviate challenges that arise when PRS are applied to individuals of African descent. Overall, my work implies that caution must be taken when extrapolating GWAS results from one population to predict disease risks in another population.