Genome-Wide Association Studies Xiaole Shirley Liu Stat 115/215
Multiple hypotheses testing? Family based association studies (trios with affected child) Population based case control studies GWAS Pvalues
Unusual Pvalue distributions Pvalue QQ plot 3
Unusual Pvalue distributions Pvalue QQ plotPopulation stratification 4 Balding, Nature Reviews Genetics 2010
Population Stratification Population stratification –e.g. some SNP unique to ethnic group –Need to make sure sample groups match –Hidden environmental structure ● Two populations have different disease frequency, and different allele frequency. ● Association picks up the fact they are different populations! 5
Genotyping Principal Components (PCs) Can Model Population Stratification Li et al., Science 2008
European population structure 1,387 samples ~200K SNPs
UK WTCCC1 Study 8 Africa European Chinese + Japanese Afro-Caribbean samples South Asian samples
Genomic control Devlin and Roeder (1999) used theoretical arguments to propose that with population structure, the distribution of Cochran-Armitage trend tests, genome-wide, is inflated by a constant multiplicative factor λ. We can estimate the multiplicative inflation factor using the statistic λ = median(X i 2 )/ Inflation factor λ > 1 indicates population structure and/or genotyping error. We can carry out an adjusted test of association that takes account of any mismatching of cases/controls at any SNP using the statistic X i 2 / λ. Inflation factor λ = 1.11 Population outliers and/or structure? True hits?
IBD: Identity By Descent Test If two individuals share common ancestor, they will share many SNPs / haplotype blocks on their genome (identical by state: IBS) 10
IBD: Identity By Descent Test Pairwise IBD probability between samples Probability two individuals share 0 (Z0), 1 (Z1), and 2 (Z2) haplotypes across the genome. Remove IDBs 11
Manolio et al., Clin Invest 2008
13 Pitfalls of Association Studies Not very predictive Explain little heritability Poor reproducibility Poor penetrance (fraction of people with the marker who show the trait) and expressivity (severity of the effect) Focus on common variation Difficult when several genes affecting a quantitative trait Many associated variants are not causal No available intervention for many disease risks
Pitfalls of Association Studies Not very predictive 14
Missing Heritability? Visccher, AJHG 2011
16 Reproducibility of Association Studies Most reported associations have not been consistently reproduced Hirschhorn et al, Genetics in Medicine, 2002, review of association studies –603 associations of polymorphisms and disease –166 studied in at least three populations –Only 6 seen in > 75% studies
17 Cause for Inconsistency What explains the lack of reproducibility? False positives –Multiple hypothesis testing –Ethnic admixture / stratification False negatives –Lack of power for weak effects Population differences –Variable LD with causal SNP –Population-specific modifiers
18 Causes for Inconsistency A sizable fraction (but less than half) of reported associations are likely correct Genetic effects are generally modest –Beware the winner’s curse (auction theory) –In association studies, first positive report is equivalent to the winning bid Large study sizes are needed to detect these reliably
19 Should we Believe Association Study Results? Initial skepticism is warranted Replication, especially with low p values, is encouraging Large sample sizes are crucial E.g. PPAR Pro12Ala & Diabetes
Replication, Replication, Replication Meta-analysis of multiple studies to increase GWAS power Combine data from different platforms / studies Impute unmeasured or missing genotypes based on LD (e.g. HapMap haplotypes or 1000 Genomes) Analyze all studies together to increase GWAS power 20
Detection Power of GWAS 21
Mapping (expression) Quantitative Trait Loci 22
SHR BN F1 F2 Genotype BGenotype H HBBHBHH Strain Distribution Pattern for Gene X Gene X Rat Recombinant Inbred (RI) Strains F1 offspring are identical F2 offspring are different (due to recombination) Brother sister mating over >20 generations to achieve homozygosity at all genetic loci
Gene X BHBBBHH SDP for Gene X Mapping of QTLs Compare strain distribution pattern of every marker with certain traits RI strains obesity mRNA Linkage
(e)QTL Mapping Many disease associated genes have been mapped with QTL eQTL mapping: –Transcript abundance may act as intermediate phenotype between genetic loci and the clinical phenotype –Incorporate information of genotype, expression, and clinical traits together to construct regulatory networks and to improve understanding of disease etiologies 25
eQTL Analysis 26
cis- and trans-acting eQTLs 27
trans-eQTLs Hot-spots 28
eQTL on Human HapMap –Gene expression –Histone mark –DNase-seq Need to check AA, AB, BB genotypes against gene expression differences 29
eQTL on TF Binding and Epigenetics 30 McDaniell et al, Science 2010
Summary Population stratification, IBD Removing outliers or find the scaling factor Predictability, heritability Reproducibility QTL and eQTL mapping Cis- vs trans- eQTL 31
32 Acknowledgement Tim Niu Kenneth Kidd, Judith Kidd and Glenys Thomson Joel Hirschhorn Greg Gibson & Spencer Muse Jim Stankovich Teri Manolio David Evans Guodong Wu Enrico Petretto Wei Wang Bo Li