Genome-Wide Association Studies Xiaole Shirley Liu Stat 115/215.

Slides:



Advertisements
Similar presentations
Statistical methods for genetic association studies
Advertisements

Population structure.
What is an association study? Define linkage disequilibrium
Association Tests for Rare Variants Using Sequence Data
Why this paper Causal genetic variants at loci contributing to complex phenotypes unknown Rat/mice model organisms in physiology and diseases Relevant.
Genetic Analysis of Genome-wide Variation in Human Gene Expression Morley M. et al. Nature 2004,430: Yen-Yi Ho.
Genetic research designs in the real world Vishwajit L Nimgaonkar MD, PhD University of Pittsburgh
SHI Meng. Abstract The genetic basis of gene expression variation has long been studied with the aim to understand the landscape of regulatory variants,
Single Nucleotide Polymorphism Copy Number Variations and SNP Array Xiaole Shirley Liu and Jun Liu.
Genetic Analysis in Human Disease
Perspectives from Human Studies and Low Density Chip Jeffrey R. O’Connell University of Maryland School of Medicine October 28, 2008.
Objectives Cover some of the essential concepts for GWAS that have not yet been covered Hardy-Weinberg equilibrium Meta-analysis SNP Imputation Review.
Basics of Linkage Analysis
Single Nucleotide Polymorphism And Association Studies
Regulatory variation and eQTLs Chris Cotsapas
Association Mapping David Evans. Outline Definitions / Terminology What is (genetic) association? How do we test for association? When to use association.
MALD Mapping by Admixture Linkage Disequilibrium.
Plant of the day! Pebble plants, Lithops, dwarf xerophytes Aizoaceae
Quantitative Genetics Theoretical justification Estimation of heritability –Family studies –Response to selection –Inbred strain comparisons Quantitative.
Signatures of Selection
MSc GBE Course: Genes: from sequence to function Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue.
Something related to genetics? Dr. Lars Eijssen. Bioinformatics to understand studies in genomics – São Paulo – June Image:
Genomewide Association Studies.  1. History –Linkage vs. Association –Power/Sample Size  2. Human Genetic Variation: SNPs  3. Direct vs. Indirect Association.
Give me your DNA and I tell you where you come from - and maybe more! Lausanne, Genopode 21 April 2010 Sven Bergmann University of Lausanne & Swiss Institute.
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
Genetic Analysis in Human Disease. Learning Objectives Describe the differences between a linkage analysis and an association analysis Identify potentially.
Rare and common variants: twenty arguments G.Gibson Homework 3 Mylène Champs Marine Flechet Mathieu Stifkens 1 Bioinformatics - GBIO K.Van Steen.
Linkage and LOD score Egmond, 2006 Manuel AR Ferreira Massachusetts General Hospital Harvard Medical School Boston.
Geuvadis RNAseq analysis at UNIGE Analysis plans
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
Comments on Rare Variants Analyses Ryo Yamada Kyoto University 2012/08/27 Japan.
Introduction to BST775: Statistical Methods for Genetic Analysis I Course master: Degui Zhi, Ph.D. Assistant professor Section on Statistical Genetics.
Broad-Sense Heritability Index
The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
From Genome-Wide Association Studies to Medicine Florian Schmitzberger - CS 374 – 4/28/2009 Stanford University Biomedical Informatics
Genome-Wide Association Study (GWAS)
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Future Directions Pak Sham, HKU Boulder Genetics of Complex Traits Quantitative GeneticsGene Mapping Functional Genomics.
Statistical Issues in Genetic Association Studies
Association analysis Genetics for Computer Scientists Biomedicum & Department of Computer Science, Helsinki Päivi Onkamo.
Errors in Genetic Data Gonçalo Abecasis. Errors in Genetic Data Pedigree Errors Genotyping Errors Phenotyping Errors.
Lecture 20 - Association Tests Gibson and Muse, Chapt. 3, 2nd Ed.
The International Consortium. The International HapMap Project.
C2BAT: Using the same data set for screening and testing. A testing strategy for genome-wide association studies in case/control design Matt McQueen, Jessica.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
1 Paper Outline Specific Aim Background & Significance Research Description Potential Pitfalls and Alternate Approaches Class Paper: 5-7 pages (with figures)
Admixture Mapping Controlled Crosses Are Often Used to Determine the Genetic Basis of Differences Between Populations. When controlled crosses are not.
Genetics of Gene Expression BIOS Statistics for Systems Biology Spring 2008.
NCSU Summer Institute of Statistical Genetics, Raleigh 2004: Genome Science Session 3: Genomic Variation.
Genome-Wides Association Studies (GWAS) Veryan Codd.
Power and Meta-Analysis Dr Geraldine M. Clarke Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for.
Understanding GWAS SNPs Xiaole Shirley Liu Stat 115/215.
SNPs and complex traits: where is the hidden heritability?
Common variation, GWAS & PLINK
Xiaole Shirley Liu STAT115/STAT215/
upstream vs. ORF binding and gene expression?
Genome Wide Association Studies using SNP
Recombination (Crossing Over)
Epidemiology 101 Epidemiology is the study of the distribution and determinants of health-related states in populations Study design is a key component.
Genome-wide Associations
In these studies, expression levels are viewed as quantitative traits, and gene expression phenotypes are mapped to particular genomic loci by combining.
Association Analysis Spotted history
Medical genomics BI420 Department of Biology, Boston College
Perspectives from Human Studies and Low Density Chip
Medical genomics BI420 Department of Biology, Boston College
Presentation transcript:

Genome-Wide Association Studies Xiaole Shirley Liu Stat 115/215

Multiple hypotheses testing? Family based association studies (trios with affected child) Population based case control studies GWAS Pvalues

Unusual Pvalue distributions Pvalue QQ plot 3

Unusual Pvalue distributions Pvalue QQ plotPopulation stratification 4 Balding, Nature Reviews Genetics 2010

Population Stratification Population stratification –e.g. some SNP unique to ethnic group –Need to make sure sample groups match –Hidden environmental structure ● Two populations have different disease frequency, and different allele frequency. ● Association picks up the fact they are different populations! 5

Genotyping Principal Components (PCs) Can Model Population Stratification Li et al., Science 2008

European population structure 1,387 samples ~200K SNPs

UK WTCCC1 Study 8 Africa European Chinese + Japanese Afro-Caribbean samples South Asian samples

Genomic control Devlin and Roeder (1999) used theoretical arguments to propose that with population structure, the distribution of Cochran-Armitage trend tests, genome-wide, is inflated by a constant multiplicative factor λ. We can estimate the multiplicative inflation factor using the statistic λ = median(X i 2 )/ Inflation factor λ > 1 indicates population structure and/or genotyping error. We can carry out an adjusted test of association that takes account of any mismatching of cases/controls at any SNP using the statistic X i 2 / λ. Inflation factor λ = 1.11 Population outliers and/or structure? True hits?

IBD: Identity By Descent Test If two individuals share common ancestor, they will share many SNPs / haplotype blocks on their genome (identical by state: IBS) 10

IBD: Identity By Descent Test Pairwise IBD probability between samples Probability two individuals share 0 (Z0), 1 (Z1), and 2 (Z2) haplotypes across the genome. Remove IDBs 11

Manolio et al., Clin Invest 2008

13 Pitfalls of Association Studies Not very predictive Explain little heritability Poor reproducibility Poor penetrance (fraction of people with the marker who show the trait) and expressivity (severity of the effect) Focus on common variation Difficult when several genes affecting a quantitative trait Many associated variants are not causal No available intervention for many disease risks

Pitfalls of Association Studies Not very predictive 14

Missing Heritability? Visccher, AJHG 2011

16 Reproducibility of Association Studies Most reported associations have not been consistently reproduced Hirschhorn et al, Genetics in Medicine, 2002, review of association studies –603 associations of polymorphisms and disease –166 studied in at least three populations –Only 6 seen in > 75% studies

17 Cause for Inconsistency What explains the lack of reproducibility? False positives –Multiple hypothesis testing –Ethnic admixture / stratification False negatives –Lack of power for weak effects Population differences –Variable LD with causal SNP –Population-specific modifiers

18 Causes for Inconsistency A sizable fraction (but less than half) of reported associations are likely correct Genetic effects are generally modest –Beware the winner’s curse (auction theory) –In association studies, first positive report is equivalent to the winning bid Large study sizes are needed to detect these reliably

19 Should we Believe Association Study Results? Initial skepticism is warranted Replication, especially with low p values, is encouraging Large sample sizes are crucial E.g. PPAR  Pro12Ala & Diabetes

Replication, Replication, Replication Meta-analysis of multiple studies to increase GWAS power Combine data from different platforms / studies Impute unmeasured or missing genotypes based on LD (e.g. HapMap haplotypes or 1000 Genomes) Analyze all studies together to increase GWAS power 20

Detection Power of GWAS 21

Mapping (expression) Quantitative Trait Loci 22

SHR BN F1 F2 Genotype BGenotype H HBBHBHH Strain Distribution Pattern for Gene X Gene X Rat Recombinant Inbred (RI) Strains F1 offspring are identical F2 offspring are different (due to recombination) Brother sister mating over >20 generations to achieve homozygosity at all genetic loci

Gene X BHBBBHH SDP for Gene X Mapping of QTLs Compare strain distribution pattern of every marker with certain traits RI strains obesity mRNA Linkage

(e)QTL Mapping Many disease associated genes have been mapped with QTL eQTL mapping: –Transcript abundance may act as intermediate phenotype between genetic loci and the clinical phenotype –Incorporate information of genotype, expression, and clinical traits together to construct regulatory networks and to improve understanding of disease etiologies 25

eQTL Analysis 26

cis- and trans-acting eQTLs 27

trans-eQTLs Hot-spots 28

eQTL on Human HapMap –Gene expression –Histone mark –DNase-seq Need to check AA, AB, BB genotypes against gene expression differences 29

eQTL on TF Binding and Epigenetics 30 McDaniell et al, Science 2010

Summary Population stratification, IBD Removing outliers or find the scaling factor Predictability, heritability Reproducibility QTL and eQTL mapping Cis- vs trans- eQTL 31

32 Acknowledgement Tim Niu Kenneth Kidd, Judith Kidd and Glenys Thomson Joel Hirschhorn Greg Gibson & Spencer Muse Jim Stankovich Teri Manolio David Evans Guodong Wu Enrico Petretto Wei Wang Bo Li