Association Mapping David Evans. Outline Definitions / Terminology What is (genetic) association? How do we test for association? When to use association.

Slides:



Advertisements
Similar presentations
A quantitative trait locus not associated with cognitive ability in children: a failure to replicate Hill, L. et al.
Advertisements

Statistical methods for genetic association studies
Linkage and Genetic Mapping
Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
Genetic research designs in the real world Vishwajit L Nimgaonkar MD, PhD University of Pittsburgh
SNP Applications statwww.epfl.ch/davison/teaching/Microarrays/snp.ppt.
Basics of Linkage Analysis
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
Ferdinand van ’t Hooft Cardiovascular Genetics and Genomics Group Karolinska Institutet, Stockholm, Sweden Genome-Wide Association Study GWAS
Ingredients for a successful genome-wide association studies: A statistical view Scott Weiss and Christoph Lange Channing Laboratory Pulmonary and Critical.
Plant of the day! Pebble plants, Lithops, dwarf xerophytes Aizoaceae
Lab 13: Association Genetics. Goals Use a Mixed Model to determine genetic associations. Understand the effect of population structure and kinship on.
Gene-gene and gene-environment interactions Manuel Ferreira Massachusetts General Hospital Harvard Medical School Center for Human Genetic Research.
Analysis of whole genome association studies in pedigreed populations
MSc GBE Course: Genes: from sequence to function Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue.
Positional Cloning LOD Sib pairs Chromosome Region Association Study Genetics Genomics Physical Mapping/ Sequencing Candidate Gene Selection/ Polymorphism.
Something related to genetics? Dr. Lars Eijssen. Bioinformatics to understand studies in genomics – São Paulo – June Image:
Genomewide Association Studies.  1. History –Linkage vs. Association –Power/Sample Size  2. Human Genetic Variation: SNPs  3. Direct vs. Indirect Association.
Give me your DNA and I tell you where you come from - and maybe more! Lausanne, Genopode 21 April 2010 Sven Bergmann University of Lausanne & Swiss Institute.
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
Genome Variations & GWAS
Shaun Purcell & Pak Sham Advanced Workshop Boulder, CO, 2003
Analysis of genome-wide association studies
Linkage and LOD score Egmond, 2006 Manuel AR Ferreira Massachusetts General Hospital Harvard Medical School Boston.
Introduction to BST775: Statistical Methods for Genetic Analysis I Course master: Degui Zhi, Ph.D. Assistant professor Section on Statistical Genetics.
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College
Multifactorial Traits
SNPs Daniel Fernandez Alejandro Quiroz Zárate. A SNP is defined as a single base change in a DNA sequence that occurs in a significant proportion (more.
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen,
Lecture 25: Association Genetics November 30, 2012.
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
Molecular & Genetic Epi 217 Association Studies
CS177 Lecture 10 SNPs and Human Genetic Variation
Gene Hunting: Linkage and Association
A basic review of genetics Dr. Danny Chan Associate Professor Assistant Dean (Faculty of Medicine) Department of Biochemistry Department of Biochemistry.
Genome-Wide Association Study (GWAS)
Lecture 19: Association Studies II Date: 10/29/02  Finish case-control  TDT  Relative Risk.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Type 1 Error and Power Calculation for Association Analysis Pak Sham & Shaun Purcell Advanced Workshop Boulder, CO, 2005.
Association mapping: finding genetic variants for common traits & diseases Manuel Ferreira Queensland Institute of Medical Research Brisbane Genetic Epidemiology.
Molecular & Genetic Epi 217 Association Studies: Indirect John Witte.
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.
Association analysis Genetics for Computer Scientists Biomedicum & Department of Computer Science, Helsinki Päivi Onkamo.
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
The International Consortium. The International HapMap Project.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Lectures 7 – Oct 19, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
Linkage. Announcements Problem set 1 is available for download. Due April 14. class videos are available from a link on the schedule web page, and at.
Linkage. Announcements Problem set 1 is available for download. Due April 14. class videos are available from a link on the schedule web page, and at.
Statistical Analysis of Candidate Gene Association Studies (Categorical Traits) of Biallelic Single Nucleotide Polymorphisms Maani Beigy MD-MPH Student.
Design and Analysis of Genome- wide Association Studies David Evans.
Genome-Wides Association Studies (GWAS) Veryan Codd.
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
Association Mapping in Families Gonçalo Abecasis University of Oxford.
Human Genetics, part I Liisa Kauppi (Keeney lab)
Common variation, GWAS & PLINK
Genome-wide Association
upstream vs. ORF binding and gene expression?
Introduction to bioinformatics lecture 11 SNP by Ms.Shumaila Azam
Epidemiology 101 Epidemiology is the study of the distribution and determinants of health-related states in populations Study design is a key component.
Power to detect QTL Association
Genome-wide Associations
Linking Genetic Variation to Important Phenotypes
Association Mapping Lon Cardon
Genome-wide Association
Medical genomics BI420 Department of Biology, Boston College
Medical genomics BI420 Department of Biology, Boston College
Presentation transcript:

Association Mapping David Evans

Outline Definitions / Terminology What is (genetic) association? How do we test for association? When to use association HapMap and tagging Genome-wide Association Sequencing and Rare variants

Definitions SNP: “Single Nucleotide Polymorphism” a mutation that produces a single base pair change in the DNA sequence haplotypes genotypes alleles A C A C G C A A T T both alleles at a locus form a genotype Locus: Location on the genome alternate forms of a SNP (mutation) A C A C G C A A T T A C A C G C A A T T the pattern of alleles on a chromosome QTL: “Quantitative trait locus” a region of the genome that changes the mean value of a quantitative phenotype

What is (genetic) association? Correlation between an allele/genotype/haplotype and a trait of interest

Genetic Association Three Common Forms Direct Association Mutant or ‘susceptible’ polymorphism Allele of interest is itself involved in phenotype ~70% of Cystic Fibrosis patients have a deletion of 3 base pairs resulting in the loss of a phenylalanine amino acid at position 508 of the CFTR gene

Genetic Association Three Common Forms Direct Association Mutant or ‘susceptible’ polymorphism Allele of interest is itself involved in phenotype ~70% of Cystic Fibrosis patients have a deletion of 3 base pairs resulting in the loss of a phenylalanine amino acid at position 508 of the CFTR gene Indirect Association Allele itself is not involved, but a nearby correlated variant changes phenotype

Indirect association and Linkage disequilibrium

time

Linkage Disequilibrium Linkage disequilibrium means that we don’t need to genotype the exact aetiological variant, but only a variant that is correlated with it

Genetic Association Three Common Forms Direct Association Mutant or ‘susceptible’ polymorphism Allele of interest is itself involved in phenotype Indirect Association Allele itself is not involved, but a nearby correlated marker changes phenotype Spurious association Apparent association not related to genetic aetiology (e.g. population stratification)

Population Stratification Marchini, Nat Genet. 2004

How do we test for association?

Genetic Case Control Study T/G T/T G/T T/T T/G Allele G is ‘associated’ with disease T/G G/G T/T ControlsCases

Allele-based tests Each individual contributes two counts to 2x2 table. Test of association where X 2 has χ 2 distribution with 1 degrees of freedom under null hypothesis. CasesControlsTotal Gn 1A n 1U n1·n1· Tn 0A n 0U n0·n0· Totaln·An·A n·Un·U n ··

Genotypic tests SNP marker data can be represented in 2x3 table. Test of association where X 2 has χ 2 distribution with 2 degrees of freedom under null hypothesis. CasesControlsTotal GGn 2A n 2U n2·n2· GTn 1A n 1U n1·n1· TTn 0A n 0U n0·n0· Totaln·An·A n·Un·U n ··

Simple Regression Model of Association (Unrelated individuals) Y i =  +  X i + e i where Y i = trait value for individual i X i =number of ‘A’ alleles an individual has X Y Association test is whether 

ACAA AC Rationale: Related individuals have to be from the same population Compare number of times heterozygous parents transmit “A” vs “C” allele to affected offspring Transmission Disequilibrium Test

ACAA AC Difficult to gather families Difficult to get parents for late onset / psychiatric conditions Inefficient for genotyping (particularly GWA)

Case-control versus TDT p = 0.1; RAA = RAa = 2

When to use association...

Methods of gene hunting Effect Size Frequency rare, monogenic (linkage) common, complex (association)

Association Summary 1.Families or unrelateds 2.Matching/ethnicity crucial 3.Many markers req for genome coverage (10 5 – 10 6 SNPs) 4.Powerful design 5.Ok for initial detection; good for fine-mapping 6.Powerful for common variants; rare variants difficult

HapMap and Tagging

Historical gene mapping Glazier et al, Science (2002).

Reasons for Failure? Complex Phenotype Common environment MarkerGene 1 Individual environment Polygenic background Gene 2 Gene 3 Linkage disequilibrium Mode of inheritance Linkage Association Weiss & Terwilliger (2000) Nat Genet Inadequate Marker Coverage (Candidate gene studies)

Enabling association studies: HapMap

Visualizing empirical LD

Pairwise tagging Tags: SNP 1 SNP 3 SNP 6 3 in total Test for association: SNP 1 SNP 3 SNP 6 A/T 1 G/A 2 G/C 3 T/C 4 G/C 5 A/C 6 high r 2 AAAA TTTT G C C G G C C G T CCCCCC A CCCCCC G C C G T CCCCCC GGGG AAAA GGGG AAAA Carlson et al. (2004) AJHG 74:106

Genome-wide Association

Enabling Genome-wide Association Studies HAPlotype MAP High throughput genotyping Large cohorts

Genome-wide Association Studies The Australo-Anglo-American Ankylosing Spondylitis Consortium (2010) Nature Genetics

Meta-analysis Repapi et al. (2009) Nature Genetics

……… ? 2 1 ?? 1? 2 2 0……… ? 2 2 ?? 0? 2 1 0………. 2 ?1 2 ? 2 1 ?? 1? 1 1 0……… ? 2 2 ?? 1? 1 1 0……… ? 2 1 ?? 1? 2 2 0……… ? 2 2 ?? 0? 2 2 0……… ? 2 1 ?? 1? 1 1 ?……… ? 2 2 ?? 1? 1 1 ?………. HapMap Phase II Cases Controls Imputation Recombination Rate

Imputation

Genomic control Test locus Unlinked ‘null’ markers 22 No stratification 22 Stratification  adjust test statistic

PCA

Replication Replication studies should be of sufficient size to demonstrate the effect Replication studies should conducted in independent datasets Replication should involve the same phenotype Replication should be conducted in a similar population The same SNP should be tested The replicated signal should be in the same direction Joint analysis should lead to a lower p value than the original report Well designed negative studies are valuable

Programs for performing association analysis Mx (Neale) –Fully flexible, ordinal data –Not ideal for large pedigrees or GWAs PLINK (Purcell, Neale, Ferreira) –GWA Haploview (Barrett) –Graphical visualization of LD, tagging, basic tests of association MERLIN, QTDT (Abecasis) –Association and linkage in families

Sequencing and Rare Variants

Metzker et al (2010) Nature Reviews Genetics

Analysis of Rare Variants How to combine rare variants? –“Ordinary” tests of association won’t work –Collapse across all SNPs? Which SNPs to include? –Frequency? –Function? How to define a region?

Summary 1.Genetic association studies can be used to locate common genetic variants that increase risk of disease/affect quantitative phenotypes 2.Genome-wide association spectacularly successful in identifying common variants underlying complex traits and disease 3.The next challenge is to explain the “missing heritability” in the genome. Genome-wide sequencing and the analysis of rare variants will play a major part in this effort