Shaun Purcell & Pak Sham Advanced Workshop Boulder, CO, 2003

Slides:



Advertisements
Similar presentations
15 The Genetic Basis of Complex Inheritance
Advertisements

Genetic research designs in the real world Vishwajit L Nimgaonkar MD, PhD University of Pittsburgh
Sampling: Final and Initial Sample Size Determination
Multiple Comparisons Measures of LD Jess Paulus, ScD January 29, 2013.
Linkage Analysis: An Introduction Pak Sham Twin Workshop 2001.
Human Genetics Genetic Epidemiology.
Joint Linkage and Linkage Disequilibrium Mapping
Association Mapping David Evans. Outline Definitions / Terminology What is (genetic) association? How do we test for association? When to use association.
Biometrical genetics Manuel Ferreira Shaun Purcell Pak Sham Boulder Introductory Course 2006.
Biometrical genetics Manuel Ferreira Shaun Purcell Pak Sham Boulder Introductory Course 2006.
Power in QTL linkage: single and multilocus analysis Shaun Purcell 1,2 & Pak Sham 1 1 SGDP, IoP, London, UK 2 Whitehead Institute, MIT, Cambridge, MA,
Association analysis Shaun Purcell Boulder Twin Workshop 2004.
Thoughts about the TDT. Contribution of TDT: Finding Genes for 3 Complex Diseases PPAR-gamma in Type 2 diabetes Altshuler et al. Nat Genet 26:76-80, 2000.
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Statistical Power Calculations Boulder, 2007 Manuel AR Ferreira Massachusetts General Hospital Harvard Medical School Boston.
8 - 1 © 2003 Pearson Prentice Hall Chi-Square (  2 ) Test of Variance.
Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.
Statistics for clinical research An introductory course.
Gene, Allele, Genotype, and Phenotype
© 2002 Thomson / South-Western Slide 8-1 Chapter 8 Estimation with Single Samples.
Multifactorial Traits
Family-Based Association Tests
Introduction to Linkage Analysis Pak Sham Twin Workshop 2003.
Lecture 19: Association Studies II Date: 10/29/02  Finish case-control  TDT  Relative Risk.
Quantitative Genetics. Continuous phenotypic variation within populations- not discrete characters Phenotypic variation due to both genetic and environmental.
Type 1 Error and Power Calculation for Association Analysis Pak Sham & Shaun Purcell Advanced Workshop Boulder, CO, 2005.
Quantitative Genetics
Power of linkage analysis Egmond, 2006 Manuel AR Ferreira Massachusetts General Hospital Harvard Medical School Boston.
Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.
Tutorial #10 by Ma’ayan Fishelson. Classical Method of Linkage Analysis The classical method was parametric linkage analysis  the Lod-score method. This.
1 B-b B-B B-b b-b Lecture 2 - Segregation Analysis 1/15/04 Biomath 207B / Biostat 237 / HG 207B.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
10.5 Testing Claims about the Population Standard Deviation.
Lecture 21: Quantitative Traits I Date: 11/05/02  Review: covariance, regression, etc  Introduction to quantitative genetics.
Association analysis Genetics for Computer Scientists Biomedicum & Department of Computer Science, Helsinki Päivi Onkamo.
Population structure at QTL d A B C D E Q F G H a b c d e q f g h The population content at a quantitative trait locus (backcross, RIL, DH). Can be deduced.
A Transmission/disequilibrium Test for Ordinal Traits in Nuclear Families and a Unified Approach for Association Studies Heping Zhang, Xueqin Wang and.
Genetic Theory Pak Sham SGDP, IoP, London, UK. Theory Model Data Inference Experiment Formulation Interpretation.
Epistasis / Multi-locus Modelling Shaun Purcell, Pak Sham SGDP, IoP, London, UK.
Association mapping for mendelian, and complex disorders January 16Bafna, BfB.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Linkage Disequilibrium Mapping of Complex Binary Diseases Two types of complex traits Quantitative traits–continuous variation Dichotomous traits–discontinuous.
Lecture 22: Quantitative Traits II
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
Powerful Regression-based Quantitative Trait Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.
Introduction to Genetic Theory
Biometrical Genetics Shaun Purcell Twin Workshop, March 2004.
QTL Mapping Using Mx Michael C Neale Virginia Institute for Psychiatric and Behavioral Genetics Virginia Commonwealth University.
Increasing Power in Association Studies by using Linkage Disequilibrium Structure and Molecular Function as Prior Information Eleazar Eskin UCLA.
Association Mapping in Families Gonçalo Abecasis University of Oxford.
Power and Meta-Analysis Dr Geraldine M. Clarke Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for.
Lecture 17: Model-Free Linkage Analysis Date: 10/17/02  IBD and IBS  IBD and linkage  Fully Informative Sib Pair Analysis  Sib Pair Analysis with Missing.
Power in QTL linkage analysis
The Chi Square Test A statistical method used to determine goodness of fit Chi-square requires no assumptions about the shape of the population distribution.
Regression Models for Linkage: Merlin Regress
Genome Wide Association Studies using SNP
The Genetic Basis of Complex Inheritance
I Have the Power in QTL linkage: single and multilocus analysis
Regression-based linkage analysis
Linkage in Selected Samples
Power to detect QTL Association
Genetic Epidemiology Association Studies and Power Considerations
Chapter 6 Confidence Intervals.
Pak Sham & Shaun Purcell Twin Workshop, March 2002
Statistical Power and Meta-analysis
Lecture 9: QTL Mapping II: Outbred Populations
Power Calculation for QTL Association
BOULDER WORKSHOP STATISTICS REVIEWED: LIKELIHOOD MODELS
Association Design Begins with KNOWN polymorphism theoretically expected to be associated with the trait (e.g., DRD2 and schizophrenia). Genotypes.
Presentation transcript:

Shaun Purcell & Pak Sham Advanced Workshop Boulder, CO, 2003 Power calculation for QTL association (discrete and quantitative traits) Shaun Purcell & Pak Sham Advanced Workshop Boulder, CO, 2003

Discrete Threshold Quantitative Case-control Case-control Variance components Aff UnAff A n1 n2 a n3 n4 High Low A n1 n2 a n3 n4 TDT TDT Tr UnTr A n1 n2 a n3 n4 Tr UnTr A n1 n2 a n3 n4

Discrete trait calculation p Frequency of high-risk allele K Prevalence of disease RAA Genotypic relative risk for AA genotype RAa Genotypic relative risk for Aa genotype N, ,  Sample size, Type I & II error rate

Risk is P(D|G) gAA = RAA gaa gAa = RAa gaa K = p2 gAA + 2pq gAa + q2 gaa gaa = K / ( p2 RAA + 2pq RAa + q2 ) Odds ratios (e.g. for AA genotype) = gAA / (1- gAA ) gaa / (1- gaa )

Need to calculate P(G|D) Expected proportion d of genotypes in cases dAA = gAA p2 / (gAAp2 + gAa2pq + gaaq2 ) dAa = gAa 2pq / (gAAp2 + gAa2pq + gaaq2 ) daa = gaa q2 / (gAAp2 + gAa2pq + gaaq2 ) Expected number of A alleles for cases 2NCase ( dAA + dAa / 2 ) Expected proportion c of genotypes in controls cAA = (1-gAA) p2 / ( (1-gAA) p2 + (1-gAa) 2pq + (1-gaa) q2 )

Full contingency table “A” allele “a” allele Case 2NCase ( dAA + dAa / 2 ) 2NCase ( daa + dAa / 2 ) Control 2NControl ( cAA + cAa / 2 ) 2NControl ( caa + cAa / 2 )

Threshold selection Genotype AA Aa aa Frequency q2 2pq p2 Trait mean -a d a Trait variance 2 2 2

P(X) = GP(X|G)P(G) P(X) Aa aa AA X

P(G|X<T) = P(X<T|G)P(G) / P(X<T) Nb. the cumulative standard normal distribution gives the area under the curve, P(X < T) Aa T AA X

Incomplete LD Effect of incomplete LD between QTL and marker A a M pm1 + δ qm1 - δ m pm2 – δ qm2 + δ δ = D’ × DMAX DMAX = min{pm2 , qm1} Note that linkage disequilibrium will depend on both D’ and QTL & marker allele frequencies

Incomplete LD Consider genotypic risks at marker: P(D|MM) = [ (pm1+ δ)2 P(D|AA) + 2(pm1+ δ)(qm1- δ) P(D|Aa) + (qm1- δ)2 P(D|aa) ] / m12 Calculation proceeds as before, but at the marker Geno. Haplo. AAMM AM/AM AM/aM or aM/AM AaMM aaMM aM/aM MM

Discrete TDT calculation Calculate probability of parental mating type given affected offspring Calculate probability of offspring genotype given parental mating type and affected Calculate overall probability of heterozygous parents transmitting allele A as opposed to a Calculate TDT test statistic, power

Fulker association model The genotypic score (1,0,-1) for sibling i is decomposed into between and within components: sibship genotypic mean deviation from sibship genotypic mean

NCPs of B and W tests Approximation for between test Approximation for within test Sham et al (2000) AJHG 66

Practical Exercise Calculation of power for simple case-control study. DATA : frequency of risk factor in 30 cases and 30 controls TEST : 2-by-2 contingency table : chi-squared (1 degree of freedom)

Step 1 : determine expected chi-squared Hypothetical risk factor frequencies Case Control A allele present 20 10 A allele absent 10 20 Chi-squared statistic = 6.666

P(T) Critical value T Step 2. Determine the critical value for a given type I error rate,  - inverse central chi-squared distribution P(T) Critical value  T

Step 3. Determine the power for a given critical value and non-centrality parameter - non-central chi-squared distribution P(T) Critical value T

Calculating Power 1. Calculate critical value (Inverse central 2) Alpha 0 (under the null) 2. Calculate power (Non-central 2) Crit. value Expected NCP

http://workshop.colorado.edu/~pshaun/gpc/pdf.html df = 1 , NCP = 0  X 0.05 0.01 0.001 3.84146 6.63489 10.82754

Determining power df = 1 , NCP = 6.666  X Power 0.05 3.84146 0.05 3.84146 0.01 6.6349 0.001 10.827 0.73 0.50 0.24

1. Planning a study Candidate gene study A disease occurs in 2% of the population Assume multiplicative model genotype risk ratio Aa = 2 genotype risk ratio AA = 4 100 cases, 100 controls What if the risk allele is rare vs common?

2. Interpreting a negative result Negative candidate gene TDT study, 82 affected offspring trios “affection” = scoring >2 SD above mean candidate gene SNP allele frequency 0.25 Desired 80% power, 5% type I error rate What is the minimum detectable QTL variance (assume additivity)?

Planning a study p N cases (=N controls) 0.01 1144 0.05 247 0.2 83 0.01 1144 0.05 247 0.2 83 0.5 66 0.8 126 0.95 465 0.99 2286

Interpreting a negative result QTL Power 0.00 0.05 0.01 0.34 0.02 0.60 0.03 0.78 0.04 0.88 0.05 0.94

Exploring power of association using GPC Linkage versus association difference in required sample sizes for specific QTL size TDT versus case-control difference in efficiency? Quantitative versus binary traits loss of power from artificial dichotomisation?

Linkage versus association QTL linkage: 500 sib pairs, r=0.5 QTL association: 1000 individuals

Case-control versus TDT p = 0.1; RAA = RAa = 2

Quantitative versus discrete K=0.05 K=0.2 K=0.5 To investigate: use threshold-based association Fixed QTL effect (additive, 5%, p=0.5) 500 individuals For prevalence K Group 1 has N and T Group 2 has N and T

Quantitative versus discrete K T (SD) 0.01 2.326 0.05 1.645 0.10 1.282 0.20 0.842 0.25 0.674 0.50 0.000

Quantitative versus discrete

Incomplete LD what is the impact of D’ values less than 1? does allele frequency affect the power of the test? (using discrete case-control calculator) Family-based VC association: between and within tests what is the impact of sibship size? sibling correlation? (using QTL VC association calculator)

Incomplete LD Case-control for discrete traits Disease K = 0.1 QTL RAA = RAa = 2 p = 0.05 Marker1 m = 0.05 D’ = { 1, 0.8, 0.6, 0.4, 0.2, 0} Marker2 m = 0.25 D’ = { 1, 0.8, 0.6, 0.4, 0.2, 0} Sample 250 cases, 250 controls

Incomplete LD Genotypic risk at marker1 (left) and marker2 (right) as a function of D’

Incomplete LD Expected likelihood ratio test as a function of D’

Family-based association Sibship type 1200 individuals, 600 pairs, 400 trios, 300 quads Sibling correlation r = 0.2, 0.5, 0.8 QTL (diallelic, equal allele frequency) 2%, 10% of trait variance

Family-based association NCP proportional to variance explained Between test ↓ with ↑ sibship size and ↑ sibling correlation Within test 0 for s=1, ↑ with ↑ sibship size and ↑ sibling correlation

Between-sibship association

Within-sibship association

Total association

GPC Usual URL for GPC http://statgen.iop.kcl.ac.uk/gpc/ Purcell S, Cherny SS, Sham PC. (2003) Genetic Power Calculator: design of linkage and association genetic mapping studies of complex traits. Bioinformatics, 19(1):149-50