Power in QTL linkage analysis

Slides:



Advertisements
Similar presentations
Genetic Heterogeneity Taken from: Advanced Topics in Linkage Analysis. Ch. 27 Presented by: Natalie Aizenberg Assaf Chen.
Advertisements

Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Multiple Comparisons Measures of LD Jess Paulus, ScD January 29, 2013.
Basics of Linkage Analysis
Linkage Analysis: An Introduction Pak Sham Twin Workshop 2001.
Power in QTL linkage: single and multilocus analysis Shaun Purcell 1,2 & Pak Sham 1 1 SGDP, IoP, London, UK 2 Whitehead Institute, MIT, Cambridge, MA,
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
Business Statistics - QBM117
DATA ANALYSIS I MKT525. Plan of analysis What decision must be made? What are research objectives? What do you have to know to reach those objectives?
Today Concepts underlying inferential statistics
5-3 Inference on the Means of Two Populations, Variances Unknown
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Statistical Power Calculations Boulder, 2007 Manuel AR Ferreira Massachusetts General Hospital Harvard Medical School Boston.
Shaun Purcell & Pak Sham Advanced Workshop Boulder, CO, 2003
AM Recitation 2/10/11.
Overview Definition Hypothesis
Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.
Statistical Significance R.Raveendran. Heart rate (bpm) Mean ± SEM n In men ± In women ± The difference between means.
Introduction to QTL analysis Peter Visscher University of Edinburgh
Genetic Mapping Oregon Wolfe Barley Map (Szucs et al., The Plant Genome 2, )
Introduction to Linkage Analysis Pak Sham Twin Workshop 2003.
Lecture 19: Association Studies II Date: 10/29/02  Finish case-control  TDT  Relative Risk.
Quantitative Genetics. Continuous phenotypic variation within populations- not discrete characters Phenotypic variation due to both genetic and environmental.
Type 1 Error and Power Calculation for Association Analysis Pak Sham & Shaun Purcell Advanced Workshop Boulder, CO, 2005.
Quantitative Genetics
Power of linkage analysis Egmond, 2006 Manuel AR Ferreira Massachusetts General Hospital Harvard Medical School Boston.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.
Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.
Tutorial #10 by Ma’ayan Fishelson. Classical Method of Linkage Analysis The classical method was parametric linkage analysis  the Lod-score method. This.
1 B-b B-B B-b b-b Lecture 2 - Segregation Analysis 1/15/04 Biomath 207B / Biostat 237 / HG 207B.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Lecture 21: Quantitative Traits I Date: 11/05/02  Review: covariance, regression, etc  Introduction to quantitative genetics.
Genetic Theory Pak Sham SGDP, IoP, London, UK. Theory Model Data Inference Experiment Formulation Interpretation.
Epistasis / Multi-locus Modelling Shaun Purcell, Pak Sham SGDP, IoP, London, UK.
© Copyright McGraw-Hill 2004
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Lecture 22: Quantitative Traits II
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
Powerful Regression-based Quantitative Trait Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.
ENGR 610 Applied Statistics Fall Week 7 Marshall University CITE Jack Smith.
1 Genetic Mapping Establishing relative positions of genes along chromosomes using recombination frequencies Enables location of important disease genes.
QTL Mapping Using Mx Michael C Neale Virginia Institute for Psychiatric and Behavioral Genetics Virginia Commonwealth University.
Association Mapping in Families Gonçalo Abecasis University of Oxford.
Lecture 17: Model-Free Linkage Analysis Date: 10/17/02  IBD and IBS  IBD and linkage  Fully Informative Sib Pair Analysis  Sib Pair Analysis with Missing.
Methods of Presenting and Interpreting Information Class 9.
Regression Models for Linkage: Merlin Regress
Identifying QTLs in experimental crosses
upstream vs. ORF binding and gene expression?
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Genome Wide Association Studies using SNP
I Have the Power in QTL linkage: single and multilocus analysis
Regression-based linkage analysis
Linkage in Selected Samples
Power to detect QTL Association
Mapping Quantitative Trait Loci
Chapter 9 Hypothesis Testing.
Review for Exam 2 Some important themes from Chapters 6-9
Error Checking for Linkage Analyses
Lecture 4: Testing for Departures from Hardy-Weinberg Equilibrium
I. Statistical Tests: Why do we use them? What do they involve?
Pak Sham & Shaun Purcell Twin Workshop, March 2002
Hypothesis Testing.
Lecture 10: QTL Mapping II: Outbred Populations
Statistics II: An Overview of Statistics
Lecture 9: QTL Mapping II: Outbred Populations
IBD Estimation in Pedigrees
Linkage Analysis Problems
Power Calculation for QTL Association
Presentation transcript:

Power in QTL linkage analysis Shaun Purcell & Pak Sham SGDP, IoP, London, UK F:\pshaun\power.ppt

Power primer Statistics (e.g. chi-squared, z-score) are continuous measures of support for a certain hypothesis YES NO Test statistic YES OR NO decision-making : significance testing Inevitably leads to two types of mistake : false positive (YES instead of NO) (Type I) false negative (NO instead of YES) (Type II)

Hypothesis testing Null hypothesis : no effect A ‘significant’ result means that we can reject the null hypothesis A ‘nonsignificant’ result means that we cannot reject the null hypothesis

Statistical significance The ‘p-value’ The probability of a false positive error if the null were in fact true Typically, we are willing to incorrectly reject the null 5% or 1% of the time (Type I error)

Misunderstandings p - VALUES that the p value is the probability of the null hypothesis being true that high p values mean large and important effects NULL HYPOTHESIS that nonrejection of the null implies its truth

Limitations IF A RESULT IS SIGNIFICANT leads to the conclusion that the null is false BUT, this may be trivial IF A RESULT IS NONSIGNIFICANT leads only to the conclusion that it cannot be concluded that the null is false

Alternate hypothesis Neyman & Pearson (1928) ALTERNATE HYPOTHESIS specifies a precise, non-null state of affairs with associated risk of error

P(T) Critical value T Sampling distribution if H0 were true Sampling distribution if HA were true P(T) Critical value   T

Nonsignificant result STATISTICS Rejection of H0 Nonrejection of H0 Type I error at rate  Nonsignificant result H0 true R E A L I T Y Type II error at rate  Significant result HA true POWER =(1- )

Power The probability of rejection of a false null-hypothesis depends on - the significance crtierion () - the sample size (N) - the effect size (NCP) “The probability of detecting a given effect size in a population from a sample of size N, using significance criterion ”

Impact of  alpha P(T) Critical value T  

Impact of  effect size, N P(T) Critical value T  

Applications EXPERIMENTAL DESIGN MAGNITUDE VS. SIGNIFICANCE - avoiding false positives vs. dealing with false negatives MAGNITUDE VS. SIGNIFICANCE - highly significant  very important INTERPRETING NONSIGIFICANT RESULTS - nonsignficant results only meaningful if power is high POWER SURVEYS / META-ANALYSES - low power undermines the confidence that can be placed in statistically significant results

Practical Exercise 1 Calculation of power for simple case-control association study. DATA : allele frequency of “A” allele for cases and controls TEST : 2-by-2 contingency table : chi-squared (1 degree of freedom)

Step 1 : determine expected chi-squared Hypothetical allele frequencies Cases P(A) = 0.68 Controls P(A) = 0.54 Sample 150 cases, 150 controls Excel spreadsheet : faculty drive:\pshaun\chisq.xls Chi-squared statistic = 12.36

P(T) Critical value T Step 2. Determine the critical value for a given type I error rate,  - inverse central chi-squared distribution P(T) Critical value  T

http://workshop.colorado.edu/~pshaun/gpc/pdf.html df = 1 , NCP = 0  X 0.05 0.01 0.001 3.84146 6.63489 10.82754

Step 3. Determine the power for a given critical value and non-centrality parameter - non-central chi-squared distribution P(T) Critical value T

Determining power df = 1 , NCP = 12.36  X Power 0.05 3.84146 0.05 3.84146 0.01 6.6349 0.001 10.827 0.94 0.83 0.59

Exercises Using the spreadsheet and the chi-squared calculator, what is power (for the 3 levels of alpha) 1. … if the sample size were 300 for each group? 2. … if allele frequencies were 0.24 and 0.18 for 750 cases and 750 controls?

Answers 1. NCP = 24.72  Power 0.05 1.00 0.01 0.99 0.001 0.95 2. NCP = 16.27  Power 0.05 0.98 0.01 0.93 0.001 0.77 nb. Stata : di 1-nchi(df,NCP,invchi(df,))

QTL linkage POWER Type I errors Variance explained Type II errors Sample N Effect Size Allele frequencies Genetic values Variance explained

Power of tests For chi-squared tests on large samples, power is determined by non-centrality parameter () and degrees of freedom (df)  = E(2lnL1 - 2lnL0) = E(2lnL1 ) - E(2lnL0) where expectations are taken at asymptotic values of maximum likelihood estimates (MLE) under an assumed true model

Linkage test HA H0 for i=j for ij for i=j for ij

Expected log likelihood under H0 Expectation of the quadratic product is simply s, the sibship size (note: standarised trait)

Expected log likelihood under HA

Linkage test Expected NCP For sib-pairs under complete marker information Determinant of 2-by-2 standardised covariance matrix = 1 - r2

Approximation of NCP NCP per sib pair is proportional to - the # of pairs in the sibship (large sibships are powerful) - the square of the additive QTL variance (decreases rapidly for QTL of v. small effect) - the sibling correlation (structure of residual variance is important)

QTL linkage POWER Type I errors Variance explained Type II errors Allele frequencies Genetic values Type I errors Type II errors Sample N Effect Size Variance explained Marker vs functional variant Recombination fraction

Incomplete linkage The previous calculations assumed analysis was performed at the QTL. - imagine that the test locus is not the QTL but is linked to it. Calculate sib-pair IBD distribution at the QTL, conditional on IBD at test locus, - a function of recombination fraction

 at QTL 1/2 1  at M 1/2 1

Use conditional probabilities to calculate the sib correlation conditional on IBD sharing at the test marker. For example : for IBD 0 at marker :  at QTL 1/2 1 r VS VA / 2 + VS VA + VD + VS P(M=0 | QTL) VS VA / 2 + VS + C0 = VA + VD + VS +

The noncentrality parameter per sib pair is then given by

If the QTL is additive, then attenuation of the NCP is by a factor of (1-2)4 = square of the correlation between the proportions of alleles IBD at two loci with recombination fraction 

Effect of incomplete linkage

Effect of incomplete linkage

Comparison to H-E Amos & Elston (1989) H-E regression - 90% power (at significant level 0.05) - QTL variance 0.5 - marker and major gene are completely linked  320 sib pairs  778 sib pairs if  = 0.1

GPC input parameters Proportions of variance additive QTL variance dominance QTL variance residual variance (shared / nonshared) Recombination fraction ( 0 - 0.5 ) Sample size & Sibship size ( 2 - 5 ) Type I error rate Type II error rate

GPC output parameters Expected sibling correlations - by IBD status at the QTL - by IBD status at the marker Expected NCP per sibship Power - at different levels of alpha given sample size Sample size - for specified power at different levels of alpha given power

From GPC Modelling additive effects only Sibships Individuals Pairs 265 (320) 530 Pairs ( = 0.1) 666 (778) 1332 Trios ( = 0.1) 220 660 Quads ( = 0.1) 110 440 Quints ( = 0.1) 67 335

Practical Exercise 2 What is the effect on power to detect linkage of : 1. QTL variance? 2. residual sibling correlation? 3. marker-QTL recombination fraction?

Pairs required (=0, p=0.05, power=0.8)

Pairs required (=0, p=0.05, power=0.8)

Effect of residual correlation QTL additive effects account for 10% trait variance Sample size required for 80% power (=0.05) No dominance  = 0.1 A residual correlation 0.35 B residual correlation 0.50 C residual correlation 0.65

Individuals required

Selective genotyping Unselected Proband Selection EDAC Maximally Dissimilar ASP Extreme Discordant EDAC Mahanalobis Distance

Selective genotyping The power calculations so far assume an unselected population. - calculate expected NCP per sibship If we have a sample with trait scores - calculate expected NCP for each sibship conditional on trait values - this quantity can be used to rank order the sample for genotying

Sibship informativeness : sib pairs -4 -3 -2 -1 1 2 3 4 Sib 1 trait Sib 2 trait 0.2 0.4 0.6 0.8 1.2 1.4 1.6 Sibship NCP

Sibship informativeness : sib pairs -4 -3 -2 -1 1 2 3 4 Sib 1 trait Sib 2 trait 0.5 1.5 Sibship NCP -4 -3 -2 -1 1 2 3 4 Sib 1 trait Sib 2 trait 0.5 1.5 Sibship NCP dominance -4 -3 -2 -1 1 2 3 4 Sib 1 trait Sib 2 trait 0.5 1.5 Sibship NCP unequal allele frequencies rare recessive

Selective genotyping SEL T ASP PS ED EDAC MaxD MDis SEL B p d/a .5 15.82 .1 17.10 .25 15.45 .1 1 16.88 .25 1 15.76 .5 1 18.89 .75 1 27.64 .9 43.16 1

Impact of selection

QTL linkage POWER Type I errors Variance explained Type II errors Allele frequencies Genetic values Type I errors Type II errors Sample N Effect Size Variance explained Marker vs functional variant Recombination fraction Locus informativeness PIC

Indices of marker informativeness: Markers should be highly polymorphic - alleles inherited from different sources are likely to be distinguishable Heterozygosity (H) Polymorphism Information Content (PIC) - measure number and frequency of alleles at a locus

Heterozygosity n = number of alleles, pi = frequency of the ith allele. H = probability that an individual is heterozygous

Heterozygosity Allele Frequency Genotype Frequency 11 0.04 12 0.14 11 0.04 12 0.14 13 0.02 14 0.16 22 0.1225 23 0.035 24 0.28 33 0.0025 34 0.04 44 0.16 Genotype Frequency 11 12 0.14 13 0.02 14 0.16 22 23 0.035 24 0.28 33 34 0.04 44 1 0.20 2 0.35 3 0.05 4 0.40 H = 0.675

Polymorphism information content IF a parent is heterozygous, their gametes will usually be informative. BUT if both parents & child are heterozygous for the same genotype, origins of child’s alleles are ambiguous IF C = the probability of this occurring, PIC = H - C

Polymorphism information content

Possible IBD configurations given parental genotypes Parental Mating Type Configuration Probability 1 Hom  Hom 1/4 1/2 (1-H)2 2 Hom  Het 0 1/4 H(1-H) 3 Hom  Het 1/2 3/4 H(1-H) 1: AA AA / AA AA 2: AA AB / AA AB 3: AA AB / AA AA 4: AB AB / AA AB 5: AB AB / AA BB 6: AB AB / AA AA 7: AB AB / AB AB 4 Het  Het 0 1/2 H2 / 2 5 Het  Het 0 0 (H2 -C)/4 6 Het  Het 1 1 (H2 -C)/4 7 Het  Het 1/2 1/2 C/2

PIC & NCP for linkage From the table of possible IBD configurations given parental genotypes, Therefore, NCP is attenuated in proportion to PIC

QTL linkage POWER Type I errors Variance explained Type II errors Allele frequencies Genetic values Type I errors Type II errors Sample N Effect Size Variance explained Marker vs functional variant Recombination fraction Locus informativeness PIC Multipoint Marker density MPIC

Multipoint IBD Estimates IBD sharing at any arbitrary point along a chromosomal region, using all available marker information on a chromosome simultaneously.

, ,  and PIC ^

2. Convert map distances into recombination fractions 1. Calculate PIC for each marker PIC 0.41 0.77 0.84 0.79 5 cM -->  = 0.04758 2. Convert map distances into recombination fractions Haldane map function (m = map distance in Morgans) 5cM 5cM 5cM 5cM M1 0.1 0.2 0.7 M2 0.2 M3 0.1 0.1 0.1 0.2 M4 0.2 0.1 M5 0.2

3. Calculate covariance matrix between pi-hat at markers M1 M2 M3 M4 M5 M1 0.051 0.032 0.035 0.033 0.032 M2 0.032 0.096 0.066 0.062 0.061 M3 0.035 0.066 0.105 0.068 0.066 M4 0.033 0.062 0.068 0.099 0.062 M5 0.032 0.061 0.066 0.062 0.096 MM =

4. Consider each multipoint position 5cM M1 M2 M3 M4 M5 10cM 15cM 20cM 25cM 30cM At each position along the chromosome, calculate covariance between trait locus and each of the markers 0.0344 0.0528 0.0472 0.0363 0.0290 MT = MD 10 15 20 25 30 RF 0.091 0.130 0.165 0.197 0.226 PIC 0.41 0.77 0.84 0.79

Fulker et al multipoint If is a vector of single marker IBD estimates then a multipoint IBD estimate at test position t is given by : Conditional on the variance of  at the test position is reduced by a quantity which can be thought of as a multipoint PIC

5. Calculate MPIC

10 cM map

5 cM map

Exclusion mapping Exclusion : support for the hypothesis that a QTL of at least a certain effect is absent at that position Normally, the LRT compares the likelihood at the MLE and the null In exclusion mapping, the LRT compares the likelihood of a fixed effect size against the null and therefore can be negative

Conclusions Factors influencing power QTL variance Sib correlation Sibship size Marker informativeness Marker density Phenotypic selection