Download presentation
Presentation is loading. Please wait.
Published byEvan Kelly Modified over 9 years ago
1
1 Association Analysis of Rare Genetic Variants Qunyuan Zhang Division of Statistical Genomics Course M21-621 Computational Statistical Genetics
2
2 Rare Variants Rare Variants Low allele frequency: usually less than 1% Low power: for most analyses, due to less variation of observations High false positive rate: for some model-based analyses, due to sparse distribution of data, unstable/biased parameter estimation and inflated p- value.
3
3 An Example of Low Power Jonathan C. Cohen, et al.Science 305, 869 (2004)
4
An Example of High False Positive Rate (Q-Q plots from GWAS data, unpublished) N=~2500 MAF>0.03 N=~2500 MAF<0.03 N=~2500 MAF<0.03 Permuted N=50000 MAF<0.03 Bootstrapped
5
5 Three Levels of Rare Variant Data Three Levels of Rare Variant Data Level 1: Individual-level Level 2: Summarized over subjects Level 3: Summarized over both subjects and variants
6
6 Level 1: Individual-level SubjectV1V2V3V4Trait-1Trait-2 1100090.11 2010.99.21 30000105.90 4000089.50 50.0097.60 60000110.50 7001088.80 8000195.41
7
7 Level 2: Summarized over subjects (by group) Jonathan C. Cohen, et al.Science 305, 869 (2004)Jonathan C. Cohen, et al.Science 305, 869 (2004)
8
Level 3: Summarized over subjects (by group) and variants (usually by gene) Variant allele number Reference allele number Total Low-HDL group 20236256 High-HDL group 2254256 Total 22490 512
9
9 Methods For Level 3 Data
10
10 Single-variant Test vs Total Freq.Test (TFT) Jonathan C. Cohen, et al.Science 305, 869 (2004)
11
11 What we have learned … Single-variant test of rare variants has very low power for detecting association, due to extremely low frequency (usually < 0.01) Testing collective effect of a set of rare variants may increase the power (sum test, collective test, group test, collapsing test, burden test…)
12
12 Methods For Level 2 Data Allowing different samples sizes for different variants Different variants can be weighted differently
13
13 CAST: A cohort allelic sums test Morgenthaler and Thilly, Mutation Research 615 (2007) 28–56 Under H0: S(cases)/2N(cases)−S(controls)/2N(controls) =0 S: variant number; N: sample size T= S(cases) − S(controls)N(cases)/N(controls) = S(cases) − S ∗ (controls) (S can be calculated variant by variant and can be weighted differently, the final T=sum(W i S i ) ) Z=T/SQRT(Var(T)) ~ N (0,1) Var(T)= Var (S(cases) − S* (controls) ) =Var(S(cases)) + Var(S* (controls)) =Var(S(cases)) + Var(S(controls)) X [N(cases)/N(controls)]^2
14
14 C-alpha PLOS Genetics, 2011 | Volume 7 | Issue 3 | e1001322 Effect direction problem
15
15 C-alpha
16
QQ Plots of Existing Methods (under the null) EFT and C-alpha inflated with false positives TFT and CAST no inflation, but assuming single effect-direction Objective More general, powerful methods … CAST C-alpha EFT TFT
17
17 More Generalized Methods For Level 2 Data
18
Structure of Level 2 data variant 1 variant i variant k variant 2 … Strategy Instead of testing total freq./number, we test the randomness of all tables. variant 3 …
19
4. Calculating p-value P= Prob.( ) Exact Probability Test (EPT) 1.Calculating the probability of each table based on hypergeometric distribution 2. Calculating the logarized joint probability (L) for all k tables 3. Enumerating all possible tables and L scores ASHG Meeting 1212, Zhang
20
Likelihood Ratio Test (LRT) Binomial distribution ASHG Meeting 1212, Zhang
21
Q-Q Plots of EPT and LRT (under the null) EPT N=500 EPT N=3000 LRT N=500 LRT N=3000
22
Power Comparison significance level=0.00001 Variant proportion Positive causal 80% Neutral 20% Negative Causal 0% Power Sample size Power Sample size Power Sample size
23
Power Comparison significance level=0.00001 Variant proportion Positive causal 60% Neutral 20% Negative Causal 20% Power Sample size
24
Power Comparison significance level=0.00001 Variant proportion Positive causal 40% Neutral 20% Negative Causal 40% Power Sample size
25
25 Methods For Level 1 Data Including covariates Extended to quantitative trait Better control for population structure More sophisticate model
26
26 Collapsing (C) test Step 1 Step 2 logit(y)=a + b* X + e (logistic regression) Li and Leal,The American Journal of Human Genetics 2008(83): 311–321
27
27 Variant Collapsing (+) (.) SubjectV1V2V3V4CollapsedTrait 1100011 2010011 3000000 4000000 5000000 6000000 7001010 8000111
28
28 WSS
29
29 WSS
30
30 WSS
31
31 Weighted Sum Test Collapsing test (Li & Leal, 2008), w i =1 and s=1 if s>1 Weighted-sum test (Madsen & Browning,2009), w i calculated based-on allele freq. in control group aSum: Adaptive sum test (Han & Pan,2010), w i = -1 if b<0 and p<0.1, otherwise w j =1 KBAC (Liu and Leal, 2010), w i = left tail p value RBT (Ionita-Laza et al, 2011), w i = log scaled probability PWST p-value weighted sum test (Zhang et al., 2011) :, w i = rescaled left tail p value, incorporating both significance and directions EREC( Lin et al, 2011), w i = estimated effect size
32
32 When there are only causal(+) variants … (+) Subjec tV1V2 Collapse dTrait 11013.00 20113.10 30001.95 40002.00 50002.05 60002.10 Collapsing (Li & Leal,2008) works well, power increased
33
33 (+) (.) SubjectV1V2V3V4 Collapse dTrait 1100013.00 2010013.10 3000001.95 4000002.00 5000002.05 6000002.10 7001012.00 8000112.10 When there are causal(+) and non-causal(.) variants … Collapsing still works, power reduced
34
34 (+) (.) (-) SubjectV1V2V3V4V5V6 Collaps edTrait 110000013.00 201000013.10 300000001.95 400000002.00 500000002.05 600000002.10 700100012.00 800010012.10 900001010.95 1000000111.00 When there are causal(+) non-causal(.) and causal (-) variants … Power of collapsing test significantly down
35
35 P-value Weighted Sum Test (PWST) (+) (.) (-) SubjectV1V2V3V4V5V6CollapsedpSumTrait 110000010.863.00 201000010.903.10 300000000.001.95 400000000.002.00 500000000.002.05 600000000.002.10 70010001-0.022.00 800010010.082.10 90000101-0.900.95 100000011-0.881.00 t 1.611.84-0.040.11-1.84-1.72 p(x≤t) 0.930.950.490.540.050.06 2*(p-0.5)0.860.90-0.020.08-0.90-0.88 Rescaled left-tail p-value [-1,1] is used as weight
36
36 P-value Weighted Sum Test (PWST) Power of collapsing test is retained even there are bidirectional effects
37
37 PWST:Q-Q Plots Under the Null Direct test Inflation of type I error Corrected by permutation test (permutation of phenotype)
38
Generalized Linear Mixed Model (GLMM) & Weighted Sum Test (WST) 38
39
GLMM & WST Y : quantitative trait or logit(binary trait) α : intercept β : regression coefficient of weighted sum m : number of RVs to be collapsed w i : weight of variant i g i : genotype (recoded) of variant i Σw i g i : weighted sum (WS) X : covariate(s), such as population structure variable(s) τ : fixed effect(s) of X Z: design matrix corresponding to γ γ : random polygene effects for individual subjects, ~N(0, G), G=2σ 2 K, K is the kinship matrix and σ 2 the additive ploygene genetic variance ε : residual 39
40
Base on allele frequency, binary(0,1) or continuous, fixed or variable threshold; Based on function annotation/prediction; SIFT, PolyPhen etc. Based on sequencing quality (coverage, mapping quality, genotyping quality etc.); Data-driven, using both genotype and phenotype data, learning weight from data or adaptive selection, permutation test; Any combination … Weight 40
41
Adjusting relatedness in family data for non-data- driven test of rare variants. Application 1: Family Data 41 γ ~N(0,2σ 2 K) Unadjusted: Adjusted:
42
Q-Q Plots of –log 10 (P) under the Null Li & Leal’s collapsing test, ignoring family structure, inflation of type-1 error Li & Leal’s collapsing test, modeling family structure via GLMM, inflation is corrected 42 (From Zhang et al, 2011, BMC Proc.)
43
Application 2: Permuting Family Data Permuted Non-permuted, subject IDs fixed 43 MMPT: Mixed Model-based Permutation Test Adjusting relatedness in family data for data-driven permutation test of rare variants. γ ~N(0,2σ 2 K)
44
Q-Q Plots under the Null WSS SPWSTPWST aSum Permutation test, ignoring family structure, inflation of type-1 error 44 (From Zhang et al, 2011, IGES Meeting)
45
Q-Q Plots under the Null WSS SPWSTPWST aSum Mixed model-based permutation test (MMPT), modeling family structure, inflation corrected (From Zhang et al, 2011, IGES Meeting)
46
Burden Test vs. Non-burden Test 46 Burden test Non-burden test T-test, Likelihood Ratio Test, F-test, score test, … SKAT: sequence kernel association test
48
Extension of SKAT to Family Data kinship matrix Polygenic heritability of the traitResidual Han Chen et al., 2012, Genetic Epidemiology
49
Other problems 49 Missing genotypes & imputation Genotyping errors & QC (family consistency, sequence review) Population Stratification Inherited variants and de novo mutation Family data & linkage infomation Variant validation and association validation Public databases And more …
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.