1 Association Analysis of Rare Genetic Variants Qunyuan Zhang Division of Statistical Genomics Course M21-621 Computational Statistical Genetics.

1 Association Analysis of Rare Genetic Variants Qunyuan Zhang Division of Statistical Genomics Course M21-621 Computational Statistical Genetics

2 Rare Variants Rare Variants Low allele frequency: usually less than 1% Low power: for most analyses, due to less variation of observations High false positive rate: for some model-based analyses, due to sparse distribution of data, unstable/biased parameter estimation and inflated p- value.

3 An Example of Low Power Jonathan C. Cohen, et al.Science 305, 869 (2004)

An Example of High False Positive Rate (Q-Q plots from GWAS data, unpublished) N=~2500 MAF>0.03 N=~2500 MAF<0.03 N=~2500 MAF<0.03 Permuted N=50000 MAF<0.03 Bootstrapped

5 Three Levels of Rare Variant Data Three Levels of Rare Variant Data Level 1: Individual-level Level 2: Summarized over subjects Level 3: Summarized over both subjects and variants

6 Level 1: Individual-level SubjectV1V2V3V4Trait-1Trait-2 1100090.11 2010.99.21 30000105.90 4000089.50 50.0097.60 60000110.50 7001088.80 8000195.41

7 Level 2: Summarized over subjects (by group) Jonathan C. Cohen, et al.Science 305, 869 (2004)Jonathan C. Cohen, et al.Science 305, 869 (2004)

Level 3: Summarized over subjects (by group) and variants (usually by gene) Variant allele number Reference allele number Total Low-HDL group 20236256 High-HDL group 2254256 Total 22490 512

9 Methods For Level 3 Data

10 Single-variant Test vs Total Freq.Test (TFT) Jonathan C. Cohen, et al.Science 305, 869 (2004)

11 What we have learned …  Single-variant test of rare variants has very low power for detecting association, due to extremely low frequency (usually < 0.01)  Testing collective effect of a set of rare variants may increase the power (sum test, collective test, group test, collapsing test, burden test…)

12 Methods For Level 2 Data  Allowing different samples sizes for different variants  Different variants can be weighted differently

13 CAST: A cohort allelic sums test Morgenthaler and Thilly, Mutation Research 615 (2007) 28–56 Under H0: S(cases)/2N(cases)−S(controls)/2N(controls) =0 S: variant number; N: sample size T= S(cases) − S(controls)N(cases)/N(controls) = S(cases) − S ∗ (controls) (S can be calculated variant by variant and can be weighted differently, the final T=sum(W i S i ) ) Z=T/SQRT(Var(T)) ~ N (0,1) Var(T)= Var (S(cases) − S* (controls) ) =Var(S(cases)) + Var(S* (controls)) =Var(S(cases)) + Var(S(controls)) X [N(cases)/N(controls)]^2

14 C-alpha PLOS Genetics, 2011 | Volume 7 | Issue 3 | e1001322 Effect direction problem

15 C-alpha

QQ Plots of Existing Methods (under the null) EFT and C-alpha inflated with false positives TFT and CAST no inflation, but assuming single effect-direction Objective More general, powerful methods … CAST C-alpha EFT TFT

17 More Generalized Methods For Level 2 Data

Structure of Level 2 data variant 1 variant i variant k variant 2 … Strategy Instead of testing total freq./number, we test the randomness of all tables. variant 3 …

4. Calculating p-value P= Prob.( ) Exact Probability Test (EPT) 1.Calculating the probability of each table based on hypergeometric distribution 2. Calculating the logarized joint probability (L) for all k tables 3. Enumerating all possible tables and L scores ASHG Meeting 1212, Zhang

Likelihood Ratio Test (LRT) Binomial distribution ASHG Meeting 1212, Zhang

Q-Q Plots of EPT and LRT (under the null) EPT N=500 EPT N=3000 LRT N=500 LRT N=3000

Power Comparison significance level=0.00001 Variant proportion Positive causal 80% Neutral 20% Negative Causal 0% Power Sample size Power Sample size Power Sample size

Power Comparison significance level=0.00001 Variant proportion Positive causal 60% Neutral 20% Negative Causal 20% Power Sample size

Power Comparison significance level=0.00001 Variant proportion Positive causal 40% Neutral 20% Negative Causal 40% Power Sample size

25 Methods For Level 1 Data Including covariates Extended to quantitative trait Better control for population structure More sophisticate model

26 Collapsing (C) test Step 1 Step 2 logit(y)=a + b* X + e (logistic regression) Li and Leal,The American Journal of Human Genetics 2008(83): 311–321

27 Variant Collapsing (+) (.) SubjectV1V2V3V4CollapsedTrait 1100011 2010011 3000000 4000000 5000000 6000000 7001010 8000111

28 WSS

29 WSS

30 WSS

31 Weighted Sum Test Collapsing test (Li & Leal, 2008), w i =1 and s=1 if s>1 Weighted-sum test (Madsen & Browning,2009), w i calculated based-on allele freq. in control group aSum: Adaptive sum test (Han & Pan,2010), w i = -1 if b<0 and p<0.1, otherwise w j =1 KBAC (Liu and Leal, 2010), w i = left tail p value RBT (Ionita-Laza et al, 2011), w i = log scaled probability PWST p-value weighted sum test (Zhang et al., 2011) :, w i = rescaled left tail p value, incorporating both significance and directions EREC( Lin et al, 2011), w i = estimated effect size

32 When there are only causal(+) variants … (+) Subjec tV1V2 Collapse dTrait 11013.00 20113.10 30001.95 40002.00 50002.05 60002.10 Collapsing (Li & Leal,2008) works well, power increased

33 (+) (.) SubjectV1V2V3V4 Collapse dTrait 1100013.00 2010013.10 3000001.95 4000002.00 5000002.05 6000002.10 7001012.00 8000112.10 When there are causal(+) and non-causal(.) variants … Collapsing still works, power reduced

34 (+) (.) (-) SubjectV1V2V3V4V5V6 Collaps edTrait 110000013.00 201000013.10 300000001.95 400000002.00 500000002.05 600000002.10 700100012.00 800010012.10 900001010.95 1000000111.00 When there are causal(+) non-causal(.) and causal (-) variants … Power of collapsing test significantly down

35 P-value Weighted Sum Test (PWST) (+) (.) (-) SubjectV1V2V3V4V5V6CollapsedpSumTrait 110000010.863.00 201000010.903.10 300000000.001.95 400000000.002.00 500000000.002.05 600000000.002.10 70010001-0.022.00 800010010.082.10 90000101-0.900.95 100000011-0.881.00 t 1.611.84-0.040.11-1.84-1.72 p(x≤t) 0.930.950.490.540.050.06 2*(p-0.5)0.860.90-0.020.08-0.90-0.88 Rescaled left-tail p-value [-1,1] is used as weight

36 P-value Weighted Sum Test (PWST) Power of collapsing test is retained even there are bidirectional effects

37 PWST:Q-Q Plots Under the Null Direct test Inflation of type I error Corrected by permutation test (permutation of phenotype)

Generalized Linear Mixed Model (GLMM) & Weighted Sum Test (WST) 38

GLMM & WST Y : quantitative trait or logit(binary trait) α : intercept β : regression coefficient of weighted sum m : number of RVs to be collapsed w i : weight of variant i g i : genotype (recoded) of variant i Σw i g i : weighted sum (WS) X : covariate(s), such as population structure variable(s) τ : fixed effect(s) of X Z: design matrix corresponding to γ γ : random polygene effects for individual subjects, ~N(0, G), G=2σ 2 K, K is the kinship matrix and σ 2 the additive ploygene genetic variance ε : residual 39

Base on allele frequency, binary(0,1) or continuous, fixed or variable threshold; Based on function annotation/prediction; SIFT, PolyPhen etc. Based on sequencing quality (coverage, mapping quality, genotyping quality etc.); Data-driven, using both genotype and phenotype data, learning weight from data or adaptive selection, permutation test; Any combination … Weight 40

Adjusting relatedness in family data for non-data- driven test of rare variants. Application 1: Family Data 41 γ ~N(0,2σ 2 K) Unadjusted: Adjusted:

Q-Q Plots of –log 10 (P) under the Null Li & Leal’s collapsing test, ignoring family structure, inflation of type-1 error Li & Leal’s collapsing test, modeling family structure via GLMM, inflation is corrected 42 (From Zhang et al, 2011, BMC Proc.)

Application 2: Permuting Family Data Permuted Non-permuted, subject IDs fixed 43 MMPT: Mixed Model-based Permutation Test Adjusting relatedness in family data for data-driven permutation test of rare variants. γ ~N(0,2σ 2 K)

Q-Q Plots under the Null WSS SPWSTPWST aSum Permutation test, ignoring family structure, inflation of type-1 error 44 (From Zhang et al, 2011, IGES Meeting)

Q-Q Plots under the Null WSS SPWSTPWST aSum Mixed model-based permutation test (MMPT), modeling family structure, inflation corrected (From Zhang et al, 2011, IGES Meeting)

Burden Test vs. Non-burden Test 46 Burden test Non-burden test T-test, Likelihood Ratio Test, F-test, score test, … SKAT: sequence kernel association test

Extension of SKAT to Family Data kinship matrix Polygenic heritability of the traitResidual Han Chen et al., 2012, Genetic Epidemiology

Other problems 49  Missing genotypes & imputation  Genotyping errors & QC (family consistency, sequence review)  Population Stratification  Inherited variants and de novo mutation  Family data & linkage infomation  Variant validation and association validation  Public databases  And more …

1 Association Analysis of Rare Genetic Variants Qunyuan Zhang Division of Statistical Genomics Course M21-621 Computational Statistical Genetics.

Similar presentations

Presentation on theme: "1 Association Analysis of Rare Genetic Variants Qunyuan Zhang Division of Statistical Genomics Course M21-621 Computational Statistical Genetics."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Association Analysis of Rare Genetic Variants Qunyuan Zhang Division of Statistical Genomics Course M21-621 Computational Statistical Genetics.

Similar presentations

Presentation on theme: "1 Association Analysis of Rare Genetic Variants Qunyuan Zhang Division of Statistical Genomics Course M21-621 Computational Statistical Genetics."— Presentation transcript:

Similar presentations

About project

Feedback