Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Cladistic Clustering of Haplotypes in Association Analysis Jung-Ying Tzeng Aug 27, 2004 Department of Statistics & Bioinformatics Research Center North.

Similar presentations


Presentation on theme: "1 Cladistic Clustering of Haplotypes in Association Analysis Jung-Ying Tzeng Aug 27, 2004 Department of Statistics & Bioinformatics Research Center North."— Presentation transcript:

1 1 Cladistic Clustering of Haplotypes in Association Analysis Jung-Ying Tzeng Aug 27, 2004 Department of Statistics & Bioinformatics Research Center North Carolina State University

2 2 Simple Disorder vs. Complex Disorder Peltonen and McKusick (2001). Science

3 3 Complex Disorders  Liability genes = genes containing variants increasing disease liability  Goal: look for such genes  Rely more on the epidemiological evidences Association analysis  Case-control studies  Detect liability genes by searching for association between disease status and genetic variants

4 4 Genetic Markers  Instead of studying the whole DNA sequences, we look at a subset of them---genetic markers  SNP: Single Nucleotide Polymorphism Pro: dense; 100-300bp Con: binary variants Resolved by considering adjacent SNPs jointly

5 5 Haplotype-based Association Analysis  Haplotype = maker sequence  Haplotye-based association analysis TCTC CACA CaseControl Hap 1 Hap 2 Hap 3. Hap k T C T C C A C A

6 6 Haplotype-based Association Analysis  Problem: findings are not replicable Under-powered (Lohmueller et. al 2003; Neal and Sham 2004 )  Solution: 1. Use large samples (Lohmueller et. al 2003) 2. Reduce the dimension of the parameter space

7 7 Dimensionality  Haplotype distribution within a block Daly et al. (2001) Nature Genetics  Method I: Truncating : tag SNPs

8 8  Evolutionary tree of haplotypes  Minimize the haplotype distance within clusters 000000 1 00000 1 0000 1 1 000 11 1 00 1 0 1 1 0 1 00 1 11 0000 0 1 0000 0 11 00 1 000 1 00 0 11 000 111 000 Method II: Clustering (Molitor et al. 2003; Durrant et al. 2004)

9 9 Method II: Clustering 000000 100000 100001 100011 100101 101001 110000 010000 011001 000100 011000 111000

10 10 000000 100000 100001 100011 100101 101001 110000 010000 011001 000100 011000 111000 Method II: Clustering

11 11  Observed Hap ={ 000, 001, 010, 100,110, 101, 011, 111 } 001 101 110 010 011 000 111 100 001 101 110 010 011 000 111 100 Method III: Cladistic Grouping (Templeton 1995) (Seltman et al. 2003) Cladogram

12 12  Include all samples  Incorporate both haplotype distance and age High frequency  ancient (Crandall & Templeton 1995) Low frequency  young  Allow uncertainty in inferring the underlying evolutionary relationship Desired Features

13 13 Possible Hap = { 000, 001, 010, 100, 110, 101, 011, 111 } 110 001 101 011 000 111 010 100 { 110 }  (2)   * (i) t =  (i) t +  (i+1) t B (i+1 ) { 000, 010, 111, 100 } { 001, 011, 101 }  (1)  (0) 001 101 011 111 010 100 000 110 B (2) B (1) Proposed Approach: Cladistic Clustering p 1-p q1q1 q2q2 1-q 1 -q 2   * t =  t  B =  (0) t  (1) t  (2) t B (2) B (1) B (1) I

14 14 Issues 1.Determine major nodes  (0) 2.Construct conditional allocating matrix B (i)

15 15 110 001 101 011 000 111 010 100 { 110 } { 000, 010, 100, 111 } { 001, 011, 101 }  B (2) = C = (           ) c c c c 110 000010100111  (2)  (1)  (0) Conditional Allocating Matrix B ( i )   * (1) t =  (2) t B (2) +  (1)t  [0,1  likelihood of one step movement B (2)            110 111 010 100 000

16 16  B (1) =   * t =  (0) t +  (1) t B (1) +  (2) t B (2) B (1) Conditional Allocating Matrix B ( i ) 110 001 101 011 000 111 010 100   111    010  000 101011001

17 17 Determine    Information criteria Net Information (Shannon’s Information content)

18 18 Net Information and  (0)

19 19 Association Analysis Based on  *  Coalescent simulation (Hudson’s 2002) : Prevalence = 0.01 Relative Risk = 2 Frequencies of liability Allele = (0.1, 0.3, 0.5) Location of liability allele = ( hot spot, blocky, very blocky ) Draw 200 cases and 200 controls  Test of homogeneity based on  * cs and  * cn

20 20 Power and Type I error Gene Pelc Gene IL01RB

21 21 Summary  Provide a mechanism of cladistic clustering by  *  B Combine the ideas of Truncating and Clustering Based on evolutionary relationship without reconstruct cladogram Incorporate haplotype frequencies and distance in cluster assignment One-step conditional regrouping can accommodate multiple step regrouping: self-repeating, algebraic multiplicative Reserve  (0) based on information criteria   * increases test efficiency Increased power even for large samples and haplotypes in block regions

22 22 End of Slides

23 23 Approach  Two stages: Stage I: (Where) Identify the susceptible regions across genome (multiple testing problem) Approaches based on haplotype similarity Stage II: (Which) Determine and pinpoint the specific liability variants Study individual effects of groups of haplotypes

24 24 I. Haplotype Similarity Van Der Meulen and te Meerman 1997; Bourgain et al. 2000-2002; Tzeng et al. 2003ab Search for extra haplotype sharing among cases Pro: 1 degree of freedom Con: not study individual haplotype effect Usage: good for genome screening Strategies of Reducing Degrees of Freedom

25 25 Strategies of Reducing Degrees of Freedom Freq (%) 1AC A CCCCCGGG C C G 45 2........... A.. 20 3CT T G.TATTA.... 13.25 4............. A 11.25 5C. T.T.A...A A.. 3.75 6............ T. 3.50 7C............. 1.50 8C. T.T.A....... 0.50 9.T T G.TATTA.... 1ACG 2.A. 3T.. 4..A 5TA. (1)...... 6T.. (6) T.. tag SNP II.Haplotype Tagging (Johnson et al. 2001) Pro: efficiently capture the major diversity Con: discard rare haplotypes

26 26 III. Haplotype Clustering Molitor et al. 2003; Seltman et al 2001, 2003; Durrant et al 2004 Similar haplotypes induce similar liability effect Cluster haplotypes and perform analysis based on clusters of haplotypes Pro: incorporating all data Con: may cluster two major haplotypes in the same group Strategies of Reducing Degrees of Freedom

27 27 Approach  Two stages: Stage I: (Where) Identify the susceptible regions across genome (multiple testing problem)  Approaches based on haplotype similarity Stage II: (Which) Determine and pinpoint the specific liability variants  Study individual effects of groups of haplotypes

28 28 Haplotype Grouping  Focus on Stage II  Combine the pros of haplotype tagging and clustering

29 29 Power and Type I error Gene Pelc Gene IL01RB


Download ppt "1 Cladistic Clustering of Haplotypes in Association Analysis Jung-Ying Tzeng Aug 27, 2004 Department of Statistics & Bioinformatics Research Center North."

Similar presentations


Ads by Google