Download presentation
Presentation is loading. Please wait.
1
1 Cladistic Clustering of Haplotypes in Association Analysis Jung-Ying Tzeng Aug 27, 2004 Department of Statistics & Bioinformatics Research Center North Carolina State University
2
2 Simple Disorder vs. Complex Disorder Peltonen and McKusick (2001). Science
3
3 Complex Disorders Liability genes = genes containing variants increasing disease liability Goal: look for such genes Rely more on the epidemiological evidences Association analysis Case-control studies Detect liability genes by searching for association between disease status and genetic variants
4
4 Genetic Markers Instead of studying the whole DNA sequences, we look at a subset of them---genetic markers SNP: Single Nucleotide Polymorphism Pro: dense; 100-300bp Con: binary variants Resolved by considering adjacent SNPs jointly
5
5 Haplotype-based Association Analysis Haplotype = maker sequence Haplotye-based association analysis TCTC CACA CaseControl Hap 1 Hap 2 Hap 3. Hap k T C T C C A C A
6
6 Haplotype-based Association Analysis Problem: findings are not replicable Under-powered (Lohmueller et. al 2003; Neal and Sham 2004 ) Solution: 1. Use large samples (Lohmueller et. al 2003) 2. Reduce the dimension of the parameter space
7
7 Dimensionality Haplotype distribution within a block Daly et al. (2001) Nature Genetics Method I: Truncating : tag SNPs
8
8 Evolutionary tree of haplotypes Minimize the haplotype distance within clusters 000000 1 00000 1 0000 1 1 000 11 1 00 1 0 1 1 0 1 00 1 11 0000 0 1 0000 0 11 00 1 000 1 00 0 11 000 111 000 Method II: Clustering (Molitor et al. 2003; Durrant et al. 2004)
9
9 Method II: Clustering 000000 100000 100001 100011 100101 101001 110000 010000 011001 000100 011000 111000
10
10 000000 100000 100001 100011 100101 101001 110000 010000 011001 000100 011000 111000 Method II: Clustering
11
11 Observed Hap ={ 000, 001, 010, 100,110, 101, 011, 111 } 001 101 110 010 011 000 111 100 001 101 110 010 011 000 111 100 Method III: Cladistic Grouping (Templeton 1995) (Seltman et al. 2003) Cladogram
12
12 Include all samples Incorporate both haplotype distance and age High frequency ancient (Crandall & Templeton 1995) Low frequency young Allow uncertainty in inferring the underlying evolutionary relationship Desired Features
13
13 Possible Hap = { 000, 001, 010, 100, 110, 101, 011, 111 } 110 001 101 011 000 111 010 100 { 110 } (2) * (i) t = (i) t + (i+1) t B (i+1 ) { 000, 010, 111, 100 } { 001, 011, 101 } (1) (0) 001 101 011 111 010 100 000 110 B (2) B (1) Proposed Approach: Cladistic Clustering p 1-p q1q1 q2q2 1-q 1 -q 2 * t = t B = (0) t (1) t (2) t B (2) B (1) B (1) I
14
14 Issues 1.Determine major nodes (0) 2.Construct conditional allocating matrix B (i)
15
15 110 001 101 011 000 111 010 100 { 110 } { 000, 010, 100, 111 } { 001, 011, 101 } B (2) = C = ( ) c c c c 110 000010100111 (2) (1) (0) Conditional Allocating Matrix B ( i ) * (1) t = (2) t B (2) + (1)t [0,1 likelihood of one step movement B (2) 110 111 010 100 000
16
16 B (1) = * t = (0) t + (1) t B (1) + (2) t B (2) B (1) Conditional Allocating Matrix B ( i ) 110 001 101 011 000 111 010 100 111 010 000 101011001
17
17 Determine Information criteria Net Information (Shannon’s Information content)
18
18 Net Information and (0)
19
19 Association Analysis Based on * Coalescent simulation (Hudson’s 2002) : Prevalence = 0.01 Relative Risk = 2 Frequencies of liability Allele = (0.1, 0.3, 0.5) Location of liability allele = ( hot spot, blocky, very blocky ) Draw 200 cases and 200 controls Test of homogeneity based on * cs and * cn
20
20 Power and Type I error Gene Pelc Gene IL01RB
21
21 Summary Provide a mechanism of cladistic clustering by * B Combine the ideas of Truncating and Clustering Based on evolutionary relationship without reconstruct cladogram Incorporate haplotype frequencies and distance in cluster assignment One-step conditional regrouping can accommodate multiple step regrouping: self-repeating, algebraic multiplicative Reserve (0) based on information criteria * increases test efficiency Increased power even for large samples and haplotypes in block regions
22
22 End of Slides
23
23 Approach Two stages: Stage I: (Where) Identify the susceptible regions across genome (multiple testing problem) Approaches based on haplotype similarity Stage II: (Which) Determine and pinpoint the specific liability variants Study individual effects of groups of haplotypes
24
24 I. Haplotype Similarity Van Der Meulen and te Meerman 1997; Bourgain et al. 2000-2002; Tzeng et al. 2003ab Search for extra haplotype sharing among cases Pro: 1 degree of freedom Con: not study individual haplotype effect Usage: good for genome screening Strategies of Reducing Degrees of Freedom
25
25 Strategies of Reducing Degrees of Freedom Freq (%) 1AC A CCCCCGGG C C G 45 2........... A.. 20 3CT T G.TATTA.... 13.25 4............. A 11.25 5C. T.T.A...A A.. 3.75 6............ T. 3.50 7C............. 1.50 8C. T.T.A....... 0.50 9.T T G.TATTA.... 1ACG 2.A. 3T.. 4..A 5TA. (1)...... 6T.. (6) T.. tag SNP II.Haplotype Tagging (Johnson et al. 2001) Pro: efficiently capture the major diversity Con: discard rare haplotypes
26
26 III. Haplotype Clustering Molitor et al. 2003; Seltman et al 2001, 2003; Durrant et al 2004 Similar haplotypes induce similar liability effect Cluster haplotypes and perform analysis based on clusters of haplotypes Pro: incorporating all data Con: may cluster two major haplotypes in the same group Strategies of Reducing Degrees of Freedom
27
27 Approach Two stages: Stage I: (Where) Identify the susceptible regions across genome (multiple testing problem) Approaches based on haplotype similarity Stage II: (Which) Determine and pinpoint the specific liability variants Study individual effects of groups of haplotypes
28
28 Haplotype Grouping Focus on Stage II Combine the pros of haplotype tagging and clustering
29
29 Power and Type I error Gene Pelc Gene IL01RB
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.