Download presentation
Presentation is loading. Please wait.
Published byBlake Colden Modified over 10 years ago
1
Association Studies, Haplotype Blocks and Tagging SNPs Prof. Sorin Istrail
2
Association studies Disease Responder Control Non-responder Allele 0Allele 1 Marker A is associated with Phenotype Marker A: Allele 0 = Allele 1 =
3
Association studies Evaluate whether nucleotide polymorphisms associate with phenotype TA GA A CG GA A CG TA A TA TC G TG TA G TG GA G
4
TA GA A CG GA A CG TA A TA TC G TG TA G TG GA G Association studies
5
Hypothesis – Haplotype Blocks? The genome consists largely of blocks of common SNPs with relatively little recombination within the blocks Patil et al., Science, 2001; Jeffreys et al., Nature Genetics, 2001; Daly et al., Nature Genetics, 2001
6
Sense genes Antisense genes 200 kb 1234 DNA SNPs Haplotype blocks Haplotype Block Structure LD-Blocks, and 4-Gamete Test Blocks
7
One definition of block Based on the Four Gamete test. Intuition: when between two SNPs there are all four gametes, there is a recombination point somewhere inbetween the two sites
8
Four Gamete Block Test Hudson and Kaplan 1985 A segment of SNPs is a block if between every pair of SNPs at most 3 out of the 4 gametes (00, 01,10,11) are observed. 0 0 1 0 1 1 1 1 0 1 1 1 0 0 1 0 1 1 1 1 0 1 0 1 BLOCKVIOLATES THE BLOCK DEFINITION
9
Finding Recombination Hotspots: Many Possible Partitions into Blocks A C T A G A T A G C C T G T T C G A C A A C A T A C T C T A T G A T C G G T T A T A C G A C A T A C T C T A T A G T A T A C T A G C T G G C A T All four gametes are present:
10
A C T A G A T A G C C T G T T C G A C A A C A T A C T C T A T G A T C G G T T A T A C G A C A T A C T C T A T A G T A T A C T A G C T G G C A T Find the left-most right endpoint of any constraint and mark the site before it a recombination site. Eliminate any constraints crossing that site. Repeat until all constraints are gone. The final result is a minimum-size set of sites crossing all constraints.
11
Tagging SNPs ACGATCGATCATGAT GGTGATTGCATCGAT ACGATCGGGCTTCCG ACGATCGGCATCCCG GGTGATTATCATGAT A------A---TG-- G------G---CG-- A------G---TC-- A------G---CC-- G------A---TG-- An example of real data set and its haplotype block structure. Colors refer to the founding population, one color for each founding haplotype Only 4 SNPs are needed to tag all the different haplotypes
12
Informativeness A measure for the “information” a SNP contains about about another SNP. Useful for designing SNPs Arrays and Tagging SNPs selection. 01 00 1 01 10 0 s h2h2 h1h1
13
10 00 0 01 00 1 01 10 0 10 11 1 s 1 s 2 s 3 s 4 s 5 I(s 1,s 2 ) = 2/4 = 1/2 Informativeness
14
10 00 0 01 00 1 01 10 0 10 11 1 s 1 s 2 s 3 s 4 s 5 I({s 1,s 2 }, s 4 ) = 3/4 Informativeness
15
10 00 0 01 00 1 01 10 0 10 11 1 s 1 s 2 s 3 s 4 s 5 I({s 3,s 4 },{s 1,s 2,s 5 }) = 3 S={s 3,s 4 } is a Minimal Informative Subset Informativeness
16
Minimum Set Cover = Minimum Informative Subset s1s1 s2s2 s5s5 s3s3 s4s4 e1e1 e2e2 e3e3 e4e4 e5e5 e6e6 SNPs Edges 10 00 0 01 00 1 01 10 0 1 0 1 1 1 s1s1 s2s2 s3s3 s4s4 s5s5 Graph theory insight Informativeness
17
Minimum Set Cover {s 3, s 4 } = Minimum Informative Subset s1s1 s2s2 s5s5 s3s3 s4s4 e1e1 e2e2 e3e3 e4e4 e5e5 e6e6 SNPsEdges 10 00 0 01 00 1 01 10 0 1 0 1 1 1 s1s1 s2s2 s3s3 s4s4 s5s5 Informativeness Graph theory insight
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.