Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fast Tag SNP Selection Wang Yue Joint work with Postdoc Guimei Liu and Prof Limsoon Wong.

Similar presentations


Presentation on theme: "Fast Tag SNP Selection Wang Yue Joint work with Postdoc Guimei Liu and Prof Limsoon Wong."— Presentation transcript:

1 Fast Tag SNP Selection Wang Yue Joint work with Postdoc Guimei Liu and Prof Limsoon Wong

2 NUS Presentation Title 2006 Outline Some preliminary definitions Previous Work Our work- Fast Tagger Experiment Result Tag SNP Application Future Work

3 NUS Presentation Title 2006 SNP(Single-nucleotide polymorphism)

4 NUS Presentation Title 2006 Why research on SNP Variation among human beings can affect how human develop certain diseases and respond to pathogens, chemicals, drugs, vaccines, and other agents eg:Researchers found that persons with the specific alterations (SNPs) have a 50% higher relative risk of developing glioblastoma, a type of Brain Cancer. A promising area to realize the "Personalized medicine" Important in crop and livestock breeding programs

5 NUS Presentation Title 2006 Tag SNP A tag SNP is a SNP in a region of the genome with high linkage disequilibrium Possible to identify genetic variation without genotyping every SNP in a chromosomal region. Tag SNPs are useful in whole-genome SNP association studies in which hundreds of thousands of SNPs across the entire genome are genotyped.

6 NUS Presentation Title 2006 Tag SNP-linkage disequilibrium In population genetics, linkage disequilibrium is the non-random association of alleles at two or more loci, not necessarily on the same chromosome. Usually use r 2 to measure where P(XY), P(Xy), P(xY), P(xy) are freq of possible alleles; P(X) =P(XY)+P(Xy), P(x)=P(xY)+P(xy),

7 NUS Presentation Title 2006 Tag SNP selection Given dataset, we can find a huge number of tag snp relation among SNPs as long as we can enumerate the possible r 2 value between SNPs The reality is We desire to select a smallest set of high quality SNPs which can tag the rest SNPs, in other words, if we understand this smallest set of SNPs, we can refer the rest based on the r 2 value.

8 NUS Presentation Title 2006 Tag SNP Selection-- More formal description Given a set S of SNPs, find the smallest set of tag SNPs S tag such that for every SNP j ∈ S − S tag, there is at least one SNP set S j ⊆ S tag such that – r 2 (S j, SNP j ) ≥ min_r 2 – |S j | ≤ max_size – Distance between every pair of SNPs in S j ∪ {SNP j } is no larger than max_dist

9 NUS Presentation Title 2006 Previous Work Step 1: Correlations between SNPs within certain distance are calculated Step 2: Find smallest set of tag SNPs using correlations calculated in Step 1 Most algo use greedy approach to find a near optimal set of tag SNPs in Step 2 Earlier tag SNP selection methods rely on pairwise correlations MultiTag & MMTagger find multimarker rules – {SNP 1, SNP 2, SNP 3 } ->SNP x Cannot handle >100k SNP MultiTag takes hundreds of hours for 30k SNP MMTagger takes hours & 1GB memory for 30k SNP

10 NUS Presentation Title 2006 Fast Tagger Similar two major steps first step: borrow the typical data mining techniques to mine tagging rules based on r 2 value Second step : Use a greedy algorithm to select the small set of tag SNPs from the tagging rules generated in first step

11 NUS Presentation Title 2006 Why beat the previous work? Previous work like MMtagger will generate a lot of redundant tagging relations Ours can avoid this by 1. Merge nearby equivalent SNPs 2. Prune redundant correlation rules 3. Skip the rules if RHS has been covered many times 4. If total size of rules exceeds memory, divide chromosome into blocks, and then find tag SNPs within each block

12 NUS Presentation Title 2006 Experiment Setting Japanese and Han in HapMap release 21 – 45 unrelated individuals – 6 chromosomes

13 NUS Presentation Title 2006 Experiment Result—running time and # tag SNPs Comparison with state-of-the-art work: MMTagger

14 NUS Presentation Title 2006 Experiment Result-memory consumption MMTagger consumes much more memory Failed on large chromosomes when max_size = 3 Step 2 of FastTagger consumes much more memory than Step 1 because this step needs to store rules generated in the memory

15 NUS Presentation Title 2006 Effectiveness of Merging Nearby Equiv SNPs # of rules, tag SNPs, and runtime are significantly reduced

16 NUS Presentation Title 2006 Effectiveness of Skipping Rules Memory usage and runtime are significantly reduced, while # of tag SNPs is marginally increased

17 NUS Presentation Title 2006 Effectiveness of Pruning Redundant Rules Memory usage and # rules are significantly reduced

18 NUS Presentation Title 2006 Conclusions Compared to existing genome-wide tag SNP selection algorithm using multi-marker correlations, FastTagger is – Many times faster – Consumes much less memory – Can work on chromosomes with > 100k SNPs Merging equiv SNPs together is most effective technique in reducing running time and memory consumption

19 NUS Presentation Title 2006 Tag SNP Application Using the tagging rules generated by our data mining technique to infer extra SNPs from existing SNP list We obtained two SNP list from two major SNP chip company: IIiuminia,1145784 SNPs Affimetric,927654 SNPs How many extra SNPs we can infer?

20 NUS Presentation Title 2006 Experiment Setting Our rules are generated from Data set Japanese and Han in HapMap release 21,contrary to previous experiment, we use 22 chromosomes In this experiment, two factors will determine how many extra SNPs we can infer 1.r 2 threshold: empirical set 0.8, we set 0.80, 0.85, 0.90, 0.95 2. Rule size: we set 1,2

21 NUS Presentation Title 2006 r 2 : 0.80 length 2length 1 Affimetric1006866382962 lluminia993321417107 r 2 : 0.85 r 2 : 0.90 r 2 : 0.95 length 2length 1 Affimetric927026310671 lluminia923971340070 length 2length 1 Affimetric821042226306 lluminia827858248512 length 2length 1 Affimetric118994112896 lluminia131263125200

22 NUS Presentation Title 2006 Future Work Test the accuracy of our selected SNPs with state-of- the-art work Support adaptive user requirement to select the SNPs, such as I have only 1 million, just give me 1000 most informative SNPs How the division of the chromosomes influence the # of tag SNPs More to explore

23 NUS Presentation Title 2006 Many thanks to My supervisor : Prof Limsoon Wong My senior: Guimei Liu Some slides are adapted from Prof Wong's notes and Wikipedia Thank you for listening Q&A


Download ppt "Fast Tag SNP Selection Wang Yue Joint work with Postdoc Guimei Liu and Prof Limsoon Wong."

Similar presentations


Ads by Google