Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fast test for multiple locus mapping By Yi Wen Nisha Rajagopal.

Similar presentations


Presentation on theme: "Fast test for multiple locus mapping By Yi Wen Nisha Rajagopal."— Presentation transcript:

1 Fast test for multiple locus mapping By Yi Wen Nisha Rajagopal

2 Introduction Single nucleotide polymorphisms (SNPs) can serve as genetic markers which are correlated with incidence of a disease. For many complex diseases, multiple loci (genes) participate to confer susceptibility to the disease. To identify multiple SNPs correlated with a disease is a computational challenge. (e.g. to detect 5 SNPs from a 1M SNPs, 10 4 individual dataset, it takes 10 4 *10 5*6 =10 34 computations).

3 Rana's Paper Population-based sample reveals gene-gender interactions in blood pressure in white americans. (Hypertension, 49:96–106, Jan 2007)‏ Samples were obtained from people with the highest and lowest percentiles of diastolic blood pressure (DBP)‏ Genotypes were analysed by statistical methods (power analysis and ANOVA)‏ They scored ~60000 genotypes, and found 48 SNPs at 33 autosomal and 2 X-linked genes in BP regulation pathways. They detected gene-by-gene, gender-specific interactions of SNPs.

4 Two stage -Two locus models in Genome-wide association -Evans et al Epistasis may play a role in complex diseases. Hence, pairwise detection may be better than single locus method. Two 2-stage strategies: All loci above low threshold in single locus test selected and then compared pairwise. Loci above threshold in single-locus test tested pairwise with all other loci.

5 Models MA M1 M27 51 bi-allelic cases Power: Number of locus/pairs that are detected compared to the total number of locii/pairs. Epistatic variance: Amount of genetic variance not accounted for by the single-locus components.

6

7 Partitioning of the Variance for Four Quantitative Trait Models

8 Results Power to detect both loci more even when there is no epistatic interaction! Power to detect either locus more in single locus, when no epistatic interaction Exhaustive two-locus strategy better than two-stage strategy: Better filtering strategy needed!

9 Risk factor searching Heuristics for SNP control studies.- Brinza et al Atomic Risk Factor: Diplotype which is a subset of SNP columns of S with fixed values. Odds Ratio = |d(C)|/|d(D−C)| |h(C)|/|h(H−C)| MORARF – (Maximum Odds Ratio Atomic Risk Factor problem.) Given a genotype case/control study on a sample population S, find atomic risk factor with the maximum odds ratio. Complimentary Greedy Solution k-relaxed CGS Weighted CGS.

10 Algorithm for CGS Input: Sample population S partitioned into subsets H and D Output: Control-free cluster C 1. C ← S 2. Repeat until h(C) > 0 3. Find 1-SNP combination X = (s, i), where s is a SNP and i ∈ {0, 1, 2} minimizing (d(C) − d(C ∩ X))/(h(C) − h(C ∩ X))‏ 4. C ← C ∩ X 5. Output C Fig. 1. Complimentary Greedy Search (CG

11 Algorithm for k-CGS Input: Sample population S partitioned into subsets H and D and positive integer k. Output: Control-free cluster C of k-RARF with diplotype x. 1. C ← S, x<--0 2. Repeat until h(C) > 0 C k <-- the set of all genotypes in C with exactly k mismatches with x 3. Find a SNP s and its value v ∈ {0, 1, 2} with the cluster C s=v minimizing (d(C k ) − d(C k ∩ C s=v ))/(h(C k ) − h(C k ∩ C s=v )‏ 4. C ← C - ( C k - C s=v )‏ 5. Output C Fig. 1.k- Complimentary Greedy Search (k-CGS)‏

12 Project strategy Get Data Filter SNPs interacting with disease Filter SNPs interacting with disease Find interacting pairs of SNPs Find interacting pairs of SNPs Find SNP motif for disease

13 The Dataset ms coalescent simulator (Hudson, 2002) to generate the dataset. Parameters  Mutation rate  Recombination rate Fixed population size Number of SNPs: ~ 1M

14 Project Goal:To filter SNPs, find epistatic interactions between pairs and find a SNP motif for the disease I)Filtering Strategy: 1.Use modified k-CGS/W-CGS to find SNPs that interact with phenotype. 2.Use simple single locus correlation.

15 II)Pair-finding Strategy: 1.Paired SNP algorithm 2.Pairwise correlation (Evans et al) Use different combinations of filtering and pair-finding-->Using W-CGS, see if paired SNPs have similar weights.

16 Find paired interacting SNPs by Paired SNP Algorithm Labeled-hamming-distance Paired SNP algorithm

17 Paired SNP Algorithm (cont'd)‏ the probability that the pair passes the filter in a single iteration is p = e −kd. In l iterations the expected number of times a pair appears together is μ = le −kd. a pair of SNPs with high hamming distance has a lower expected count

18 III)Motif Finding Algorithm Consider a diseased genotype is enriched with a set of motifs of SNPs Implementation of existing motif finding algorithms to search for potential interacting SNPs May need to filter the size of SNPs down to lower than 10 3  Correlations  k-CGS/WCGS

19 Testing method 1.Cross Validation with different values. 2.Random Validation 3.Compute statistical significance of the risk factors.

20 Thanks to Dr.Vineet Bafna for his guidance and encouragement! Thank you all for your attention! Any questions??


Download ppt "Fast test for multiple locus mapping By Yi Wen Nisha Rajagopal."

Similar presentations


Ads by Google