Presentation is loading. Please wait.

Presentation is loading. Please wait.

Linkage stuff Vibhav Gogate. A Review of the Genetic Model X1X1 X2X3Xi-1XiXi+1Y1Y1 Y2Y2 Y3Y3 Y i-1 YiYi Y i+1 X1X1 X2X3Xi-1XiXi+1S1S1 S2S2 S3S3 S i-1.

Similar presentations


Presentation on theme: "Linkage stuff Vibhav Gogate. A Review of the Genetic Model X1X1 X2X3Xi-1XiXi+1Y1Y1 Y2Y2 Y3Y3 Y i-1 YiYi Y i+1 X1X1 X2X3Xi-1XiXi+1S1S1 S2S2 S3S3 S i-1."— Presentation transcript:

1 Linkage stuff Vibhav Gogate

2 A Review of the Genetic Model X1X1 X2X3Xi-1XiXi+1Y1Y1 Y2Y2 Y3Y3 Y i-1 YiYi Y i+1 X1X1 X2X3Xi-1XiXi+1S1S1 S2S2 S3S3 S i-1 SiSi S i+1 All except yellow nodes Our View E A Thompson et al’s view

3 A few other notation Divide the S variables S i,j denotes the indicator in meiosis i at location j. S i,j = 0 if DNA at meiosis i locus j is parent’s maternal DNA S i,j =1 if DNA at meiosis i locus j is parent’s paternal DNA S.,j = {S i,j | i=1,..,m} S i,. = {S i,j | j=1,..,l} Assuming that there are m meiosis and l locations.

4 More on S S 23m L 21f L 21m L 23m X 21 S 23f L 22f L 22m L 23f X 22 X 23 The variables in the circle are S.,2 i.e. the set of variables indicating meiosis at locus 2.

5 Gibbs sampling: Review Generate T samples {x t } from P(X|e): t=1 x 1 = {x 1 1,x 1 2,…x 1 k } t=2 x 2 = {x 2 1,x 2 2,…x 2 k } … After sampling, average: X1 X4 X8 X5 X2 X3 X9 X7 X6

6 Gibbs Properties: Review Good: Gibbs sampling is guaranteed to converge to P(X|e) as long as P(X|e) is ergodic: Shortcoming: Hard to estimate how many samples is enough Variance is too big in high- dimensions Not guaranteed to converge to P(X|e) with deterministic information

7 Rao-Blackwellisation (RB) (Casella & Robert, 1996) Rao-Blackwellisation provides salvation in some cases: Partition X into C and Z, such that we can compute P(c|e) and P(Z|c,e) efficiently. Sample from C and sum out Z (Rao- Blackwellisation). Rao-Blackwellised estimate:

8 Gibbs sampling with RB on Linkage (E A Thompson et al.) Two versions L-sampler Locus Sampler RB set is chosen from S.,j M-sampler Meiosis sampler RB set is chosen from S i,.

9

10 L-sampler A single locus is selected and inheritance indicators at the locus are updated based on the genotype data at all loci and on the current realization of inheritance indicators at all loci other than j.

11 L-sampler L-sampler can be implemented on any pedigree on which single-locus peeling is feasible Provided each inter-locus recombination fraction is strictly positive, the sampler is clearly irreducible. However, if the loci are tightly linked, mixing performance will be poor.

12 M-sampler At each iteration a single meiosis is selected and inheritance indicators for that meiosis are updated conditional on the genotype data at all loci and the current realization of inheritance indicators for all other meioses

13 Other Advances Use of Metropolis Hastings step Restart Sequential Imputation

14 The Actual Bayesnet output by Superlink

15 The Variables output by Superlink Genetic Loci. For each individual i and locus j, we denote two random variables G i,jp, G i,jm whose values are the specific alleles at locus j in individual i's paternal and maternal haplotypes respectively. Marker Phenotypes. For each individual i and marker locus j, a random variable P i,j whose value is the specific unordered pair of alleles measured at locus j of individual i. Disease Phenotypes. For each individual i, a binary random variable P i whose values are affected or unaffected. Selector Variables. For each individual i and marker locus j, two binary random variables S i,jp and S i,jm, the values of which are determined as follows. If a denotes i's father and b denotes i's mother, then S i,jp = 0 if G i,jp = G a,jp S i,jp =1 if G i,jp = G a,jm S i,jm is dened in a similar way, with b replacing a.

16 LOD SCORE

17 SampleSearch to compute LOD-SCORE (1) Compute P(e|θ 1 ) using SampleSearch+Importance Sampling (2) Compute P(e|θ=0.5) using SampleSearch+Importance Sampling Computing (1) and (2) is same as computing the probability of evidence.

18 SampleSearch-LB SampleSearch-LB computes a lower bound on the probability of evidence (i.e. both the numerator and denominator) Use of Bounding the LOD score. If LOD score > 3 then the location is significant If we know that the lower bound on the LOD score is 3, then we have our location Unfortunately SampleSearch-LB is not enough as we need an upper bound on the denominator Use SampleSearch in conjunction with Bozhena’s bounding techniques to upper bound the denominator.


Download ppt "Linkage stuff Vibhav Gogate. A Review of the Genetic Model X1X1 X2X3Xi-1XiXi+1Y1Y1 Y2Y2 Y3Y3 Y i-1 YiYi Y i+1 X1X1 X2X3Xi-1XiXi+1S1S1 S2S2 S3S3 S i-1."

Similar presentations


Ads by Google