Tutorial #11 by Anna Tzemach
Background – Lander & Green’s HMM Recombinations across successive intervals are independent sequential computation across loci using the forward-backward algorithm is enabled. The algorithm computing the probability of the data given an inheritance vector is linear in the number of founders. We need to sum over all possible inheritance vectors (exponential in the number of non-founders). Complexity: Linear in the number of loci, and number of founders. Exponential in the number of non-founders.
Reminder – Compute marker data probability given inheritance data Assume that the descent graph vertices below represent the pedigree on the left v = ( 1,1; 0,0; 1,1; 1,1; 1,1; 0,0 ) v = (person 12; person 13; person 21; person 22; person 23; person 24) a/b a/b 2324 b/d a/c
Reminder (cont) (b,d) (a,b) (a,c) (a,b) Founder Graph (a,b) (a,c)(b,d)(a,b) Descent Graph
Reminder (cont) Alternatively we can rewrite function as
Simultaneous calculation of all vectors 1.J =0 2.For each meiosis j –Duplicate founder states –V[j] = 0 –Add corresponding edge to the first set of the founder alleles –V[j] = 1 –Add corresponding edge to the second set of the founder alleles –If j < 2 * number of nonfounder goto 2 3.Calculate all sets probability
Possible outcome Node with zero likelihood
Cutoff of zero nodes For each founder allele a i = Ǿ For each meiosis j (person p = j/2) –k j = alleles of person p; –If a i = Ǿ (allele assignment of corresponding founder) ai = k j –Else a i = a i ∩ k j If a i = Ǿ : return 0; Goto meiosis j+1 –if j == 2* number of people compute vector probability
Possible outcome Node with zero likelihood
Cutoff of symmetric nodes The node is symmetric node if parent pi is known to be homozygous or if offspring oi and all its descendants are not genotyped If we can not distinguish between v[i] = 0 and v[i] = 1 then P(data | v = ____1_____) = P(data | v = ____0_____) We do not want to calculate such nodes twice.
Possible outcome Node with zero likelihood Node identical to sibling
Founder Reduction No way to find ordered genotype for founders So, if we look on the child of the founder, we cannot distinguish between v[i] = 0 and v[i] =1 For each founder 1 bit in inheritance vector is constant. Total time/space save 2 f
Recombination speed-up The transition probability of vector i and vector j For small thetas only small number of recombinations make sense
Example Number of non founders = 3 Number of inheritance vectors = 2 2*3 = 64 Number of founders = 3 Founder reduction = 2 3 = 8 Reduced number of vectors = = 8
Example (cont) V[1] =0 A1 = {a,b} No need for v[1] = 1 -> founder reduction V[2] =0 A3 = {a,b} No need for v[2] = 1 -> founder reduction
Example (cont) V[3] = 0 => 000 a1 = {a,b} ∩ {a,c } = {a} V[3] = 1 => 001 a3 = {a,b} ∩ {a,c } = {a} V[4] = 0 => 0000 a5 = {a,c} V[4] = 0 = > 0010 a5 = {a,c} No need for v[4] = 1 -> founder reduction