Download presentation
Presentation is loading. Please wait.
1
Problem Set 2 Solutions Tree Reconstruction Algorithms
Marc A. Schaub February 22nd, 2008 CS 262 Problem Session Problem Set 2 Solutions Tree Reconstruction Algorithms Based on slides by - Andreas Sundquist and George Asimenos (problem 1) - Serafim Batzoglou (tree reconstruction)
2
Problem 1(a)
3
Problem 1(b) Baum-Welch: Suppose Forward: Similar for Backward
4
Problem 1(b) Baum-Welch:
5
Problem 1(b) Baum-Welch:
6
Problem 1(b) Baum-Welch: Given Inductive step: After training:
7
Problem 1(b) Viterbi: Viterbi parse may arbitrarily choose state k over state k’ Akl Ak’l a’kl a’k’l
8
Problem 1(c) akl l=1 2 k=0 1 1/2 Akl l=1 2 k=0 1 ek(b) b=x y k=1 1 2
1/2 Akl l=1 2 k=0 1 ek(b) b=x y k=1 1 2 Ek(b) b=x y k=1 3 2 1
9
Problem 1(c) Viterbi akl l=1 2 k=0 1 1/2 x y 1 .9 .045 .3645 .1640 2
1/2 x y 1 .9 .045 .3645 .1640 2 .405 ek(b) b=x y k=1 1 2
10
Problem 1(c) Viterbi x y 1 .75 .1688 .1139 .0769 2 .0375 .0084 .0057 akl l=1 2 k=0 1 0.9 0.1 ek(b) b=x y k=1 0.75 0.25 2 0.5 akl l=1 2 k=0 1 ek(b) b=x y k=1 0.75 0.25 2 ?
11
Additive Distances 1 d1,4 12 4 8 3 7 9 5 11 10 6 2 Given a tree, a distance measure is additive if the distance between any pair of leaves is the sum of lengths of edges connecting them Given a tree T & additive distances dij, can uniquely reconstruct edge lengths: Find two neighboring leaves i, j, with common parent k Place parent node k at distance dkm = ½ (dim + djm – dij) from any node m i, j
12
Neighbor-Joining Dij = (N – 2) dij – ki dik – kj djk
Guaranteed to produce the correct tree if distance is additive May produce a good tree even when distance is not additive Step 1: Finding neighboring leaves Define Dij = (N – 2) dij – ki dik – kj djk Claim: The above “magic trick” ensures that Dij is minimal iff i, j are neighbors 1 3 0.1 0.1 0.1 0.4 0.4 2 4
13
Algorithm: Neighbor-joining
Initialization: Define T to be the set of leaf nodes, one per sequence Let L = T Iteration: Pick i, j s.t. Dij is minimal Define a new node k, and set dkm = ½ (dim + djm – dij) for all m L Add k to T, with edges of lengths dik = ½ (dij + ri – rj), djk = dij – dik where ri = (N – 2)-1 ki dik Remove i, j from L; Add k to L Termination: When L consists of two nodes, i, j, and the edge between them of length dij
14
Parsimony – direct method not using distances
One of the most popular methods: GIVEN multiple alignment FIND tree & history of substitutions explaining alignment Idea: Find the tree that explains the observed sequences with a minimal number of substitutions Two computational subproblems: Find the parsimony cost of a given tree (easy) Search through all tree topologies (hard)
15
Example: Parsimony cost of one column
Final cost C = 1 {A} {A, B} Cost C+=1 A B A B A A {A} {B} {A} {A}
16
Parsimony Scoring Given a tree, and an alignment column u
Label internal nodes to minimize the number of required substitutions Initialization: Set cost C = 0; node k = 2N – 1 (last leaf) Iteration: If k is a leaf, set Rk = { xk[u] } // Rk is simply the character of kth species If k is not a leaf, Let i, j be the daughter nodes; Set Rk = Ri Rj if intersection is nonempty Set Rk = Ri Rj, and C += 1, if intersection is empty Termination: Minimal cost of tree for column u, = C
17
Example {B} {A,B} {A} {B} {A} {A,B} {A} A A A A B B A B {A} {A} {A}
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.