Discussion section #2 HW1 questions? HW2: highest-weighed path on a DAG Shortest-path finding algorithms Memoization
HW2: highest-weighted path on a DAG
HW2: highest-weight path on a DAG Create input file with one line for each vertex and edge Find highest-weight path on the graph Output: Path length Beginning and end vertex labels The labels of all the edges on the path, in order For a given DAG and a genomic sequence
HW2: highest-weight path on a DAG Create input file with one line for each vertex and edge Find highest-weight path on the graph Output: Path length Beginning and end vertex labels The labels of all the edges on the path, in order How to store the graph after reading it in?
HW2: highest-weight path on a DAG
Shortest path algorithms!
Shortest path on DAG Pretty much same algorithm as homework Just look for minimum instead of maximum Have to choose specific source and/or destination (because shortest path overall is always 0)
What if graph has cycles? More difficult (can’t order nodes by depth) Bellman-Ford algorithm Choose source node and set distance to 0 Set distance to all other nodes to infinity For each edge u->v, if v’s distance can be reduced by taking that edge, update v’s distance Cycle through all edges in this way |V|-1 times Repeat for all vertices as source node (can also check for negative-weight cycle with one extra iteration)
What if graph has no negative edges? Less difficult (we can visit some nodes fewer times) Djikstra’s algorithm Choose source node and set distance to 0 Set distance to all other nodes to infinity Set source node to current Make distance offers to all unvisited neighbors, which are accepted if they’re less than the previous best offer Mark current as visited (it will never be updated again) Select unvisited neighbor with smallest distance, set it to current, and repeat (When destination node has been marked visited, stop)
Two approaches to dynamic programming Tabulation (“dynamic programming”) Memoization What we’ve been doing Bottom-up approach Lazier (?) Top-down approach
Two approaches to dynamic programming Tabulation (“dynamic programming”) Memoization What we’ve been doing Bottom-up approach Lazier (?) Top-down approach function fib(n) fib = [0, 1] for i in 2..n fib[i] = fib[i-1] + fib[i-2] return fib[n] function fib(n, solutions) if n in solutions return solutions[n] if n == 0 return 0 if n == 1 return 1 current = fib[n-1] + fib[n-2] solutions[n] = current return current
Two approaches to dynamic programming Tabulation (“dynamic programming”) Memoization What we’ve been doing Bottom-up approach Lazier (?) Top-down approach function fib(n) fib = [0, 1] for i in 2..n fib[i] = fib[i-1] + fib[i-2] return fib[n] function fib(n, solutions) if n in solutions return solutions[n] if n == 0 return 0 if n == 1 return 1 current = fib[n-1] + fib[n-2] solutions[n] = current return current This doesn’t seem useful…
RNA folding AUGCUAUAUAAACGCGAUACUAUACGCGAUAAUCGCGCGAGA ..........................................
RNA folding AUGCUAUAUAAACGCGAUACUAUACGCGAUAAUCGCGCGAGA ..........................................
RNA folding AUGCUAUAUAAACGCGAUACUAUACGCGAUAAUCGCGCGAGA ..........................................
RNA folding AUGCUAUAUAAACGCGAUACUAUACGCGAUAAUCGCGCGAGA ..........................................
RNA folding AUGCUAUAUAAACGCGAUACUAUACGCGAUAAUCGCGCGAGA .......................................... AUGCUAUAUAAACGCGAUACUAUACGCGAUAAUCGCGCGAGA ..........................................
RNA folding AUGCUAUAUAAACGCGAUACUAUACGCGAUAAUCGCGCGAGA .......................................... AUGCUAUAUAAACGCGAUACUAUACGCGAUAAUCGCGCGAGA ..........................................
RNA folding AUGCUAUAUAAACGCGAUACUAUACGCGAUAAUCGCGCGAGA .......................................... AUGCUAUAUAAACGCGAUACUAUACGCGAUAAUCGCGCGAGA ..........................................
RNA folding AUGCUAUAUAAACGCGAUACUAUACGCGAUAAUCGCGCGAGA .......................................... AUGCUAUAUAAACGCGAUACUAUACGCGAUAAUCGCGCGAGA .......................................... We’ve already solved this problem!
RNA folding function max_folds(seq) if length(seq) <= 1 return 0 current_max = 0 for i in 1..length(seq) for j in i..length(seq) if complement(seq[i], seq[j]) left = seq[1:i-1] middle = seq[i:j] right = seq[j+1:length(seq)] num_folds = 1 + max_folds(left + right) + max_folds(middle) if num_folds > current_max current_max = num_folds return current_max
RNA folding function max_folds(seq, solutions) if seq in solutions return solutions[seq] if length(seq) <= 1 return 0 current_max = 0 for i in 1..length(seq) for j in i..length(seq) if complement(seq[i], seq[j]) left = seq[1:i-1] middle = seq[i:j] right = seq[j+1:length(seq)] num_folds = 1 + max_folds(left + right, memo) + max_folds(middle, memo) if num_folds > current_max current_max = num_folds solutions[seq] = current_max return current_max
RNA folding AUGCUAUAUAAACGCGAUACUAUACGCGAUAAUCGCGCGAGA .......................................... With memoization: ~36 seconds 323,143 recursive calls (partial solutions stored) Without memoization: Before it had finished running, I wrote the memoization code…and then got tired of waiting for it to finish on the whole sequence 912,843 recursive calls on 16 (out of the 42) bases