A Note on Useful Algorithmic Strategies Kun-Mao Chao (趙坤茂) Department of Computer Science and Information Engineering National Taiwan University, Taiwan WWW: http://www.csie.ntu.edu.tw/~kmchao
Greedy Algorithm A greedy method always makes a locally optimal (greedy) choice. the greedy-choice property: a globally optimal solution can be reached by a greedy choice. optimal substructures
Huffman Codes (1952) David Huffman (August 9, 1925 – October 7, 1999)
Huffman Codes Expected number of bits per character = 3x0.1+3x0.1+2x0.3+1x0.5 = 1.7 (vs. 2 bits by a simple scheme)
An example 20 characters; 34 bits in total Sequence: GTTGTTATCGTTTATGTGGC By Huffman Coding: 0111011100010010111100010110101001 20 characters; 34 bits in total
Divide-and-Conquer Divide the problem into smaller subproblems. Conquer each subproblem recursively. Combine the solutions to the child subproblems into the solution for the parent problem.
Merge Sort (Invented in 1938; Coded in 1945) John von Neumann (December 28, 1903 – February 8, 1957 )
Merge Sort (Merge two solutions into one.)
Merge Sort
Dynamic Programming Dynamic programming is a class of solution methods for solving sequential decision problems with a compositional cost structure. Richard Bellman was one of the principal founders of this approach. Richard Ernest Bellman (1920–1984)
Two key ingredients Two key ingredients for an optimization problem to be suitable for a dynamic-programming solution: 2. overlapping subproblems 1. optimal substructures Subproblems are dependent. (otherwise, a divide-and-conquer approach is the choice.) Each substructure is optimal. (Principle of optimality)
Three basic components The development of a dynamic-programming algorithm has three basic components: The recurrence relation (for defining the value of an optimal solution); The tabular computation (for computing the value of an optimal solution); The traceback (for delivering an optimal solution).
Fibonacci numbers The Fibonacci numbers are defined by the following recurrence: . for 2 1 - + = i>1 i F Leonardo of Pisa (c. 1170 – c. 1250) 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17711, 28657, 46368, 75025, 121393, ...
How to compute F10? F10 F9 F8 F7 F6 ……
Tabular computation The tabular computation can avoid recompuation. F0 1 2 3 5 8 13 21 34 55
Longest increasing subsequence(LIS) The longest increasing subsequence is to find a longest increasing subsequence of a given sequence of distinct integers a1a2…an . e.g. 9 2 5 3 7 11 8 10 13 6 3 7 7 10 13 7 11 3 5 11 13 are increasing subsequences. We want to find a longest one. are not increasing subsequences.
A naive approach for LIS Let L[i] be the length of a longest increasing subsequence ending at position i. L[i] = 1 + max j = 0..i-1{L[j] | aj < ai} (use a dummy a0 = minimum, and L[0]=0) 9 2 5 3 7 11 8 10 13 6 L[i] 1 1 2 2 3 4 ?
A naive approach for LIS L[i] = 1 + max j = 0..i-1 {L[j] | aj < ai} 9 2 5 3 7 11 8 10 13 6 L[i] 1 1 2 2 3 4 4 5 6 3 The maximum length The subsequence 2, 3, 7, 8, 10, 13 is a longest increasing subsequence. This method runs in O(n2) time.
An O(n log n) method for LIS Define BestEnd[k] to be the smallest number of an increasing subsequence of length k. 9 2 5 3 7 11 8 10 13 6 9 2 2 2 2 2 2 2 2 BestEnd[1] 5 3 3 3 3 3 3 BestEnd[2] 7 7 7 7 7 BestEnd[3] 11 8 8 8 BestEnd[4] 10 10 BestEnd[5] 13 BestEnd[6]
An O(n log n) method for LIS Define BestEnd[k] to be the smallest number of an increasing subsequence of length k. 9 2 5 3 7 11 8 10 13 6 9 2 2 2 2 2 2 2 2 2 BestEnd[1] 5 3 3 3 3 3 3 3 BestEnd[2] 7 7 7 7 7 6 BestEnd[3] 11 8 8 8 8 BestEnd[4] For each position, we perform a binary search to update BestEnd. Therefore, the running time is O(n log n). 10 10 10 BestEnd[5] 13 13 BestEnd[6]
Longest Common Subsequence (LCS) A subsequence of a sequence S is obtained by deleting zero or more symbols from S. For example, the following are all subsequences of “president”: pred, sdn, predent. The longest common subsequence problem is to find a maximum-length common subsequence between two sequences.
LCS For instance, Sequence 1: president Sequence 2: providence Its LCS is priden. president providence
LCS Another example: Sequence 1: algorithm Sequence 2: alignment One of its LCS is algm. a l g o r i t h m a l i g n m e n t
How to compute LCS? Let A=a1a2…am and B=b1b2…bn . len(i, j): the length of an LCS between a1a2…ai and b1b2…bj With proper initializations, len(i, j)can be computed as follows.
Longest Common Increasing Subsequence Proposed by Yang, Huang and Chao IPL 2005 2 5 3 7 11 8 10 13 6 6 5 2 8 3 7 4 10 1 13