Algorithms and data structures Divide and conquer dynamic programming greedy (naive) algorithms
Divide and conquer Split problem into subproblems (preferably subproblems are same or close size); Solve subproblems (repeating the split procedure until we get unit problems); Merge partial results. Examples – most of recursive algorthms e.g. merge sort; factorial (recursive); Fibonacci sequence (recursive) * * Less effective in comparison to other aproaches.
Divide and conquer Bad cases: High cost of merging subproblems solutions; Many similar (potentially identical) subproblems. Fib(n) = Fib(n-1) + Fib(n-2) = Fib(n-2) + Fib(n-3) + Fib(n-3) + Fib(n-4) = Fib(n-3) + Fib(n-4) + Fib(n-3) + Fib(n-3) + Fib(n-4) ….
Dynamic programming How to apply: Recursive definition of solution; Bottom-up construction of optimal solution (i.e. From small problems towards bigger ones)
Knapsack problem – discrette v. Formulation: Given a set of items, each with a weight and a value, determine the number of each item to include in a collection so that the total weight is less than or equal to a given limit K and the total value is as large as possible. The amounts of items are (or are not) limited by mi. Items cannot be split. I.e. for Z = {(p1,w1, m1), (p2,w2,m2), (p3,w3,m3), …, (pn,wn,mn)} find L = (l1, l2, …, ln), that å li*wi £ K, li Î C+, li £ mi and å li*pi = max
Knapsack problem – discrette v. Example (no limits): Items (weight, prize): Z = { (1, 1), (2, 1), (3, 11), (4, 16), (5, 24) } Knapsack: K = 7 Optimal solution: KN(K,Z) = 27 L = (0, 0, 1, 1, 0)
Knapsack problem – discrette v. Solution: Subproblem – if we split knapsack into two smaller the solution for both has to be optimal. Fulfill helper array KN by determining cells in order 1 .. K, in the following way KN(k) = max { ci + KN(k-wi); 1 £ i < n } . where K – knapsack size, n – number of item types Beside knapasack values the structures of particular solutions could be stored (in separate array?). This is obligatory for limits of items checking.
Knapsack problem – discrette v. Items 1 1,0,0,0,0 2 2,0,0,0,0 3 11 0,0,1,0,0 4 16 0,0,0,1,0 5 24 0,0,0,0,1 6 25 1,0,0,0,1 7 27 0,0,1,1,0 8 35 0,0,1,0,1 Unlimmited items (1, 1) (2, 1) (3, 11) (4, 16) (5, 24)
Knapsack problem algorithm class GOOD: weight=0 price=0 def Knapsack (goods,KSize) : KTmp = Array(KSize+1,0) for i in range(1,KSize): KTmp[i] = KTmp[i-1] for j in range(1, n+1): if Z[j].weight >= i and\ Z[j].price + KTmp[i-Z[j].weight] > KTmp[i]: KTmp[i] = Z[j].weight + KTmp[i-Z[j].weight] return KTmp[k]
Matrix chain multiplication Description: Given a sequence of matrices <A1, A2, … An> , the goal is to find the most efficient way to multiply these matrices. The problem is not actually to perform the multiplications, but merely to decide the sequence of the matrix multiplications involved.
Two matrix multiplication - sample implementation def MulMatrix(A, B): if Columns(A)!=Rows(B): ERROR else: for i in range(0, Rows(A)): for j in range(0, Columns(B)): C[i][j] = 0; for k in range(0, Columns(A)): C[i][j] += A[i][k]*B[k][j] return C
Matrix chain multiplication Example: Given a sequence of matrices <A1, A2, … An> , the goal is to find the most efficient way to multiply these matrices. [p, q] ´ [q, r] = [p, r] O([p, q] ´ [q, r]) = p*q*r A1= [10, 100] A2= [100, 3] A3= [3, 50] O((A1´A2) ´ A3) = 10*100*3 + 10*3*50 = 4500 O(A1 ´ (A2´A3)) = 100*3*50 + 10*100*50 = 65000
Matrix chain multiplication Solution: Subproblem – if we split expression into two the parenthesis should be optimal for each part. m[i, j] = { min (m[i, k]+m[k+1, j]+pi-1pkpj; i<=k<j } Fulfill m[i, j] starting from m[i,1] then m[i,2] etc until m[n,n]. Besides optimal values there could be stored (in separate table) the structure of particular sub-solution
Matrix chain multiplication j m Solution: A1= [30, 35] A2= [35, 15] A3= [15, 5] A4= [5, 10] A5= [10, 20] A6= [20, 25] 6 5 4 3 2 1 1 5000 1000 3500 750 2500 5375 2625 4375 7125 10500 15750 7875 9375 11875 15125 2 i 3 4 5 6 optdiv j 6 5 4 3 2 1 1 5 4 3 2 1 2 i 3 4 5 6
Matrix chain multiplication - implementation def MatrixChain(p, len): for i in range(1,len): m[i][j] = 0 for h in range(2,len): for i in range(1,len-h-1): j = i+h-1 m[i][j] = -1 for k in range (i,j+1): tmp = m[i][k]+m[k+1][j] + p[i-1]*p[k]*p[j] if m[i][j] < 0 or tmp < m[i][j]: m[i][j] = tmp optdiv[i][j] = k
Matrix chain multiplication – recursive implementation def RecursiveMatrixChain(p, i, j): if (i == j) return 0 m[i][j] = -1; for k in range(i,j) q = RecursiveMatrixChain(p,i,k)+\ RecursiveMatrixChain(p,k+1,j) +\ p[i-1]*p[k]*p[j] if m[i][j]<0 or q <= m[i][j]: m[i][j] = q return m[i][j] RecursiveMatrixChain(p[], 1, len-1);
Memoization Although related to caching, memoization refers to a specific case of this optimization, distinguishing it from forms of caching such as buffering or page replacement. In the context of some logic programming languages, memoization is also known as tabling (->lookup table).
Matrix chain multiplication – memoized ver. def MemoizedMatrixChain(p, len): for i in range(1,len): for j in range(1,len): m[i,j] = -1 return LookupMatrixChain(p,1,len-1)
LookupMatrixChain – implem. def LookupMatrixChain(p, i, j): if m[i][j] >= 0: return m[i][j] if i == j: m[i][j] = 0 else: for k in range(i,j) q = LookupMatrixChain(p,i,k) +\ LookupMatrixChain(p,k+1,j) + p[i-1]*p[k]*p[j] if (q <= m[i][j]): m[i][j] = q return m[i][j]
Dynamic programming – sample applications Longest common subsequence of two sequences Longest montonically increasing subsequence Polygon triangulation with minimal total length of sides RNA structure prediction and protein-DNA binding
Greedy (naive) algorithms Algorithm could be considered as a chain of diecisions (optimisations); Each step is considered separatelly – i.e. Best local solution is picked; This strategy does not always give best global solution.
Knapsack Problem – cont. v. Formulation: Given a set of items, each with a weight and a value, determine the number of each item to include in a collection so that the total weight is less than or equal to a given limit K and the total value is as large as possible. The amounts of items are (or are not) limited by mi. Items CAN be split. I.e. for Z = {(p1,w1, m1), (p2,w2,m2), (p3,w3,m3), …, (pn,wn,mn)} find L = (l1, l2, …, ln), that å li*wi £ K, li Î R+, li £ mi and å li*pi = max
Knapsack Problem – cont. v. Example (without limits): Items: Z = { (3, 1), (60, 10), (80, 15), (210, 30), (270, 45) } Knapsack: K = 45 Optimal selection: KN(K, Z) = 315 Item amounts: L = (0, 0, 0, 1.5, 0)
Knapsack Problem – cont. v. Exampe (with limits): Items: Z = { (3, 1, 1), (60, 10, 1), (80, 15, 1), (210, 30, 1), (270, 45, 1) } Knapsack: K = 45 Optimal selection: KN(K, Z) = 300 Item amounts: L = (0, 0, 0, 1, 1/3) or (0, 1, 0, 1, 1/9)
Contineous Knapsack Problem – algorithm // unlimited items deef Knapsack (goods, KSize): i = GetBestPrizeToValueItem(goods, n) KValue = KSize / goods[i].weight * goods[i].price return KValue
Contineous Knapsack Problem – algorithm Knapsack (goods, KSize) KValue = 0 cnt=0 while KSize >= 0 : i = GetMaxPrizeToValueItem(goods, n) if KSize < goods[i].weight * goods[i].max: cnt = KSize / goods[i].weight else: Z[i].max KSize = KSize - cnt KValue += KValue + cnt* goods[i].price goods[i].max = 0 return KValue
Greedy algorithms - sample applications Task assignments to the resources (with mutual exclusion); Spanning tree in graph. Hufman’s coding Greedy choice property; Matroid theory.
Huffman’s Coding Fixed-length coding – codes of all symbols have same length, i.e.: A = 01000001, B = 01000010, C = 01000011, … Variable-length codes –codes symbols can have various lengths, (more frequent symbols can have shorter representation) Prefix code – the code that represents any symbol cannot be a a prefix of another code
Huffman’s coding (1952) Each time two trees with smallest total weight are picked and meged NIE PIEPRZ PIETRZE WIEPRZA PIEPRZEM I II 2 Frequencies: 2 1 1 N: 1 T: 1 W: 1 A: 1 M: 1 R: 4 Z: 4 : 4 I: 6 P: 6 E: 7 A 1 W 1 T N III 3 2 1 2 1 1 M A W 1 1 N T
Huffman’s coding (1952) NIE PIEPRZ PIETRZE WIEPRZA PIEPRZEM : 100 Ascii coding: 35 x 8 = 280 b After compression: 110 bits 3, 14 bits ber char on average 8 7 9 11 T tree 35 15 20 4 5 6 2 3 1 E R Z I P W A M N