Matrix Chain Multiplication
Matrix-chain Multiplication Suppose we have a sequence or chain A1, A2, …, An of n matrices to be multiplied That is, we want to compute the product A1A2…An There are many possible ways (parenthesizations) to compute the product.
Matrix-chain Multiplication To compute the number of scalar multiplications necessary, we must know: Algorithm to multiply two matrices Matrix dimensions Can you write the algorithm to multiply two matrices?
Algorithm to Multiply 2 Matrices Input: Matrices Ap×q and Bq×r (with dimensions p×q and q×r) Result: Matrix Cp×r resulting from the product A·B MATRIX-MULTIPLY(Ap×q , Bq×r) 1. for i ← 1 to p 2. for j ← 1 to r 3. C[i, j] ← 0 4. for k ← 1 to q 5. C[i, j] ← C[i, j] + A[i, k] · B[k, j] 6. return C Scalar multiplication in line 5 dominates time to compute C Number of scalar multiplications = pqr
Matrix Chain Multiplication A - km matrix B - mn matrix C = AB is defined as kn size matrix with elements 2 1 3 0 1 1 1 2 1 2 -1 0 9 1 2 =
Matrix Chain Multiplication A - km matrix B - mn matrix C = AB Each element of C can be computed in time (m) Matrix C can be computed in time (kmn) Matrix multiplication is associative, i.e. A(BC) = (AB)C
Matrix Chain Multiplication A - 210000 matrix B - 100005 matrix C - 550 matrix (AB)C (AB) - = 2*10000*5 =100000 scalar multiplications (AB)C - = 2*5*50 =500 scalar multiplications 100500 scalar multiplications A(BC) (BC) - = 10000*5*50 = 2500000 scalar multiplications A(BC) - = 2*10000*50 = 1000000 scalar multiplications 3500000 scalar multiplications How we parenthesize a chain of matrices can have a dramatic impact on the cost of evaluating the product.
Matrix Chain Multiplication - Problem For a given sequence of matrices find a parenthesization which gives the smallest number of multiplications that are necessary to compute the product of all matrices in the given sequence.
Counting Number of Parenthesizations
Catalan Numbers n multiplication order 2 (x1 · x2) 3 (x1 · (x2 · x3)) 4 (x1 · (x2 · (x3 · x4))) (x1 · ((x2 · x3) · x4)) ((x1 · x2) · (x3 · x4)) ((x1 · (x2 · x3)) · x4) (((x1 · x2) · x3) · x4) n C(n) 1 2 3 4 5 14 6 42 7 132 Multiplying n Numbers Objective: Find C(n), the number of ways to compute product x1 . x2 …. xn.
Counting the Number of Parenthesizations
Number of Parenthesizations
Multiplying n Numbers - small n Recursive equation: where is the last multiplication? Catalan numbers: Asymptotic value:
Applying Dynamic Programming Characterize the structure of an optimal solution Recursively define the value of an optimal solution Compute the value of an optimal solution in a bottom-up fashion Construct an optimal solution from the information computed in Step 3
Matrix-Chain Multiplication Given a chain of matrices A1, A2, …, An, where for i = 1, 2, …, n matrix Ai has dimensions pi-1x pi, fully parenthesize the product A1 A2 An in a way that minimizes the number of scalar multiplications. A1 A2 Ai Ai+1 An p0 x p1 p1 x p2 pi-1 x pi pi x pi+1 pn-1 x pn
1. The Structure of an Optimal Parenthesization Step 1: Characterize the structure of an optimal solution Ai..j : matrix that results from evaluating the product Ai Ai+1 Ai+2 ... Aj ,i≤j An optimal parenthesization of the product A1A2 ... An – Splits the product between Ak and Ak+1, for some 1≤k<n Ai..j = (A1A2A3 ... Ak) · (Ak+1Ak+2 ... An) – i.e., first compute A1..k and Ak+1..n and then multiply these two The cost of this optimal parenthesization Cost of computing A1..k+ Cost of computing Ak+1..n + Cost of multiplying A1..k · Ak+1..n
1. The Structure of an Optimal Parenthesization… Ai…j = Ai…k Ak+1…j Key observation: Given optimal parenthesization (A1A2A3 ... Ak) · (Ak+1Ak+2 ... An) Parenthesization of the sub-chain A1A2…Ak Parenthesization of the sub-chain Ak+1Ak+2…An should be optimal if the parenthesization of the chain A1A2…An is optimal (why?) That is, the optimal solution to the problem contains within it the optimal solution to sub-problems i.e. optimal substructure within optimal solution exists.
2. A Recursive Solution Subproblem: Step 2: Define the value of optimal solution recursively Subproblem: Determine the minimum cost of parenthesizing Ai…j = Ai Ai+1 Aj for 1 i j n Let m[i, j] = the minimum number of multiplications needed to compute Ai…j Full problem (A1..n): m[1, n]
2. A Recursive Solution …. Define m recursively: Ai…j = Ai…k Ak+1…j i = j: Ai…i = Ai m[i, i] = 0, for i = 1, 2, …, n , since the sub-chain contain just one matrix; no multiplication at all. i < j: assume that the optimal parenthesization splits the product Ai Ai+1 Aj between Ak and Ak+1, i k < j Ai…j = Ai…k Ak+1…j m[i, j] = m[i, k] + m[k+1, j] + pi-1pkpj Ai…k Ak+1…j Ai…kAk+1…j
2. A Recursive Solution ….. Consider the subproblem of parenthesizing Ai…j = Ai Ai+1 Aj for 1 i j n = Ai…k Ak+1…j for i k < j m[i, j] = the minimum number of multiplications needed to compute the product Ai…j m[i, j] = m[i, k] + m[k+1, j] + pi-1pkpj pi-1pkpj m[k+1,j] m[i, k] min # of multiplications to compute Ai…k min # of multiplications to compute Ak+1…j # of multiplications to compute Ai…kAk…j
2. A Recursive Solution …. We do not know the value of k m[i, j] = m[i, k] + m[k+1, j] + pi-1pkpj We do not know the value of k There are j – i possible values for k: k = i, i+1, …, j-1 Minimizing the cost of parenthesizing the product Ai Ai+1 Aj becomes: 0 if i = j m[i, j] = min {m[i, k] + m[k+1, j] + pi-1pkpj} if i < j ik<j
Reconstructing the Optimal Solution Additional information to maintain: s[i, j] = a value of k at which we can split the product Ai Ai+1 Aj in order to obtain an optimal parenthesization
3. Computing the Optimal Costs 0 if i = j m[i, j] = min {m[i, k] + m[k+1, j] + pi-1pkpj} if i < j ik<j A recurrent algorithm may encounter each subproblem many times in different branches of the recursion overlapping subproblems Compute a solution using a tabular bottom-up approach
3. Computing the Optimal Costs … 0 if i = j m[i, j] = min {m[i, k] + m[k+1, j] + pi-1pkpj} if i < j ik<j Length = 0: i = j, i = 1, 2, …, n Length = 1: j = i + 1, i = 1, 2, …, n-1 second first 1 2 3 n m[1, n] gives the optimal solution to the problem n j 3 Compute rows from bottom to top and from left to right In a similar matrix s we keep the optimal values of k 2 1 i
Example: min {m[i, k] + m[k+1, j] + pi-1pkpj} m[2, 2] + m[3, 5] + p1p2p5 m[2, 3] + m[4, 5] + p1p3p5 m[2, 4] + m[5, 5] + p1p4p5 k = 2 m[2, 5] = min k = 3 k = 4 1 2 3 4 5 6 6 5 Values m[i, j] depend only on values that have been previously computed 4 j 3 2 1 i
Example Compute A1 A2 A3 A1: 10 x 100 (p0 x p1) m[i, i] = 0 for i = 1, 2, 3 m[1, 2] = m[1, 1] + m[2, 2] + p0p1p2 (A1A2) = 0 + 0 + 10 *100* 5 = 5,000 m[2, 3] = m[2, 2] + m[3, 3] + p1p2p3 (A2A3) = 0 + 0 + 100 * 5 * 50 = 25,000 m[1, 3] = min m[1, 1] + m[2, 3] + p0p1p3 = 75,000 (A1(A2A3)) m[1, 2] + m[3, 3] + p0p2p3 = 7,500 ((A1A2)A3) 7500 2 25000 2 3 5000 1 2 1
Algorithm Chain-Matrix-Order(p) n length[p] – 1 for i 1 to n do m[i, i] 0 for l 2 to n, do for i 1 to n-l+1 do j i+l-1 m[i, j] for k i to j-1 do q m[i, k] + m[k+1, j] + pi-1 . pk . pj if q < m[i, j] then m[i, j] = q s[i, j] k return m and s, “l is chain length” 1 2 3 4 m[1,4] m[2,4] m[3,4] m[4,4] m[1,3] m[2,3] m[3,3] m[1,2] m[2,2] m[1,1] 4 3 2 1
Example: Dynamic Programming Problem: Compute optimal multiplication order for a series of matrices given below 1 2 3 4 m[1,4] m[2,4] m[3,4] m[4,4] m[1,3] m[2,3] m[3,3] m[1,2] m[2,2] m[1,1] P0 = 10 P1 = 100 P2 = 5 P3 = 50 P4 = 20 4 3 2 1
Main Diagonal m[1,4] m[2,4] m[3,4] m[1,3] m[2,3] m[1,2] Main Diagonal 1 2 3 4 m[1,4] m[2,4] m[3,4] m[1,3] m[2,3] m[1,2] 4 3 2 1
Computing m[3,4], 4] m[1,4] m[2,4] 3 5000 m[1,3] m[2,3] m[1,2] 1 2 3 4 m[1,4] m[2,4] 3 5000 m[1,3] m[2,3] m[1,2] m[3, 4] = 0 + 0 + 5 . 50 . 20 = 5000 s[3, 4] = k = 3 4 3 2 1
Computing m[2, 3] m[1,4] m[2,4] 3 5000 m[1,3] 225000 m[1,2] 1 2 3 4 m[2, 3] = 0 + 0 + 100 . 5 . 50 = 25000 s[2, 3] = k = 2 m[1,4] m[2,4] 3 5000 m[1,3] 225000 m[1,2] 4 3 2 1
Computing m[1, 2] m[1,4] m[2,4] 35000 m[1,3] 225000 1 5000 1 2 3 4 m[1,4] m[2,4] 35000 m[1,3] 225000 1 5000 m[1, 2] = 0 + 0 + 10 . 100 . 5 = 5000 s[1, 2] = k = 1 4 3 2 1
Computing m[2, 4] 1 2 3 4 m[1,4] 215000 3 5000 m[1,3] 225000 1 5000 4 3 2 1 m[2, 4] = min(0+5000+100.5.20, 25000+0+100.50.20) = min(15000, 35000) = 15000 s[2, 4] = k = 2
Computing m[1, 3] 1 2 3 4 m[1,4] 215000 3 5000 2 2500 225000 1 5000 4 3 2 1 m[1, 3] = min(0+25000+10.100.50, 5000+0+10.5.50) = min(75000, 2500) = 2500 s[1, 3] = k = 2
Computing m[1, 4] 211000 215000 3 5000 2 2500 225000 1 5000 m[1, 4] = min(0+15000+10.100.20, 5000+5000+ 10.5.20, 2500+0+10.50.20) = min(35000, 11000, 35000) = 11000 s[1, 4] = k = 2 4 3 2 1
Final Cost Matrix and Its Order of Computation 1 2 3 4 211000 215000 3 5000 2 2500 225000 1 5000 Final Cost Matrix Order of Computation 4 3 2 1 1 2 3 4 10 8 5 4 9 6 3 7 2 1 4 3 2 1
K,s Values Leading Minimum m[i, j] 1 2 3 4 4 3 2 1 2 3 1
l =3 m[3,5] = min m[3,4]+m[5,5] + 15*10*20 =750 + 0 + 3000 = 3750 m[3,3]+m[4,5] + 15*5*20 =0 + 1000 + 1500 = 2500 l = 2 10*20*25=5000 35*15*5=2625
Analysis Chain-Matrix-Order(p) n length[p] – 1 for i 1 to n do m[i, i] 0 for l 2 to n, do for i 1 to n-l+1 do j i+l-1 m[i, j] for k i to j-1 do q m[i, k] + m[k+1, j] + pi-1 . pk . pj if q < m[i, j] then m[i, j] = q s[i, j] k return m and s, “l is chain length” Takes O(n3) time Requires O(n2) space
4. Construct the Optimal Solution Our algorithm computes the minimum- cost table m and the split table s The optimal solution can be constructed from the split table s Each entry s[i, j ]=k shows where to split the product Ai Ai+1 … Aj for the minimum cost.
4. Construct the Optimal Solution … s[i, j] = value of k such that the optimal parenthesization of Ai Ai+1 Aj splits the product between Ak and Ak+1 1 2 3 4 5 6 A1..n = A1..s[1, n] As[1, n]+1..n 3 5 - 4 1 2 6 s[1, n] = 3 A1..6 = A1..3 A4..6 s[1, 3] = 1 A1..3 = A1..1 A2..3 s[4, 6] = 5 A4..6 = A4..5 A6..6 5 4 3 j 2 1 i
4. Construct the Optimal Solution … PRINT-OPT-PARENS(s, i, j) if i = j then print “A”i else print “(” PRINT-OPT-PARENS(s, i, s[i, j]) PRINT-OPT-PARENS(s, s[i, j] + 1, j) print “)” 1 2 3 4 5 6 3 5 - 4 1 2 6 5 4 j 3 2 1 i
Example: A1 A6 ( ( A1 ( A2 A3 ) ) ( ( A4 A5 ) A6 ) ) 3 5 - 4 1 2 PRINT-OPT-PARENS(s, i, j) if i = j then print “A”i else print “(” PRINT-OPT-PARENS(s, i, s[i, j]) PRINT-OPT-PARENS(s, s[i, j] + 1, j) print “)” s[1..6, 1..6] 1 2 3 4 5 6 3 5 - 4 1 2 6 5 4 j 3 2 P-O-P(s, 1, 6) s[1, 6] = 3 i = 1, j = 6 “(“ P-O-P (s, 1, 3) s[1, 3] = 1 i = 1, j = 3 “(“ P-O-P(s, 1, 1) “A1” P-O-P(s, 2, 3) s[2, 3] = 2 i = 2, j = 3 “(“ P-O-P (s, 2, 2) “A2” P-O-P (s, 3, 3) “A3” “)” 1 i …
Elements of Dynamic Programming When should we look for a DP solution to an optimization problem? Two key ingredients for the problem Optimal substructure Overlapping subproblems
Optimal Substructure
Optimal Substructure
Optimal Substructure
Overlapping Subproblems
Overlapping Subproblems