Lecture 2: More Examples CS 341: Algorithms Thursday, May 5 th
Tuesday Recap 1.Administrative Information 2.Overview of CS Example 1: Sorting-Merge Sort-Divide & Conquer 2
CS 341 Assumptions 1.Worst-case Runtime Analysis 2.“Sloppy” in counting 3.Interested in very large inputs
A Comment About Tractability/Intractability Computational ProblemInputOutput SortingAn array of integers in arbitrary order Same array of integers in increasing order Matrix MultiplicationTwo nxn matrices A, BC=A*B Traveling Salesman Problem A set S of cities, and distances between each pair of cities Minimum distance starting from city X, visiting each city once and come back to X Intractable Tractable 4
Traveling Salesman Problem Surprising that a problem that is so easy to understand is so difficult to solve. c1 c2 c3 c4 c5 5 hours 3 hours 2 hours 9 hours 1 hours 2 hours 3 hours 4 hours 7 hours 5
Outline For Today 1.Max Subarray Problem-Bentley’s Alg-Dynamic Programming 2.Scheduling – Greedy Algorithms 3.Maybe: Shortest-Paths-Dijkstra’s Alg-Greedy Algorithms 6
Outline For Today 1.Max Subarray Problem-Bentley’s Alg-Dynamic Programming 2.Scheduling – Greedy Algorithms 3.Maybe: Shortest-Paths-Dijkstra’s Alg-Greedy Algorithms 7
Max Subarray Problem Input: An array X of numbers (integers or decimals) Output: Contiguous subarray with the maximum sum a i, a i+1, a i+2, …, a j s.t is maximum Empty arrays are OK Output just the sum a1a1 a2a2 …a n/2 a n/2+ 1 …a n-1 anan 8
Example Input: An array of numbers (integers or decimals) Output: Contiguous subarray with the maximum sum Output: sum: 9 9
Algorithm 1: Brute Force procedure msaBruteForce(Array X of size n): maxSum = 0; for i = 1 to n: for j = i to n: sum = 0; for k = j to i: sum += X[k]; maxSum = max(maxSum, sum); return maxSum 10
Analysis of Brute Force procedure msaBruteForce(Array X of size n): maxSum = 0; for i = 1 to n: for j = i to n: sum = 0; for k = j to i: sum += X[k]; maxSum = max(maxSum, sum); return maxSum Total: n ops Total: n(n+1)/2 ops Total: O(n 3 ) Total: n(n+1)/2 ops 11
Analysis of Inner Most Loop for i ∈ 1…n, for j ∈ i…n => cost = (j – i + 1) when i=1: … + n = n(n+1)/2 => n 2 /8 ≤ Cost ≤ n 2 when i=2: n-1= (n-1)(n)/2 => n 2 /8 ≤ Cost ≤ n 2 … when i=n/2: 1+…+n/2 = (n/2)(n/2+1)/2 => n 2 /8 ≤ Cost ≤ n 2 when i=n/2+1:1+…+ n/2-1 => Cost ≤ n 2 … when i=n: 1 => Cost ≤ n 2 n 3 /16 ≤ Total Cost of Inner Most Loop ≤ n 3 12
Analysis of Brute Force procedure msaBruteForce(Array X of size n): maxSum = 0; for i = 1 to n: for j = i to n: sum = 0; for k = j to i: sum += X[k]; maxSum = max(maxSum, sum); return maxSum Total: n ops Total: n(n+1)/2 ops Total: O(n 3 ) Total: n(n+1)/2 ops Total Cost: n + 3n(n+1)/2 + O(n 3 ) = O(n 3 ) 13
Brute Force Simulation i j k Sum: -3 14
Brute Force Simulation i j k Sum: -3 15
Brute Force Simulation i j k Sum: = -1 16
Brute Force Simulation i j k Sum: -3 17
Brute Force Simulation i j k Sum:
Brute Force Simulation i j k Sum: = 0 19
Brute Force Simulation i j k Sum: -3 20
Brute Force Simulation i j k Sum:
Brute Force Simulation i j k Sum:
Brute Force Simulation i j k Observation: Don’t need to sum from i to j each time. Can simply add the last item to the previous sum. Sum: – 4 = -4 23
Algorithm 2: Smart Brute Force procedure msaSBruteForce(Array X of size n): maxSum = 0; for i = 1 to n: sum = 0; for j = i to n: sum += X[j]; maxSum = max(maxSum, sum); return maxSum Total: n ops Total: O(n 2 ) ops Total Cost: O(n 2 ) 24
Algorithm 3: Divide & Concur a1a1 a2a2 …a n/2 a n/2+ 1 …a n-1 anan A Claim that doesn’t Require a Proof: Max subarray is either: 1)entirely in L; or 2)entirely in R; or 3)spans L and R => includes both a n/2 and a n/2 + 1 L R C 25
Algorithm 3: Divide & Concur a1a1 a2a2 …a n/2 a n/2+ 1 …a n-1 anan Claim 2: if max subarray spans L & R: it is the sum of CL + CR CL = max sum ending at a n/2 CR = max sum starting from a n/2+1 Why? CL CR 26
Algorithm 3: Divide & Concur procedure msaDC(Array X of size n): L = msaDC(X[1,…,n/2]); R = msaDC(X[n/2+1,…,n]); CL = sum = 0; for i = n/2 to 1: sum += X[i]; CL = max(CL, sum); CR = sum = 0; for i = n/2+1 to n: sum += X[i]; CR = max(CR, sum); return max(L, R, CL+CR) 8n=O(n) ops outside the recursive calls 27
Analysis of Algorithm 3 msaDC(n) msaDC(n/2) msaDC(n/4) …. msaDC(1) …. 8n 2(8n/2) = 8n 4(8n/4) = 8n n(8) = 8n Total = 8n*#levelsTotal = 8n*log 2 (n) **msaDC takes 8nlog 2 (n)=O(nlog(n)) time.** 28
CS 341 Diagram Fundamental (& Fast) Algorithms to Tractable Problems Common Algorithm Design Paradigms Mathematical Tools to Analyze Algorithms Intractable Problems MergeSort Strassen’s MM BFS/DFS Dijkstra’s SSSP Kosaraju’s SCC Kruskal’s MST Floyd Warshall APSP Topological Sort … Big-oh notation Recursion Tree Master method Substitution method Exchange Arguments Greedy-stays-ahead Arguments P vs NP Poly-time Reductions Undecidability Divide-and-Conquer Greedy Dynamic Programming Other (Last Lecture) Randomized/Online/Para llel Algorithms 29
Can We Do O(n) time? Reason about what the optimal solution looks like **in terms of optimal solutions to sub-problems** Consider the following subproblem: Let P j = the max sum of any contiguous subarray ending at location X[j]? Suppose we have computed each P j a1a1 a2a2 …a n/2 a n/2+ 1 …a n-1 anan P1P1 P2P2 PnPn Q: What is maxSum? A: maxSum = max j P j (or 0 if the max subarray is the empty set) 30
Can We Solve P j in terms of other P i ’s? What is P j in terms of P j-1 ? a1a1 a2a2 a3a3 a4a4 ………anan P3P3 P4P4 Q: What is P 4 in terms of P 3 ? A: P 4 = max(a 4, P 3 + a 4 ) P 4 is the max sum ending exactly at a 4. So it’s either: (1)a 4 itself; or (2)a 4 + maxSum ending at a 3. Why? 31
Bentley’s Algorithm procedure msaBentley(Array X of size n): P = array of size n initialized to 0; for i = 1 to n; P[i] = max(X[i], X[i] + P[i-1]) return max(0, max j P j ) n ops **msaBentley takes O(n) time.** n ops Bentley’s Alg is an example of “Dynamic Programming” Informally: DP algs solve P in terms of solutions to subproblems P i. 32
CS 341 Diagram Fundamental (& Fast) Algorithms to Tractable Problems Common Algorithm Design Paradigms Mathematical Tools to Analyze Algorithms Intractable Problems MergeSort Strassen’s MM BFS/DFS Dijkstra’s SSSP Kosaraju’s SCC Kruskal’s MST Floyd Warshall APSP Topological Sort … Big-oh notation Recursion Tree Master method Substitution method Exchange Arguments Greedy-stays-ahead Arguments P vs NP Poly-time Reductions Undecidability Divide-and-Conquer Greedy Dynamic Programming Other (Last Lecture) Randomized/Online/Para llel Algorithms 33
Outline For Today 1.Max Subarray Problem-Bentley’s Alg-Dynamic Programming 2.Scheduling – Greedy Algorithms 3.Maybe: Shortest-Paths-Dijkstra’s Alg-Greedy Algorithms 34
Scheduling Problem Input: A set of n jobs J. Each job i has length l i l1l1 l1l1 Job 1 l2l2 l2l2 Job 2 lnln lnln Job n … Output: A schedule of the jobs on a processor s.t: is minimum over all possible n! schedules. completion time of job i Problem from Op. Systems & Data Center Management Systems 35
Completion Time of Job i Definition: time when job i finishes i.e., sum of scheduled job lengths up to and including job i S1S J1J1 5 5 J2J2 1 1 J4J4 1 1 J3J3 J3J3 J2J2 J1J1 J4J time Total Cost of S 1 : =
Another Example Schedule S1S1 1 1 J3J3 J2J2 J1J1 J4J time Total Cost of S 1 : = 26 S2S2 1 1 J3J3 J2J2 5 5 J1J1 3 3 J4J4 1 1 time Total Cost of S 2 : = 20 Goal is to find the min cost schedule! 37
Let’s Start Simple 3 3 J1J1 5 5 J2J2 What are all possible schedules? S1S1 J2J2 5 5 J1J1 3 3 time 38 Total Cost: = 11 S2S2 J2J2 5 5 J1J1 3 3 time 58 Total Cost: = 13 38
Why Put One Job In Front of Another? Observation: Shorter jobs have less impact on the completion times of future jobs 39
Greedy Scheduling Algorithm Schedule jobs by increasing lengths Run-time O(nlog(n))! procedure greedySchedule(Array J of size n): return sort(J) 40
Greedy Scheduling Algorithm 3 3 J1J1 5 5 J2J2 1 1 J4J4 1 1 J3J3 Ex: SgSg 1 1 J3J3 J2J2 5 5 J1J1 3 3 J4J4 1 1 time Total Cost of S g : = 18 41
Comparing S g to Previous Schedules SgSg 1 1 J3J3 J2J2 5 5 J1J1 3 3 J4J4 1 1 time S1S1 1 1 J3J3 J2J2 J1J1 J4J time S2S2 1 1 J3J3 J2J2 5 5 J1J1 3 3 J4J4 1 1 time
Proof of Correctness (1) “Greedy stays ahead” proof: Induct on the cost of the first k jobs executed Argue S g beats everyone else at each step Let S[i]: the ith job that a schedule S executes E.g., S g [1] is the first job S g executes Let Cost(S, i): be the sum of the costs of the first i jobs that schedule S executes. E.g., Cost(S g, 3) is the sum of completion times S g [1], S g [2], S g [3]: S g [1] + (S g [1]+S g [2]) + (S g [1]+S g [2]+S g [3]) Goal: Argue ∀ S, Cost(S g, n) ≤ Cost(S, n) by inducting on i 43
Proof of Correctness (2) Base Case: ∀ S, Cost(S g, 1) = S g [1] ≤ Cost(S, 1) since S g [1] is the shortest length job Inductive Hypothesis: Cost(S g, k-1) ≤ Cost(S, k-1) By inductive hypothesis By greedy criterion of S g QED 44
Greedy Algorithms Informally Algorithms that make myopic/local decisions (with the hope that the decisions are globally optimum) 45
CS 341 Diagram Fundamental (& Fast) Algorithms to Tractable Problems Common Algorithm Design Paradigms Mathematical Tools to Analyze Algorithms Intractable Problems MergeSort Strassen’s MM BFS/DFS Dijkstra’s SSSP Kosaraju’s SCC Kruskal’s MST Floyd Warshall APSP Topological Sort … Big-oh notation Recursion Tree Master method Substitution method Exchange Arguments Greedy-stays-ahead Arguments P vs NP Poly-time Reductions Undecidability Divide-and-Conquer Greedy Dynamic Programming Other (Last Lecture) Randomized/Online/Para llel Algorithms 46
Outline For Today 1.Max Subarray Problem-Bentley’s Alg-Dynamic Programming 2.Scheduling – Greedy Algorithms 3.Maybe: Shortest-Paths-Dijkstra’s Alg-Greedy Algorithms 47
Edsger Dijkstra ( ) Legendary Dutch computer scientist 1972 Turing Award winner Significant contributions to algorithms, operating systems, compilers, distributed systems, among other subdisciplines of CS Would handwrite his EWD reports. 48
Shortest Paths From A Single Source Input: A directed/undirected graph G(V, E): n nodes (one is the source), m edges (u,v) and costs c u,v X X S S A A Y Y B B C C Output: For each node v in the graph: shortest path (a series of edges) distance from s to v. Assumption 1: Graph is connected (s has a path to every vertex) Assumption 2: Edge costs are non-negative, i.e., c u,v ≥ 0 49
Shortest Path Example X X S S A A Y Y B B C C INPUT OUTPUT 50
Shortest Path Example X X S S A A Y Y B B C C INPUT OUTPUT 51
Shortest Path Example X X S S A A Y Y B B C C INPUT OUTPUT 52
Shortest Path Example X X S S A A Y Y B B C C INPUT OUTPUT 53
Shortest Path Example X X S S A A Y Y B B C C INPUT OUTPUT 54
Shortest Path Example X X S S A A Y Y B B C C INPUT OUTPUT 55
0 Dijkstra’s Algorithm A 56
0 Dijkstra’s Algorithm A B C D
0 Dijkstra’s Algorithm A B C DE
01 3 Dijkstra’s Algorithm A B C DE F
Dijkstra’s Algorithm A B C DE F
Dijkstra’s Algorithm A B C DE F G
Dijkstra’s Algorithm A B C DE F HG
Dijkstra’s Algorithm A B C DE F HG
Dijkstra’s Algorithm A B C DE F HG
Dijkstra’s Algorithm procedure dijkstra(G(V,E),s, costs c u,v ): L = {s}; R=V-{s} shortestDist=array of size n init to null shortestDist[s] = 0 for i = 1 to n: for all v in R: distSoFar[v] = min (u,v) shortestDist[v] + c u,v let v* = min v ∈ R distSoFar[v]; remove v* from R and put into L; shortestDist[v*] = distSoFar[v*] return shortestDist Will formally prove Dijkstra’s alg’s correctness & analyze its run time in about two months. u ∈ L v ∈ R 65