Richard Anderson Lecture 1

Slides:



Advertisements
Similar presentations
CSEP 521 Applied Algorithms Richard Anderson Lecture 7 Dynamic Programming.
Advertisements

CSE 421 Algorithms Richard Anderson Dijkstra’s algorithm.
CSE 421 Algorithms Richard Anderson Lecture 19 Longest Common Subsequence.
CSE 421 Algorithms Richard Anderson Lecture 4. What does it mean for an algorithm to be efficient?
CSE 421 Algorithms Richard Anderson Lecture 16 Dynamic Programming.
CSE 421 Algorithms Richard Anderson Lecture 6 Greedy Algorithms.
CSE 421 Algorithms Richard Anderson Lecture 10 Minimum Spanning Trees.
CSE 421 Algorithms Richard Anderson Lecture 19 Longest Common Subsequence.
CSE 421 Algorithms Richard Anderson Lecture 3. Classroom Presenter Project Understand how to use Pen Computing to support classroom instruction Writing.
CSEP 521 Applied Algorithms Richard Anderson Winter 2013 Lecture 2.
CSEP 521 Applied Algorithms Richard Anderson Winter 2013 Lecture 3.
CSEP 521 Applied Algorithms Richard Anderson Lecture 8 Network Flow.
CSE 421 Algorithms Richard Anderson Lecture 8 Optimal Caching Dijkstra’s algorithm.
CSE 421 Algorithms Richard Anderson Lecture 21 Shortest Paths and Network Flow.
CSE 421 Algorithms Richard Anderson Lecture 18 Dynamic Programming.
Richard Anderson Lecture 23 Network Flow
Lecture 22 Network Flow, Part 2
Richard Anderson Lecture 18 Dynamic Programming
Richard Anderson Lecture 18 Dynamic Programming
Richard Anderson Autumn 2015 Lecture 8 – Greedy Algorithms II
Richard Anderson (for Anna Karlin) Winter 2006 Lecture 1
Autumn 2016 Lecture 9 Dijkstra’s algorithm
Autumn 2015 Lecture 9 Dijkstra’s algorithm
Richard Anderson Lecture 18 Dynamic Programming
Memory Efficient Longest Common Subsequence
Richard Anderson Lecture 20 LCS / Shortest Paths
Richard Anderson Lecture 13 Divide and Conquer
Richard Anderson Lecture 11 Minimum Spanning Trees
Richard Anderson Lecture 9 Dijkstra’s algorithm
Richard Anderson Lecture 9 Dijkstra’s algorithm
Richard Anderson Autumn 2006 Lecture 1
Richard Anderson Lecture 19 Longest Common Subsequence
CSEP 521 Applied Algorithms
Autumn 2015 Lecture 10 Minimum Spanning Trees
Richard Anderson Lecture 21 Shortest Path Network Flow Introduction
Richard Anderson Lecture 23 Network Flow
Richard Anderson Lecture 23 Network Flow
Richard Anderson Lecture 3
Richard Anderson Lecture 21 Shortest Paths
Greedy Algorithms: Homework Scheduling and Optimal Caching
Richard Anderson Autumn 2016 Lecture 5
Richard Anderson Autumn 2015 Lecture 1
Richard Anderson Autumn 2016 Lecture 7
Richard Anderson Autumn 2006 Lecture 1
Richard Anderson Winter 2019 Lecture 8 – Greedy Algorithms II
Richard Anderson Lecture 19 Memory Efficient Dynamic Programming
Richard Anderson Autumn 2016 Lecture 8 – Greedy Algorithms II
Richard Anderson Winter 2009 Lecture 2
Richard Anderson Lecture 22 Network Flow
Winter 2019 Lecture 9 Dijkstra’s algorithm
Richard Anderson Lecture 19 Longest Common Subsequence
Richard Anderson Winter 2019 Lecture 7
Lecture 21 Network Flow, Part 1
Richard Anderson Winter 2019 Lecture 4
Richard Anderson Autumn 2015 Lecture 7
Richard Anderson Winter 2019 Lecture 2
Richard Anderson Winter 2019 Lecture 6
Richard Anderson Lecture 20 Shortest Paths and Network Flow
Winter 2019 Lecture 10 Minimum Spanning Trees
Richard Anderson Lecture 19 Memory Efficient Dynamic Programming
Lecture 22 Network Flow, Part 2
Richard Anderson Lecture 18 Dynamic Programming
Richard Anderson Winter 2019 Lecture 5
Richard Anderson Lecture 20 Space Efficient LCS
Memory Efficient Dynamic Programming / Shortest Paths
Richard Anderson Lecture 13 Divide and Conquer
Richard Anderson Autumn 2019 Lecture 7
Richard Anderson Autumn 2019 Lecture 2
Richard Anderson Autumn 2019 Lecture 8 – Greedy Algorithms II
Autumn 2019 Lecture 9 Dijkstra’s algorithm
Presentation transcript:

Richard Anderson Lecture 1 CSE 421 Algorithms Richard Anderson Lecture 1

Course Introduction Instructor Teaching Assistant Richard Anderson, anderson@cs.washington.edu Teaching Assistant Yiannis Giotas, giotas@cs.washington.edu

All of Computer Science is the Study of Algorithms

Mechanics It’s on the web Weekly homework Midterm Final exam Subscribe to the mailing list

Text book Algorithm Design Jon Kleinberg, Eva Tardos Read Chapters 1 & 2

How to study algorithms Zoology Mine is faster than yours is Algorithmic ideas Where algorithms apply What makes an algorithm work Algorithmic thinking

Introductory Problem: Stable Matching Setting: Assign TAs to Instructors Avoid having TAs and Instructors wanting changes E.g., Prof A. would rather have student X than her current TA, and student X would rather work for Prof A. than his current instructor.

Formal notions Perfect matching Ranked preference lists Stability m1 w1 m2 w2

Examples m1: w1 w2 m2: w2 w1 w1: m1 m2 w2: m2 m1 m1: w1 w2 m2: w1 w2

Examples m1: w1 w2 m2: w2 w1 w1: m2 m1 w2: m1 m2

Intuitive Idea for an Algorithm m proposes to w If w is unmatched, w accepts If w is matched to m2 If w prefers m to m2, w accepts If w prefers m2 to m, w rejects Unmatched m proposes to highest w on its preference list

Algorithm Initially all m in M and w in W are free While there is a free m w highest on m’s list that m has not proposed to if w is free, then match (m, w) else suppose (m2, w) is matched if w prefers m to m2 unmatch (m2, w) match (m, w)

Does this work? Does it terminate? Is the result a stable matching? Begin by identifying invariants and measures of progress m’s proposals get worse Once w is matched, w stays matched w’s partners get better

Claim: The algorithms stops in at most n2 steps Why?

The algorithm terminates with a perfect matching Why?

The resulting matching is stable Suppose m1 prefers w2 to w1 w2 prefers m1 to m2 How could this happen? m1 w1 m2 w2

Richard Anderson Lecture 2 CSE 421 Algorithms Richard Anderson Lecture 2

Announcements Office Hours Homework Reading Richard Anderson, CSE 582 Monday, 10:00 – 11:00 Friday, 11:00 – 12:00 Yiannis Giotas, CSE 220 Monday, 2:30-3:20 Friday, 2:30-3:20 Homework Assignment 1, Due Wednesday, October 5 Reading Read Chapters 1 & 2

Stable Matching Find a perfect matching with no instabilities Instability (m1, w1) and (m2, w2) matched m1 prefers w2 to w1 w2 prefers m1 to m2 m1 w1 m2 w2

Intuitive Idea for an Algorithm m proposes to w If w is unmatched, w accepts If w is matched to m2 If w prefers m to m2, w accepts If w prefers m2 to m, w rejects Unmatched m proposes to highest w on its preference list

Algorithm Initially all m in M and w in W are free While there is a free m w highest on m’s list that m has not proposed to if w is free, then match (m, w) else suppose (m2, w) is matched if w prefers m to m2 unmatch (m2, w) match (m, w)

Does this work? Does it terminate? Is the result a stable matching? Begin by identifying invariants and measures of progress m’s proposals get worse Once w is matched, w stays matched w’s partners get better

Claim: The algorithm stops in at most n2 steps Why? Each m asks each w at most once

The algorithm terminates with a perfect matching Why? If m is free, there is a w that has not been proposed to

The resulting matching is stable Suppose m1 prefers w2 to w1 w2 prefers m1 to m2 How could this happen? m1 w1 m2 w2 m1 proposed to w2 before w1 m2 rejected m1 m2 prefers m3 to m1 m2 prefers m2 to m3

Result Simple, O(n2) algorithm to compute a stable matching Corollary A stable matching always exists

A closer look Stable matchings are not necessarily fair m1: w1 w2 w3 w1: m2 m3 m1 w2: m3 m1 m2 w3: m1 m2 m3 m2 w2 m3 w3

Algorithm under specified Many different ways of picking m’s to propose Surprising result All orderings of picking free m’s give the same result Proving this type of result Reordering argument Prove algorithm is computing something mores specific Show property of the solution – so it computes a specific stable matching

Proposal Algorithm finds the best possible solution for M And the worst possible for W (m, w) is valid if (m, w) is in some stable matching best(m): the highest ranked w for m such that (m, w) is valid S* = {(m, best(m)} Every execution of the proposal algorithm computes S*

Proof Argument by contradiction Suppose the algorithm computes a matching S different from S* There must be some m rejected by a valid partner. Let m be the first man rejected by a valid partner w. w rejects m for m1. w = best(m)

S+ stable matching including (m, w) Suppose m1 is paired with w1 in S+ m1 prefers w to w1 w prefers m1 to m Hence, (m1, w) is an instability in S+ m w m1 w1 Since m1 could not have been rejected by w1 at this point, because (m, w) was the first valid pair rejected. (m1, v1) is valid because it is in S+. m w m1 w1

The proposal algorithm is worst case for W In S*, each w is paired with its worst valid partner Suppose (m, w) in S* but not m is not the worst valid partner of w S- a stable matching containing the worst valid partner of w Let (m1, w) be in S-, w prefers m to m1 Let (m, w1) be in S-, m prefers w to w1 (m, w) is an instability in S- m w1 w prefers m to m1 because m1 is the wvp w prefers w to w1 because S* has all the bvp’s m1 w

Could you do better? Is there a fair matching Design a configuration for problem of size n: M proposal algorithm: All m’s get first choice, all w’s get last choice W proposal algorithm: All w’s get first choice, all m’s get last choice There is a stable matching where everyone gets their second choice

Key ideas Formalizing real world problem Model: graph and preference lists Mechanism: stability condition Specification of algorithm with a natural operation Proposal Establishing termination of process through invariants and progress measure Underspecification of algorithm Establishing uniqueness of solution

Richard Anderson Lecture 3 CSE 421 Algorithms Richard Anderson Lecture 3

Classroom Presenter Project Understand how to use Pen Computing to support classroom instruction Writing on electronic slides Distributed presentation Student submissions Classroom Presenter 2.0, started January 2002 www.cs.washington.edu/education/dl/presenter/ Classroom Presenter 3.0, started June 2005

Key ideas for Stable Matching Formalizing real world problem Model: graph and preference lists Mechanism: stability condition Specification of algorithm with a natural operation Proposal Establishing termination of process through invariants and progress measure Underspecification of algorithm Establishing uniqueness of solution

Question Goodness of a stable matching: Add up the ranks of all the matched pairs M-rank, W-rank Suppose that the preferences are completely random If there are n M’s, and n W’s, what is the expected value of the M-rank and the W-rank

What is the run time of the Stable Matching Algorithm? Initially all m in M and w in W are free While there is a free m w highest on m’s list that m has not proposed to if w is free, then match (m, w) else suppose (m2, w) is matched if w prefers m to m2 unmatch (m2, w) match (m, w) Executed at most n2 times

O(1) time per iteration Find free m Find next available w If w is matched, determine m2 Test if w prefer m to m2 Update matching

What does it mean for an algorithm to be efficient?

Definitions of efficiency Fast in practice Qualitatively better worst case performance than a brute force algorithm

Polynomial time efficiency An algorithm is efficient if it has a polynomial run time Run time as a function of problem size Run time: count number of instructions executed on an underlying model of computation T(n): maximum run time for all problems of size at most n

Polynomial Time Algorithms with polynomial run time have the property that increasing the problem size by a constant factor increases the run time by at most a constant factor (depending on the algorithm)

Why Polynomial Time? Generally, polynomial time seems to capture the algorithms which are efficient in practice The class of polynomial time algorithms has many good, mathematical properties

Ignoring constant factors Express run time as O(f(n)) Emphasize algorithms with slower growth rates Fundamental idea in the study of algorithms Basis of Tarjan/Hopcroft Turing Award

Why ignore constant factors? Constant factors are arbitrary Depend on the implementation Depend on the details of the model Determining the constant factors is tedious and provides little insight

Why emphasize growth rates? The algorithm with the lower growth rate will be faster for all but a finite number of cases Performance is most important for larger problem size As memory prices continue to fall, bigger problem sizes become feasible Improving growth rate often requires new techniques

Formalizing growth rates T(n) is O(f(n)) [T : Z+  R+] If sufficiently large n, T(n) is bounded by a constant multiple of f(n) Exist c, n0, such that for n > n0, T(n) < c f(n) T(n) is O(f(n)) will be written as: T(n) = O(f(n)) Be careful with this notation

Prove 3n2 + 5n + 20 is O(n2) Choose c = 6, n0 = 5

Lower bounds T(n) is (f(n)) Warning: definitions of  vary T(n) is at least a constant multiple of f(n) There exists an n0, and  > 0 such that T(n) > f(n) for all n > n0 Warning: definitions of  vary T(n) is (f(n)) if T(n) is O(f(n)) and T(n) is (f(n))

Useful Theorems If lim (f(n) / g(n)) = c for c > 0 then f(n) = (g(n)) If f(n) is O(g(n)) and g(n) is O(h(n)) then f(n) is O(h(n))) If f(n) is O(h(n)) and g(n) is O(h(n)) then f(n) + g(n) is O(h(n))

Ordering growth rates For b > 1 and x > 0 logb n is O(nx) For r > 1 and d > 0 nd is O(rn)

Richard Anderson Lecture 4 CSE 421 Algorithms Richard Anderson Lecture 4

Announcements Homework 2, Due October 12, 1:30 pm. Reading Chapter 3 Start on Chapter 4

Polynomial time efficiency An algorithm is efficient if it has a polynomial run time Run time as a function of problem size Run time: count number of instructions executed on an underlying model of computation T(n): maximum run time for all problems of size at most n

Polynomial Time Algorithms with polynomial run time have the property that increasing the problem size by a constant factor increases the run time by at most a constant factor (depending on the algorithm)

Why Polynomial Time? Generally, polynomial time seems to capture the algorithms which are efficient in practice The class of polynomial time algorithms has many good, mathematical properties

Constant factors and growth rates Express run time as O(f(n)) Ignore constant factors Prefer algorithms with slower growth rates Fundamental ideas in the study of algorithms Basis of Tarjan/Hopcroft Turing Award

Why ignore constant factors? Constant factors are arbitrary Depend on the implementation Depend on the details of the model Determining the constant factors is tedious and provides little insight

Why emphasize growth rates? The algorithm with the lower growth rate will be faster for all but a finite number of cases Performance is most important for larger problem size As memory prices continue to fall, bigger problem sizes become feasible Improving growth rate often requires new techniques

Formalizing growth rates T(n) is O(f(n)) [T : Z+  R+] If sufficiently large n, T(n) is bounded by a constant multiple of f(n) Exist c, n0, such that for n > n0, T(n) < c f(n) T(n) is O(f(n)) will be written as: T(n) = O(f(n)) Be careful with this notation

Prove 3n2 + 5n + 20 is O(n2) Choose c = 6, n0 = 5

Lower bounds T(n) is (f(n)) Warning: definitions of  vary T(n) is at least a constant multiple of f(n) There exists an n0, and  > 0 such that T(n) > f(n) for all n > n0 Warning: definitions of  vary T(n) is (f(n)) if T(n) is O(f(n)) and T(n) is (f(n))

Useful Theorems If lim (f(n) / g(n)) = c for c > 0 then f(n) = (g(n)) If f(n) is O(g(n)) and g(n) is O(h(n)) then f(n) is O(h(n))) If f(n) is O(h(n)) and g(n) is O(h(n)) then f(n) + g(n) is O(h(n))

Ordering growth rates For b > 1 and x > 0 logb n is O(nx) For r > 1 and d > 0 nd is O(rn)

Graph Theory G = (V, E) Undirected graphs Directed graphs Explain that there will be some review from 326 Graph Theory G = (V, E) V – vertices E – edges Undirected graphs Edges sets of two vertices {u, v} Directed graphs Edges ordered pairs (u, v) Many other flavors Edge / vertices weights Parallel edges Self loops By default |V| = n and |E| = m

Definitions Path: v1, v2, …, vk, with (vi, vi+1) in E Distance Simple Path Cycle Simple Cycle Distance Connectivity Undirected Directed (strong connectivity) Trees Rooted Unrooted

Graph search Find a path from s to t S = {s} While there exists (u, v) in E with u in S and v not in S Pred[v] = u Add v to S if (v = t) then path found

Breadth first search Explore vertices in layers s in layer 1 Neighbors of s in layer 2 Neighbors of layer 2 in layer 3 . . .

Key observation All edges go between vertices on the same layer or adjacent layers 1 2 3 4 5 6 7 8

Bipartite A graph V is bipartite if V can be partitioned into V1, V2 such that all edges go between V1 and V2 A graph is bipartite if it can be two colored

Testing Bipartiteness If a graph contains an odd cycle, it is not bipartite

Algorithm Run BFS Color odd layers red, even layers blue If no edges between the same layer, the graph is bipartite If edge between two vertices of the same layer, then there is an odd cycle, and the graph is not bipartite

Richard Anderson Lecture 5 Graph Theory CSE 421 Algorithms Richard Anderson Lecture 5 Graph Theory

Announcements Monday’s class will be held in CSE 305 Reading Chapter 3 Start on Chapter 4

Graph Theory G = (V, E) Undirected graphs Directed graphs Explain that there will be some review from 326 Graph Theory G = (V, E) V – vertices E – edges Undirected graphs Edges sets of two vertices {u, v} Directed graphs Edges ordered pairs (u, v) Many other flavors Edge / vertices weights Parallel edges Self loops By default |V| = n and |E| = m

Definitions Path: v1, v2, …, vk, with (vi, vi+1) in E Distance Simple Path Cycle Simple Cycle Distance Connectivity Undirected Directed (strong connectivity) Trees Rooted Unrooted

Graph search Find a path from s to t S = {s} While there exists (u, v) in E with u in S and v not in S Pred[v] = u Add v to S if (v = t) then path found

Breadth first search Explore vertices in layers s in layer 1 Neighbors of s in layer 2 Neighbors of layer 2 in layer 3 . . .

Key observation All edges go between vertices on the same layer or adjacent layers 1 2 3 4 5 6 7 8

Bipartite A graph V is bipartite if V can be partitioned into V1, V2 such that all edges go between V1 and V2 A graph is bipartite if it can be two colored

Testing Bipartiteness If a graph contains an odd cycle, it is not bipartite

Algorithm Run BFS Color odd layers red, even layers blue If no edges between the same layer, the graph is bipartite If edge between two vertices of the same layer, then there is an odd cycle, and the graph is not bipartite

Corollary A graph is bipartite if and only if it has no Odd Length Cycle

Depth first search Explore vertices from most recently visited

Recursive DFS DFS(u) Mark u as “Explored” Foreach v in Neighborhood of u If v is not “Explored”, DFS(v)

Key observation Each edge goes between vertices on the same branch 1 Each edge goes between vertices on the same branch No cross edges 2 6 4 4 7 12 5 8 9 10 11

Connected Components Undirected Graphs

Strongly Connected Components There is an O(n+m) algorithm that we will not be covering Directed Graphs

Richard Anderson Lecture 6 Graph Theory CSE 421 Algorithms Richard Anderson Lecture 6 Graph Theory

Draw a picture of David Notkin To submit your drawing, press the button

Describe an algorithm to determine if an undirected graph has a cycle

Cycle finding Does a graph have a cycle? Find a cycle Find a cycle through a specific vertex v Linear runtime: O(n+m)

Find a cycle through a vertex v Not obvious how to do this with BFS from vertex v v

Depth First Search Each edge goes between vertices on the same branch 1 Each edge goes between vertices on the same branch No cross edges 2 6 3 4 7 12 5 8 9 10 11

A DFS from vertex v gives a simple algorithm for finding a cycle containing v How does this algorithm work and why?

Connected Components Undirected Graphs

Directed Graphs A Strongly Connected Component is a subset of the vertices with paths between every pair of vertices.

Identify the Strongly Connected Components There is an O(n+m) algorithm that we will not be covering

Topological Sort Given a set of tasks with precedence constraints, find a linear order of the tasks 321 322 401 142 143 341 326 421 370 431 378

Find a topological order for the following graph J C F K B L

If a graph has a cycle, there is no topological sort Consider the first vertex on the cycle in the topological sort It must have an incoming edge A F B E C D

Lemma: If a graph is acyclic, it has a vertex with in degree 0 Proof: Pick a vertex v1, if it has in-degree 0 then done If not, let (v2, v1) be an edge, if v2 has in- degree 0 then done If not, let (v3, v2) be an edge . . . If this process continues for more than n steps, we have a repeated vertex, so we have a cycle

Topological Sort Algorithm While there exists a vertex v with in-degree 0 Output vertex v Delete the vertex v and all out going edges H E I A D G J C F K B L

Details for O(n+m) implementation Maintain a list of vertices of in-degree 0 Each vertex keeps track of its in-degree Update in-degrees and list when edges are removed m edge removals at O(1) cost each

Richard Anderson Lecture 7 Greedy Algorithms CSE 421 Algorithms Richard Anderson Lecture 7 Greedy Algorithms

Greedy Algorithms Solve problems with the simplest possible algorithm The hard part: showing that something simple actually works Pseudo-definition An algorithm is Greedy if it builds its solution by adding elements one at a time using a simple rule

Scheduling Theory Tasks Processors Precedence constraints Processing requirements, release times, deadlines Processors Precedence constraints Objective function Jobs scheduled, lateness, total execution time

Interval Scheduling Tasks occur at fixed time Single processor Maximize number of tasks completed Tasks {1, 2, . . . N} Start and finish times, s(i), f(i)

Simple heuristics Schedule earliest available task Instructor note counter examples Schedule shortest available task Schedule task with fewest conflicts

Schedule available task with the earliest deadline Let A be the set of tasks computed by this algorithm, and let O be an optimal set of tasks. We want to show that |A| = |O| Let A = {i1, . . ., ik}, O = {j1, . . ., jm}, both in increasing order of finish times

Correctness Proof A always stays ahead of O, f(ir) <= f(jr) Induction argument f(i1) <= f(j1) If f(ir-1) <= f(jr-1) then f(ir) <= f(jr)

Scheduling all intervals Minimize number of processors to schedule all intervals

Lower bound In any instance of the interval partitioning problem, the number of processors is at least the depth of the set of intervals

Algorithm Sort by start times Suppose maximum depth is d, create d slots Schedule items in increasing order, assign each item to an open slot Correctness proof: When we reach an item, we always have an open slot

Scheduling tasks Each task has a length ti and a deadline di All tasks are available at the start One task may be worked on at a time All tasks must be completed Goal minimize maximum lateness Lateness = fi – di if fi >= di

Example Show the schedule 2, 3, 4, 5 first and compute lateness 6 2 3 4 4 5 5 12

Greedy Algorithm Earliest deadline first Order jobs by deadline This algorithm is optimal This result may be surprising, since it ignores the job lengths

Analysis Suppose the jobs are ordered by deadlines, d1 <= d2 <= . . . <= dn A schedule has an inversion if job j is scheduled before i where j > i The schedule A computed by the greedy algorithm has no inversions. Let O be the optimal schedule, we want to show that A has the same maximum lateness as O

Proof Lemma: There is an optimal schedule with no idle time. Lemma: There is an optimal schedule with no inversions and no idle time. Let O be an optimal schedule k inversions, we construct a new optimal schedule with k-1 inversions

If there is an inversion, there is an inversion of adjacent jobs Interchange argument Suppose there is a pair of jobs i and j, with i < j, and j scheduled immediately before i. Interchanging i and j does not increase the maximum lateness. Recall, di <= dj di dj di dj

Summary Simple algorithms for scheduling problems Correctness proofs Method 1: Identify an invariant and establish by induction that it holds Method 2: Show that the algorithm’s solution is as good as an optimal one by converting the optimal solution to the algorithm’s solution while preserving value

Greedy Algorithms: Homework Scheduling and Optimal Caching CSE 421 Algorithms Richard Anderson Lecture 8 Greedy Algorithms: Homework Scheduling and Optimal Caching

Announcements Monday, October 17 Class will meeting in CSE 305 Tablets again! Read sections 4.4 and 4.5 before class Lecture will be designed with the assumption that you have read the text

Greedy Algorithms Solve problems with the simplest possible algorithm The hard part: showing that something simple actually works Today’s problem Homework Scheduling Optimal Caching

Homework Scheduling Tasks to perform Deadlines on the tasks Freedom to schedule tasks in any order

Scheduling tasks Each task has a length ti and a deadline di All tasks are available at the start One task may be worked on at a time All tasks must be completed Goal minimize maximum lateness Lateness = fi – di if fi >= di

Example Show the schedule 2, 3, 4, 5 first and compute lateness Deadline 6 2 3 4 4 5 5 12

Greedy Algorithm Earliest deadline first Order jobs by deadline This algorithm is optimal This result may be surprising, since it ignores the job lengths

Analysis Suppose the jobs are ordered by deadlines, d1 <= d2 <= . . . <= dn A schedule has an inversion if job j is scheduled before i where j > i The schedule A computed by the greedy algorithm has no inversions. Let O be the optimal schedule, we want to show that A has the same maximum lateness as O

Proof Lemma: There is an optimal schedule with no idle time. Lemma: There is an optimal schedule with no inversions and no idle time. Let O be an optimal schedule k inversions, we construct a new optimal schedule with k-1 inversions

If there is an inversion, there is an inversion of adjacent jobs Interchange argument Suppose there is a pair of jobs i and j, with i < j, and j scheduled immediately before i. Interchanging i and j does not increase the maximum lateness. Recall, di <= dj di dj di dj

Result Earliest Deadline First algorithm constructs a schedule that minimizes the maximum lateness

Extensions What if the objective is to minimize the sum of the lateness? EDF does not seem to work If the tasks have release times and deadlines, and are non-preemptable, the problem is NP-complete What about the case with release times and deadlines where tasks are preemptable?

Optimal Caching Caching problem: Maintain collection of items in local memory Minimize number of items fetched

Caching example A, B, C, D, A, E, B, A, D, A, C, B, D, A

Optimal Caching If you know the sequence of requests, what is the optimal replacement pattern? Note – it is rare to know what the requests are in advance – but we still might want to do this: Some specific applications, the sequence is known Competitive analysis, compare performance on an online algorithm with an optimal offline algorithm

Farthest in the future algorithm Discard element used farthest in the future A, B, C, A, C, D, C, B, C, A, D

Correctness Proof Sketch Start with Optimal Solution O Convert to Farthest in the Future Solution F-F Look at the first place where they differ Convert O to evict F-F element There are some technicalities here to ensure the caches have the same configuration . . .

Richard Anderson Lecture 9 Dijkstra’s algorithm CSE 421 Algorithms Richard Anderson Lecture 9 Dijkstra’s algorithm

Who was Dijkstra? What were his major contributions?

Edsger Wybe Dijkstra http://www.cs.utexas.edu/users/EWD/

Single Source Shortest Path Problem Given a graph and a start vertex s Determine distance of every vertex from s Identify shortest paths to each vertex Express concisely as a “shortest paths tree” Each vertex has a pointer to a predecessor on shortest path 1 u u 1 2 3 5 s x s x 4 3 3 v v

Construct Shortest Path Tree from s 2 d d 1 a a 5 4 4 4 e e -3 c c s -2 s 3 3 2 6 g g b b 3 7 f f

Warmup If P is a shortest path from s to v, and if t is on the path P, the segment from s to t is a shortest path between s and t WHY? (submit an answer) v t s

Careful Proof Suppose s-v is a shortest path Suppose s-t is not a shortest path Therefore s-v is not a shortest path Therefore s-t is a shortest path

Prove if s-t not a shortest path then s-v is not a shortest path

Dijkstra’s Algorithm S = {}; d[s] = 0; d[v] = inf for v != s While S != V Choose v in V-S with minimum d[v] Add v to S For each w in the neighborhood of v d[w] = min(d[w], d[v] + c(v, w) 4 3 y 1 1 u 1 1 4 s x 2 2 2 2 v 2 3 5 z

Simulate Dijkstra’s algorithm (strarting from s) on the graph Vertex Added 1 Round s a b c d c a 1 1 2 3 4 5 3 2 1 s 4 4 6 1 b d 3

Dijkstra’s Algorithm as a greedy algorithm Elements committed to the solution by order of minimum distance

Correctness Proof Elements in S have the correct label Key to proof: when v is added to S, it has the correct distance label. y x s u v

Proof Let Pv be the path of length d[v], with an edge (u,v) Let P be some other path to v. Suppose P first leaves S on the edge (x, y) P = Psx + c(x,y) + Pyv Len(Psx) + c(x,y) >= d[y] Len(Pyv) >= 0 Len(P) >= d[y] + 0 >= d[v] y x s u v

Negative Cost Edges Draw a small example a negative cost edge and show that Dijkstra’s algorithm fails on this example

Bottleneck Shortest Path Define the bottleneck distance for a path to be the maximum cost edge along the path u 6 5 s x 2 5 4 3 v

Compute the bottleneck shortest paths 6 d d 6 a a 5 4 4 4 e e -3 c c s -2 s 3 3 2 6 g g b b 4 7 f f

How do you adapt Dijkstra’s algorithm to handle bottleneck distances Does the correctness proof still apply?

Richard Anderson Lecture 10 Minimum Spanning Trees CSE 421 Algorithms Richard Anderson Lecture 10 Minimum Spanning Trees

Announcements Homework 3 is due now Homework 4, due 10/26, available now Reading Chapter 5 (Sections 4.8, 4.9 will not be covered in class) Guest lecturers (10/28 – 11/4) Anna Karlin Venkat Guruswami

Shortest Paths Negative Cost Edges Dijkstra’s algorithm assumes positive cost edges For some applications, negative cost edges make sense Shortest path not well defined if a graph has a negative cost cycle a 6 4 -4 4 e -3 c s -2 3 3 2 6 g b f 4 7

Negative Cost Edge Preview Topological Sort can be used for solving the shortest path problem in directed acyclic graphs Bellman-Ford algorithm finds shortest paths in a graph with negative cost edges (or reports the existence of a negative cost cycle).

Dijkstra’s Algorithm Implementation and Runtime S = {}; d[s] = 0; d[v] = inf for v != s While S != V Choose v in V-S with minimum d[v] Add v to S For each w in the neighborhood of v d[w] = min(d[w], d[v] + c(v, w)) y u a HEAP OPERATIONS n Extract Min m Heap Update s x v b z Edge costs are assumed to be non-negative

Bottleneck Shortest Path Define the bottleneck distance for a path to be the maximum cost edge along the path u 6 5 s x 2 5 4 3 v

Compute the bottleneck shortest paths 6 d d 6 a a 5 4 4 4 e e -3 c c s -2 s 3 3 2 6 g g b b 7 7 f f

How do you adapt Dijkstra’s algorithm to handle bottleneck distances Does the correctness proof still apply?

Minimum Spanning Tree 15 t a 6 9 14 3 4 e 10 13 c s 11 20 5 17 2 7 g b 8 f 22 u 12 1 16 v

Greedy Algorithm 1 Prim’s Algorithm Extend a tree by including the cheapest out going edge 15 t a 6 9 14 3 4 e 10 13 c s 11 20 5 17 2 7 g b 8 f 22 u 12 1 16 v

Greedy Algorithm 2 Kruskal’s Algorithm Add the cheapest edge that joins disjoint components 15 t a 6 9 14 3 4 e 10 13 c s 11 20 5 17 2 7 g b 8 f 22 u 12 1 16 v

Greedy Algorithm 3 Reverse-Delete Delete the most expensive edge that does not disconnect the graph 15 t a 6 9 14 3 4 e 10 13 c s 11 20 5 17 2 7 g b 8 f 22 u 12 1 16 v

Why do the greedy algorithms work? For simplicity, assume all edge costs are distinct Let S be a subset of V, and suppose e = (u, v) be the minimum cost edge of E, with u in S and v in V-S e is in every minimum spanning tree

Proof Suppose T is a spanning tree that does not contain e Add e to T, this creates a cycle The cycle must have some edge e1 = (u1, v1) with u1 in S and v1 in V-S T1 = T – {e1} + {e} is a spanning tree with lower cost Hence, T is not a minimum spanning tree

Optimality Proofs Prim’s Algorithm computes a MST Kruskal’s Algorithm computes a MST

Reverse-Delete Algorithm Lemma: The most expensive edge on a cycle is never in a minimum spanning tree

Dealing with the assumption Force the edge weights to be distinct Add small quantities to the weights Give a tie breaking rule for equal weight edges

Richard Anderson Lecture 11 Minimum Spanning Trees CSE 421 Algorithms Richard Anderson Lecture 11 Minimum Spanning Trees

Announcements Monday – Class in EE1 003 (no tablets)

Foreign Exchange Arbitrage USD 1.2 1.2 USD EUR CAD ------ 0.8 1.2 1.6 0.6 ----- EUR CAD 0.6 USD 0.8 0.8 EUR CAD 1.6

Minimum Spanning Tree 15 t a 6 9 14 3 4 e 10 13 c s 11 20 5 17 2 7 g b 8 f 22 u 12 1 16 v Temporary Assumption: Edge costs distinct

Why do the greedy algorithms work? For simplicity, assume all edge costs are distinct Let S be a subset of V, and suppose e = (u, v) be the minimum cost edge of E, with u in S and v in V-S e is in every minimum spanning tree Or equivalently, if e is not in T, then T is not a minimum spanning tree e S V - S

Proof Suppose T is a spanning tree that does not contain e Add e to T, this creates a cycle The cycle must have some edge e1 = (u1, v1) with u1 in S and v1 in V-S T1 = T – {e1} + {e} is a spanning tree with lower cost Hence, T is not a minimum spanning tree S V - S e Why is e lower cost than e1?

Optimality Proofs Prim’s Algorithm computes a MST Kruskal’s Algorithm computes a MST

Reverse-Delete Algorithm Lemma: The most expensive edge on a cycle is never in a minimum spanning tree

Dealing with the distinct cost assumption Force the edge weights to be distinct Add small quantities to the weights Give a tie breaking rule for equal weight edges

MST Fun Facts The minimum spanning tree is determined only by the order of the edges – not by their magnitude Finding a maximum spanning tree is just like finding a minimum spanning tree

Divide and Conquer Array Mergesort(Array a){ n = a.Length; if (n <= 1) return a; b = Mergesort(a[0..n/2]); c = Mergesort(a[n/2+1 .. n-1]); return Merge(b, c);

Algorithm Analysis Cost of Merge Cost of Mergesort

T(n) = 2T(n/2) + cn; T(1) = c;

Recurrence Analysis Solution methods Unrolling recurrence Guess and verify Plugging in to a “Master Theorem”

A better mergesort (?) Divide into 3 subarrays and recursively sort Apply 3-way merge

T(n) = aT(n/b) + f(n)

T(n) = T(n/2) + cn

T(n) = 4T(n/2) + cn

T(n) = 2T(n/2) + n2

T(n) = 2T(n/2) + n1/2

Recurrences Three basic behaviors Dominated by initial case Dominated by base case All cases equal – we care about the depth

Richard Anderson Lecture 12 Recurrences CSE 421 Algorithms Richard Anderson Lecture 12 Recurrences

Announcements Wednesday class will meet in CSE 305.

Divide and Conquer Array Mergesort(Array a){ n = a.Length; if (n <= 1) return a; b = Mergesort(a[0..n/2]); c = Mergesort(a[n/2+1 .. n-1]); return Merge(b, c);

Algorithm Analysis Cost of Merge Cost of Mergesort

T(n) = 2T(n/2) + cn; T(1) = c;

Recurrence Analysis Solution methods Unrolling recurrence Guess and verify Plugging in to a “Master Theorem”

A better mergesort (?) Divide into 3 subarrays and recursively sort Apply 3-way merge

T(n) = aT(n/b) + f(n)

T(n) = T(n/2) + cn

T(n) = 4T(n/2) + cn

T(n) = 2T(n/2) + n2

T(n) = 2T(n/2) + n1/2

Recurrences Three basic behaviors Dominated by initial case Dominated by base case All cases equal – we care about the depth

Richard Anderson Lecture 13 Divide and Conquer CSE 421 Algorithms Richard Anderson Lecture 13 Divide and Conquer

Announcements Guest Lecturers Homework 5 and Homework 6 are available Anna Karlin (10/31, 11/2) Venkat Guruswami (10/28, 11/4) Homework 5 and Homework 6 are available I’m going to try to be clear when student submissions are expected Instructor Example Student Submission

What is the solution to: Student Submission What is the solution to: What are the asymptotic bounds for x < 1 and x > 1?

Solve by unrolling T(n) = n + 3T(n/4) Instructor Example

Solve by unrolling T(n) = n + 5T(n/2) Student Submission Answer: nlog(5/2)

A non-linear additive term Instructor Example T(n) = n2 + 3T(n/2)

What you really need to know about recurrences Work per level changes geometrically with the level Geometrically increasing (x > 1) The bottom level wins Geometrically decreasing (x < 1) The top level wins Balanced (x = 1) Equal contribution

Classify the following recurrences (Increasing, Decreasing, Balanced) Student Submission T(n) = n + 5T(n/8) T(n) = n + 9T(n/8) T(n) = n2 + 4T(n/2) T(n) = n3 + 7T(n/2) T(n) = n1/2 + 3T(n/4)

Divide and Conquer Algorithms Split into sub problems Recursively solve the problem Combine solutions Make progress in the split and combine stages Quicksort – progress made at the split step Mergesort – progress made at the combine step

Closest Pair Problem Given a set of points find the pair of points p, q that minimizes dist(p, q)

Divide and conquer If we solve the problem on two subsets, does it help? (Separate by median x coordinate) 1 2

Student Submission Packing Lemma Suppose that the minimum distance between points is at least , what is the maximum number of points that can be packed in a ball of radius ?

Combining Solutions Suppose the minimum separation from the sub problems is  In looking for cross set closest pairs, we only need to consider points with  of the boundary How many cross border interactions do we need to test?

A packing lemma bounds the number of distances to check 

Details Preprocessing: sort points by y Merge step Select points in boundary zone For each point in the boundary Find highest point on the other side that is at most  above Find lowest point on the other side that is at most  below Compare with the points in this interval (there are at most 6)

Identify the pairs of points that are compared in the merge step following the recursive calls Student Submission

Algorithm run time After preprocessing: T(n) = cn + 2 T(n/2)

Counting Inversions Count inversions on lower half 11 12 4 1 7 2 3 15 9 5 16 8 6 13 10 14 Count inversions on lower half Count inversions on upper half Count the inversions between the halves

Count the Inversions 11 12 4 1 7 2 3 15 9 5 16 8 6 13 10 14 9 5 16 8 6 13 10 14 11 12 4 1 7 2 3 15 11 12 4 1 7 2 3 15 9 5 16 8 6 13 10 14

Problem – how do we count inversions between sub problems in O(n) time? Solution – Count inversions while merging 1 2 3 4 7 11 12 15 5 6 8 9 10 13 14 16 Standard merge algorithms – add to inversion count when an element is moved from the upper array to the solution Instructor Example

Use the merge algorithm to count inversions 1 4 11 12 2 3 7 15 5 8 9 16 6 10 13 14 Student Submission

Richard Anderson Lecture 18 Dynamic Programming CSE 421 Algorithms Richard Anderson Lecture 18 Dynamic Programming

Announcements Wednesday class will meet in CSE 305.

Dynamic Programming The most important algorithmic technique covered in CSE 421 Key ideas Express solution in terms of a polynomial number of sub problems Order sub problems to avoid recomputation

Today - Examples Examples Optimal Billboard Placement Text, Solved Exercise, Pg 307 Linebreaking with hyphenation Compare with HW problem 6, Pg 317 String concatenation Text, Solved Exercise, Page 309

Billboard Placement Maximize income in placing billboards Constraint: (pi, vi), vi: value of placing billboard at position pi Constraint: At most one billboard every five miles Example {(6,5), (8,6), (12, 5), (14, 1)}

Opt[k] What are the sub problems?

Opt[k] = fun(Opt[0],…,Opt[k-1]) How is the solution determined from sub problems?

Solution j = 0; // j is five miles behind the current position // the last valid location for a billboard, if one placed at P[k] for k := 1 to n while (P[j] < P[k] – 5) j := j + 1; j := j – 1; Opt[k] = Max(Opt[k-1], V[k] + Opt[j]);

Optimal line breaking and hyphen-ation Problem: break lines and insert hyphens to make lines as balanced as possible Typographical considerations: Avoid excessive white space Limit number of hyphens Avoid widows and orphans Etc.

Penalty Function Pen(i, j) – penalty of starting a line a position i, and ending at position j Key technical idea Number the breaks between words/syllables Opt-i-mal line break-ing and hyph-en-a-tion is com-put-ed with dy-nam-ic pro-gram-ming

Opt[k] What are the sub problems?

Opt[k] = fun(Opt[0],…,Opt[k-1]) How is the solution determined from sub problems?

Solution for k := 1 to n Opt[k] := infinity; for j := 0 to k-1 Opt[k] := Min(Opt[k], Opt[j] + Pen(j, k));

But what if you want to layout the text? And not just know the minimum penalty?

Solution for k := 1 to n Opt[k] := infinity; for j := 0 to k-1 temp := Opt[j] + Pen(j, k); if (temp < Opt[k]) Opt[k] = temp; Best[k] := j;

String approximation Given a string S, and a library of strings B = {b1, …bk}, construct an approximation of the string S by using copies of strings in B. B = {abab, bbbaaa, ccbb, ccaacc} S = abaccbbbaabbccbbccaabab

Formal Model Strings from B assigned to non- overlapping positions of s Strings from B may be used multiple times Cost of  for unmatched character in s Cost of  for mismatched character in s MisMatch(i, j) – number of mismatched characters of bj, when aligned starting with position i in s.

Opt[k] What are the sub problems?

Opt[k] = fun(Opt[0],…,Opt[k-1]) How is the solution determined from sub problems?

Solution for i := 1 to n Opt[k] = Opt[k-1] + ; for j := 1 to |B| p = i – len(bj); Opt[k] = min(Opt[k], Opt[p-1] +  MisMatch(p, j));

Richard Anderson Lecture 19 Longest Common Subsequence CSE 421 Algorithms Richard Anderson Lecture 19 Longest Common Subsequence

Longest Common Subsequence C=c1…cg is a subsequence of A=a1…am if C can be obtained by removing elements from A (but retaining order) LCS(A, B): A maximum length sequence that is a subsequence of both A and B ocurranec occurrence attacggct tacgacca Instructor Example

Determine the LCS of the following strings BARTHOLEMEWSIMPSON KRUSTYTHECLOWN Student Submission

String Alignment Problem Align sequences with gaps Charge x if character x is unmatched Charge xy if character x is matched to character y CAT TGA AT CAGAT AGGA

LCS Optimization A = a1a2…am B = b1b2…bn Opt[j, k] is the length of LCS(a1a2…aj, b1b2…bk)

Optimization recurrence If aj = bk, Opt[j,k] = 1 + Opt[j-1, k-1] If aj != bk, Opt[j,k] = max(Opt[j-1,k], Opt[j,k-1])

Give the Optimization Recurrence for the String Alignment Problem Charge x if character x is unmatched Charge xy if character x is matched to character y Student Submission

Dynamic Programming Computation

Write the code to compute Opt[j,k] Student Submission

Storing the path information A[1..m], B[1..n] for i := 1 to m Opt[i, 0] := 0; for j := 1 to n Opt[0,j] := 0; Opt[0,0] := 0; for i := 1 to m for j := 1 to n if A[i] = B[j] { Opt[i,j] := 1 + Opt[i-1,j-1]; Best[i,j] := Diag; } else if Opt[i-1, j] >= Opt[i, j-1] { Opt[i, j] := Opt[i-1, j], Best[i,j] := Left; } else { Opt[i, j] := Opt[i, j-1], Best[i,j] := Down; } b1…bn a1…am

How good is this algorithm? Is it feasible to compute the LCS of two strings of length 100,000 on a standard desktop PC? Why or why not. Student Submission

Observations about the Algorithm The computation can be done in O(m+n) space if we only need one column of the Opt values or Best Values The algorithm can be run from either end of the strings

Divide and Conquer Algorithm Where does the best path cross the middle column? For a fixed i, and for each j, compute the LCS that has ai matched with bj

Constrained LCS LCSi,j(A,B): The LCS such that LCS4,3(abbacbb, cbbaa) a1,…,ai paired with elements of b1,…,bj ai+1,…am paired with elements of bj+1,…,bn LCS4,3(abbacbb, cbbaa)

A = RRSSRTTRTS B=RTSRRSTST Compute LCS5,1(A,B), LCS5,2(A,B),…,LCS5,9(A,B) Student Submission

A = RRSSRTTRTS B=RTSRRSTST Compute LCS5,1(A,B), LCS5,2(A,B),…,LCS5,9(A,B) j left right 3 1 2 4 5 6 7 8 9 Instructor Example

Computing the middle column From the left, compute LCS(a1…am/2,b1…bj) From the right, compute LCS(am/2+1…am,bj+1…bn) Add values for corresponding j’s Note – this is space efficient

Divide and Conquer A = a1,…,am B = b1,…,bn Find j such that Recurse LCS(a1…am/2, b1…bj) and LCS(am/2+1…am,bj+1…bn) yield optimal solution Recurse

Algorithm Analysis T(m,n) = T(m/2, j) + T(m/2, n-j) + cnm

Prove by induction that T(m,n) <= 2cmn Instructor Example

Richard Anderson Lecture 20 Space Efficient LCS CSE 421 Algorithms Richard Anderson Lecture 20 Space Efficient LCS

Longest Common Subsequence C=c1…cg is a subsequence of A=a1…am if C can be obtained by removing elements from A (but retaining order) LCS(A, B): A maximum length sequence that is a subsequence of both A and B Wednesday’s Result: O(mn) time, O(mn) space LCS Algorithm Today’s Result: O(mn) time, O(m+n) space LCS Algorithm

Digression: String Alignment Problem Align sequences with gaps Charge x if character x is unmatched Charge xy if character x is matched to character y Find alignment to minimize sum of costs CAT TGA AT CAGAT AGGA

Optimization Recurrence for the String Alignment Problem Charge x if character x is unmatched Charge xy if character x is matched to character y A = a1a2…am; B = b1b2…bn Opt[j, k] is the value of the minimum cost alignment a1a2…aj and b1b2…bk

Dynamic Programming Computation

Storing the path information A[1..m], B[1..n] for i := 1 to m Opt[i, 0] := 0; for j := 1 to n Opt[0,j] := 0; Opt[0,0] := 0; for i := 1 to m for j := 1 to n if A[i] = B[j] { Opt[i,j] := 1 + Opt[i-1,j-1]; Best[i,j] := Diag; } else if Opt[i-1, j] >= Opt[i, j-1] { Opt[i, j] := Opt[i-1, j], Best[i,j] := Left; } else { Opt[i, j] := Opt[i, j-1], Best[i,j] := Down; } b1…bn a1…am

Observations about the Algorithm The computation can be done in O(m+n) space if we only need one column of the Opt values or Best Values The algorithm can be run from either end of the strings

Divide and Conquer Algorithm Where does the best path cross the middle column? For a fixed i, and for each j, compute the LCS that has ai matched with bj

Constrained LCS LCSi,j(A,B): The LCS such that LCS4,3(abbacbb, cbbaa) a1,…,ai paired with elements of b1,…,bj ai+1,…am paired with elements of bj+1,…,bn LCS4,3(abbacbb, cbbaa)

A = RRSSRTTRTS B=RTSRRSTST Compute LCS5,0(A,B), LCS5,1(A,B), LCS5,2(A,B),…,LCS5,9(A,B)

A = RRSSRTTRTS B=RTSRRSTST Compute LCS5,0(A,B), LCS5,1(A,B), LCS5,2(A,B),…,LCS5,9(A,B) j left right 4 1 2 3 5 6 7 8 9

Computing the middle column From the left, compute LCS(a1…am/2,b1…bj) From the right, compute LCS(am/2+1…am,bj+1…bn) Add values for corresponding j’s Note – this is space efficient

Divide and Conquer A = a1,…,am B = b1,…,bn Find j such that Recurse LCS(a1…am/2, b1…bj) and LCS(am/2+1…am,bj+1…bn) yield optimal solution Recurse

Algorithm Analysis T(m,n) = T(m/2, j) + T(m/2, n-j) + cnm

Prove by induction that T(m,n) <= 2cmn

Shortest Path Problem Dijkstra’s Single Source Shortest Paths Algorithm O(mlog n) time, positive cost edges General case – handling negative edges If there exists a negative cost cycle, the shortest path is not defined Bellman-Ford Algorithm O(mn) time for graphs with negative cost edges

Lemma If a graph has no negative cost cycles, then the shortest paths are simple paths Shortest paths have at most n-1 edges

Shortest paths with a fixed number of edges Find the shortest path from v to w with exactly k edges

Express as a recurrence Optk(w) = minx [Optk-1(x) + cxw] Opt0(w) = 0 if v=w and infinity otherwise

Algorithm, Version 1 foreach w M[0, w] = infinity; M[0, v] = 0; for i = 1 to n-1 M[i, w] = minx(M[i-1,x] + cost[x,w]);

Algorithm, Version 2 foreach w M[0, w] = infinity; M[0, v] = 0; for i = 1 to n-1 M[i, w] = min(M[i-1, w], minx(M[i-1,x] + cost[x,w]))

Algorithm, Version 3 foreach w M[w] = infinity; M[v] = 0; for i = 1 to n-1 M[w] = min(M[w], minx(M[x] + cost[x,w]))

Correctness Proof for Algorithm 3 Key lemma – at the end of iteration i, for all w, M[w] <= M[i, w]; Reconstructing the path: Set P[w] = x, whenever M[w] is updated from vertex x

If the pointer graph has a cycle, then the graph has a negative cost cycle If P[w] = x then M[w] >= M[x] + cost(x,w) Equal after update, then M[x] could be reduced Let v1, v2,…vk be a cycle in the pointer graph with (vk,v1) the last edge added Just before the update M[vj] >= M[vj+1] + cost(vj+1, vj) for j < k M[vk] > M[v1] + cost(v1, vk) Adding everything up 0 > cost(v1,v2) + cost(v2,v3) + … + cost(vk, v1) v1 v4 v2 v3

Negative Cycles If the pointer graph has a cycle, then the graph has a negative cycle Therefore: if the graph has no negative cycles, then the pointer graph has no negative cycles

Finding negative cost cycles What if you want to find negative cost cycles?

Richard Anderson Lecture 21 Shortest Path Network Flow Introduction CSE 421 Algorithms Richard Anderson Lecture 21 Shortest Path Network Flow Introduction

Announcements Friday, 11/18, Class will meet in CSE 305 Reading 7.1-7.3, 7.5-7.6 Section 7.4 will not be covered

Find the shortest paths from v with exactly k edges 7 v y 5 1 -2 -2 3 1 x 3 z

Express as a recurrence Optk(w) = minx [Optk-1(x) + cxw] Opt0(w) = 0 if v=w and infinity otherwise

Algorithm, Version 1 foreach w M[0, w] = infinity; M[0, v] = 0; for i = 1 to n-1 M[i, w] = minx(M[i-1,x] + cost[x,w]);

Algorithm, Version 2 foreach w M[0, w] = infinity; M[0, v] = 0; for i = 1 to n-1 M[i, w] = min(M[i-1, w], minx(M[i-1,x] + cost[x,w]))

Algorithm, Version 3 foreach w M[w] = infinity; M[v] = 0; for i = 1 to n-1 M[w] = min(M[w], minx(M[x] + cost[x,w]))

Algorithm 2 vs Algorithm 3 x y z 7 v y 5 1 -2 -2 3 1 x 3 z i v x y z

Correctness Proof for Algorithm 3 Key lemma – at the end of iteration i, for all w, M[w] <= M[i, w]; Reconstructing the path: Set P[w] = x, whenever M[w] is updated from vertex x 7 v y 5 1 -2 -2 3 1 x 3 z

Negative Cost Cycle example 7 i v x y z v y 3 1 -2 -2 3 3 x -2 z

If the pointer graph has a cycle, then the graph has a negative cost cycle If P[w] = x then M[w] >= M[x] + cost(x,w) Equal after update, then M[x] could be reduced Let v1, v2,…vk be a cycle in the pointer graph with (vk,v1) the last edge added Just before the update M[vj] >= M[vj+1] + cost(vj+1, vj) for j < k M[vk] > M[v1] + cost(v1, vk) Adding everything up 0 > cost(v1,v2) + cost(v2,v3) + … + cost(vk, v1) v1 v4 v2 v3

Negative Cycles If the pointer graph has a cycle, then the graph has a negative cycle Therefore: if the graph has no negative cycles, then the pointer graph has no negative cycles

Finding negative cost cycles What if you want to find negative cost cycles?

Network Flow

Network Flow Definitions Capacity Source, Sink Capacity Condition Conservation Condition Value of a flow

Flow Example u 20 10 30 s t 10 20 v

Residual Graph u u 15/20 0/10 5 10 15 15/30 s t 15 15 s t 5 5/10 20/20 v v

Richard Anderson Lecture 22 Network Flow CSE 421 Algorithms Richard Anderson Lecture 22 Network Flow

Outline Network flow definitions Flow examples Augmenting Paths Residual Graph Ford Fulkerson Algorithm Cuts Maxflow-MinCut Theorem

Network Flow Definitions Flowgraph: Directed graph with distinguished vertices s (source) and t (sink) Capacities on the edges, c(e) >= 0 Problem, assign flows f(e) to the edges such that: 0 <= f(e) <= c(e) Flow is conserved at vertices other than s and t Flow conservation: flow going into a vertex equals the flow going out The flow leaving the source is a large as possible

Flow Example a d g s b e h t c f i 20 20 5 20 5 5 10 5 5 20 5 20 20 30

Find a maximum flow a d g s b e h t c f i Student Submission 20 20 5 30 20 30 s b e h t 20 5 5 10 20 5 25 20 c f i 20 10 Student Submission

Find a maximum flow a d g s b e h t c f i 20 20 20 20 20 5 20 5 5 20 5 25 30 20 20 30 s b e h t 20 20 30 5 20 5 5 15 10 15 20 5 25 5 20 c f i 20 20 10 10 Discussion slide

Augmenting Path Algorithm Vertices v1,v2,…,vk v1 = s, vk = t Possible to add b units of flow between vj and vj+1 for j = 1 … k-1 u 10/20 0/10 10/30 s t 5/10 15/20 v

Find two augmenting paths 2/5 2/2 0/1 2/4 3/4 3/4 3/3 3/3 1/3 1/3 s t 3/3 3/3 2/2 1/3 3/3 1/3 2/2 1/3 Student Submission

Residual Graph Flow graph showing the remaining capacity Flow graph G, Residual Graph GR G: edge e from u to v with capacity c and flow f GR: edge e’ from u to v with capacity c – f GR: edge e’’ from v to u with capacity f

Residual Graph u u 15/20 0/10 5 10 15 15/30 s t 15 15 s t 5 5/10 20/20 v v

Build the residual graph 3/5 d g 2/4 2/2 1/5 1/5 s 1/1 t 3/3 2/5 e h 1/1 Student Submission

Augmenting Path Lemma Let P = v1, v2, …, vk be a path from s to t with minimum capacity b in the residual graph. b units of flow can be added along the path P in the flow graph. u 15/20 0/10 15/30 s t 5/10 20/20 v

Richard Anderson Lecture 23 Network Flow CSE 421 Algorithms Richard Anderson Lecture 23 Network Flow

Review Network flow definitions Flow examples Augmenting Paths Residual Graph Ford Fulkerson Algorithm Cuts Maxflow-MinCut Theorem

Network Flow Definitions Flowgraph: Directed graph with distinguished vertices s (source) and t (sink) Capacities on the edges, c(e) >= 0 Problem, assign flows f(e) to the edges such that: 0 <= f(e) <= c(e) Flow is conserved at vertices other than s and t Flow conservation: flow going into a vertex equals the flow going out The flow leaving the source is a large as possible

Find a maximum flow a d g s b e h t c f i 20/20 20/20 5 20/20 5 5 5/5 5 5/5 20 25/30 20/20 30/30 s b e h t 20/20 5 5/5 10 20/20 5 15/20 15/25 c f i 20/20 10/10

Residual Graph Flow graph showing the remaining capacity Flow graph G, Residual Graph GR G: edge e from u to v with capacity c and flow f GR: edge e’ from u to v with capacity c – f GR: edge e’’ from v to u with capacity f

Augmenting Path Lemma Let P = v1, v2, …, vk be a path from s to t with minimum capacity b in the residual graph. b units of flow can be added along the path P in the flow graph. u u 5 10 15/20 0/10 15 15 15 s t 15/30 s t 5 5 20 5/10 20/20 v v

Proof Add b units of flow along the path P What do we need to verify to show we have a valid flow after we do this? Student Submission

Ford-Fulkerson Algorithm (1956) while not done Construct residual graph GR Find an s-t path P in GR with capacity b > 0 Add b units along in G If the sum of the capacities of edges leaving S is at most C, then the algorithm takes at most C iterations

Cuts in a graph Cut: Partition of V into disjoint sets S, T with s in S and t in T. Cap(S,T) – sum of the capacities of edges from S to T Flow(S,T) – net flow out of S Sum of flows out of S minus sum of flows into S Flow(S,T) <= Cap(S,T)

What is Cap(S,T) and Flow(S,T) S={s, a, b, e, h}, T = {c, f, i, d, g, t} 20/20 20/20 a d g 5 20/20 5 5 20/20 5/5 5 5/5 20 25/30 20/20 30/30 s b e h t 20/20 5 5/5 10 20/20 5 15/20 15/25 c f i 20/20 10/10 Student Submission

Minimum value cut u 10 40 10 s t 10 40 v

Find a minimum value cut 6 6 5 8 10 3 6 2 t s 7 4 5 3 8 5 4 Student Submission

Find a minimum value cut 6 6 5 8 10 3 6 2 t s 7 4 5 3 8 5 4 Duplicate slide For discussion

MaxFlow – MinCut Theorem There exists a flow which has the same value of the minimum cut Proof: Consider a flow where the residual graph has no s-t path with positive capacity Let S be the set of vertices in GR reachable from s with paths of positive capacity s t

Let S be the set of vertices in GR reachable from s with paths of positive capacity u v t T S What is Cap(u,v) in GR? What can you say about Cap(u,v) and Flow(u,v) in G? What can you say about Cap(v,u) and Flow(v,u) in G? Student Submission

Max Flow - Min Cut Theorem Ford-Fulkerson algorithm finds a flow where the residual graph is disconnected, hence FF finds a maximum flow. If we want to find a minimum cut, we begin by looking for a maximum flow.

Performance The worst case performance of the Ford- Fulkerson algorithm is horrible u 1000 1000 1 s t 1000 1000 v

Better methods of finding augmenting paths Find the maximum capacity augmenting path O(m2log(C)) time Find the shortest augmenting path O(m2n) Find a blocking flow in the residual graph O(mnlog n)

Richard Anderson Lecture 24 Maxflow MinCut Theorem CSE 421 Algorithms Richard Anderson Lecture 24 Maxflow MinCut Theorem

Find a minimum value cut 5/6 5/6 5/5 8 3/10 3/3 5/6 t s 6/7 2/2 4/4 5 3/3 3/8 5 3/4

MaxFlow – MinCut Theorem There exists a flow which has the same value of the minimum cut Proof: Consider a flow where the residual graph has no s-t path with positive capacity Let S be the set of vertices in GR reachable from s with paths of positive capacity s t

Let S be the set of vertices in GR reachable from s with paths of positive capacity u v t T S What is Cap(u,v) in GR? What can you say about Cap(u,v) and Flow(u,v) in G? What can you say about Cap(v,u) and Flow(v,u) in G?

Max Flow - Min Cut Theorem Ford-Fulkerson algorithm finds a flow where the residual graph is disconnected, hence FF finds a maximum flow. If we want to find a minimum cut, we begin by looking for a maximum flow.

Performance The worst case performance of the Ford- Fulkerson algorithm is horrible u 1000 1000 1 s t 1000 1000 v

Better methods of finding augmenting paths Find the maximum capacity augmenting path O(m2log(C)) time Find the shortest augmenting path O(m2n) Find a blocking flow in the residual graph O(mnlog n)

Bipartite Matching A graph G=(V,E) is bipartite if the vertices can be partitioned into disjoints sets X,Y A matching M is a subset of the edges that does not share any vertices Find a matching as large as possible

Application A collection of teachers A collection of courses And a graph showing which teachers can teach which course RA 303 PB 321 CC 326 DG 401 AK 421

Converting Matching to Network Flow s t

Finding edge disjoint paths

Theorem The maximum number of edge disjoint paths equals the minimum number of edges whose removal separates s from t

Richard Anderson Lecture 25 Network Flow Applications CSE 421 Algorithms Richard Anderson Lecture 25 Network Flow Applications

Today’s topics Problem Reductions Circulations Lowerbound constraints on flows Survey design Airplane scheduling

Problem Reduction Reduce Problem A to Problem B Practical Theoretical Convert an instance of Problem A to an instance Problem B Use a solution of Problem B to get a solution to Problem A Practical Use a program for Problem B to solve Problem A Theoretical Show that Problem B is at least as hard as Problem A

Problem Reduction Examples Reduce the problem of finding the Maximum of a set of integers to finding the Minimum of a set of integers

Undirected Network Flow Undirected graph with edge capacities Flow may go either direction along the edges (subject to the capacity constraints) u 10 20 5 s t 20 10 v

Circulation Problem Directed graph with capacities, c(e) on the edges, and demands d(v) on vertices Find a flow function that satisfies the capacity constraints and the vertex demands 0 <= f(e) <= c(e) fin(v) – fout(v) = d(v) Circulation facts: Feasibility problem d(v) < 0: source; d(v) > 0: sink Must have vd(v)=0 to be feasible 10 u 20 10 -25 10 s t 20 20 10 v -5

Find a circulation in the following graph 2 3 4 b e 3 2 6 5 3 -4 -2 2 7 2 1 a c f h 4 9 3 2 8 d g 5 -5 4 10

Reducing the circulation problem to Network flow

Formal reduction Add source node s, and sink node t For each node v, with d(v) < 0, add an edge from s to v with capacity -d(v) For each node v, with d(v) > 0, add an edge from v to t with capacity d(v) Find a maximum s-t flow. If this flow has size vcap(s,v) than the flow gives a circulation satisifying the demands

Circulations with lowerbounds on flows on edges Each edge has a lowerbound l(e). The flow f must satisfy l(e) <= f(e) <= c(e) -2 3 1,2 a x 0,2 1,4 b y 2,4 -4 3

Removing lowerbounds on edges Lowerbounds can be shifted to the demands b y 2,5 -4 3 b y 3 -2 1

Formal reduction Lin(v): sum of lowerbounds on incoming edges Lout(v): sum of lowerbounds on outgoing edges Create new demands d’ and capacities c’ on vertices and edges d’(v) = d(v) + lout(v) – lin(v) c’(e) = c(e) – l(e)

Application Customized surveys Ask customers about products Only ask customers about products they use Limited number of questions you can ask each customer Need to ask a certain number of customers about each product Information available about which products each customer has used

Details Customer C1, . . ., Cn Products P1, . . ., Pm Si is the set of products used by Ci Customer Ci can be asked between ci and c’i questions Questions about product Pj must be asked on between pj and p’j surveys

Circulation construction

Airplane Scheduling Given an airline schedule, and starting locations for the planes, is it possible to use a fixed set of planes to satisfy the schedule. Schedule [segments] Departure, arrival pairs (cities and times)

Compatible segments Segments S1 and S2 are compatible if the same plane can be used on S1 and S2 End of S1 equals start of S2, and enough time for turn around between arrival and departure times End of S1 is different from S2, but there is enough time to fly between cities

Graph representation Each segment, Si, is represented as a pair of vertices (di, ai, for departure and arrival), with an edge between them. Add an edge between ai and dj if Si is compatible with Sj. di ai ai dj

Setting up a flow problem 1,1 di ai 0,1 ai dj 1 -1 P’i Pi

Result The planes can satisfy the schedule iff there is a feasible circulation

Richard Anderson Lecture 26 Open Pit Mining CSE 421 Algorithms Richard Anderson Lecture 26 Open Pit Mining

Today’s topics Open Pit Mining Problem Task Selection Problem Reduction to Min cut problem

Open Pit Mining Each unit of earth has a profit (possibly negative) Getting to the ore below the surface requires removing the dirt above Test drilling gives reasonable estimates of costs Plan an optimal mining operation

Determine an optimal mine -5 -3 -4 -2 2 -4 -6 -1 -7 -10 -1 -3 -3 3 -2 4 -3 -10 -10 -10 4 7 -2 7 3 -10 -10 -10 -10 6 8 6 -3 -10 -10 -10 -10 -10 -10 1 4 4 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 Student Submission

Generalization Precedence graph G=(V,E) Each v in V has a profit p(v) A set F if feasible if when w in F, and (v,w) in E, then v in F. Find a feasible set to maximize the profit -4 6 -2 -3 5 -1 -10 4

Min cut algorithm for profit maximization Construct a flow graph where the minimum cut identifies a feasible set that maximizes profit

Precedence graph construction Precedence graph G=(V,E) Each edge in E has infinite capacity Add vertices s, t Each vertex in V is attached to s and t with finite capacity edges

Show a finite value cut with at least two vertices on each side of the cut Student Submission

The sink side of the cut is a feasible set No edges permitted from S to T If a vertex is in T, all of its ancestors are in T

Setting the costs If p(v) > 0, If p(v) < 0 If p(v) = 0 cap(v,t) = p(v) cap(s,v) = 0 If p(v) < 0 cap(s,v) = -p(v) cap(v,t) = 0 If p(v) = 0 3 3 1 1 -3 -1 -3 3 2 1 2 3 t

Enumerate all finite s,t cuts and show their capacities 2 2 -2 1 2 -2 1 1 2 1 t Student Submission

Minimum cut gives optimal solution Why? 2 2 -2 1 2 -2 1 1 2 1 t

Computing the Profit Cost(W) = {w in W; p(w) < 0}-p(w) Benefit(W) = {w in W; p(w) > 0} p(w) Profit(W) = Benefit(W) – Cost(W) Maximum cost and benefit C = Cost(V) B = Benefit(V)

Express Cap(S,T) in terms of B, C, Cost(T), Benefit(T), and Profit(T) Student Submission

Cap(S,T) = B – Profit(T) Cap(S,T) = Cost(T) + Ben(S) = Cost(T) + Ben(S) + Ben(T) – Ben(T) = B + Cost(T) – Ben(T) = B – Profit(T)

Summary Construct flow graph Finite cut gives a feasible set of tasks Infinite capacity for precedence edges Capacities to source/sink based on cost/benefit Finite cut gives a feasible set of tasks Minimizing the cut corresponds to maximizing the profit Find minimum cut with a network flow algorithm

Richard Anderson Lecture 27 Network Flow Applications CSE 421 Algorithms Richard Anderson Lecture 27 Network Flow Applications

Today’s topics More network flow reductions Airplane scheduling Image segmentation Baseball elimination

Airplane Scheduling Given an airline schedule, and starting locations for the planes, is it possible to use a fixed set of planes to satisfy the schedule. Schedule [segments] Departure, arrival pairs (cities and times) Approach Construct a circulation problem where paths of flow give segments flown by each plane

Compatible segments Segments S1 and S2 are compatible if the same plane can be used on S1 and S2 End of S1 equals start of S2, and enough time for turn around between arrival and departure times End of S1 is different from S2, but there is enough time to fly between cities

Graph representation Each segment, Si, is represented as a pair of vertices (di, ai, for departure and arrival), with an edge between them. Add an edge between ai and dj if Si is compatible with Sj. di ai ai dj

Setting up a flow problem 1,1 di ai 0,1 ai dj 1 -1 P’i Pi

Result The planes can satisfy the schedule iff there is a feasible circulation

Image Segmentation Separate foreground from background

Image analysis ai: value of assigning pixel i to the foreground bi: value of assigning pixel i to the background pij: penalty for assigning i to the foreground, j to the background or vice versa A: foreground, B: background Q(A,B) = {i in A}ai + {j in B}bj - {(i,j) in E, i in A, j in B}pij

Pixel graph to flow graph s t

Mincut Construction s av pvu u v puv bv t

Baseball elimination Can the Dung Beetles win the league? Remaining games: AB, AC, AD, AD, AD, BC, BC, BC, BD, CD W L Ants 4 2 Bees Cockroaches 3 Dung Beetles 1 5

Baseball elimination Can the Fruit Flies win the league? Remaining games: AC, AD, AD, AD, AF, BC, BC, BC, BC, BC, BD, BE, BE, BE, BE, BF, CE, CE, CE, CF, CF, DE, DF, EF, EF W L Ants 17 12 Bees 16 7 Cockroaches Dung Beetles 14 13 Earthworms 10 Fruit Flies 15

Assume Fruit Flies win remaining games Fruit Flies are tied for first place if no team wins more than 19 games Allowable wins Ants (2) Bees (3) Cockroaches (3) Dung Beetles (5) Earthworms (5) 18 games to play AC, AD, AD, AD, BC, BC, BC, BC, BC, BD, BE, BE, BE, BE, CE, CE, CE, DE W L Ants 17 13 Bees 16 8 Cockroaches 9 Dung Beetles 14 Earthworms 12 Fruit Flies 19 15

Remaining games AC, AD, AD, AD, BC, BC, BC, BC, BC, BD, BE, BE, BE, BE, CE, CE, CE, DE s AC AD BC BD BE CE DE E A B C D T

Network flow applications summary Bipartite Matching Disjoint Paths Airline Scheduling Survey Design Baseball Elimination Project Selection Image Segmentation

Richard Anderson Lecture 28 NP Completeness CSE 421 Algorithms Richard Anderson Lecture 28 NP Completeness

Announcements Final Exam HW 10, due Friday, 1:30 pm This weeks topic Monday, December 12, 2:30-4:20 pm, EE1 003 Closed book, closed notes Practice final and answer key available HW 10, due Friday, 1:30 pm This weeks topic NP-completeness Reading: 8.1-8.8: Skim the chapter, and pay more attention to particular points emphasized in class

Algorithms vs. Lower bounds Algorithmic Theory What we can compute I can solve problem X with resources R Proofs are almost always to give an algorithm that meets the resource bounds

Lower bounds How do you show that something can’t be done? Establish evidence that a whole bunch of smart people can’t do something! Lower bounds How do you show that something can’t be done?

Theory of NP Completeness Most significant mathematical theory associated with computing

The Universe NP-Complete NP P

Polynomial Time P: Class of problems that can be solved in polynomial time Corresponds with problems that can be solved efficiently in practice Right class to work with “theoretically”

Polynomial time reductions Y Polynomial Time Reducible to X Solve problem Y with a polynomial number of computation steps and a polynomial number of calls to a black box that solves X Notations: Y <P X

Lemma Suppose Y <P X. If X can be solved in polynomial time, then Y can be solved in polynomial time.

Lemma Suppose Y <P X. If Y cannot be solved in polynomial time, then X cannot be solved in polynomial time.

Sample Problems Independent Set Graph G = (V, E), a subset S of the vertices is independent if there are no edges between vertices in S 1 2 3 5 4 6 7

Vertex Cover Vertex Cover Graph G = (V, E), a subset S of the vertices is a vertex cover if every edge in E has at least one endpoint in S 1 2 3 5 4 6 7

Decision Problems Theory developed in terms of yes/no problems Independent set Given a graph G and an integer K, does G have an independent set of size at least K Vertex cover Given a graph G and an integer K, does the graph have a vertex cover of size at most K.

IS <P VC Lemma: A set S is independent iff V-S is a vertex cover To reduce IS to VC, we show that we can determine if a graph has an independent set of size K by testing for a Vertex cover of size n - K

Satisfiability Given a boolean formula, does there exist a truth assignment to the variables to make the expression true

Definitions Boolean variable: x1, …, xn Term: xi or its negation !xi Clause: disjunction of terms t1 or t2 or … tj Problem: Given a collection of clauses C1, . . ., Ck, does there exist a truth assignment that makes all the clauses true (x1 or !x2), (!x1 or !x3), (x2 or !x3)

3-SAT Each clause has exactly 3 terms Variables x1, . . ., xn Clauses C1, . . ., Ck Cj = (tj1 or tj2 or tj3) Fact: Every instance of SAT can be converted in polynomial time to an equivalent instance of 3-SAT

Theorem: 3-SAT <P IS Build a graph that represents the 3-SAT instance Vertices yi, zi with edges (yi, zi) Truth setting Vertices uj1, uj2, and uj3 with edges (uj1, uj2), (uj2,uj3), (uj3, uj1) Truth testing Connections between truth setting and truth testing: If tjl = xi, then put in an edge (ujl, zi) If tjl = !xi, then put in an edge (ujl, yi)

Example C1 = x1 or x2 or !x3 C2 = x1 or !x2 or x3 C3 = !x1 or x2 or x3

Thm: 3-SAT instance is satisfiable iff there is an IS of size n + k

Richard Anderson Lecture 29 NP-Completeness CSE 421 Algorithms Richard Anderson Lecture 29 NP-Completeness

Sample Problems Independent Set Graph G = (V, E), a subset S of the vertices is independent if there are no edges between vertices in S 1 2 3 5 4 6 7

Satisfiability Given a boolean formula, does there exist a truth assignment to the variables to make the expression true

Definitions Boolean variable: x1, …, xn Term: xi or its negation !xi Clause: disjunction of terms t1 or t2 or … tj Problem: Given a collection of clauses C1, . . ., Ck, does there exist a truth assignment that makes all the clauses true (x1 or !x2), (!x1 or !x3), (x2 or !x3)

3-SAT Each clause has exactly 3 terms Variables x1, . . ., xn Clauses C1, . . ., Ck Cj = (tj1 or tj2 or tj3) Fact: Every instance of SAT can be converted in polynomial time to an equivalent instance of 3-SAT

Theorem: 3-SAT <P IS Build a graph that represents the 3-SAT instance Vertices yi, zi with edges (yi, zi) Truth setting Vertices uj1, uj2, and uj3 with edges (uj1, uj2), (uj2,uj3), (uj3, uj1) Truth testing Connections between truth setting and truth testing: If tjl = xi, then put in an edge (ujl, zi) If tjl = !xi, then put in an edge (ujl, yi)

Example C1 = x1 or x2 or !x3 C2 = x1 or !x2 or x3 C3 = !x1 or x2 or x3

Thm: 3-SAT instance is satisfiable iff there is an IS of size n + k

What is NP? Problems solvable in non-deterministic polynomial time . . . Problems where “yes” instances have polynomial time checkable certificates

Certificate examples Independent set of size K Satifisfiable formula The Independent Set Satifisfiable formula Truth assignment to the variables Hamiltonian Circuit Problem A cycle including all of the vertices K-coloring a graph Assignment of colors to the vertices

NP-Completeness A problem X is NP-complete if X is in NP For every Y in NP, Y <P X X is a “hardest” problem in NP If X is NP-Complete, Z is in NP and X <P Z Then Z is NP-Complete

Cook’s Theorem The Circuit Satisfiability Problem is NP- Complete

Garey and Johnson

History Jack Edmonds Steve Cook Dick Karp Leonid Levin Identified NP Cook’s Theorem – NP-Completeness Dick Karp Identified “standard” collection of NP-Complete Problems Leonid Levin Independent discovery of NP-Completeness in USSR

Populating the NP-Completeness Universe Circuit Sat <P 3-SAT 3-SAT <P Independent Set Independent Set <P Vertex Cover 3-SAT <P Hamiltonian Circuit Hamiltonian Circuit <P Traveling Salesman 3-SAT <P Integer Linear Programming 3-SAT <P Graph Coloring 3-SAT <P Subset Sum Subset Sum <P Scheduling with Release times and deadlines

Hamiltonian Circuit Problem Hamiltonian Circuit – a simple cycle including all the vertices of the graph

Traveling Salesman Problem Minimum cost tour highlighted Traveling Salesman Problem Given a complete graph with edge weights, determine the shortest tour that includes all of the vertices (visit each vertex exactly once, and get back to the starting point) 3 7 7 2 2 5 4 1 1 4

Thm: HC <P TSP

Richard Anderson Lecture 30 NP-Completeness CSE 421 Algorithms Richard Anderson Lecture 30 NP-Completeness

NP-Completeness A problem X is NP-complete if X is in NP For every Y in NP, Y <P X X is a “hardest” problem in NP To show X is NP complete, we must show how to reduce every problem in NP to X

Cook’s Theorem The Circuit Satisfiability Problem is NP- Complete Given a boolean circuit, determine if there is an assignment of boolean values to the input to make the output true

Circuit SAT Satisfying assignment x1 = T, x2 = F, x3 = F x4 = T, x5 = T AND Circuit SAT OR OR AND AND AND AND NOT OR NOT OR AND OR AND NOT AND NOT OR NOT AND x3 x4 x5 x1 x2

Proof of Cook’s Theorem Reduce an arbitrary problem Y in NP to X Let A be a non-deterministic polynomial time algorithm for Y Convert A to a circuit, so that Y is a Yes instance iff and only if the circuit is satisfiable

Garey and Johnson

History Jack Edmonds Steve Cook Dick Karp Leonid Levin Identified NP Cook’s Theorem – NP-Completeness Dick Karp Identified “standard” collection of NP-Complete Problems Leonid Levin Independent discovery of NP-Completeness in USSR

Populating the NP-Completeness Universe Circuit Sat <P 3-SAT 3-SAT <P Independent Set Independent Set <P Vertex Cover 3-SAT <P Hamiltonian Circuit Hamiltonian Circuit <P Traveling Salesman 3-SAT <P Integer Linear Programming 3-SAT <P Graph Coloring 3-SAT <P Subset Sum Subset Sum <P Scheduling with Release times and deadlines

Hamiltonian Circuit Problem Hamiltonian Circuit – a simple cycle including all the vertices of the graph

Traveling Salesman Problem Minimum cost tour highlighted Traveling Salesman Problem Given a complete graph with edge weights, determine the shortest tour that includes all of the vertices (visit each vertex exactly once, and get back to the starting point) 3 7 7 2 2 5 4 1 1 4

Thm: HC <P TSP

Number Problems Subset sum problem Subset sum problem is NP-Complete Given natural numbers w1,. . ., wn and a target number W, is their a subset that adds up to exactly W Subset sum problem is NP-Complete Subset Sum problem can be solved in O(nW) time

Subset sum problem The reduction to show Subset Sum is NP- complete involves numbers with n digits In that case, the O(nW) algorithm is an exponential time and space algorithm

Course summary What did we cover in the last 30 lectures? Stable Matching Models of computation and efficiency Basic graph algorithms BFS, Bipartiteness, SCC, Cycles, Topological Sort Greedy Algorithms Interval Scheduling, HW Scheduling Correctness proofs Dijkstra’s Algorithm Minimum Spanning Trees Recurrences Divide and Conquer Algorithms Closest Pair, FFT Dynamic Programming Weighted interval scheduling, subset sum, knapsack, longest common subsequence, shortest paths Network Flow Ford Fulkerson, Maxflow/mincut, Applications NP-Completeness