Graph Theory and Algorithm 02 COP 6726: New Directions in Database Systems Graph Theory and Algorithm 02
A* algorithm
A* algorithm f(n) = g(n) + h(n) g(n) = “cost from the starting node to reach n” h(n) = “estimate of the cost of the cheapest path from n to the goal node” A* generates an optimal solution if h(n) is an admissible heuristic and the search space is a tree: h(n) is admissible if it never overestimates the cost to reach the destination node A heuristic is admissible if it is too optimistic, estimating the cost to be smaller than it actually is. Example: h(n) = “Euclidean distance to destination”
A* algorithm f(n) = g(n) + h(n)
Clustering Graph
Markov Chains 0.6 0.4 0.8 .6 .2 .4 .8 .6 .2 .4 .8 .44 .28 .56 .78 = A B .44 .28 .56 .78 .44 .28 .56 .78 .35 .32 .65 .68 0.2 = A B .6 .2 .4 .8 .35 .32 .65 .68 .35 .32 .65 .68 .34 .33 .66 .66 = .34 .33 .66 .66 .34 .33 .66 .66 .34 .33 .66 .66 =
Clustering on Graphs A B C D E F G A 0, 1, 1, 1, 0, 0, 0 B 0, 1, 1, 1, 0, 0, 0 1, 0, 1, 1, 1, 0, 0 1, 1, 0, 1, 0, 0, 0 1, 1, 1, 0, 0, 0, 0 0, 1, 0, 0, 0, 1, 1 0, 0, 0, 0, 1, 0, 1 0, 0, 0, 0, 1, 1, 0 A B E G C D F
Random Walk A B C D E F G
Random Walk A B C D E F G A B C D E F G A B C D E F G
Random Walk A B C D E F G A B C D E F G A B C D E F G
Random Walk A B C D E F G A B C D E F G A B C D E F G
Random Walk A B C D E F G A B C D E F G A B C D E F G
Random Walk A B C D E F G A B C D E F G A B C D E F G
Random Walk A B C D E F G A B C D E F G A B C D E F G
Random Walk A B C D E F G A B C D E F G A B C D E F G
Random Walk A B C D E F G A B C D E F G Node B has the highest value. Node A, C, D, E has the second highest value.
Markov Cluster Algorithm (MCL) Normalization Inflation: the inflation operation is responsible for both strengthening and weakening of current status. square normalization 1/2 1/6 1/3 1/4 1/36 1/9 9/14 1/14 4/14
Random Walk A B C D A B C D 1
MCL A B C D A B C D 1
MCL A B C D E F G A B C D E F G 0, 1, 1, 1, 0, 0, 0 1, 0, 1, 1, 1, 0, 0 1, 1, 0, 1, 0, 0, 0 1, 1, 1, 0, 0, 0, 0 0, 1, 0, 0, 0, 1, 1 0, 0, 0, 0, 1, 0, 1 0, 0, 0, 0, 1, 1, 0 A B E G C D F
MCL A B E G C D F
Graph Partitioning
Graph Partitioning
Local Search
Suffix Tree
Suffix Tree Suffix trees can be used to solve the exact matching problem in linear time. The Exact matching problem: Given a pattern P of length n, and a text T of length m, find all occurrences of P in T in O(n + m) time. {aeef, ad, bbfe, bbfg, c, aeef}
Suffix Tree {aeef, ad, bbfe, bbfg, c, aeef} a e e f
Suffix Tree {aeef, ad, bbfe, bbfg, c, aeef} a e d e f
Suffix Tree {aeef, ad, bbfe, bbfg, c, aeef} a b e b d e f f e
Suffix Tree {aeef, ad, bbfe, bbfg, c, aeef} a b e b d e f f e g
Suffix Tree {aeef, ad, bbfe, bbfg, c, aeef} c a b e b d e f f e g
Suffix Tree {aeef, ad, bbfe, bbfg, c, aeef} c a b e b d e f f e g
Max Flow
Max Flow capacity A B B 4 5 Source Sink A 3 C 2 1 D
Max Flow capacity A B B 4 5 Source Sink A 3 C 2 1 D
Max Flow capacity A B B 4 1 Source 4 Sink A 3 C 2 1 D
Max Flow capacity A B B 4 1 Source 4 Sink A 3 C 2 1 D
Max Flow capacity A B B 4 1 Source 4 Sink A 3 C 2 1 D
Max Flow capacity A B B 4 5 Source Sink A 2 1 C 1 1 1 D
Max Flow capacity A B B 4 5 Source Sink A 2 1 C 1 1 1 D
Max Flow capacity A B B 4 5 Source Sink A 2 1 C 2 1 D
Max Flow-Min Cut
Max Flow / Min Cut Question: Maximum Flow Source Sink B 10 8 A 1 C 6 7 D
Max Flow / Min Cut An (S,T)-cut in a flow network G = (V,E) is a partition of vertices V into two disjoint subsets S and T such that s S, t T The capacity of a cut (S,T) is CAP(S,T) = uS vT c(u,v) B 9/10 8/8 Source Sink A 1/1 C 6/6 7/10 D
Bipartite matching
Bipartite Matching b1 a1 b2 a2 b3 a3 b4 a4 b5
Bipartite Matching b1 a1 b2 a2 s b3 t a3 b4 a4 b5
Bipartite Matching b1 a1 b2 a2 s b3 t a3 b4 a4 b5
Bipartite Matching b1 a1 b2 a2 b3 a3 b4 a4 b5
Shortest path Algorithm
Shortest Path Algorithm Factory: A, E, K Warehouse: H, I 1 2 3 1 A B C D E 1 1 1 1 1 1 1 1 1 F G H I J 3 1 3 1 2 3 1 2 1 K L M N O
Open Pit Mining
Open fit Mining Closed Set Not a closed Set
Open fit Mining Maximize the profit -2 1 2 -2 1
Open fit Mining Maximize the profit -2 1 2 -2 1
Open fit Mining Construct a flow graph where the minimum cut identifies a feasible set that maximizes profit. s 2 2 ∞ ∞ ∞ ∞ -2 1 2 -2 1 2 1 1 t Each edge in E has infinite capacity Add nodes s, t Each node is attached to s and t with finite capacity edges.
Open fit Mining Minimum cut gives optimal solution. s 2 2 ∞ ∞ ∞ ∞ -2 1
Open fit Mining 8 -9 -1 7
Open fit Mining t -9 ∞ 8 -9 ∞ -1 ∞ -1 -8 ∞ ∞ 7 -7 s
Open fit Mining t -9 ∞ 8 -9 ∞ -1 ∞ -1 -8 ∞ ∞ 7 -7 s
NP hard problems
NP-hard problem Class of problems which are at least as hard as the hardest problems in NP. Problems that are NP-hard do not have to be elements of NP; indeed, they may not even be decidable. A decision problem is NP-complete when it is both in NP and NP-hard. A decision problem C is NP-complete if: C is in NP, and Every problem in NP is reducible to C in polynomial time. C can be shown to be in NP by demonstrating that a candidate solution to C can be verified in polynomial time.
NP-complete examples Boolean satisfiability problem (SAT) Subgraph isomorphism problem Independent set problem Knapsack problem Subset sum problem Dominating set problem Hamiltonian path problem Clique problem Graph coloring problem Travelling salesman problem Vertex cover problem
Take Home Message Graph Algorithms