UCS 406 – Data Structures & Algorithms Graphs UCS 406 – Data Structures & Algorithms
What is a graph? A data structure that consists of a set of nodes (vertices) and a set of edges that relate the nodes to each other The set of edges describes relationships among the vertices
Formal definition of graphs A graph G is defined as follows: G=(V,E) V(G): a finite, nonempty set of vertices E(G): a set of edges (pairs of vertices)
Directed vs. undirected graphs When the edges in a graph have no direction, the graph is called undirected
Directed vs. undirected graphs (cont.) When the edges in a graph have a direction, the graph is called directed (or digraph) Warning: if the graph is directed, the order of the vertices in each edge is important !! E(Graph2) = {(1,3) (3,1) (5,9) (9,11) (5,7)
Trees vs graphs Trees are special cases of graphs!!
Graph terminology Adjacent nodes: two nodes are adjacent if they are connected by an edge Path: a sequence of vertices that connect two nodes in a graph Complete graph: a graph in which every vertex is directly connected to every other vertex 5 is adjacent to 7 7 is adjacent from 5
Graph terminology The outdegree [outdeg(u)] of a node in G is the number of edges beginning at u. The indegree of u [indeg(u)] is the number of edges ending at u. A node u is called a source if it has a positive outdegree but zero indegree. A node u is called sink if it has a zero outdegree but a positive indegree.
Graph Terminology End vertices of an edge U and V are the endpoints of a Edge incident on a vertex a, d, b are incident on v Adjacent vertices u and v are adjacent Degree of a vertex : x has degree 5 Sum of degree of all vertices is double the no of edges of an undirected graph Parallel edges: h and i are parallel edges Self loop : j is a self loop
Graph terminology (cont.) What is the number of edges in a complete directed graph with N vertices? N * (N-1)
Graph terminology (cont.) What is the number of edges in a complete undirected graph with N vertices? N * (N-1) / 2
Graph terminology (cont.) Weighted graph: a graph in which each edge carries a value
Cycle cycle - circular sequence of alternating vertices and edges Each edge is preceded and followed by its endpoints Simple cycle : such that all its vertices and edges are distinct v, b, x, g, y, f, w, c, u, a, v is a simple cycle u, c, w, e, x, g, y, f, w, d, v, a, u is not a simple cycle
Graph implementation Array-based implementation A 1D array is used to represent the vertices A 2D array (adjacency matrix) is used to represent the edges
Array-based implementation
Graph implementation (cont.) Linked-list implementation A 1D array is used to represent the vertices A list is used for each vertex v which contains the vertices which are adjacent from v (adjacency list)
Linked-list implementation
Adjacency matrix vs. adjacency list representation Good for dense graphs --|E|~O(|V|2) Memory requirements: O(|V| + |E| ) = O(|V|2 ) Connectivity between two vertices can be tested quickly Adjacency list Good for sparse graphs -- |E|~O(|V|) Memory requirements: O(|V| + |E|)=O(|V|) Vertices adjacent to another vertex can be found quickly
Adjacency matrix and Path Matrix Suppose G is a simple directed graph with m nodes and suppose the nodes of G have been ordered and are called v1, v2 …. Vm. Then the adjacency matrix A =(aij) of the graph G is the m x m matrix defined as follows: aij = 1 if there is an edge (vi,vj) = 0 otherwise Such a matrix with entries of only 0 and 1 is called a bit matrix or boolean matrix. Suppose G is a simple directed graph with m nodes v1, v2 …. Vm. The path matrix or reachability matrix of G is the m-square matrix P = (pij) defined as follows: Pij = 1 if there is path from vi to vj = 0 otherwise
Adjacency matrix
Adjacency List
Graph searching Problem: find a path between two nodes of the graph (e.g., Austin and Washington) Methods: Depth-First-Search (DFS) or Breadth-First-Search (BFS)
Depth-First-Search (DFS) What is the idea behind DFS? Travel as far as you can down a path Back up as little as possible when you reach a "dead end" (i.e., next vertex has been "marked" or there is no next vertex) DFS on a graph with n vertices and m edges takes O(n + m ) time. DFS can be implemented efficiently using a stack
DFS Algorithm Initialize all nodes to the ready state (STATUS=1) Push the starting node A onto STACK and change the status to the waiting state (STATUS=2) Repeat steps 4 and 5 until STACK is empty Pop the top node N of STACK. Process N and change its status to the processed state (STATUS=3) Push onto STACK al the neighbors of N that are still in the ready state (STATUS=1) and change their status to the waiting state (STATUS=2) Exit
Breadth-First-Searching (BFS) What is the idea behind BFS? Look at all possible paths at the same depth before you go at a deeper level Back up as far as possible when you reach a "dead end" (i.e., next vertex has been "marked" or there is no next vertex)
BFS algorithm Initialize all nodes to the ready state (STATUS=1) Push the starting node A in QUEUE and change the status to the waiting state (STATUS=2) Repeat steps 4 and 5 until QUEUE is empty Remove the front node N of QUEUE. Process N and change its status to the processed state (STATUS=3) Add to the rear of QUEUE all the neighbors of N that are still in the ready state (STATUS=1) and change their status to the waiting state (STATUS=2) Exit
Operations on Graph Adding a node Deleting a node Adding an edge Deleting an edge
Node Search
Adding Node
Deleting Node with Item
Delete Node
Find Edge
Insert Edge
Delete Edge
Heaps and Heap Sort
Special Types of Trees Full binary tree 2 14 8 1 16 7 4 3 9 10 12 Def: Full binary tree = a binary tree in which each node is either a leaf or has degree exactly 2. Def: Complete binary tree = In a complete binary tree every level, except possibly the last, is completely filled. Complete binary tree 2 1 16 4 3 9 10
Definitions Height of a node = the number of edges on the longest simple path from the node down to a leaf Level of a node = the length of a path from the root to the node Height of tree = height of root node Height of root = 3 4 1 3 Height of (2)= 1 2 16 9 10 Level of (10)= 2 14 8
The Heap Data Structure Def: A heap is a nearly complete binary tree with the following two properties: Structural property: all levels are full, except possibly the last one, which is filled from left to right Order (heap) property: for any node x Parent(x) ≥ x From the heap property, it follows that: “The root is the maximum element of the heap!” 8 7 4 5 2 A heap is a binary tree that is filled in order Heap
Array Representation of Heaps A heap can be stored as an array A. Root of tree is A[1] Left child of A[i] = A[2i] Right child of A[i] = A[2i + 1] Parent of A[i] = A[ i/2 ] Heapsize[A] ≤ length[A] The elements in the subarray A[(n/2+1) .. n] are leaves
Heap Types Max-heaps (largest element at root), have the max-heap property: for all nodes i, excluding the root: A[PARENT(i)] ≥ A[i] Min-heaps (smallest element at root), have the min-heap property: A[PARENT(i)] ≤ A[i]
Adding/Deleting Nodes New nodes are always inserted at the bottom level (left to right) Nodes are removed from the bottom level (right to left)
Operations on Heaps Maintain/Restore the max-heap property MAX-HEAPIFY Create a max-heap from an unordered array BUILD-MAX-HEAP Sort an array in place HEAPSORT Priority queues
Maintaining the Heap Property Suppose a node is smaller than a child Left and Right subtrees of i are max-heaps To eliminate the violation: Exchange with larger child Move down the tree Continue until node is not smaller than children
Example MAX-HEAPIFY(A, 2, 10) A[2] A[4] A[2] violates the heap property A[4] violates the heap property A[2] A[4] Heap property restored A[4] A[9]
Maintaining the Heap Property Alg: MAX-HEAPIFY(A, i, n) l ← LEFT(i) r ← RIGHT(i) if l ≤ n and A[l] > A[i] then largest ←l else largest ←i if r ≤ n and A[r] > A[largest] then largest ←r if largest i then exchange A[i] ↔ A[largest] MAX-HEAPIFY(A, largest, n) Assumptions: Left and Right subtrees of i are max-heaps A[i] may be smaller than its children
Building a Heap Convert an array A[1 … n] into a max-heap (n = length[A]) The elements in the subarray A[(n/2+1) .. n] are leaves Apply MAX-HEAPIFY on elements between 1 and n/2 Alg: BUILD-MAX-HEAP(A) n = length[A] for i ← n/2 downto 1 do MAX-HEAPIFY(A, i, n) 2 14 8 1 16 7 4 3 9 10 5 6 A: 4 1 3 2 16 9 10 14 8 7
Example: A 4 1 3 2 16 9 10 14 8 7 i = 5 i = 4 i = 3 2 14 8 1 16 7 4 3 9 10 5 6 2 14 8 1 16 7 4 3 9 10 5 6 14 2 8 1 16 7 4 3 9 10 5 6 i = 2 i = 1 14 2 8 1 16 7 4 10 9 3 5 6 14 2 8 16 7 1 4 10 9 3 5 6 8 2 4 14 7 1 16 10 9 3 5 6
Heapsort Goal: Idea: Sort an array using heap representations Build a max-heap from the array Swap the root (the maximum element) with the last element in the array “Discard” this last node by decreasing the heap size Call MAX-HEAPIFY on the new root Repeat this process until only one node remains
Example: A=[7, 4, 3, 1, 2] MAX-HEAPIFY(A, 1, 2) MAX-HEAPIFY(A, 1, 3)
Alg: HEAPSORT(A) BUILD-MAX-HEAP(A) for i ← length[A] downto 2 do exchange A[1] ↔ A[i] MAX-HEAPIFY(A, 1, i - 1) Running time: O(nlgn) --- Can be shown to be Θ(nlgn) O(n) O(lgn) n-1 times
Minimum Spanning Tree
Spanning Forest Spanning sub graph is Sub graph of G containing all vertices of G Spanning tree is Spanning sub graph that is itself a tree A Minimum Spanning Tree (MST) is a subgraph of an undirected graph such that the subgraph spans (includes) all nodes, is connected, is acyclic, and has minimum total edge weight Spanning forest is a sub graph that consists of a spanning tree in each connected component of a graph.
Minimum Spanning Tree Given a connected graph G = (V, E) with real-valued edge weights, an MST is a subset of the edges such that T is a spanning tree whose sum of edge weights is minimized.
Prim Algorithm Use a priority queue Maintain set of explored nodes S. For each unexplored node v, maintain the attachment cost a[v] = cost of cheapest edge v to a node in S. O(V2 ) with an array; O(E log V) with a binary heap
Prim Algorithm
Kruskal Algorithm It finds a MST for a connected weighted graph. This means it finds a subset of the edges that forms a tree that includes every vertex, where the total weight of all the edges in the tree is minimized. If the graph is not connected, then it finds a minimum spanning forest (a minimum spanning tree for each connected component). Kruskal's algorithm is an example of a greedy algorithm. Complexity: O(E log V) Prim grows in connected manner while as Kruskal can develop into forests of not connected components
Kruskal Algorithm
Shortest Path Algorithms
Single-source shortest-path problem There are multiple paths from a source vertex to a destination vertex Shortest path: the path whose total weight (i.e., sum of edge weights) is minimum Examples: Austin->Houston->Atlanta->Washington: 1560 miles Austin->Dallas->Denver->Atlanta->Washington: 2980 miles
Single-source shortest-path problem (cont.) Common algorithms: Dijkstra's algorithm, Bellman-Ford algorithm BFS can be used to solve the shortest graph problem when the graph is weightless or all the weights are the same (mark vertices before Enqueue)
Dijkstra’s algorithm Dijkstra's algorithm - is a solution to the single-source shortest path problem in graph theory. Works on both directed and undirected graphs. However, all edges must have nonnegative weights. Input: Weighted graph G={E,V} and source vertex v∈V, such that all edge weights are nonnegative Output: Lengths of shortest paths (or the shortest paths themselves) from a given source vertex v∈V to all other vertices
Approach The algorithm computes for each vertex u the distance to u from the start vertex v, that is, the weight of a shortest path between v and u. the algorithm keeps track of the set of vertices for which the distance has been computed, called the cloud C Every vertex has a label D associated with it. For any vertex u, D[u] stores an approximation of the distance between v and u. The algorithm will update a D[u] value when it finds a shorter path from v to u. When a vertex u is added to the cloud, its label D[u] is equal to the actual (final) distance between the starting vertex v and vertex u.
Dijkstra Algorithm
Example
Example
Example
Example
Example
Example
Example
Example
Example
Time Complexity Shortest path tree is union of all shortest path. Complexity = |v| extract-min + |E| decrease key For Array O(V2) For Binary heap O(V+E) lgV For Fibonacci Heap O(E+V lg V) Dijkstra is equivalent to a BFS if weights of all the edges are taken as one.
Thank You !