Graphs CSE 2320 – Algorithms and Data Structures Alexandra Stefan

Graphs CSE 2320 – Algorithms and Data Structures Alexandra Stefan
and Vassilis Athitsos University of Texas at Arlington

References Recommended: CLRS Graph definition and representations
CLRS (3rd edition) - Chapter 22.1 (pg 589) Sedgewick - Ch 3.7 (pg 120) : 2D arrays and lists Graph traversal CLRS: BFS , DFS-22.3 Sedgewick, Ch 5.8 The code used in slides is from Sedgewick. See other links on the Lectures page.

Graphs A graph is a fundamental data type.
Graphs are at the core of many algorithms. Examples: road networks computer networks social networks game-playing algorithms (e.g., for chess). problem-solving algorithms (e.g., for automated proofs). 1 7 2 5 3 4 6

Graphs A graph is formally defined as G = (V,E) where: Path Cycle 6
V : set of vertices. E : set of edges. Each edge is a pair of two vertices in V: e = (v1,v2). Path Cycle Connected components Degree of a vertex Directed / undirected Sparse /dense Representation: adjacency matrix, adjacency list Undirected graph 1 7 2 5 3 4 6

Graphs How many graphs do we have here? Vertices? V = Edges? 6 E = 2 1
Undirected graph 1 7 2 5 3 4 6

Graphs How many graphs do we have here? One. Vertices?
V = {0, 1, 2, 3, 4, 5, 6, 7}, |V| = 8 Edges? E = { (0,2), (0,6), (0, 7), (3, 4), (3, 5), (4, 5), (6,7)}. Undirected graph 1 7 2 5 3 4 6

Paths, Connected Components
Are 2 and 7 connected? Yes: via the paths or Are 1 and 3 connected? No. Connected components? 3: {0,2,6,7}, {1}, {3,4,5}. Degree of a vertex: number of edges incident to the vertex: Examples: degree(0) = 3, degree(1) = 0, degree(5) = 2. Undirected graph 1 7 2 5 3 4 6

Directed vs Undirected
Graphs can be directed or undirected. In a directed graph edge (A, B) means that we can go (using that edge) from A to B, but not from B to A. We will have both edge (A, B) and edge (B, A) if we want to show that A and B are linked in both directions. In an undirected graph, edge (A, B) means that we can go (using that edge) from both A to B and B to A. We have already seen an example. directed graph 1 7 2 5 3 4 6

Graph Representations
G = (V,E). Let |V| = N and |E| = M. where |V| is the size of set V, i.e. number of vertices in the graph. Similar for |E|. Vertices are integers from 0 to N-1 The graph object just needs to know how many vertices it contains, that is the number N. Example: If graph G has 10 vertices, we know that those vertices are 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. 2 representations for edges: Adjacency matrix: 2D matrix of size NxN Adjacency lists: 1D array of lists

Representing Edges 2 representations for edges Sparse graph
Adjacency matrix: 2D matrix of size NxN Adjacency lists: 1D array of lists Sparse graph A graph with |E| << |V|2. Max possible edges in an undirected graph with 106 nodes? *106/2 Actual number of edges if 10 edges per node: *106/2 Use adjacency lists

Graph representation: Adjacency Matrix

Adjacency Matrix 6 2 1 7 3 5 4 - Space complexity: O(N*N)
- Time complexity for add/remove/check edge: O(1) Assume N vertices: 0,1, , N-1. Represent edges using a 2D binary matrix M, of size N*N. M[v1][v2] = 1 if and only if there is an edge from v1 to v2. M[v1][v2] = 0 otherwise (there is no edge from v1 to v2). 1 7 2 5 3 4 6 1 2 3 4 5 6 7 Note: the adjacency matrix of non-directed graphs is symmetric.

Defining a Graph How do we define in C a data type for a graph, using the adjacency matrix representation? // structure typedef struct struct_graph * graph; struct struct_graph { ... }; // operations int edgeExists(graph g, int v1, int v2) ... void addEdge(graph g, int v1, int v2) ... void removeEdge(graph g, int v1, int v2) ...

Defining a Graph typedef struct struct_graph * graph; struct struct_graph { int number_of_vertices; int ** adjacencies; }; int edgeExists(graph g, int v1, int v2) return g->adjacencies[v1][v2]; }

Defining a Graph void addEdge(graph g, int v1, int v2) { g->adjacencies[v1][v2] = 1; g->adjacencies[v2][v1] = 1; } void removeEdge(graph g, int v1, int v2) g->adjacencies[v1][v2] = 0; g->adjacencies[v2][v1] = 0;

New graph representation: Adjacency Lists

Adjacency Lists Represent the edges of the graph by an array of lists.
Let’s name that array A A[v1] is a list containing the neighbors of vertex v1. A 5 1 7 2 6 1 7 2 5 3 4 6 1 2 5 4 3 5 3 7 6 4 5 3 4 6 4 7 4

Adjacency Lists Space Time to check if an edge exists or not
5 3 7 6 Space for A For nodes: Time to check if an edge exists or not Worst case: 3 4 4 4 Time to remove an edge? Time to add an edge?

Adjacency Lists Space Time to check if an edge exists or not
5 3 7 6 Space O(E + V) If the graph is relatively sparse, |E| << |V|2, this can be a significant advantage. Time to check if an edge exists or not Worst case: O(V). Each vertex can have up to V-1 neighbors, and we may need to go through all of them to see if an edge exists. For sparse graphs, the behavior can be much better. If let’s say each vertex has at most 10 neighbors, then we can check if an edge exists much faster. Either way, slower than using adjacency matrices. 3 4 4 4 Time to remove an edge? Same as for checking if an edge exists. Time to add an edge? Why? Because if the edge already exists, we should not duplicate it.

Defining a Graph How do we define in C a data type for a graph, using the adjacency list representation? Note: the same interface as for adjacency matrix. typedef struct struct_graph * graph; struct struct_graph { ... }; int edgeExists(graph g, int v1, int v2) ... void addEdge(graph g, int v1, int v2) ... void removeEdge(graph g, int v1, int v2) ...

Defining a Graph Defining the object type itself:
typedef struct struct_graph * graph; struct struct_graph { int number_of_vertices; list * adjacencies; };

Defining a Graph Check if an edge exists:
int edgeExists(graph g, int v1, int v2) { link n; for (n = getFirst( g->adjacencies[v1] ); n != NULL; n = getLinkNext(n)) if (getLinkItem(n) == v2) return 1; } return 0;

Defining a Graph: add an edge
Add a new edge: void addEdge(graph g, int v1, int v2) { if !(edgeExists(g, v1, v2)) insertAtBeginning(g->adjacencies[v1], newLink(v2)); insertAtBeginning(g->adjacencies[v2], newLink(v1)); }

Defining a Graph: remove edge
Removing an edge: see posted file graph_list.c Pseudocode: removeEdge(V1, V2) Go through adjacency list of V1, remove link corresponding to V2 Go through adjacency list of V2, remove link corresponding to V1.

Adjacency Matrices vs. Adjacency Lists
Suppose we have an undirected graph with: 10 million vertices. Each vertex has at most 20 neighbors. Which of the two graph representations would you choose?

Space Analysis: Adjacency Matrices vs. Adjacency Lists
Suppose we have an undirected graph with: 10 million vertices. Each vertex has at most 20 neighbors. Adjacency matrices: we need at least 100 trillion bits of memory, so at least 12.5TB of memory. Adjacency lists: in total, they would store at most 100 million nodes. With 16 bytes per node (as an example), this takes at most 3.28 Gigabytes. We’ll see next how to compute/verify such answers.

Steps for Solving This Problem
Suppose we have an undirected graph with: 10 million vertices. Each vertex has at most 20 neighbors. Adjacency matrices: we need at least 100 trillion bits of memory, so at least 12.5TB of memory. Adjacency lists: in total, they would store at most 100 million nodes. With 16 bytes per node (as an example), this takes 1.68 Gigabytes. Find ‘keywords’, understand numbers: 10 million vertices => 10 * 106 Trillion = 1012 1 TB (terra bytes) = 1012 bytes 1GB = 109 100 Trillion bits vs 12.5 TB

Solving: Adjacency Matrix
Suppose we have a graph with: 10 million vertices. => |V| = 10*106 =107 Each vertex has at most 20 neighbors. Adjacency matrix representation for the graph: The smallest possible matrix: a 2D array of bits => The matrix size will be: |V| x |V| x 1bit => (10*106) * (10*106) * 1bit = 100*1012 bits = 100 trillion bits Bits => bytes: 1byte = 8bits => 100*1012bits = 100/8*1012bytes = 12.5*1012bytes 12.5*1012bytes = 12.5 TB (final result) 1012bytes = 1TB

Solving: Adjacency List
Suppose we have an undirected graph with: 10 million vertices. => V = 10*106 Each vertex has at most 20 neighbors. Adjacency lists representation of graphs: For each vertex, keep a list of edges (a list of neighboring vertices) Space for the adjacency list array: ≤ 10 million vertices*8 bytes (memory address) = 80*106 bytes = 0.08*109 GB Space for all the lists for all the vertices: ≤ (10*106vertices * 20 neighbors/vertex) = 200*106 nodes = 200*106 nodes Assume 16 bytes per node: 8 bytes for the next pointer, and 8 bytes for the data (vertex): 200*106 nodes * 16byte/node = 3200 * 106 bytes = 3.2 * 109 bytes = 3.2GB Total: 3.2GB GB = 3.28GB ( 109 bytes = 1GB (GigaByte) )

Check Out Posted Code graph.h: defines an abstract interface for basic graph functions. graph_matrix.c: implements the abstract interface of graph.h, using an adjacency matrix. graph_list.c: also implements the abstract interface of graph.h, using adjacency lists. graph_main: a test program, that can be compiled with either graph_matrix.c or graphs_list.c.

Trees vs Graphs A tree is a special case of graphs.
A tree is a connected graph with no cycles.

Strongly Connected Components (Directed Graphs)
Let G = (V,E) be a directed graph How many “connected components” does this graph have? Can you get from 0 to every other vertex? Can you get from 3 to 6? Connected components do not apply to directed graphs. For directed graphs we define strongly connected components: a subset of vertices such that for any two vertices u,v in the subset we can get from u to v and from v to u. There is only one strongly connected component in this graph: {0,1,4,5} (Note that {6,2,3} is not strongly connected.) 1 2 3 4 5 6

Graph Traversal

Graph Traversal - Graph Search
Overall, we will use the terms "graph traversal" and "graph search" almost interchangeably. However, there is a small difference: "Traversal": visit every node in the graph. "Search": visit nodes until find what we are looking for. E.g.: A node labeled "New York". A node containing integer 2014. Graph traversal: Input: start/source vertex. Output: a sequence of nodes resulting from graph traversal. 1 7 2 5 3 4 6

Graph traversals Breadth-First Search (BFS)- O(V+E) (for Adj list repr) Explores vertices in the order: root, nodes 1 edge away from the root, nodes 2 edges away from the root, and so on until all nodes were visited If the graph is a tree, BFS corresponds to the level order traversal. Useful for finding shortest paths from a source vertex. Here the length of the path is the number of edges on it. Depth-First Search (DFS)- Θ(V+E) (for Adj list repr) Explores the vertices by following down a path as much as possible, backtracking by one vertex, continuing to follow paths from there and so on. Useful for Finding and labelling connected components (easy to implement) Finding cycles Topological sorting of DAGs (Directed Acyclic Graphs).

Graph Traversal BFS : DFS : 1 2 3 1 2 3 4 5 6 7 4 5 6 1 2 3 1 2 3 4 5
1 2 3 1 2 3 4 5 6 7 4 5 6 DFS : 1 2 3 1 2 3 4 5 6 4 5 6

Graph Traversal BFS traversal examples: DFS traversal examples : 1 2 3
1 2 3 1 2 3 4 5 6 7 4 5 6 DFS traversal examples : 1 2 3 1 2 3 4 5 6 4 5 6

BFS (Breadth-First Search)
When BFS discovers a new node, it gives one of the shortest paths to that node. Example: starting at 0: Nodes traversed in order: 0, 1,4,5, 6,7, 2,3 Yellow nodes are one edge away from 0, Orange nodes are 2 edges away from 0, Purple nodes are 3 edges away from 0 The shortest path from 0 to 3 is length 3. (There may be longer paths from 0 to 3 in the graph. See: 0,1,4,5,6,3) 1 2 3 7 4 5 6

Vertex coloring while searching
Vertices will be in one of 3 states while searching and we will assign a color for each state: White – undiscovered Gray – discovered, but the algorithm is not done processing it Black – discovered and the algorithm finished processing it.

Breadth-First Search (BFS) CLRS 22.2
BFS(G,s) // search graph G starting from vertex s. For each vertex u of G color[u] = WHITE dist[u] = inf // distance from s to u pred[u] = NIL // predecessor of u on the path from s to u color[s] = GRAY dist[s] = 0 pred[s] = NIL Initialize empty queue Q Enqueue(Q,s) // s goes to the end of Q While Q is not empty u = Dequeue(Q) // removes u from the front of Q For each v adjacent to u //explore edge (u,v) If color[v] == WHITE color[v] = GRAY dist[v] = dist[u]+1 pred[v] = u Enqueue(Q,v) color[u] = BLACK Runtime: O(V+E) * Depends on graph implementation 1 2 3 4 5 6

Breadth-First Search (BFS) CLRS 22.2
Note that the code above assumes that you will only call BFS(G,s) once for s, and not attempt to find other connected components by calling it again for unvisited nodes.

Depth-First Search (DFS) – simpler version
DFS(G) For each vertex u of G color[u] = WHITE pred[u] = NIL If color[u] == WHITE DFS_visit(G,u) color[u] = GRAY For each v adjacent to u // explore edge (u,v) If color[v]==WHITE pred[v] = u DFS_visit(G,v) color[u] = BLACK 1 2 3 4 5 6

DFS with time stamps Next, DFS with ‘time stamps’ of when a node u was first discovered (d[u]) and the time when the algorithm finished processing that node (finish[u]). The time is needed for edge labeling, (distinguish between forward and cross edges) tree edge: (u,v) if v is first discovered by this edge (v was white) backward edge: (u,v) if v is an ancestor of u (v is gray) forward edge: (u,v) if v started and finished while u was still gray (still being processed) (v is black) cross edge: (u,v) if v finished before u started (processed in disjoint time intervals). They connect 2 trees or 2 subtrees, but are not one the ancestor of the other. (v is black) Note that this pseudo-code does not specify the details of the implementation(e.g. if time is a global variable or if it will be passed as an argument). Note that the time should never decrement (in C it should be passed as a pointer).

Depth-First Search (DFS) (with time stamps) - CLRS
DFS(G) For each vertex u of G color[u] = WHITE pred[u] = NIL d[u] = -1 time = 0 If color[u] == WHITE // u is in an undiscovered connected component DFS_visit(G,u) DFS_visit(G,u) // search graph G starting from vertex u. time = time + 1 d[u] = time // time when u was discovered color[u] = GRAY For each v adjacent to u // explore edge (u,v) If color[v]==WHITE pred[v] = u DFS-_visit(G,v) color[u] = BLACK finish[u] = time Node u will be: - WHITE before time d[u], - GRAY between d[u] and finish[u], - BLACK after finish[u] WHITE GRAY BLACK d[u] finish[u] Runtime (based on representation): 1 2 3 4 5 6 pred d finish 1 2 3 4 5 6 (See CLRS page 605 for step-by-step example.)

Practice DFS(G): DFS(G) (note that 0 moved) Convention: start /end 1 2
___ /___ ___ /___ ___ /___ ___ /___ ___ /___ ___ /___ ___ /___ ___ /___ 1 2 3 6 1 2 3 4 5 6 4 5 ___ /___ ___ /___ ___ /___ ___ /___ ___ /___ ___ /___ DFS(G): DFS(G) (note that 0 moved)

Practice Try various directions for arrows. 1 2 3 4 5 6

Edge Classification In a DFS traversal, we classify edge (u,v) as follows: Tree edge v is a descendant of u and v is white (just discovered) Backward edge v is an ascendant of u and v is gray (still being processed) Forward edge v is a descendant of u and v is black (finished processing) Cross edge v is unrelated to u and v is black (v finished processing before u started being processed) 1 2 3 Note that these classifications depend on the order in which vertices are discovered. forward (if 2 before 3) 4 5 6 cross (If 6 before 5) (start at vertex 0)

Edge Classification The edge classification depends on the order in which vertices are discovered. tree (2 before 3) cross (3 before 2) 1 2 3 1 2 3 forward (if 2 before 3) 4 5 6 4 5 6 cross (If 6 before 5 DFS(G,0): 0, 1, 5, 4, 6, 3, 2 Visit neighbors in increasing order, but at 6 visit 3 before 2. DFS(G,0): 0, 1, 6, 2, 3,5, 4 Visit neighbors clockwise, starting at 3 o’clock.

Edge Classification for Undirected Graphs
An undirected graph will only have: Tree edges Back edges As there is no direction on the edges, the forward and cross edges were explored as backward and tree edges. Forward => backward Cross => tree

Detecting Cycles in a Graph
A graph has a cycle if a DFS traversal finds a backward edge. Applies to both directed and undirected graphs. A Directed Acyclic Graph (DAG) is a directed graph that has no cycles. 1 2 3 4 5 6

Topological Sorting Topological sort of a directed acyclic graph (DAG), G, is a linear ordering of its vertices s.t. if (u,v) is an edge in G, then u will be listed before v (all edges point from left to right). Application: Task ordering Vertices represent tasks Edge (u,v) indicates that task u must finish before task v starts. Topological sorting gives a feasible order for completing the tasks. Identify strongly connected components . Algorithm: Initialize an empty list L. Call DFS(G) When a vertex finishes (black) add it at the beginning of list L. Return L

Topological Sorting - Example
Topological sort of a directed acyclic graph (DAG), G, is a linear ordering of its vertices s.t. if (u,v) is an edge in G, then u will be listed before v (all edges point from left to right). Example 1 Example 2 1 2 3 1 2 3 4 5 6 4 5 6 Topological order: Topological order:

Topological Sorting - Answer
Topological sort of a directed acyclic graph (DAG), G, is a linear ordering of its vertices s.t. if (u,v) is an edge in G, then u will be listed before v (all edges point from left to right). Print vertices in reversed order of finish time with DFS. Example 1 Example 2 1 2 3 1 2 3 4 5 6 4 5 6 Topological order: 4, 0, 1, 5, 6, 2, 3 ( 0, 4, 1, 5, 6, 2, 3 ) Topological order: 6, 4, 2, 3, 0, 1, 5 ( 0, 4, 1, 6, 2, 3, 5 ) ( 0, 4, 1, 6, 2, 5, 3 )

DFS – Non-Recursive Extra, not required Fall 2016
Sedgewick , Figure 5.34, page 244 Use a stack instead of recursion Visit the nodes in the same order as the recursive version. Put nodes on the stack in reverse order Mark node as visited when you start processing them from the stack, not when you put them on the stack 2 versions based on what they put on the stack: only the vertex the vertex and a node reference in the adjacency list for that vertex Visit the nodes in different order, but still depth-first. 1 2 3 4 5 6

Graphs CSE 2320 – Algorithms and Data Structures Alexandra Stefan

Similar presentations

Presentation on theme: "Graphs CSE 2320 – Algorithms and Data Structures Alexandra Stefan"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Graphs CSE 2320 – Algorithms and Data Structures Alexandra Stefan

Similar presentations

Presentation on theme: "Graphs CSE 2320 – Algorithms and Data Structures Alexandra Stefan"— Presentation transcript:

Similar presentations

About project

Feedback