Graphs and Trees Graph theory Purpose: Overall goals

Graphs and Trees Graph theory Purpose: Overall goals
In CS, data can either be linear or nonlinear Nonlinear data is used to depict relationships or a hierarchy A graph is a general way to model nonlinear data Overall goals Definition, examples, properties Representation (will look familiar to you!)  Special types of graphs Ways to manipulate and analyze graphs

Graph Not like the graphs you see in calculus.
These graphs are more like “connect the dots.” Many applications in CS: Finite automata and other computer models Networking Compiler design: finding loops, syntax, track variable use Data structures and algorithms: sometimes the things we want to manipulate are relationships among data, instead of numbers You have seen graphs when evaluating Fibonacci numbers, which creates a “tree” of recursive calls.

Graph A nonlinear data structure
Useful to model any kind of network, or set of relationships Questions we may want to ask: How many vertices / edges are there? Does an edge exist from x to y? How far apart are x and y? How many edges incident on x? (i.e. find the degree) How many nodes are within some distance from x? Is y reachable from x? Is there a systematic way to visit every node and return back to the beginning?

Definition & examples A graph has 2 sets: vertices and edges. The purpose of an edge is to “connect” two vertices to make them adjacent to each other. A A B A B C F C D B C E D Vertices: A, B, C A, B, C, D A, B, C, D, E, F Edges: AB, BC, AC AB, AD AB, BC, CD, DE, EF, FA, AD, BE, CF

Representations Adjacency list Adjacency matrix 
For each vertex in the graph, we maintain a list (e.g. linked list or array list) of other vertices that are directly connected to this one Adjacency matrix  A 2-d array The vertices are in some order, such as alphabetical order (A, B, C, …) The entry in row / column indicates whether there is an edge or not. (1 or 0) Elegantly handles weighted and directed graphs.

Internal rep’n Inside the computer, a graph is usually represented as an adjacency matrix. To check, should be symmetric. A B C D E F G H 1 Blank entries are really 0.

Degree sequence Degree of vertex = # of edges incident to it
Degree sequence: list of all degrees For example, 2nd graph on previous slide: 0, 1, 1, 2 Tells a lot about a graph (though not everything) # edges in graph = ½ (sum of degrees) Are these possible degree sequences? 2, 3, 3, 4, 4, 5 2, 3, 4, 4, 5 5, 4, 3, 3, 3, 2 4, 3, 3, 2, 2 1, 1, 3, 3, 5, 5 Try this game: A gives degree sequence; B tries to produce such a graph.

Properties of graphs Graph properties Path Cycle Bipartiteness
Isomorphic to another graph Pseudograph, multigraph, subgraph Path Cycle Hamiltonian Euler

Bipartite Property that a graph may have
Useful to find a variable’s usage in a program Bipartite = it is possible to partition the set of vertices into 2 subsets, such that within a subset no two vertices are adjacent As a consequence, you won’t see triangles anywhere. To determine: try to partition vertices. Adjacent vertices must go to different camps. Are these bipartite? Also: tripartite, n-partite. Can you draw a graph that is NOT tripartite?

Isomorphism Meaning: same shape
How can we tell if 2 graphs are essentially the same? Definition: the vertices can be put into a 1-1 correspondence. ( same adj matrix) Easier to disprove when not isomorphic. Checklist: Same # vertices and same # edges Same degree sequence Connectedness (both are, or both are not) Existence of cycles Adjacency of conspicuous vertices Consider the complement if e > n(n – 1) / 4

Examples Isomorphic since A, B, C, D correspond to W, Z, Y, X
Check out adjacency of degree-2 vertices! Question: can a graph be isomorphic to its complement? Y Z C D

How many graphs… How many nonisomorphic graphs have 4 vertices and 3 edges? For this question, let’s expand the definition of “graph” to include Loop(s) on a single vertex: pseudograph Multiple edges between same pair of vertices: multigraph I think the answer is 20. Can you draw all of them? 3 loops (3); 2 loops (5); 1 loop (4) Parallel (3); parallel and loop (2) Simple (3)

Subgraph Analogous to a subset
To obtain a subgraph from an existing graph, feel free to remove edges/vertices. But you can’t remove a vertex if some edge needs it! What are the subgraphs of a triangle? It’s convenient to classify by the number of edges. e = 0: You only have vertices. 8 subsets of 3 elements. e = 1: Choose which vertex is not connected. For each case, you can even remove that vertex. e = 2: Just choose which edge not to draw. e = 3: The entire graph itself.

Paths and Cycles A path is simply a sequence of edges that allow us to “travel” from one vertex to another. Formally, each edge’s first vertex must match the second vertex of the previous edge. And analogously, each edge’s second vertex must match the next edge’s first vertex. In other words, you can’t just list the edges in random order. (Should not repeat vertices or edges along the way! Cycle = a path where first vertex = last vertex. There are 2 interesting types of cycles Hamiltonian: passes thru every vertex once Euler: includes every edge once

Hamiltonian Can refer to a path or a cycle that…
goes thru every vertex exactly once And, in the case of a cycle, returns home. When does a graph contain one? There needs to be a subgraph where all vertices have degree 2, such as a polygon. The trick is to remove edges we don’t really need, and recognize edges that are essential. Essential edges  we may be forced to visit a vertex twice. Removing unnecessary edges  graph may become disconnected. *** Examples in book

Euler Can refer to a path or a cycle that…
Goes thru every edge exactly once Euler said it was impossible to cross all 7 bridges of Königsberg and return to the same point. Key: every vertex must have even degree

Tree A Tree is a special kind of graph Definition Why they are used
Huffman code Binary search tree Creating a tree using BFS or DFS Mathematical expressions

Tree Connected acyclic graph If n vertices, then n – 1 edges
Vertices partitioned into 2 types Internal External (leaf) Rooted tree: one specific vertex identified as special Otherwise it’s called a “free” tree Important terminology for rooted trees: Parent, child, sibling, ancestor, descendant, (uncle, niece)

Some tree applications
Any hierarchical classification system Structure of a document File system A method for compressing data: Huffman code Efficient data structure: Binary search tree Visiting vertices of a graph systematically Mathematical expression Computer program / Call graph / Find loops Depicting relationships among data

Huffman code example Suppose you want to send a message, and you know the only letters you need are A, D, E, L, N, P, S. A Huffman code might look like this table: How would you decode this message? A D E L N P S 001 100 01 101 0001 0000 11

How to create code  We’re given the set of letters used for the message, and their frequencies. Ex. A = 5, B = 10, C = 20, D = 25, E = 30 Ex. P = 5, N = 10, D = 10, L = 15, A = 20, S = 20, E = 30 It’s convenient to arrange the frequencies in order. Group the letters in pairs, always looking for the smallest sum of frequencies. The resulting structure is a “tree”. Each left arm = “0” in the code; each right arm is a “1”. When done, let’s compute average # bits per symbol.

Fun with trees Binary search tree Tree traversals
Breadth first search Depth first search Trees to model expressions Traversals on binary trees Inorder Preorder Postorder

Binary search tree Each vertex in tree has a key value. Can be number or text (ASCII code). For each vertex: left child  you  right child Or better yet: all in L subtree  you  all elements in R subtree How do we …? Find a value that may be in the tree Insert a value Find the highest/lowest key values Find the range of values that can go in a vacant child location

Traversing a tree 2 basic strategies Breadth-first search (BFS)
Start at the root (top) of tree Fan out in all directions simultaneously Good if you think what you’re looking for is near the top. Depth-first search (DFS) Start at the root, as usual Go as far as you can down one path of the tree. If not found, back up and try another path. Good if you have an idea what area to search first.

Example Suppose we’re looking for file at node 9.
We visit the nodes of the tree in the following order. BFS: 1, 2, 3, 4, 5, 6, 7, 8, 9 DFS: 1, 2, 4 – back up 5 – back up 3, 6, 7, 9 1 2 3 6 4 5 7 8 9 10 11

Expression as tree Arithmetic expression is inherently hierarchical
We also have linear/text representations. Infix, prefix, postfix Note: prefix and postfix do not need grouping symbols Example: (25 – 5) * (6 + 7) into a tree Which is the last operator performed?  This is the root. And we can deduce where left and right subtrees are. Next, for the subtree: (25 – 5) * (6 + 7), last op is the *, so this is the “root” of this subtree. Note: Numbers are leaves; operators are internal. This is why the tree drawing is straightforward.

Tree & traversal Given a (binary) tree, we can find its traversals. √
How about the other way? Mathematical expression had enough context information that 1 traversal would be enough. But in general, we need 2 traversals, one of them being inorder. Example: Draw the binary tree having these traversals. Postorder: S C X H R J Q T Inorder: S R C H X T J Q Hint: End of the postorder is the root of the tree. Find where the root lies in the inorder. This will show you the 2 subtrees. Continue with each subtree, finding its root and subtrees, etc. Exercise: Find 2 distinct binary trees t1 and t2 where preorder(t1) = preorder(t2) and postorder(t1) = postorder(t2).

Other graph types Directed graphs Weighted graph
Application: finding a loop in code Weighted graph Finding the shortest path Finding the cheapest network Practice: expr trees->infix,prefix,postfix

Directed graph Also called digraphs Each edge has a direction
Adjacency matrix rep’n: not necessarily symmetric Used by compiler to represent control flow; or for analyzing relations How do you find a loop?

Where are the loops? Not hard for us
1 Not hard for us But control-flow data just has sequence of blocks Need to gather info about transitions between blocks. 2 3 4 5 6 7

Finding loops For each block, determine Successors
Where can I go immediately after this block? Predecessors Where could I have just come from? Dominators Where must I have been, to reach here?

Example block pred succ dom 1 - 2 1,6 3 4,6 4 3,5 5 6 2,7 7 1 2 3 4 5

Example block pred succ dom 1 - 2 1,6 3 4,6 4 3,5 5 6 2,7 7 1,2 1,2,3
1,2,3,4 1,2,3,4,5 6 2,7 1,2,3,6 7 1,2,3,6,7 2 3 4 5 6 7

Aha! A loop We have a loop whenever a block can say: “One of my successors is also one of my dominators.” In other words, I’m going to a place I’ve already been. Hence a back edge, and a loop.

Weighted graph Each edge is labelled with a number, implying some distance or cost. Adjacency matrix stores these values. How do we represent that 2 vertices are not adjacent? Two big questions for weighted graph Cheapest path to go from one vertex to another: This is called Dijkstra’s algorithm. Cheapest “network”, i.e. spanning tree BFS & DFS don’t care about the weight of edges, so we need a different approach.

Adjacency matrix The values inside the adjacency matrix have a different meaning if the graph is weighted vs. unweighted. Here is what the numbers mean: Situation Unweighted Weighted Vertices adjacent 1 Non-zero number Vertices not adjacent Infinity Vertex itself Not a special case

Dijkstra’s algorithm How do you find the shortest path in a network?
General case solved by Edsger Dijkstra, 1959 4 7 9 6 8 7 4 3 2 3 1 6

Let’s say we want to go from “A” to “Z”.
The idea is to label each vertex with a number – its best known distance from A. As we work, we may find a cheaper distance, until we “mark” or finalize the vertex. Label A with 0, and mark A. Label A’s neighbors with their distances from A. Find the lowest unmarked vertex and mark it. Let’s call this vertex “B”. Recalculate distances for B’s neighbors via B. Some of these neighbors may now have a shorter known distance. Repeat steps 3 and 4 until you mark Z. 4 7 2 B C 3 4 Z

First, we label A with 0. Mark A as final.
The neighbors of A are B and C. Label B = 4 and C = 7. Now, the unmarked vertices are B=4 and C=7. The lowest of these is B. Mark B, and recalculate B’s neighbors via B. The neighbors of B are C and Z. If we go to C via B, the total distance is 4+2 = 6. This is better than the old distance of 7. So re-label C = 6. If we go to Z via B, the total distance is = 7. 4 7 2 B C 3 4 Z

Now, the unmarked vertices are C=6 and Z=7. The lowest of these is C.
Mark C, and recalculate C’s neighbors via B. The only unmarked neighbor of C is Z. If we go to Z via C, the total distance is 6+4 = 10. This is worse than the current distance to Z, so Z’s label is unchanged. The only unmarked vertex now is Z, so we mark it and we are done. Its label is the shortest distance from A. 4 7 2 B C 3 4 Z

Postscript. I want to clarify something…
The idea is to label each vertex with a number – its best known distance from A. As we work, we may find a cheaper distance, until we “mark” or finalize the vertex. When you mark a vertex and look to recalculate distances to its neighbors: We don’t need to recalculate distance for a vertex if marked. So, only consider unmarked neighbors. We only update a vertex’s distance if it is an improvement: if it’s shorter than what we previously had. 4 7 2 B C 3 4 Z

Graph applications Shortest paths: Cheapest network
Practice Dijkstra’s algorithm Traveling salesman problem Cheapest network a.k.a. “minimum spanning tree” Kruskal’s algorithm Prim’s algorithm

Shortest Paths Dijkstra’s algorithm:
What is the shortest distance between 2 points in a network/graph ? A related problem: What is the shortest distance for me to visit all the points in the graph and return home? This is called the traveling salesman problem. Nobody knows how to solve this problem without doing an exhaustive search! Open question in CS: why is this problem so hard?

B 8 6 9 12 A 2 C 5 4 6 3 For traveling salesman problem, consider this: does it matter where you start? E D 4

Min spanning tree MST also known as the shortest network problem
Want to connect all vertices with minimum total length of edges. Applications Sources of oil need to be connected to pipelines. Want to minimize total mileage. Private telecom networks are billed according to total mileage of the network. Client should not have to pay for phone company’s inefficiency. Some Algorithms O. Boruvka (1926) – published Slovak paper in obscure journal (first known solution) J. B. Kruskal (1956) – AT&T Bell Labs R. C. Prim (1957) – also from Bell Labs

How to make one Kruskal’s algorithm Prim’s algorithm
For n vertices, we want n – 1 cheapest edges Repeatedly add edges from low to high, until you have added n – 1 edges. Prim’s algorithm Start with any vertex Tree grows during algorithm… Add cheapest edge that brings new vertex into tree In both cases, make sure you never create a cycle!

Graphs and Trees Graph theory Purpose: Overall goals

Similar presentations

Presentation on theme: "Graphs and Trees Graph theory Purpose: Overall goals"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Graphs and Trees Graph theory Purpose: Overall goals

Similar presentations

Presentation on theme: "Graphs and Trees Graph theory Purpose: Overall goals"— Presentation transcript:

Similar presentations

About project

Feedback