Presentation is loading. Please wait.

Presentation is loading. Please wait.

1/16/20161 Introduction to Graphs 15-111 Advanced Programming Concepts/Data Structures Ananda Gunawardena.

Similar presentations


Presentation on theme: "1/16/20161 Introduction to Graphs 15-111 Advanced Programming Concepts/Data Structures Ananda Gunawardena."— Presentation transcript:

1 1/16/20161 Introduction to Graphs 15-111 Advanced Programming Concepts/Data Structures Ananda Gunawardena

2 1/16/20162 An Airline route Map

3 1/16/20163

4 4 Introduction Many real world problems can be modeled using graphs –Airline Route Map What is the fastest way to get from Pittsburgh to St Louis? What is the cheapest way to get from Pittsburgh to St Louis? –Electric Circuits Circuit elements - transistors, resistors, capacitors is everything connected together? –Depends on interconnections (wires) If this circuit is built will it work? –Depends on wires and objects they connect.

5 1/16/20165 Graphs More applications –Job Scheduling Interconnections indicate which jobs to be performed before others When should each task be performed All these questions can be answered using a mathematical structure named a “graph”. We will answer the questions –what are graphs? –what are their basic properties?

6 1/16/20166 Graph Definitions Graph –A set of vertices(nodes) V = {v 1, v 2, …., v n } –A set of edges(arcs) that connects the vertices E={e 1, e 2, …, e m } –Each edge e i is a pair (v, w) where v, w in V –|V| = number of vertices (cardinality) –|E| = number of edges Graphs can be –directed (order (v,w) matters) –Undirected (order of (v,w) doesn’t matter) Edges can be –weighted (cost associated with the edge) –eg: Neural Network, airline route map(vanguard airlines)

7 1/16/20167 Graph Representation How do we represent a graph internally? Two ways –adjacency matrix –Adjacency list Adjacency Matrix –Use matrix entries to represent edges in the graph Adjacency List –Use an array of lists to represent edges in the graph (we will discuss this later)

8 1/16/20168 Adjacency Matrix –For each edge (v,w) in E, set A[v][w] = edge_cost –Non existent edges with logical infinity Cost of implementation –O(|V| 2 ) time for initialization –O(|V| 2 ) space ok for dense graphs unacceptable for sparse graphs

9 1/16/20169 Adjacency List –Ideal solution for sparse graphs –For each vertex keep a list of all adjacent vertices –Adjacent vertices are the vertices that are connected to the vertex directly by an edge. –Example List 0 List 1 List 2 12 201 1

10 1/16/201610 Adjacency List The number of list nodes equals to number of edges –O(|E|) space Space is also required to store the lists –O(|V|) for |V| lists Note that the number of edges is at least round(|V|/2) –assuming each vertex is in some edge –Therefore disregard any O(|V|) term when O(|E|) is present Adjacency list can be constructed in linear time (wrt to edges)

11 1/16/201611 Breadth First Traversal Algorithm –Start from any node in the graph –Traverse its neighbors (nodes that are directly connected to it) using some heuristic –Next traverse the neighbors of the neighbors etc.. Until some limit is reach or all the nodes in the graph are visited –Use a queue to perform the breadth first traversal

12 1/16/201612 Depth First Traversal Algorithm –Start from any node in the graph –Traverse deeper and deeper until dead end –Back track and traverse other nodes that are not visited –Use a stack to perform the depth first traversal

13 1/16/201613 Web as a Graph URL 1 URL 2 URL 7 URL 5 URL 3 URL 6 URL 4

14 1/16/201614 Web Algorithms

15 1/16/201615 Web Algorithms Search –Google, MSN, Altavista Image search –games Routing Distributed Computing Shortest Path Algorithms –Google Maps, MapQuest Semantic Web –XML metadata Etc.

16 1/16/201616 Web Search Engines A Cool Application of Graphs

17 1/16/201617 Building a Search Engine Crawl the web Build a web index Then when we build/search, we may have to sort the index –Google sorts more than 100 billion index items Novel algorithms, novel data structures, distributed computing

18 1/16/201618 A basic Search Engine Architecture

19 1/16/201619 Google Architecture

20 1/16/201620 Google’s server farm

21 1/16/201621 Web Crawlers  Start with an initial page P 0. Find URLs on P 0 and add them to a queue  When done with P 0, pass it to an indexing program, get a page P 1 from the queue and repeat  Can be specialized (e.g. only look for email addresses)  Issues  Which page to look at next? (Special subjects, recency)  How deep within a site do you go (depth search)?  How frequently to visit pages?

22 1/16/201622 So, why Spider the Web?  Refresh Collection by deleting dead links  OK if index is slightly smaller  Done every 1-2 weeks in best engines  Finding new sites  Respider the entire web  Done every 2-4 weeks in best engines

23 1/16/201623 Cost of Spidering  Spider can (and does) run in parallel on hundreds of severs  Very high network connectivity (e.g. T3 line)  Servers can migrate from spidering to query processing depending on time-of-day load  Running a full web spider takes days even with hundreds of dedicated servers

24 1/16/201624 Indexing  Arrangement of data (data structure) to permit fast searching  Which list is easier to search? sow fox pig eel yak hen ant cat dog hog ant cat dog eel fox hen hog pig sow yak  Sorting helps. Why?  Permits binary search. About log 2 n probes into list  log 2 (1 billion) ~ 30  Permits interpolation search. About log 2 (log 2 n) probes  log 2 log 2 (1 billion) ~ 5

25 1/16/201625 Inverted Files A file is a list of words by position - First entry is the word in position 1 (first word) - Entry 4562 is the word in position 4562 (4562 nd word) - Last entry is the last word An inverted file is a list of positions by word! POS 1 10 20 30 36 FILE a (1, 4, 40) entry (11, 20, 31) file (2, 38) list (5, 41) position (9, 16, 26) positions (44) word (14, 19, 24, 29, 35, 45) words (7) 4562 (21, 27) INVERTED FILE

26 1/16/201626 Inverted Files for Multiple Documents DOCID OCCUR POS 1 POS 2...... “jezebel” occurs 6 times in document 34, 3 times in document 44, 4 times in document 56... LEXICON WORD INDEX

27 1/16/201627 Ranking (Scoring) Hits  Hits must be presented in some order  What order?  Relevance, recency, popularity, reliability, alphabetic?  Some ranking methods  Presence of keywords in title of document  Closeness of keywords to start of document  Frequency of keyword in document  Link popularity (how many pages point to this one)


Download ppt "1/16/20161 Introduction to Graphs 15-111 Advanced Programming Concepts/Data Structures Ananda Gunawardena."

Similar presentations


Ads by Google