Searching in Graphs.

Slides:



Advertisements
Similar presentations
Chapter 10: Trees. Definition A tree is a connected undirected acyclic (with no cycle) simple graph A collection of trees is called forest.
Advertisements

Graph A graph, G = (V, E), is a data structure where: V is a set of vertices (aka nodes) E is a set of edges We use graphs to represent relationships among.
Prof. Swarat Chaudhuri COMP 482: Design and Analysis of Algorithms Spring 2013 Lecture 4.
Graphs. Data structures that connect a set of objects to form a kind of a network Objects are called “Nodes” or “Vertices” Connections are called “Edges”
C++ Programming: Program Design Including Data Structures, Third Edition Chapter 21: Graphs.
CS 206 Introduction to Computer Science II 10 / 31 / 2008 Happy Halloween!!! Instructor: Michael Eckmann.
Lists A list is a finite, ordered sequence of data items. Two Implementations –Arrays –Linked Lists.
Introduction to Graphs
Graphs Chapter 28 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
Balanced Binary Search Trees height is O(log n), where n is the number of elements in the tree AVL (Adelson-Velsky and Landis) trees red-black trees get,
Lecture 13 Graphs. Introduction to Graphs Examples of Graphs – Airline Route Map What is the fastest way to get from Pittsburgh to St Louis? What is the.
CSCI 256 Data Structures and Algorithm Analysis Lecture 4 Some slides by Kevin Wayne copyright 2005, Pearson Addison Wesley all rights reserved, and some.
GRAPHS CSE, POSTECH. Chapter 16 covers the following topics Graph terminology: vertex, edge, adjacent, incident, degree, cycle, path, connected component,
Introduction Of Tree. Introduction A tree is a non-linear data structure in which items are arranged in sequence. It is used to represent hierarchical.
Foundations of Discrete Mathematics
Data Structures Week 9 Introduction to Graphs Consider the following problem. A river with an island and bridges. The problem is to see if there is a way.
Tree A connected graph that contains no simple circuits is called a tree. Because a tree cannot have a simple circuit, a tree cannot contain multiple.
Lecture 17 Trees CSCI – 1900 Mathematics for Computer Science Fall 2014 Bill Pine.
COM1721: Freshman Honors Seminar A Random Walk Through Computing Lecture 2: Structure of the Web October 1, 2002.
3.1 Basic Definitions and Applications. Undirected Graphs Undirected graph. G = (V, E) V = nodes. E = edges between pairs of nodes. Captures pairwise.
5.5.2 M inimum spanning trees  Definition 24: A minimum spanning tree in a connected weighted graph is a spanning tree that has the smallest possible.
5.5.3 Rooted tree and binary tree  Definition 25: A directed graph is a directed tree if the graph is a tree in the underlying undirected graph.  Definition.
3.1 Basic Definitions and Applications
COSC 2007 Data Structures II Chapter 14 Graphs I.
5.5.2 M inimum spanning trees  Definition 24: A minimum spanning tree in a connected weighted graph is a spanning tree that has the smallest possible.
The Structure of the Web. Getting to knowing the Web How big is the web and how do you measure it? How many people use the web? How many use search engines?
Discrete Mathematics Chapter 5 Trees.
Lecture 5 CSE 331. Graphs Problem Statement Algorithm Problem Definition “Implementation” Analysis A generic tool to abstract out problems.
Graphs Slide credits:  K. Wayne, Princeton U.  C. E. Leiserson and E. Demaine, MIT  K. Birman, Cornell U.
1/16/20161 Introduction to Graphs Advanced Programming Concepts/Data Structures Ananda Gunawardena.
Trees Dr. Yasir Ali. A graph is called a tree if, and only if, it is circuit-free and connected. A graph is called a forest if, and only if, it is circuit-free.
CSCI-256 Data Structures & Algorithm Analysis Lecture Note: Some slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved. 6.
1 Chapter 3 Graphs Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved.
Data Structures Lakshmish Ramaswamy. Tree Hierarchical data structure Several real-world systems have hierarchical concepts –Physical and biological systems.
Chapter 20: Graphs. Objectives In this chapter, you will: – Learn about graphs – Become familiar with the basic terminology of graph theory – Discover.
Search Engine and Optimization 1. Introduction to Web Search Engines 2.
Lecture 20. Graphs and network models 1. Recap Binary search tree is a special binary tree which is designed to make the search of elements or keys in.
Introduction to Graph & Network Theory Thinking About Networks: From Metabolism to the Genome to Social Conflict Summer Workshop for Teachers June 27 th.
Graphs.
CS 201: Design and Analysis of Algorithms
Balanced Binary Search Trees
3.1 Basic Definitions and Applications
6CCS3WSN--7CCSMWAL Algorithms for WWW and Social Networks Algorithmic Issues in the WWW Lecture 1.
Graph Graphs and graph theory can be used to model:
12. Graphs and Trees 2 Summary
Lecture 18. Basics and types of Trees
Graphs and Graph Models
Depth-First Search.
Chapter 3 Graphs Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved.
Balanced Binary Search Trees
Taibah University College of Computer Science & Engineering Course Title: Discrete Mathematics Code: CS 103 Chapter 10 Trees Slides are adopted from “Discrete.
Graphs All tree structures are hierarchical. This means that each node can only have one parent node. Trees can be used to store data which has a definite.
Graphs Chapter 11 Objectives Upon completion you will be able to:
Graphs Discrete Structure CS203.
Chapter 9: Graphs Basic Concepts
Connected Components, Directed Graphs, Topological Sort
Trees L Al-zaid Math1101.
Lecture 36 Section 12.2 Mon, Apr 23, 2007
Graphs Definitions Breadth First Search Depth First Search
Trees 11.1 Introduction to Trees Dr. Halimah Alshehri.
Important Problem Types and Fundamental Data Structures
3.2 Graph Traversal.
Chapter 10 Graphs and Trees
Algorithm Course Dr. Aref Rashad
Graphs: Definitions How would you represent the following?
Chapter 9: Graphs Basic Concepts
Graphs G = (V,E) V is the vertex set.
Heaps Chapter 6 Section 6.9.
Cs212: Data Structures Lecture 7: Tree_Part1
Presentation transcript:

Searching in Graphs

Google: life time of a query All web pages need to be in Google’s index Over 20 billion webpages New ones are constantly being added How can Google keep searching for new web pages?

Web Crawlers First crawler: Web Wanderer from MIT, 1993 Measure the growth of the web Well known crawlers GoogleBot MSNBot Slurp (from Yahoo!) Teoma (from AskJeeves)

Crawler Architecture PARSER HREFs extractor Citations and normalizer Load Monitor SCHEDULER Crawl Metadata Duplicate URL Eliminator Filter Hosts HREFs extractor and normalizer PARSER Internet seed URLs URL FRONTIER Citations RETRIEVERS DNS HTTP

Web Crawler Architechture High level structure Start with a set of URLs Repeatedly get web pages, scan for outlinks Issues Latency of several seconds per page DNS lookup delays Duplicate pages “Spider traps”: hyperlinks constructed to trap the crawler Crashing the server due to overload Delays in server response

Web Crawler Architechture www.vt.edu/robots.txt http://www.troutbums.com/Flyfactory/flyfactory/flyfactory/ hatchline/hatchline/flyfactory/hatchline/flyfactory/hatchline/ flyfactory/flyfactory/flyfactory/hatchline/flyfactory/hatchline/ Spider traps: dummy links Basic web crawl: searching a graph

Graph Theory: Basic Definitions and Applications Section 3.1 of [KT]

Connections between web links College of Engineering Academics VT home page Computer Science Sports

Road Map

Airline routes

Directed Graphs 1 2 3 4 Directed graph. G = (V, E) V = nodes. E = edges between pairs of nodes. Captures pairwise relationship between objects. Graph size parameters: n = |V|, m = |E|. Maximum number of distinct edges = O(n2) Edges are asymmetric: edge (1,4) but not (4,1) V = { 1, 2, 3, 4} E = { (1,2), (1,3), (1,4), (2,4), (4,2), (4,3)} n = 4 m = 6 1 2 3 4

Adjacencies 1 2 3 4 In(v) = { u : (u,v) is an edge} Indegree(v) = | In(v)| Out(v) = { w: (v,w) is an edge } Outdegree(v) = |Out(v)| Maximum Indegree, Outdegree = O(n) Outdegree(1) Indegree(2) 1 2 3 4

Undirected Graphs Undirected graph. G = (V, E) V = nodes. E = edges between pairs of nodes. Captures symmetric pairwise relationship between objects. Graph size parameters: n = |V|, m = |E|. V = { 1, 2, 3, 4, 5, 6, 7, 8 } E = { (1,2), (1,3), (2,3), (2,4), (2,5), (3,5), (3,7), (3,8), (4,5), (5,6) } n = 8 m = 11

Some Graph Applications Nodes Edges transportation street intersections highways communication computers fiber optic cables World Wide Web web pages hyperlinks social people relationships food web species predator-prey software systems functions function calls scheduling tasks precedence constraints circuits gates wires

World Wide Web Web graph. Directed graph Node: web page. Edge: hyperlink from one page to another. cnn.com netscape.com novell.com cnnsi.com timewarner.com hbo.com sorpranos.com

Ecological Food Web Food web graph. Directed graph Node = species. Edge = from prey to predator. Reference: http://www.twingroves.district96.k12.il.us/Wetlands/Salamander/SalGraphics/salfoodweb.giff

Road Map Nodes: intersections Edges: roads

Other graphs in the real world Airline routes Nodes: cities Edges: Flights Yeast protein network Nodes: proteins Edges: interacting pairs

Other graphs in the real world Sexual interaction network High school dating network

Phylogeny Trees Phylogeny trees. Describe evolutionary history of species. biologists draw their tree from left to right The phylogeny states that there was an ancestral species that gave rise to mammals and birds, but not to the other species shown in the tree (that is, mammals and birds share a common ancestor that they do not share with other species on the tree), that all animals are descended from an ancestor not shared with mushrooms, trees, and bacteria, and so on.

GUI Containment Hierarchy GUI containment hierarchy. Describe organization of GUI widgets. Reference: http://java.sun.com/docs/books/tutorial/uiswing/overview/anatomy.html

Paths and Connectivity Def. A path in an undirected graph G = (V, E) is a sequence P of nodes v1, v2, …, vk-1, vk with the property that each consecutive pair vi, vi+1 is joined by an edge in E. Def. A path is simple if all nodes are distinct. Def. An undirected graph is connected if for every pair of nodes u and v, there is a path between u and v.

Cycles Def. A cycle is a path v1, v2, …, vk-1, vk in which v1 = vk, k > 2, and the first k-1 nodes are all distinct. cycle C = 1-2-4-5-3-1

Trees Def. An undirected graph is a tree if it is connected and does not contain a cycle. Theorem. Let G be an undirected graph on n nodes. Any two of the following statements imply the third. G is connected. G does not contain a cycle. G has n-1 edges.

Rooted Trees Rooted tree. Given a tree T, choose a root node r and orient each edge away from r. Importance. Models hierarchical structure. root r by rooting a tree, it's easy to see that it has n-1 edges (exactly one edge leading upward from each non-root node.) parent of v v child of v a tree the same tree, rooted at 1

Phylogeny Trees Phylogeny trees. Describe evolutionary history of species. biologists draw their tree from left to right The phylogeny states that there was an ancestral species that gave rise to mammals and birds, but not to the other species shown in the tree (that is, mammals and birds share a common ancestor that they do not share with other species on the tree), that all animals are descended from an ancestor not shared with mushrooms, trees, and bacteria, and so on.

GUI Containment Hierarchy GUI containment hierarchy. Describe organization of GUI widgets. Reference: http://java.sun.com/docs/books/tutorial/uiswing/overview/anatomy.html

Binary Trees A rooted tree in which every node has either two 1 A rooted tree in which every node has either two or zero children 2 3 4 5 Complete binary tree: all leaf nodes are at the same level #nodes in a complete binary tree with k levels?