6CCS3WSN--7CCSMWAL Algorithms for WWW and Social Networks Algorithmic Issues in the WWW Lecture 1.

Slides:



Advertisements
Similar presentations
Lecture 5 Graph Theory. Graphs Graphs are the most useful model with computer science such as logical design, formal languages, communication network,
Advertisements

CS 206 Introduction to Computer Science II 03 / 27 / 2009 Instructor: Michael Eckmann.
Graphs Chapter 20 Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013.
Data Structures Using C++
Edited by Malak Abdullah Jordan University of Science and Technology Data Structures Using C++ 2E Chapter 12 Graphs.
CS 206 Introduction to Computer Science II 11 / 11 / Veterans Day Instructor: Michael Eckmann.
C++ Programming: Program Design Including Data Structures, Third Edition Chapter 21: Graphs.
Applied Discrete Mathematics Week 12: Trees
Lists A list is a finite, ordered sequence of data items. Two Implementations –Arrays –Linked Lists.
Introduction to Graphs
CS 206 Introduction to Computer Science II 11 / 05 / 2008 Instructor: Michael Eckmann.
C o n f i d e n t i a l HOME NEXT Subject Name: Data Structure Using C Unit Title: Graphs.
Chapter 9 – Graphs A graph G=(V,E) – vertices and edges
Chapter 2 Graph Algorithms.
GRAPHS CSE, POSTECH. Chapter 16 covers the following topics Graph terminology: vertex, edge, adjacent, incident, degree, cycle, path, connected component,
COM1721: Freshman Honors Seminar A Random Walk Through Computing Lecture 2: Structure of the Web October 1, 2002.
Mathematics of Networks (Cont)
Basic Notions on Graphs. The House-and-Utilities Problem.
COSC 2007 Data Structures II
GRAPHS. Graph Graph terminology: vertex, edge, adjacent, incident, degree, cycle, path, connected component, spanning tree Types of graphs: undirected,
Graphs Upon completion you will be able to:
Chapter 9: Graphs.
Chapter 20: Graphs. Objectives In this chapter, you will: – Learn about graphs – Become familiar with the basic terminology of graph theory – Discover.
Data Structures and Algorithm Analysis Graph Algorithms Lecturer: Jing Liu Homepage:
Lecture #13. Topics 1.The Graph Abstract Data Type. 2.Graph Representations. 3.Elementary Graph Operations.
1 Data Structures and Algorithms Graphs. 2 Graphs Basic Definitions Paths and Cycles Connectivity Other Properties Representation Examples of Graph Algorithms:
CMSC 341 Graphs. 2 Basic Graph Definitions A graph G = (V,E) consists of a finite set of vertices, V, and a set of edges, E. Each edge is a pair (v,w)
BCA-II Data Structure Using C Submitted By: Veenu Saini
Data Structures & Algorithm Analysis lec(8):Graph T. Souad alonazi
Graphs A New Data Structure
Graphs Chapter 20.
Graphs Lecture 19 CS2110 – Spring 2013.
Unit 10 Graphs (1) King Fahd University of Petroleum & Minerals
Data Structures Graphs - Terminology
Introduction to Graphs
Csc 2720 Instructor: Zhuojun Duan
Introduction to Graphs
C.Eng 213 Data Structures Graphs Fall Section 3.
EECS 203 Lecture 20 More Graphs.
Introduction to Graphs
Ellen Walker CPSC 201 Data Structures Hiram College
I206: Lecture 15: Graphs Marti Hearst Spring 2012.
CS202 - Fundamental Structures of Computer Science II
CS120 Graphs.
Graph Algorithms Using Depth First Search
CMSC 341 Lecture 21 Graphs (Introduction)
Graph Algorithm.
CSE 421: Introduction to Algorithms
Graphs Graph transversals.
Graph & BFS.
Graphs Lecture 18 CS2110 – Fall 2009.
Graphs Chapter 11 Objectives Upon completion you will be able to:
Graphs.
What is a Graph? a b c d e V= {a,b,c,d,e} E= {(a,b),(a,c),(a,d),
Chapter 11 Graphs.
CMSC 341 Lecture 20.
Graphs Chapter 7 Visit for more Learning Resources.
Graph Implementation.
Graphs G = (V, E) V are the vertices; E are the edges.
GRAPHS G=<V,E> Adjacent vertices Undirected graph
Data Structures and Algorithm Analysis Graph Algorithms
Important Problem Types and Fundamental Data Structures
3.2 Graph Traversal.
Chapter 14 Graphs © 2011 Pearson Addison-Wesley. All rights reserved.
Elementary Graph Algorithms
Paths and Connectivity
Graphs: Definitions How would you represent the following?
Introduction to Graphs
Introduction to Graphs
INTRODUCTION A graph G=(V,E) consists of a finite non empty set of vertices V , and a finite set of edges E which connect pairs of vertices .
Presentation transcript:

6CCS3WSN--7CCSMWAL Algorithms for WWW and Social Networks Algorithmic Issues in the WWW Lecture 1

Lecturer’s Contact Lecturer: Colin Cooper Email: colin.cooper@kcl.ac.uk Office: S6.23 Office hours: Thursday 1:15-3:15pm

Course Information No coursework, simply a final exam Course Web Site Announcements Lecture slides (Also on KEATS) Exercises Past exams Papers (references)

Full Info at http://www.inf.kcl.ac.uk/staff/ccooper/teachingmaterial/index.html   To access these notes the user name and password are: username: lecturenotes  passwd: PalPdaWal

Brief Syllabus Structure of large networks such as WWW Algorithmic issues related to information retrieval. Main topics Webpage Ranking Text processing Information retrieval Clustering data Recommender systems

Internet and WWW Internet ≠ World Wide Web (WWW) Internet is a network of computers, linked by undirected (two-way) data connections WWW is a network of web pages, linked by directed hyperlinks

Characteristics of WWW DYNAMIC: Cho & Garcia-Molina) 40% of all web pages changed weekly 23% of the .com pages changed daily SELF-ORGANISED Anyone can post a webpage No standard, structure, and format Huge. Google indexes at least 4.5 billion pages Look unstructured/random from global perspective

Graph Models of Networks Definition of graph G = (V, E), V is a set of vertices, E is a set of pairs (v1, v2) for some vertices v1 and v2 in V Directed graphs (or digraphs) Pairs (v1, v2) and (v2, v1) are not equivalent Undirected graphs Pairs (v1, v2) and (v2, v1) are equivalent (v,w) also written {v,w} or vw

Directed Graph A set of hyperlinked web pages can be represented by a directed graph G1 = (V1, E1) V1 = { P1, P2, P3, P4 } E1 = { (P1, P2), (P1, P4), (P3, P1), (P3, P2), (P4, P1) } Web page P1 Webpage P4 Web page P2 Web page P3

Undirected Graph A computer network can be represented by an undirected graph G2 = (V2, E2) V2 = { S1, S2, S3, S4 } E2 = { (S1, S2), (S1, S3), (S1, S4), (S2, S3) } Computer S1 (S1, S2) and (S2, S1) are equivalent, only one is shown in E2 Computer S4 Computer S2 Computer S3

The Internet graph

The graph of main subdomains of www.kcl.ac.uk (Note: hyperlinks are shown as undirected edges)

Example: Wikipedia page links (wikimedia.org/wiki/User:Chris73)

Graph Representation List of (vertices and) edges Adjacency list Adjacency matrix EXAMPLES……….

Graph Representation Adjacency-list representation For i = 1 to the number of vertices, Adj[i] = list of vertices adjacent to vertex i Undirected graph

Graph Representation Adjacency-list representation For i = 1 to the number of vertices, Adj[i] = list of vertices adjacent to vertex i Directed graph

Graph Representation Adjacency-matrix representation Let n be the number of vertices A is a n x n matrix where A(i, j) = 1 if (i, j) exists, A(i, j) = 0 otherwise Undirected graph

Graph Representation Adjacency-matrix representation Let n be the number of vertices A is a n x n matrix where A(i, j) = 1 if (i, j) exists, A(i, j) = 0 otherwise Directed graph

Terminology in Graphs Degree of a vertex v, d(v) Undirected graphs Number of edges incident with v Undirected graphs E.g. d(1) = 2, d(5) = 3

Terminology in Graphs Directed graphs In-degree d– (v) = number of edges (x, v) for x in the vertex set E.g. d–(1) = 0, d–(4) = 2 Out-degree d+(v) = number of edges (v, x) for x in the vertex set E.g., d+(1) = 2, d+(4) = 1 d(v) = d– (v) + d+(v)

Terminology in Graphs A path of length k from vertices u to v is a sequence of k arcs/edges (u, v1), (v1, v2), ..., (vk-1, v) .... u v1 v2 vk-1 v E.g., A path from vertex 1 to vertex 4 with length 2 (Note: a path can take either direction of an edge in an undirected graphs) A path from vertex 1 to vertex 5 with length 3

Walk (non-simple path) Terminology in Graphs A cycle is a path with u = v A simple path is path if it consists of no cycle, i.e., a vertex appear at most once on the path Cycle A simple path Walk (non-simple path)

Terminology in Graphs An undirected graph is connected if there is a path from every vertex to every other vertex 1 2 3 5 4 Connected Not connected (or disconnected)

Terminology in Graphs A directed graph is strongly connected if there is a path from every vertex to every other vertex 1 2 3 4 5 6 Strongly connected Not strongly connected (e.g., no path from vertex 5 to vertex 6)

Terminology in Graphs A directed graph is weakly connected if all the edges are considered undirected edges and the resulting (undirected) graph is connected. 1 2 3 4 5 6 Weakly connected Not weakly connected

Graph Algorithms To determine properties of a graph Connected or not? Shortest path from a vertex to another one The average distance between vertices How many edges, vertices, triangles Are there any ‘clusters’ Which vertices are important

Graph Traversal Try to “visit” all vertices of a graph in a systematic way Two standard algorithms Breath First Search (BFS) Depth First Search (DFS) Can be used to determine if an undirected graph is connected If starting from a vertex, we can visit every other vertex Simple idea of Web crawling …

Framework of Traversal BFS and DFS share the same framework A starting vertex to start the traversal Keep a list (or priority queue) of vertices reached so far Will mark all vertices reachable by the starting vertex The two traversal differ in the way of adding new vertices to the list, hence result in a different traversal order

Traversal Algorithm Put the starting vertex in LIST While LIST is not empty Remove first vertex y held in LIST Visit y & mark y as seen For all (y, z) in E If z is not marked and not in LIST Add z to LIST DFS adds z to the beginning of LIST BFS adds z to the end of LIST

BFS Traverse the vertices in a first-in-first-out manner. LIST is a QUEUE Suppose vertex 1 is the starting vertex Front of list on left [a,b,c] 2 3 4 1 5 6 7 To prevent ambiguity, if a vertex has more than one outgoing edges, follow the edges in ascending order of vertices they lead to

1 5 2 3 4 6 7 LIST 1 Initial setting LIST Mark vertex 1 as seen 2 Insert vertex 2 to LIST 2, 6 Insert vertex 6 to LIST 1 5 3 4 6 7 1 5 2 3 4 6 7 LIST Mark vertex 2 6, 3 Insert vertex 3 to LIST

1 5 2 3 4 6 7 LIST Mark vertex 6 3, 4 Insert vertex 4 to LIST 3, 4, 7 Insert vertex 7 to LIST 1 5 2 3 4 6 7 LIST 4, 7 Mark vertex 3 4, 7, 5 Insert vertex 5 to LIST 1 5 2 3 4 6 7 LIST 7, 5 Mark vertex 4

5 4 6 7 LIST Mark vertex 7 2 1 5 4 6 7 LIST Mark vertex 5 2 1

The BFS traversal order of the vertices: 1, 2, 6, 3, 4, 7, 5 BFS tree 12, 16, 23, 64, 67, 35

DFS Traverse the vertices in a last-in-first-out manner. LIST is a STACK Suppose vertex 1 is the starting vertex 1 5 2 3 4 6 7 To prevent ambiguity, if a vertex has more than one outgoing arcs, follow the edges in ascending order of vertices they lead to

1 5 2 3 4 6 7 LIST 1 Initial setting LIST Mark vertex 1 2 Insert vertex 2 to LIST 6, 2 Insert vertex 6 to LIST 1 5 3 4 6 7 1 5 2 3 4 6 7 LIST Traverse (1,6) mark vertex 6 4, 2 Insert vertex 4 to LIST 7, 4, 2 Insert vertex 7 to LIST

2 3 4 LIST 4, 2 Traverse (6,7) mark 7 1 5 LIST 5, 4, 2 Insert vertex 5 to LIST 6 7 1 5 2 3 4 6 7 LIST 4, 2 Traverse (7,5) mark 5 Backtrack over 7 to 6 1 5 2 3 4 6 7 LIST Traverse (6,4) mark 4 3, 2 Insert vertex 3 to LIST

2 3 4 LIST 2 Traverse (4,3) mark 3 Backtrack to 4 1 5 6 7 1 5 2 3 4 6 LIST Traverse (4,2) mark 2 Backtrack over 4, 6 to 1 End. List empty 7

The DFS traversal order of the vertices: 1, 6, 7, 5, 4, 3, 2 DFS tree from vertex 1 16, 67, 75, 64, 43, 42 The BFS traversal order of the vertices: 1, 2, 6, 3, 4, 7, 5 BFS tree from vertex 1 12, 16, 23, 64, 67, 35

Demmo (BFS-DFS.r) Lets see what happens……….

Demmo

Shortest Path A shortest path from a vertex u to a vertex v is a path, among all possible paths from u to v, which consists of the least number of arcs/edges Shortest path is not unique; there could be more than one shortest paths between two vertices The shortest path from u to v is called the (graph) distance

Shortest Path The shortest path from vertex 1 to vertex 5, can be either (1, 2), (2, 3), (3, 5) (1, 6), (6, 7), (7, 5) The length of the shortest path is 3 1 5 2 3 4 6 7

BFS finds shortest paths BFS tree from vertex 1 12, 16, 23, 64, 67, 35 See the BFS tree from vertex 1 1 5 2 3 4 6 7

Distance The (graph) distance (u,v) from vertex u to vertex v is 0 if u = v the length of a shortest path from u to v in the BFS tree  if there is no path from u to v

Distance (u,v) 1 2 3 4 5

Distance (u,v) v 1 2 3 4 5 6 u 

Diameter The diameter D of a graph G = (V, E) is the maximum graph distance (longest shortest path) between any pair of vertices. (u,v) 1 2 3 4 5 Diameter is 2

Diameter (u,v) v 1 2 3 4 5 6 u  Diameter is 

i.e., the path from the starting vertex to the last visited vertex BFS and Shortest Paths BFS can find the shortest paths and distance from the starting vertex to all other vertices in a connected component (2) x2 (1) x1 distance from the starting vertex Starting vertex (0) x0 (2) x3 (3) x6 (1) x4 (2) x5 The longest (from level to level) path (depth) of the BFS: x0, x1, x2, x6 i.e., the path from the starting vertex to the last visited vertex

Shortest Path Algorithm Modify the BFS algorithm to find the shortest path/distance from a vertex to every other vertex Let x be the starting vertex Dist[i] = (x, i) the distance from vertex x to vertex I The modified BFS computes Dist[i] for all vertices i

BFS Shortest Path Algorithm Put the starting vertex x to QUEUE Dist[x] = 0 While QUEUE is not empty Extract the first vertex y in QUEUE Mark y as seen For all (y, z) in E If z is not marked and not in QUEUE Dist[z] = Dist[y] + 1 Add z to the end of QUEUE For all unmarked vertices i Dist[i] = 

Example with starting vertex 3  Dist[3] = 0  Dist[5] = Dist[3] + 1 = 1   Dist[6] = Dist[3] + 1 = 1  

 Dist[4] = Dist[5] + 1 = 2      Dist[2] = Dist[4] + 1 = 3      Dist[1] =    

All shortest paths? BFS from 1 (red edges) Not in any shortest path from 1 In shortest path but not BFS tree Shortest (1,6)-paths: 1246, 1346, 1356

Number of shortest paths from v to w of length k. N(v,w,k) N(v,v,0)=1 One shortest path from v to v For w at distance (k+1) from v N(v,w,k+1) = Sum over all neighbours u of w at distance k from v the value of N(v,u,k) N(1,1,0)=1, N(1,2,1)=1,N(1,3,1)=1 N(1,4,2)=N(1,2,1)+N(1,3,1)=2 etc

More Shortest Paths Algorithms Edges have different weights and the length of a path is the sum of the weights of edges in the path Dijkstra’s Algorithm All pairs shortest paths – find the shortest distances/paths for all pairs of vertices Floyd-Warshall Algorithm

Diameter Diameter measures of how “well” a graph is connected The smaller the “better” A directed graph which is not strongly connected has a diameter of  The WWW graph, is not strongly connected nor weakly connected. Why? Some web pages are not hyperlinked with the rest of the WWW

Revised Diameter Measure To measure the longest distance between any two “connected” pairs of vertices Let G = (V, E) be the WWW graph Let P be the set of all triples (u, v, k), where u, v in V and 0  k = (u, v) <  (reachable) P is the set of finite distance vertex pairs The revised diameter is D*(G) = max(u, v, k) in P (u, v)

Average Distance An estimate of the average length of the shortest path from a vertex to another one Small world. Six degrees of separation is the theory that everyone and everything is six or fewer steps away. There are several possible ways to estimate average distance….

Average Distance Traditional definition (all pairs) Suppose a graph has n vertices. There are n(n-1) pairs of vertices (u,v) giving n(n-1) paths Definition useful only for connected undirected graphs and strongly connected directed graphs  not applicable to WWW graph Example

Revised Average Distance Only count the pairs of reachable vertices. Let P be the set of all triples (u, v, k) for which u, v in V and 0 < k = (u, v) <  i.e. distinct ordered pairs |P| is the size of P which is  n

Example (u,v) v 1 2 3 4 5 6 u  |P| = 13 = (1 + 1 + 2 + 2 + 1 + 3 + 2 + 1 + 1 + 1 + 2 + 2 + 1) / 13 = 1.538

Revised Average Distance Average of the reciprocal of the distances Infinite distance contributes zero to the sum No need to distinguish between reachable and unreachable pairs Don’t include distance from vertex to itself

Example (u,v) v 1 2 3 4 5 6 u  = 3.06 (1/1 + 1/ + 1/1 + 1/2 + 1/ + 1/ + 1/ + 1/2 + 1/1 + 1/ + ... + 1/ + 1/ + 1/ + 1/ + 1/)/30 = = 9.833/30

Exercise Digraph (a,b), (a,c), (b,c) All pairs average distance in underlying graph Average distance in digraph for reachable pairs Average distance in digraph using reciprocal distances Comment? Answers 1, 1, 2

Connectivity in Digraphs A graph is connected if there is a path between every pair of vertices A digraph is strongly connected if there is a directed path between every pait of vertices A digraph is weakly connected if the underlying graph is connected Underlying graph: Ignore edge orientations

Find the strongly connected components (SCC) of a digraph FAN-OUT(v): set of vertices obtained by BFS from v using edges directed out of vertices FAN-IN(v): set of vertices obtained by BFS from v using edges directed into vertices SCC(v) is the intersection of FAN-OUT(v) and FAN-IN(v)

Example FO(1)={1,2,3,4,6}, FI(1)={3,2,4,5} SCC(1)={1,2,3,4} SCC(6)={6}

How many SCC? There are 3 SCC, {1,2,3,4}, {5} ,{6}

Examples Edges G=(V,E) Digraph (3,1) (6,1) (9,2) (10,2) (6,3) (8,3) (1,4) (8,4) (5,10) (6,5) (9,5) (5,7) (1,8) (4,9) (7,9) (8,9) (9,10) Draw G Find the SCC Exercise. Digraph V=1,2,…10, Edges 5->1 8-> 1 5-> 3 8-> 3 5->10 6->10 4-> 7 7->10 8->10 5-> 9 7-> 9

Draw G

SCC SCC are

Demmo connect-scc.r Examples of G(n,p) and PA connectivity