Download presentation
Presentation is loading. Please wait.
Published byMitchell Willis Modified over 6 years ago
1
6CCS3WSN--7CCSMWAL Algorithms for WWW and Social Networks Algorithmic Issues in the WWW
Lecture 1
2
Lecturer’s Contact Lecturer: Colin Cooper
Office: S6.23 Office hours: Thursday 1:15-3:15pm
3
Course Information No coursework, simply a final exam Course Web Site
Announcements Lecture slides (Also on KEATS) Exercises Past exams Papers (references)
4
Full Info at To access these notes the user name and password are: username: lecturenotes passwd: PalPdaWal
5
Brief Syllabus Structure of large networks such as WWW
Algorithmic issues related to information retrieval. Main topics Webpage Ranking Text processing Information retrieval Clustering data Recommender systems
6
Internet and WWW Internet ≠ World Wide Web (WWW)
Internet is a network of computers, linked by undirected (two-way) data connections WWW is a network of web pages, linked by directed hyperlinks
7
Characteristics of WWW
DYNAMIC: Cho & Garcia-Molina) 40% of all web pages changed weekly 23% of the .com pages changed daily SELF-ORGANISED Anyone can post a webpage No standard, structure, and format Huge. Google indexes at least 4.5 billion pages Look unstructured/random from global perspective
8
Graph Models of Networks
Definition of graph G = (V, E), V is a set of vertices, E is a set of pairs (v1, v2) for some vertices v1 and v2 in V Directed graphs (or digraphs) Pairs (v1, v2) and (v2, v1) are not equivalent Undirected graphs Pairs (v1, v2) and (v2, v1) are equivalent (v,w) also written {v,w} or vw
9
Directed Graph A set of hyperlinked web pages can be represented by a directed graph G1 = (V1, E1) V1 = { P1, P2, P3, P4 } E1 = { (P1, P2), (P1, P4), (P3, P1), (P3, P2), (P4, P1) } Web page P1 Webpage P4 Web page P2 Web page P3
10
Undirected Graph A computer network can be represented by an undirected graph G2 = (V2, E2) V2 = { S1, S2, S3, S4 } E2 = { (S1, S2), (S1, S3), (S1, S4), (S2, S3) } Computer S1 (S1, S2) and (S2, S1) are equivalent, only one is shown in E2 Computer S4 Computer S2 Computer S3
11
The Internet graph
12
The graph of main subdomains of www.kcl.ac.uk
(Note: hyperlinks are shown as undirected edges)
13
Example: Wikipedia page links (wikimedia.org/wiki/User:Chris73)
14
Graph Representation List of (vertices and) edges Adjacency list
Adjacency matrix EXAMPLES……….
15
Graph Representation Adjacency-list representation
For i = 1 to the number of vertices, Adj[i] = list of vertices adjacent to vertex i Undirected graph
16
Graph Representation Adjacency-list representation
For i = 1 to the number of vertices, Adj[i] = list of vertices adjacent to vertex i Directed graph
17
Graph Representation Adjacency-matrix representation
Let n be the number of vertices A is a n x n matrix where A(i, j) = 1 if (i, j) exists, A(i, j) = 0 otherwise Undirected graph
18
Graph Representation Adjacency-matrix representation
Let n be the number of vertices A is a n x n matrix where A(i, j) = 1 if (i, j) exists, A(i, j) = 0 otherwise Directed graph
19
Terminology in Graphs Degree of a vertex v, d(v) Undirected graphs
Number of edges incident with v Undirected graphs E.g. d(1) = 2, d(5) = 3
20
Terminology in Graphs Directed graphs
In-degree d– (v) = number of edges (x, v) for x in the vertex set E.g. d–(1) = 0, d–(4) = 2 Out-degree d+(v) = number of edges (v, x) for x in the vertex set E.g., d+(1) = 2, d+(4) = 1 d(v) = d– (v) + d+(v)
21
Terminology in Graphs A path of length k from vertices u to v is a sequence of k arcs/edges (u, v1), (v1, v2), ..., (vk-1, v) .... u v1 v2 vk-1 v E.g., A path from vertex 1 to vertex 4 with length 2 (Note: a path can take either direction of an edge in an undirected graphs) A path from vertex 1 to vertex 5 with length 3
22
Walk (non-simple path)
Terminology in Graphs A cycle is a path with u = v A simple path is path if it consists of no cycle, i.e., a vertex appear at most once on the path Cycle A simple path Walk (non-simple path)
23
Terminology in Graphs An undirected graph is connected if there is a path from every vertex to every other vertex 1 2 3 5 4 Connected Not connected (or disconnected)
24
Terminology in Graphs A directed graph is strongly connected if there is a path from every vertex to every other vertex 1 2 3 4 5 6 Strongly connected Not strongly connected (e.g., no path from vertex 5 to vertex 6)
25
Terminology in Graphs A directed graph is weakly connected if all the edges are considered undirected edges and the resulting (undirected) graph is connected. 1 2 3 4 5 6 Weakly connected Not weakly connected
26
Graph Algorithms To determine properties of a graph Connected or not?
Shortest path from a vertex to another one The average distance between vertices How many edges, vertices, triangles Are there any ‘clusters’ Which vertices are important
27
Graph Traversal Try to “visit” all vertices of a graph in a systematic way Two standard algorithms Breath First Search (BFS) Depth First Search (DFS) Can be used to determine if an undirected graph is connected If starting from a vertex, we can visit every other vertex Simple idea of Web crawling …
28
Framework of Traversal
BFS and DFS share the same framework A starting vertex to start the traversal Keep a list (or priority queue) of vertices reached so far Will mark all vertices reachable by the starting vertex The two traversal differ in the way of adding new vertices to the list, hence result in a different traversal order
29
Traversal Algorithm Put the starting vertex in LIST
While LIST is not empty Remove first vertex y held in LIST Visit y & mark y as seen For all (y, z) in E If z is not marked and not in LIST Add z to LIST DFS adds z to the beginning of LIST BFS adds z to the end of LIST
30
BFS Traverse the vertices in a first-in-first-out manner. LIST is a QUEUE Suppose vertex 1 is the starting vertex Front of list on left [a,b,c] 2 3 4 1 5 6 7 To prevent ambiguity, if a vertex has more than one outgoing edges, follow the edges in ascending order of vertices they lead to
31
1 5 2 3 4 6 7 LIST 1 Initial setting LIST Mark vertex 1 as seen 2 Insert vertex 2 to LIST 2, 6 Insert vertex 6 to LIST 1 5 3 4 6 7 1 5 2 3 4 6 7 LIST Mark vertex 2 6, 3 Insert vertex 3 to LIST
32
1 5 2 3 4 6 7 LIST Mark vertex 6 3, 4 Insert vertex 4 to LIST 3, 4, 7 Insert vertex 7 to LIST 1 5 2 3 4 6 7 LIST 4, 7 Mark vertex 3 4, 7, 5 Insert vertex 5 to LIST 1 5 2 3 4 6 7 LIST 7, 5 Mark vertex 4
33
5 4 6 7 LIST Mark vertex 7 2 1 5 4 6 7 LIST Mark vertex 5 2 1
34
The BFS traversal order of the vertices:
1, 2, 6, 3, 4, 7, 5 BFS tree 12, 16, 23, 64, 67, 35
35
DFS Traverse the vertices in a last-in-first-out manner. LIST is a STACK Suppose vertex 1 is the starting vertex 1 5 2 3 4 6 7 To prevent ambiguity, if a vertex has more than one outgoing arcs, follow the edges in ascending order of vertices they lead to
36
1 5 2 3 4 6 7 LIST 1 Initial setting LIST Mark vertex 1 2 Insert vertex 2 to LIST 6, 2 Insert vertex 6 to LIST 1 5 3 4 6 7 1 5 2 3 4 6 7 LIST Traverse (1,6) mark vertex 6 4, 2 Insert vertex 4 to LIST 7, 4, 2 Insert vertex 7 to LIST
37
2 3 4 LIST 4, 2 Traverse (6,7) mark 7 1 5 LIST 5, 4, 2 Insert vertex 5 to LIST 6 7 1 5 2 3 4 6 7 LIST 4, 2 Traverse (7,5) mark 5 Backtrack over 7 to 6 1 5 2 3 4 6 7 LIST Traverse (6,4) mark 4 3, 2 Insert vertex 3 to LIST
38
2 3 4 LIST 2 Traverse (4,3) mark 3 Backtrack to 4 1 5 6 7 1 5 2 3 4 6 LIST Traverse (4,2) mark 2 Backtrack over 4, 6 to 1 End. List empty 7
39
The DFS traversal order of the vertices:
1, 6, 7, 5, 4, 3, 2 DFS tree from vertex 1 16, 67, 75, 64, 43, 42 The BFS traversal order of the vertices: 1, 2, 6, 3, 4, 7, 5 BFS tree from vertex 1 12, 16, 23, 64, 67, 35
40
Demmo (BFS-DFS.r) Lets see what happens……….
41
Demmo
42
Shortest Path A shortest path from a vertex u to a vertex v is a path, among all possible paths from u to v, which consists of the least number of arcs/edges Shortest path is not unique; there could be more than one shortest paths between two vertices The shortest path from u to v is called the (graph) distance
43
Shortest Path The shortest path from vertex 1 to vertex 5, can be either (1, 2), (2, 3), (3, 5) (1, 6), (6, 7), (7, 5) The length of the shortest path is 3 1 5 2 3 4 6 7
44
BFS finds shortest paths
BFS tree from vertex 1 12, 16, 23, 64, 67, 35 See the BFS tree from vertex 1 1 5 2 3 4 6 7
45
Distance The (graph) distance (u,v) from vertex u to vertex v is
0 if u = v the length of a shortest path from u to v in the BFS tree if there is no path from u to v
46
Distance (u,v) 1 2 3 4 5
47
Distance (u,v) v 1 2 3 4 5 6 u
48
Diameter The diameter D of a graph G = (V, E) is the maximum graph distance (longest shortest path) between any pair of vertices. (u,v) 1 2 3 4 5 Diameter is 2
49
Diameter (u,v) v 1 2 3 4 5 6 u Diameter is
50
i.e., the path from the starting vertex to the last visited vertex
BFS and Shortest Paths BFS can find the shortest paths and distance from the starting vertex to all other vertices in a connected component (2) x2 (1) x1 distance from the starting vertex Starting vertex (0) x0 (2) x3 (3) x6 (1) x4 (2) x5 The longest (from level to level) path (depth) of the BFS: x0, x1, x2, x6 i.e., the path from the starting vertex to the last visited vertex
51
Shortest Path Algorithm
Modify the BFS algorithm to find the shortest path/distance from a vertex to every other vertex Let x be the starting vertex Dist[i] = (x, i) the distance from vertex x to vertex I The modified BFS computes Dist[i] for all vertices i
52
BFS Shortest Path Algorithm
Put the starting vertex x to QUEUE Dist[x] = 0 While QUEUE is not empty Extract the first vertex y in QUEUE Mark y as seen For all (y, z) in E If z is not marked and not in QUEUE Dist[z] = Dist[y] + 1 Add z to the end of QUEUE For all unmarked vertices i Dist[i] =
53
Example with starting vertex 3
Dist[3] = 0 Dist[5] = Dist[3] + 1 = 1 Dist[6] = Dist[3] + 1 = 1
54
Dist[4] = Dist[5] + 1 = 2 Dist[2] = Dist[4] + 1 = 3 Dist[1] =
55
All shortest paths? BFS from 1 (red edges)
Not in any shortest path from 1 In shortest path but not BFS tree Shortest (1,6)-paths: 1246, 1346, 1356
56
Number of shortest paths from v to w of length k. N(v,w,k)
N(v,v,0)=1 One shortest path from v to v For w at distance (k+1) from v N(v,w,k+1) = Sum over all neighbours u of w at distance k from v the value of N(v,u,k) N(1,1,0)=1, N(1,2,1)=1,N(1,3,1)=1 N(1,4,2)=N(1,2,1)+N(1,3,1)=2 etc
57
More Shortest Paths Algorithms
Edges have different weights and the length of a path is the sum of the weights of edges in the path Dijkstra’s Algorithm All pairs shortest paths – find the shortest distances/paths for all pairs of vertices Floyd-Warshall Algorithm
58
Diameter Diameter measures of how “well” a graph is connected
The smaller the “better” A directed graph which is not strongly connected has a diameter of The WWW graph, is not strongly connected nor weakly connected. Why? Some web pages are not hyperlinked with the rest of the WWW
59
Revised Diameter Measure
To measure the longest distance between any two “connected” pairs of vertices Let G = (V, E) be the WWW graph Let P be the set of all triples (u, v, k), where u, v in V and 0 k = (u, v) < (reachable) P is the set of finite distance vertex pairs The revised diameter is D*(G) = max(u, v, k) in P (u, v)
60
Average Distance An estimate of the average length of the shortest path from a vertex to another one Small world. Six degrees of separation is the theory that everyone and everything is six or fewer steps away. There are several possible ways to estimate average distance….
61
Average Distance Traditional definition (all pairs)
Suppose a graph has n vertices. There are n(n-1) pairs of vertices (u,v) giving n(n-1) paths Definition useful only for connected undirected graphs and strongly connected directed graphs not applicable to WWW graph Example
62
Revised Average Distance
Only count the pairs of reachable vertices. Let P be the set of all triples (u, v, k) for which u, v in V and 0 < k = (u, v) < i.e. distinct ordered pairs |P| is the size of P which is n
63
Example (u,v) v 1 2 3 4 5 6 u |P| = 13 = ( ) / 13 = 1.538
64
Revised Average Distance
Average of the reciprocal of the distances Infinite distance contributes zero to the sum No need to distinguish between reachable and unreachable pairs Don’t include distance from vertex to itself
65
Example (u,v) v 1 2 3 4 5 6 u = 3.06 (1/1 + 1/ + 1/1 + 1/2 + 1/ / + 1/ + 1/2 + 1/1 + 1/ / + 1/ + 1/ + 1/ /)/30 = = /30
66
Exercise Digraph (a,b), (a,c), (b,c)
All pairs average distance in underlying graph Average distance in digraph for reachable pairs Average distance in digraph using reciprocal distances Comment? Answers 1, 1, 2
67
Connectivity in Digraphs
A graph is connected if there is a path between every pair of vertices A digraph is strongly connected if there is a directed path between every pait of vertices A digraph is weakly connected if the underlying graph is connected Underlying graph: Ignore edge orientations
68
Find the strongly connected components (SCC) of a digraph
FAN-OUT(v): set of vertices obtained by BFS from v using edges directed out of vertices FAN-IN(v): set of vertices obtained by BFS from v using edges directed into vertices SCC(v) is the intersection of FAN-OUT(v) and FAN-IN(v)
69
Example FO(1)={1,2,3,4,6}, FI(1)={3,2,4,5} SCC(1)={1,2,3,4} SCC(6)={6}
70
How many SCC? There are 3 SCC, {1,2,3,4}, {5} ,{6}
71
Examples Edges G=(V,E) Digraph
(3,1) (6,1) (9,2) (10,2) (6,3) (8,3) (1,4) (8,4) (5,10) (6,5) (9,5) (5,7) (1,8) (4,9) (7,9) (8,9) (9,10) Draw G Find the SCC Exercise. Digraph V=1,2,…10, Edges 5->1 8-> 1 5-> 3 8-> 3 5->10 6->10 4-> 7 7->10 8->10 5-> 9 7-> 9
72
Draw G
73
SCC SCC are
74
Demmo connect-scc.r Examples of G(n,p) and PA connectivity
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.