Download presentation
Presentation is loading. Please wait.
1
Link Analysis
2
2 HITS - Kleinberg’s Algorithm HITS – Hypertext Induced Topic Selection For each vertex v Є V in a subgraph of interest: A site is very authoritative if it receives many citations. Citation from important sites weight more than citations from less-important sites Hubness shows the importance of a site. A good hub is a site that links to many authoritative sites a(v) - the authority of v h(v) - the hubness of v
3
3 Authority and Hubness 2 3 4 1 1 5 6 7 a(1) = h(2) + h(3) + h(4) h(1) = a(5) + a(6) + a(7)
4
4 Authority and Hubness Convergence Recursive dependency : a(v) Σ h(w) h(v) Σ a(w) Using Linear Algebra, we can prove: w Є pa[v] w Є ch[v] a(v) and h(v) converge
5
5 Authority and Hub Pages Matrix representation of operations I and O. Let A be the adjacency matrix of SG: entry (p, q) is 1 if p has a link to q, else the entry is 0. Let A T be the transpose of A. Let h i be vector of hub scores after i iterations. Let a i be the vector of authority scores after i iterations. Operation I: a i = A T h i-1 Operation O: h i = A a i Normalize after every multiplication
6
6 Does the procedure converge? x x2x2 xkxk As we multiply repeatedly with M, the component of x in the direction of principal eigen vector gets stretched wrt to other directions.. So we converge finally to the direction of principal eigenvector Necessary condition: x must have a component in the direction of principal eigen vector (c 1 must be non-zero) The rate of convergence depends on the “eigen gap”
7
7 HITS Example {1, 2, 3, 4} - nodes relevant to the topic Expand the root set R to include all the children and a fixed number of parents of nodes in R A new set S (base subgraph) Start with a root set R {1, 2, 3, 4} Find a base subgraph:
8
8 HITS Example BaseSubgraph( R, d) 1. S r 2. for each v in R 3. do S S U ch[v] 4. P pa[v] 5. if |P| > d 6. then P arbitrary subset of P having size d 7. S S U P 8. return S
9
9 HITS Example HubsAuthorities(G) 1 1 [1,…,1] Є R 2 a h 1 3 t 1 4 repeat 5 for each v in V 6 do a (v) Σ h (w) 7 h (v) Σ a (w) 8 a a / || a || 9 h h / || h || 10 t t + 1 11 until || a – a || + || h – h || < ε 12 return (a, h ) Hubs and authorities: two n-dimensional a and h 00 t t t t t t t t t t tt t -1 w Є pa[v] |V|
10
10 HITS Example Results Authority Hubness 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Authority and hubness weights
11
11 PageRank Introduced by Page et al (1998) The weight is assigned by the rank of parents Difference with HITS HITS takes Hubness & Authority weights The page rank is proportional to its parents’ rank, but inversely proportional to its parents’ outdegree
12
12 Matrix Notation P is the adjacency matrix based on the link structure of the Internet.
13
13 Problem “Rank Sink” Problem In general, many Web pages have no inlinks/outlinks It results in dangling edges in the graph E is the matrix of all ones Interpretation: In each time step the surfer visiting a page will jump to a random page With a probability
14
14 Stability Whether the link analysis algorithms based on eigenvectors are stable in the sense that results don’t change significantly? The connectivity of a portion of the graph is changed arbitrary How will it affect the results of algorithms?
15
15 Stability of HITS δ : eigengap λ 1 – λ 2 d: maximum outdegree of G Ng et al (2001) A bound on the number of hyperlinks k that can added or deleted from one page without affecting the authority or hubness weights It is possible to perturb a symmetric matrix by a quantity that grows as δ that produces a constant perturbation of the dominant eigenvector
16
16 Stability of PageRank The parameter ε of the mixture model has a stabilization role If the set of pages affected by the perturbation have a small rank, the overall change will also be small tighter bound by Bianchini et al (2001) δ(j) >= 2 depends on the edges incident on j Ng et al (2001) V: the set of vertices touched by the perturbation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.