Presentation is loading. Please wait.

Presentation is loading. Please wait.

Link Analysis. 2 HITS - Kleinberg’s Algorithm HITS – Hypertext Induced Topic Selection For each vertex v Є V in a subgraph of interest: A site is very.

Similar presentations


Presentation on theme: "Link Analysis. 2 HITS - Kleinberg’s Algorithm HITS – Hypertext Induced Topic Selection For each vertex v Є V in a subgraph of interest: A site is very."— Presentation transcript:

1 Link Analysis

2 2 HITS - Kleinberg’s Algorithm HITS – Hypertext Induced Topic Selection For each vertex v Є V in a subgraph of interest: A site is very authoritative if it receives many citations. Citation from important sites weight more than citations from less-important sites Hubness shows the importance of a site. A good hub is a site that links to many authoritative sites a(v) - the authority of v h(v) - the hubness of v

3 3 Authority and Hubness 2 3 4 1 1 5 6 7 a(1) = h(2) + h(3) + h(4) h(1) = a(5) + a(6) + a(7)

4 4 Authority and Hubness Convergence Recursive dependency : a(v)  Σ h(w) h(v)  Σ a(w) Using Linear Algebra, we can prove: w Є pa[v] w Є ch[v] a(v) and h(v) converge

5 5 Authority and Hub Pages Matrix representation of operations I and O. Let A be the adjacency matrix of SG: entry (p, q) is 1 if p has a link to q, else the entry is 0. Let A T be the transpose of A. Let h i be vector of hub scores after i iterations. Let a i be the vector of authority scores after i iterations. Operation I: a i = A T h i-1 Operation O: h i = A a i Normalize after every multiplication

6 6 Does the procedure converge? x x2x2 xkxk As we multiply repeatedly with M, the component of x in the direction of principal eigen vector gets stretched wrt to other directions.. So we converge finally to the direction of principal eigenvector Necessary condition: x must have a component in the direction of principal eigen vector (c 1 must be non-zero) The rate of convergence depends on the “eigen gap”

7 7 HITS Example {1, 2, 3, 4} - nodes relevant to the topic Expand the root set R to include all the children and a fixed number of parents of nodes in R  A new set S (base subgraph)  Start with a root set R {1, 2, 3, 4} Find a base subgraph:

8 8 HITS Example BaseSubgraph( R, d) 1. S  r 2. for each v in R 3. do S  S U ch[v] 4. P  pa[v] 5. if |P| > d 6. then P  arbitrary subset of P having size d 7. S  S U P 8. return S

9 9 HITS Example HubsAuthorities(G) 1 1  [1,…,1] Є R 2 a  h  1 3 t  1 4 repeat 5 for each v in V 6 do a (v)  Σ h (w) 7 h (v)  Σ a (w) 8 a  a / || a || 9 h  h / || h || 10 t  t + 1 11 until || a – a || + || h – h || < ε 12 return (a, h ) Hubs and authorities: two n-dimensional a and h 00 t t t t t t t t t t tt t -1 w Є pa[v] |V|

10 10 HITS Example Results Authority Hubness 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Authority and hubness weights

11 11 PageRank Introduced by Page et al (1998)  The weight is assigned by the rank of parents Difference with HITS  HITS takes Hubness & Authority weights  The page rank is proportional to its parents’ rank, but inversely proportional to its parents’ outdegree

12 12 Matrix Notation P is the adjacency matrix based on the link structure of the Internet.

13 13 Problem “Rank Sink” Problem  In general, many Web pages have no inlinks/outlinks  It results in dangling edges in the graph E is the matrix of all ones Interpretation: In each time step the surfer visiting a page will jump to a random page With a probability

14 14 Stability Whether the link analysis algorithms based on eigenvectors are stable in the sense that results don’t change significantly? The connectivity of a portion of the graph is changed arbitrary  How will it affect the results of algorithms?

15 15 Stability of HITS δ : eigengap λ 1 – λ 2 d: maximum outdegree of G Ng et al (2001) A bound on the number of hyperlinks k that can added or deleted from one page without affecting the authority or hubness weights It is possible to perturb a symmetric matrix by a quantity that grows as δ that produces a constant perturbation of the dominant eigenvector

16 16 Stability of PageRank The parameter ε of the mixture model has a stabilization role If the set of pages affected by the perturbation have a small rank, the overall change will also be small tighter bound by Bianchini et al (2001) δ(j) >= 2 depends on the edges incident on j Ng et al (2001) V: the set of vertices touched by the perturbation


Download ppt "Link Analysis. 2 HITS - Kleinberg’s Algorithm HITS – Hypertext Induced Topic Selection For each vertex v Є V in a subgraph of interest: A site is very."

Similar presentations


Ads by Google