Download presentation
Presentation is loading. Please wait.
Published byStephen Stafford Modified over 6 years ago
1
The Efficacy of Collusions in Web Ranking and the Countermeasurements
Hui Zhang University of Southern California
2
Outline Problem Statement. PageRank algorithm : a brief introduction.
10/14/2018 Outline Problem Statement. PageRank algorithm : a brief introduction. Study of PageRank’s robustness to collusion. Adaptive-resetting: make PageRank robust to collusion. Conclusions. 10/14/2018 USC CS599 P2Peco
3
Search Engine Optimization (SEO)
10/14/2018 Search Engine Optimization (SEO) Not different from other research works on P2P rating, our research goal 10/14/2018 USC CS599 P2Peco
4
Web spam [Gyongyin et al. 2004]
10/14/2018 Web spam [Gyongyin et al. 2004] Web spamming refers to actions intended to mislead search engines and give some pages higher ranking than they deserve. A spammer will play with two factors which decide the rank score of a page in a query: Relevance – textual similarity between the query and a page. Importance – the global popularity of a page, which is query-independent. Not different from other research works on P2P rating, our research goal 10/14/2018 USC CS599 P2Peco
5
Collusion in Web ranking
10/14/2018 Collusion in Web ranking A manipulation of the hyperlink structure by a group of users with the intention of improving the rating one or more users in the group. Not different from other research works on P2P rating, our research goal 10/14/2018 USC CS599 P2Peco
6
10/14/2018 PageRank [Brin1998] An eigenvector-based rating scheme to rank hypertext documents on the WWW. An iterative algorithm to calculate the importance of a web page based on the importance of its parent pages. Can be applied to other systems than WWW. 10/14/2018 USC CS599 P2Peco
7
PageRank: random walk model
10/14/2018 PageRank: random walk model With prob. (1-), I will continue the walk to a random successor node. : resetting probability node With prob. , I will restart the walk at a random node. : resetting probability referential link The walker X 1/2 1/3 Y Z As time goes on, the expected percentage of steps the walker is at each node v converges to the PageRank weight PR(v). 10/14/2018 USC CS599 P2Peco
8
PageRank: is it collusion-proof?
10/14/2018 PageRank: is it collusion-proof? Can a node easily boost its rank by manipulating its out-going links with others’? I’m not colluding! 10/14/2018 USC CS599 P2Peco
9
Amp(G): a metric on group collusion
10/14/2018 Amp(G): a metric on group collusion x y G G’ i j : resetting probability WG(G’) =PR(i)+PR(j) real group weight PR(x) 3 (1-) PR(y) 2 4 + (1-) Win(G’) = + 2 N (1-W(G’)) “actual” group weight In the system of node group G, for a subgroup G’, the amplification factor Amp(G’) = 10/14/2018 USC CS599 P2Peco
10
Answer for (1+1 = ?) in PageRank
10/14/2018 Answer for (1+1 = ?) in PageRank In the original PageRank system, where is the resetting probability. 10/14/2018 USC CS599 P2Peco
11
Two experimental topologies
10/14/2018 Two experimental topologies W, a Web link topology Contains the link structure of upwards of 80 million URLs. Source: the Stanford WebBase. B, a weblog blogrolling topology Contains the blogrolling structure of upwards of 72,000 blogs. Source: the XML-RPC webblog service. 10/14/2018 USC CS599 P2Peco
12
Experiment 1: Collusion200
10/14/2018 Experiment 1: Collusion200 Model a small number of web pages simultaneously colluding. Methodology: 100 colluding groups of 200 nodes; Each colluding group has the circle topology consisting of two nodes with adjacent ranks; Arbitrarily chose node pairs originally ranked around 1000th, 2000th, …, th. = 0.15. (100th, 200th, …, 10000th for B due to the smaller graph size) 10/14/2018 USC CS599 P2Peco
13
Experiment result of Collusion200 (I)
10/14/2018 Experiment result of Collusion200 (I) Figure 1: W - Amplification factors of the colluding groups in Collusion200. 10/14/2018 USC CS599 P2Peco
14
Experiment result of Collusion200 (III)
10/14/2018 Experiment result of Collusion200 (III) Old rank: th New rank: 5038th Old rank: 10001th New rank: 450th Old rank: 1005th New rank: 67th Figure 2: W – new PR rank after Collusion200. 10/14/2018 USC CS599 P2Peco
15
There is a long flat portion…
10/14/2018 There is a long flat portion… Figure 3: The PR weight distribution of 4 topologies. 10/14/2018 USC CS599 P2Peco
16
Next step: how to detect collusions?
10/14/2018 Next step: how to detect collusions? Identifying colluding groups is unlikely to be computationally tractable. The densest k-subgraph problem[Feige et al. 1997]. The classical CLIQUE problem. The problem of finding hiding large cliques in random graphs[Juels 1998]. 10/14/2018 USC CS599 P2Peco
17
Hardness on Amp Theorem on Hardness.
10/14/2018 Hardness on Amp Theorem on Hardness. Max G’G Amp(G’) is a NP-Hard problem. 10/14/2018 USC CS599 P2Peco
18
How about using finer statistics of the random walk
10/14/2018 How about using finer statistics of the random walk The revisit intervals of the random walk on a colluding node will likely to have a large variance compared to its expectation. Figure E: A counterexample: a star+dangling circle topology 1 2 N N+1 N-1 N-2 10/14/2018 USC CS599 P2Peco
19
An observation on collusion behaviors
10/14/2018 An observation on collusion behaviors To increase their PR weight, i.e., the stationary weight in the random walk, the colluding nodes will stall the random walk. G G’ When the resetting probability increases, the colluding nodes must suffer a significant drop in PR weight. Therefore, we expect the PR weight of colluding nodes to be highly correlated with 1/ (the average walk length), while that of non-colluding nodes is relatively insensitive to the change in . 10/14/2018 USC CS599 P2Peco
20
An intuitive example node referential link 10/14/2018 USC CS599
P2Peco
21
An intuitive example node referential link A colluding group
10/14/2018 An intuitive example node referential link A colluding group 10/14/2018 USC CS599 P2Peco
22
10/14/2018 An intuitive example A colluding node x: PR(x) = , and co-co(PR(x), 1/ ) 1. (co-co: correlation coefficient) A non-colluding node y: PR(x) = , and co-co(PR(y), 1/ ) 0. x y N: the system size; K: the colluding group size; K << N. node referential link A colluding group 10/14/2018 USC CS599 P2Peco
23
Adaptive-resetting scheme
10/14/2018 Adaptive-resetting scheme Part I – collusion detection: Given the topology, calculate the PR vector under different values. {} = {0.0375, 0.05, 0.075, 0.15, 0.3, 0.45, 0.6}, default = 0.15. Calculate the correlation coefficient between the curve of each node x's PR weight and the curve of 1/ . Label it as co-co(x). Part II – personalization: Calculate each node x's out-link personalized- = F(default, co-co(x)). Exponential function FExp= Linear function FLinear= default+(0.5-default)*co-co(x) The final PR weight vector is calculated with these personalized resetting values. 10/14/2018 USC CS599 P2Peco
24
Experiment result of Collusion200 (IV)
10/14/2018 Experiment result of Collusion200 (IV) Figure 5: W - Amplification factors of the colluding groups in Collusion200. 10/14/2018 USC CS599 P2Peco
25
Experiment result of Collusion200 (VI)
10/14/2018 Experiment result of Collusion200 (VI) Figure 6: W – new PR rank after Collusion200. 10/14/2018 USC CS599 P2Peco
26
Experiment 2: Collusion22
10/14/2018 Experiment 2: Collusion22 Model various colluding subgraphs. Methodology: 3 colluding groups: node referential link (100th, 200th, …, 10000th for B due to the smaller graph size) G1: 10-node ring G2: 10-node star topology G3: 2-node ring 10/14/2018 USC CS599 P2Peco
27
Experiment result of Collusion22 (I)
10/14/2018 Experiment result of Collusion22 (I) Figure 7: Amplification factors of the 3 colluding groups in Collusion22. 10/14/2018 USC CS599 P2Peco
28
Experiment result of Collusion22 (II)
10/14/2018 Experiment result of Collusion22 (II) Figure 8: W – new PR weight after Collusion22. 10/14/2018 USC CS599 P2Peco
29
New top-25 URL list in W Dropped out Dropping New 10/14/2018 USC CS599
P2Peco
30
10/14/2018 Conclusions Simple collusions lead to effective Web ranking improvement. A simple scheme based on PageRank algorithm effectively counteracts Web ranking collusions. 10/14/2018 USC CS599 P2Peco
31
Backup slides 10/14/2018 USC CS599
32
Reputation systems [Okita2003]
10/14/2018 Reputation systems [Okita2003] A means of describing social trust networks. The basic concept is a democratic meritocracy. A rating system is used to evaluate individual members, and those results are then collated to produce a consensus about the merit of any given member. Examples: Livejournal, Friendster, eBay, Advogato 10/14/2018 USC CS599 P2Peco
33
PageRank algorithm [Brin1998]
10/14/2018 PageRank algorithm [Brin1998] Assume N pages. Assign all pages the initial value 1/N Let Nu be the out-degree of Page u, Rank(v) the importance of Page v, Bv the set of pages pointing to v. Basic algorithm v Rank(v) = Enhanced algorithm against rank sinks v Rank(v) = : damping factor 10/14/2018 USC CS599 P2Peco
34
Co-co distribution in real-world graphs
10/14/2018 Figure 4: the co-co PDF distribution in W and B: the [0, 0.1] range actually corresponds to [-1, 0.1] range. 10/14/2018 USC CS599 P2Peco
35
Experiment result of Collusion200 (II)
10/14/2018 Experiment result of Collusion200 (II) Figure A: W – new PR weight after Collusion200. 10/14/2018 USC CS599 P2Peco
36
Experiment result of Collusion200 (VII)
10/14/2018 Experiment result of Collusion200 (VII) Figure B: B – new PR rank after Collusion200 10/14/2018 USC CS599 P2Peco
37
Experiment result of Collusion200 (X)
10/14/2018 Experiment result of Collusion200 (X) Figure C: B – new PR weight after Collusion200 10/14/2018 USC CS599 P2Peco
38
Experiment result of Collusion200 (V)
10/14/2018 Experiment result of Collusion200 (V) Figure 6: W – new PR weight after Collusion200. 10/14/2018 USC CS599 P2Peco
39
Correlation coefficient
10/14/2018 Correlation coefficient 10/14/2018 USC CS599 P2Peco
40
Experiment result of Collusion22 (III)
10/14/2018 Experiment result of Collusion22 (III) Figure D: W – new PR rank after Collusion22. 10/14/2018 USC CS599 P2Peco
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.