Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 DiffusionRank: A Possible Penicillin for Web Spamming Haixuan Yang Group Meeting Jan. 16, 2006.

Similar presentations


Presentation on theme: "1 DiffusionRank: A Possible Penicillin for Web Spamming Haixuan Yang Group Meeting Jan. 16, 2006."— Presentation transcript:

1 1 DiffusionRank: A Possible Penicillin for Web Spamming Haixuan Yang Group Meeting Jan. 16, 2006.

2 2 Outline  Introduction  DiffusionRank Model Establishment Computation consideration Discussion on γ  Results  Conclusions

3 3 Introduction  PageRank Tries to find the importance of a Web page based on the link structure. The importance of a page i is defined recursively in terms of pages which point to it: It proves to be effective for ranking Web pages.

4 4 Introduction  PageRank Two problems:  The incomplete information about the Web structure. Solution: predict the Web Structure as a random graph.  The web pages manipulated by people for commercial interests. About 70% of all pages in the.biz domain are spam About 35% of the pages in the.us domain belong to spam category. Two methods used for manipulating spam pages  Link Stuffing  Keyword Stuffing Solution: DiffusionRank

5 5 An example for manipulation The rank value of node 1 can be increased greatly!

6 6 Why?  Two reasons Over-democratic  All pages are born equal--equal voting ability of one page: the sum of each column is equal to one. Input-independent  For any given non-zero initial input, the iteration will converge to the same stable distribution.  Heat Diffusion Model -- a natural way to avoid these two factors Pages are not equal as some pages are born with high temperatures while others are born with low temperatures. Different initial temperature distributions will give rise to different temperature distributions after a fixed time period.

7 7 DiffusionRank  On an undirected graph Assumption: the amount of the heat flow from j to i is proportional to the heat difference between i and j. Solution:

8 8 DiffusionRank  On an undirected graph Assumption: the amount of the heat flow from j to i is proportional to the heat difference between i and j. Solution:  On a directed graph Assumption: there is extra energy imposed on the link (j, i) such that the heat flow only from j to i if there is no link (i,j). Solution:  On a random directed graph Assumption: the heat flow is proportional to the probability of the link (j,i). Solution:

9 9 DiffusionRank  On a random directed graph Solution: The initial value f(i,0) in f(0) is set to be 1 if i is trusted and 0 otherwise according to the inverse PageRank.

10 10 Computation consideration  Approximation of heat kernel  N=? When N>=30, the real eigenvalues of are less than 0.01; when N>=100, they are less than 0.005. We use N=100 in the paper. When N tends to infinity

11 11 Discuss γ  γcan be understood as the thermal conductivity.  When γ=0, the ranking value is most robust to manipulation since no heat is diffused, but the Web structure is completely ignored;  When γ= ∞, DiffusionRank becomes PageRank, it can be manipulated easily.  Whenγ=1, DiffusionRank works well in practice

12 12 DiffusionRank  Advantages Can detect Group-group relations Can cut Graphs Anti-manipulation +1 γ= 0.5 or 1

13 13 DiffusionRank  Experiments Data:  a toy graph (6 nodes)  a middle-size real-world graph ( 18542 nodes)  a large-size real-world graph crawled from CUHK ( 607170 nodes) Compare with TrustRank and PageRank

14 14 Results  The tendency of DiffusionRank when γ becomes larger  On the toy graph

15 15 Anti-manipulation On the toy graph

16 16 Anti-manipulation on the middle graph and the large graph

17 17 Stability--the order difference between ranking results for an algorithm before it is manipulated and those after that

18 18 Conclusions  This anti-manipulation feature enables DiffusionRank to be a candidate as a penicillin for Web spamming.  DiffusionRank is a generalization of PageRank (when γ=∞).  DiffusionRank can be employed to detect group-group relation.  DiffusionRank can be used to cut graph.


Download ppt "1 DiffusionRank: A Possible Penicillin for Web Spamming Haixuan Yang Group Meeting Jan. 16, 2006."

Similar presentations


Ads by Google