Optimal Link Bombs are Uncoordinated Sibel Adali Tina Liu Malik Magdon-Ismail Rensselaer Polytechnic Institute.

Slides:



Advertisements
Similar presentations
Lecture 7. Network Flows We consider a network with directed edges. Every edge has a capacity. If there is an edge from i to j, there is an edge from.
Advertisements

Spread of Influence through a Social Network Adapted from :
CSE 5243 (AU 14) Graph Basics and a Gentle Introduction to PageRank 1.
1 The PageRank Citation Ranking: Bring Order to the web Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd Presented by Fei Li.
Link Analysis: PageRank
Information Retrieval Lecture 8 Introduction to Information Retrieval (Manning et al. 2007) Chapter 19 For the MSc Computer Science Programme Dell Zhang.
22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW) WWW2006 Workshop L EHIGH U NIVERSITY.
The influence of search engines on preferential attachment Dan Li CS3150 Spring 2006.
1 Evolution of Networks Notes from Lectures of J.Mendes CNR, Pisa, Italy, December 2007 Eva Jaho Advanced Networking Research Group National and Kapodistrian.
1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira.
On the Spread of Viruses on the Internet Noam Berger Joint work with C. Borgs, J.T. Chayes and A. Saberi.
More on Rankings. Query-independent LAR Have an a-priori ordering of the web pages Q: Set of pages that contain the keywords in the query q Present the.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
Distributed Algorithms for Secure Multipath Routing
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 3 March 23, 2005
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
Web as Graph – Empirical Studies The Structure and Dynamics of Networks.
The PageRank Citation Ranking “Bringing Order to the Web”
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1:
Link Analysis, PageRank and Search Engines on the Web
Scaling Personalized Web Search Glen Jeh, Jennfier Widom Stanford University Presented by Li-Tal Mashiach Search Engine Technology course (236620) Technion.
Link Analysis. 2 HITS - Kleinberg’s Algorithm HITS – Hypertext Induced Topic Selection For each vertex v Є V in a subgraph of interest: A site is very.
1 COMP4332 Web Data Thanks for Raymond Wong’s slides.
Computer Science 1 Web as a graph Anna Karpovsky.
Motivation When searching for information on the WWW, user perform a query to a search engine. The engine return, as the query’s result, a list of Web.
“ The Initiative's focus is to dramatically advance the means to collect,store,and organize information in digital forms,and make it available for searching,retrieval,and.
The PageRank Citation Ranking: Bringing Order to the Web Larry Page etc. Stanford University, Technical Report 1998 Presented by: Ratiya Komalarachun.
Presented By: - Chandrika B N
The PageRank Citation Ranking: Bringing Order to the Web Presented by Aishwarya Rengamannan Instructor: Dr. Gautam Das.
X-Informatics Web Search; Text Mining B 2013 Geoffrey Fox Associate Dean for.
MapReduce and Graph Data Chapter 5 Based on slides from Jimmy Lin’s lecture slides ( (licensed.
Graph-based Algorithms in Large Scale Information Retrieval Fatemeh Kaveh-Yazdy Computer Engineering Department School of Electrical and Computer Engineering.
Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.
CPSC 534L Notes based on the Data Mining book by A. Rajaraman and J. Ullman: Ch. 5.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:
1 Optimal Oblivious Routing in Hole-Free Networks Costas Busch Louisiana State University Malik Magdon-Ismail Rensselaer Polytechnic Institute.
1 Oblivious Routing in Wireless networks Costas Busch Rensselaer Polytechnic Institute Joint work with: Malik Magdon-Ismail and Jing Xi.
The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd Presented by Anca Leuca, Antonis Makropoulos.
The College of Saint Rose CSC 460 / CIS 560 – Search and Information Retrieval David Goldschmidt, Ph.D. from Search Engines: Information Retrieval in Practice,
Keyword Search in Databases using PageRank By Michael Sirivianos April 11, 2003.
Maximizing the Spread of Influence through a Social Network Authors: David Kempe, Jon Kleinberg, É va Tardos KDD 2003.
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
Predictive Ranking -H andling missing data on the web Haixuan Yang Group Meeting November 04, 2004.
Ranking Link-based Ranking (2° generation) Reading 21.
Lecture 10: Network models CS 765: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.
Information Retrieval and Web Search Link analysis Instructor: Rada Mihalcea (Note: This slide set was adapted from an IR course taught by Prof. Chris.
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
Ljiljana Rajačić. Page Rank Web as a directed graph  Nodes: Web pages  Edges: Hyperlinks 2 / 25 Ljiljana Rajačić.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Nadav Eiron, Kevin S.McCurley, JohA.Tomlin IBM Almaden Research Center WWW’04 CSE 450 Web Mining Presented by Zaihan Yang.
CS 440 Database Management Systems Web Data Management 1.
CS 540 Database Management Systems Web Data Management some slides are due to Kevin Chang 1.
A Sublinear Time Algorithm for PageRank Computations CHRISTIA N BORGS MICHAEL BRAUTBA R JENNIFER CHAYES SHANG- HUA TENG.
Random Walk for Similarity Testing in Complex Networks
The PageRank Citation Ranking: Bringing Order to the Web
The PageRank Citation Ranking: Bringing Order to the Web
WEB SPAM.
Search Engines and Link Analysis on the Web
CprE 458/558: Real-Time Systems
Link-Based Ranking Seminar Social Media Mining University UC3M
CSE 454 Advanced Internet Systems University of Washington
Finding Communities by Clustering a Graph into Overlapping Subgraphs
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
CS 440 Database Management Systems
PageRank algorithm based on Eigenvectors
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Web Information retrieval (Web IR)
COMPUTER NETWORKS CS610 Lecture-16 Hammad Khalid Khan.
The Impact of Changes in Network Structure on Diffusion of Warnings
Presentation transcript:

Optimal Link Bombs are Uncoordinated Sibel Adali Tina Liu Malik Magdon-Ismail Rensselaer Polytechnic Institute

2 Pagerank Pagerank algorithm models the behavior of a random surfer when at a specific web page v will either –Jump to a random page with probability (1-  ), or –Choose a link from page v uniformly and follow this link with probability . The pagerank p i of a page v i then models the probability of being at that page. It satisfies the following equation:

3 Link bombing A set of pages A = {v 1,…,v k } would like to boost the prominence of a page v 0  A. –The score of page v 0 with respect to a keyword query Q is computed by a combination of the number of times keywords in Q appear in page v 0, the number of times keywords in Q appear in links pointing to v 0, and the pagerank of v 0. –Pages are then sorted with respect to their scores and ranks are computed. –The only thing that the attacking pages can control is their own content and links. Problem: what is the best way to boost the rank of v 0 ?

4 Coordination of the attack To improve the rank of page v 0 for query Q, add links with keyword Q to page v 0. What is the best link structure for attack? –Is there a benefit to adding additional links among attackers to improve their pagerank? –How many links should be added to the attacked page?

5 Optimal attack The attack that optimally boosts the rank of a page with respect to pagerank is uncoordinated! –Attackers do not improve the effectiveness of their attack by adding links among themselves. –Attack improves as more attacking pages link to the attacked page. –The best attack by any page is to remove all outgoing links and only point to the attacked page. The number of links per page is not important in this case. –If there are other outgoing links, then as more links are added to the attacked page, the effectiveness of the attack will improve.

6 Why? Each new link (v j, v i ) introduces a new flow –Directs the pagerank of v j to v i –Any other outgoing link from v j diverts a portion of the flow away from v i –The highest flow is achieved with shortest path from the attacking page to the attacked page

7 Cycles Cycles improve the pagerank of a page due to the iterative nature of the algorithm –It is possible to visit the same page multiple times through cycles –The amplification of the pagerank at the attacked page through cycles is a monotonically increasing function of the increase in flow at the attacked page –Optimize the total flow to the attacked page with shortest route and least number of outgoing links

8 Rank The uncoordinated attack is also optimal in improving the rank of an attacked page with respect to its pagerank –The direct attack is best for page v i independent of what other attackers do –Suppose by contradiction the rank of attacked page v 0 is less than some other page u in another attack type A, then it is for the uncoordinated attack. It must be that pagerank of u is higher than pagerank of v 0 for attack A, but it is less in the uncoordinated attack. But this is not possible since the uncoordinated attack maximizes the pagerank increase of v 0.

9 Optimal disguised attack Suppose attackers want to hide by not pointing directly to the attacked page. –Choose among the pages with required anchor text to point to. –If the objective is to be a distance of L hops away from the attacked page, choose a single page to point to L-1 hops away that maximized the flow of pagerank from the attacked page to the victim. –The individual optimal attack is not necessarily the same as optimal joint attack.

10 Effect of keywords Bombing improves pagerank for all keyword queries. –When keywords in links are considered in the ranking, then the link bombing is particularly effective. –In general, the probability of the same text appearing in links pointing to the same page may be low, but much higher for attacks. What if pagerank was computed only for the graph induced by the given query? –The optimal attack is no longer the direct individual attack.

11 Experimental results Different graph types Random: Erdös-Reyni type random graph with edge probability 5(N-1) BA: Barabási-Albert, preferential attachment MWDTA: “Winners don’t take all”, BA higher probability of nodes with significant indegree and one outgoing link per node

12 Pagerank effectiveness of the uncoordinated attack Normalized discrepancy= (  p Uncoordinated /  p ) -(  p Cycle /  p )  p AttackType : the pagerank change of the attacked page  p : standard deviation of the pagerank distribution

13 Rank effectiveness of the uncoordinated attack

14 Conclusions Uncoordinated attack is best for pagerank –Any additional coordination reduces the impact of the attack –Participants in an attack may have no relationship with each other, making it harder to detect and prove –A ranking algorithm that favors hierarchical attacks would mean small groups should participate in a group structure for an effective attack Conditions resistant to attack –Dense, power-law graphs, victims with high rank, attackers with low rank

15 Conclusions Assumptions made by pagerank revisited –Random jump to any page while user does not know about them, pages with no outgoing links accumulate pagerank [Eiron, McCurley, Tomlin] –The probability to navigate from a page may be proportional to the page’s pagerank –The probability to use a link may be proportional to the pagerank of the destination page or to the page text properties [Chakrabarti et. al. ] How is an attack different than the popular opinion of the web citizens? –Size of the attacking group and their overall influence –How likely is it that a small number of unrelated pages use the same text in their link to the same page? –The analysis of group structure in attacks provide a new way of discussing the resistance of an algorithm to attacks.