CS522: Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian
Previously in this class Ranking using the hyperlink structure: HITS PageRank
Today Dealing with web spam An axiomatic approach to PageRank Next Lecture: Kamal Jain
Recap The PageRank of a page p is the probability of p in the stationary distribution of a random walk that in each stage with probability 1 – ε follows a random link from the current page, and with probability ε, starts from a random page. Typically, ε = 0.15.
The collusion problem What if a group of nodes “collude” to increase the PageRank of one or more in the group? Zhang, Goel, Govindan, Mason, and Van Roy, WAW Define “amplification” of a group of nodes, and prove that it is always at most O(1/ ε).
The collusion problem Question: Is collusion really a problem? Experiment (on a web subgraph, and blogstreet): Take, say, the 1000 th and the 1001 th nodes in the PageRank order. Each of these nodes removes all links to other pages, and adds a link to the other. Compute PageRanks in the new graph. Results: Ranks of the colluding nodes increase significantly. Exercise: Go to eBay and search for PageRank.
Finding colluding groups Approach 1: Find a set S with the largest amplification. However, it can be shown that this problem is NP-hard.
Finding colluding nodes Approach 2: Identify colluding individuals Observation: If we increase ε, the PageRank of a colluding individual decreases (often proportional to 1/ ε). Heuristic: Compute PageRanks for multiple values of ε, and compute the correlation of the PageRank of each node with 1/ ε. Nodes with high correlation are probably colluding.
Dealing with collusion We can “punish” colluding individuals by increasing their ε, so that they cannot pass their reputation on to others. Experimental results
Explaining PageRank Axiomatic approach Define a set of “natural” axioms Prove that PageRank satisfies these axioms Prove that any page ranking algorithm satisfying these axioms outputs the same ranking as PageRank
Axiomatic Approaches: Voting Consider a democracy where people submit preference lists over candidates. A voting rule (or social welfare function) outputs a global ordering of candidates for every set of preference lists.
Voting Axioms Unanimity: If everyone prefers the candidate x to y, then the global ordering also ranks x above y. Independence of irrelevant alternatives (IIA): For any two candidates x and y, changes in people’s rankings of candidates other than x and y should not affect the relative position of x and y in the global ordering.
Arrow’s (Im)possibility Theorem Theorem [Arrow, 1951]: The only function satisfying unanimity and IIA is dictatorship. Extensions Similar results hold for social choice functions where a single candidate (winner) must be chosen [Muller-Satterthwaite, 1977] Majority rule arises naturally when we relax IIA or restrict the preference domain of people (i.e., impose rules on how they can rank candidates).
Axiomatic Approach: PageRank Agents are nodes of graph. Agents output a “vote” over other agents as represented by a directed graph G. A ranking algorithm is a function mapping every directed graph to an ordering of its nodes.
PageRank Axioms Isomorphism: The ranking procedure should be independent of the names of the nodes. Self edge: Adding self loops should not harm a node and should not affect other nodes. Vote by committee: Importance a gives to b and c by voting shouldn’t change if a votes via committee. Collapsing: If two nodes vote similarly, and are linked to by disjoint sets of nodes, the ranking does not change when they are collapsed to one node. Proxy: There is an equal distribution of importance.
PageRank: Altman and Tennenholtz Theorem: PageRank satisfies axioms. Theorem: PageRank is only ranking algorithm which satisfies axioms (i.e., every other ranking algorithm which satisfies axioms outputs same ranking as PageRank).