Ranking Systems: Manipulability and Efficiency Eric Friedman, ORIE Cornell University (Currently visiting: Dept of CS, U.C. Berkeley, 2005-6) Work supported.

Slides:

Advertisements

Similar presentations

~1~ Infocom’04 Mar. 10th On Finding Disjoint Paths in Single and Dual Link Cost Networks Chunming Qiao* LANDER, CSE Department SUNY at Buffalo *Collaborators:

Advertisements

TrustRank Algorithm Srđan Luković 2010/3482

Introduction to Markov Random Fields and Graph Cuts Simon Prince

MS&E 211, Lecture 11 The dual of Min-Cost Flow Ashish Goel.

Trust and Profit Sensitive Ranking for Web Databases and On-line Advertisements Raju Balakrishnan (Arizona State University)

An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.

DATA MINING LECTURE 12 Link Analysis Ranking Random walks.

1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.

The Cache Location Problem IEEE/ACM Transactions on Networking, Vol. 8, No. 5, October 2000 P. Krishnan, Danny Raz, Member, IEEE, and Yuval Shavitt, Member,

Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol

CS246: Page Selection. Junghoo "John" Cho (UCLA Computer Science) 2 Page Selection Infinite # of pages on the Web – E.g., infinite pages from a calendar.

The Page Rank Axioms Based on Ranking Systems: The PageRank Axioms, by Alon Altman and Moshe Tennenholtz. Presented by Aron Matskin.

Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.

LinkSelector: A Web Mining Approach to Hyperlink Selection for Web Portals Xiao Fang University of Arizona 10/18/2002.

Validating Streaming XML Documents Luc Segoufin & Victor Vianu Presented by Harel Paz.

CSE 321 Discrete Structures Winter 2008 Lecture 25 Graph Theory.

The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.

CS522: Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian

Complexity ©D.Moshkovitz 1 Paths On the Reasonability of Finding Paths in Graphs.

Tirgul 7 Review of graphs Graph algorithms: – BFS (next tirgul) – DFS – Properties of DFS – Topological sort.

TECH Computer Science Graph Optimization Problems and Greedy Algorithms Greedy Algorithms  // Make the best choice now! Optimization Problems  Minimizing.

CS246 Link-Based Ranking. Problems of TFIDF Vector  Works well on small controlled corpus, but not on the Web  Top result for “American Airlines” query:

The effect of New Links on Google Pagerank By Hui Xie Apr, 07.

Copyright © Cengage Learning. All rights reserved. 8 Tests of Hypotheses Based on a Single Sample.

Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405.

March 8, 2006  Yvo Desmedt Robust Operations Research II: Production Networks by Yvo Desmedt University College London, UK.

1 Efficiency and Nash Equilibria in a Scrip System for P2P Networks Eric J. Friedman Joseph Y. Halpern Ian Kash.

Random Walks and Semi-Supervised Learning Longin Jan Latecki Based on : Xiaojin Zhu. Semi-Supervised Learning with Graphs. PhD thesis. CMU-LTI ,

Terminodes and Sybil: Public-key management in MANET Dave MacCallum (Brendon Stanton) Apr. 9, 2004.

Reputations Based On Transitive Trust Slides by Josh Albrecht.

1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:

More on Social choice and implementations 1 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A Using slides by Uri.

Hybrid Transitive Trust Mechanisms Jie Tang, Sven Seuken, David C. Parkes UC Berkeley, Harvard University,

Influence Maximization in Dynamic Social Networks Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang, Xiaoming Sun.

DATA MINING LECTURE 13 Pagerank, Absorbing Random Walks Coverage Problems.

Center for E-Business Technology Seoul National University Seoul, Korea BrowseRank: letting the web users vote for page importance Yuting Liu, Bin Gao,

Adaptive On-Line Page Importance Computation Serge, Mihai, Gregory Presented By Liang Tian 7/13/2010 1Adaptive On-Line Page Importance Computation.

Challenges and Opportunities Posed by Power Laws in Network Analysis Bruno Ribeiro UMass Amherst MURI REVIEW MEETING Berkeley, 26 th Oct 2011.

Personalized Social Recommendations – Accurate or Private? A. Machanavajjhala (Yahoo!), with A. Korolova (Stanford), A. Das Sarma (Google) 1.

Optimal Link Bombs are Uncoordinated Sibel Adali Tina Liu Malik Magdon-Ismail Rensselaer Polytechnic Institute.

Data Structures & Algorithms Graphs

Graph Colouring L09: Oct 10. This Lecture Graph coloring is another important problem in graph theory. It also has many applications, including the famous.

Predictive Ranking -H andling missing data on the web Haixuan Yang Group Meeting November 04, 2004.

C&O 355 Mathematical Programming Fall 2010 Lecture 16 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.

The Structure of the Web. Getting to knowing the Web How big is the web and how do you measure it? How many people use the web? How many use search engines?

CS 103 Discrete Structures Lecture 13 Induction and Recursion (1)

1 CS612 Algorithms for Electronic Design Automation CS 612 – Lecture 8 Lecture 8 Network Flow Based Modeling Mustafa Ozdal Computer Engineering Department,

C&O 355 Lecture 24 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A A A A A A A.

Vasilis Syrgkanis Cornell University

KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.

Privacy Preserving in Social Network Based System PRENTER: YI LIANG.

Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.

Lecture. Today Problem set 9 out (due next Thursday) Topics: –Complexity Theory –Optimization versus Decision Problems –P and NP –Efficient Verification.

Markov Random Fields in Vision

Written By: Presented By: Swarup Acharya,Amr Elkhatib Phillip B. Gibbons, Viswanath Poosala, Sridhar Ramaswamy Join Synopses for Approximate Query Answering.

Random Sampling Algorithms with Applications Kyomin Jung KAIST Aug ERC Workshop.

TU/e Algorithms (2IL15) – Lecture 8 1 MAXIMUM FLOW (part II)

Balaji Prabhakar Departments of EE and CS Stanford University

A paper on Join Synopses for Approximate Query Answering

June 2017 High Density Clusters.

Modeling, sampling, generating Networks with MRV

The Efficacy of Collusions in Web Ranking and the Countermeasurements

Chapter 5. Optimal Matchings

Data Integration with Dependent Sources

Instructor: Shengyu Zhang

Making Eigenvector-based Reputation Systems Robust to Collusion

3.5 Minimum Cuts in Undirected Graphs

Balaji Prabhakar Departments of EE and CS Stanford University

Graph-based Security and Privacy Analytics via Collective Classification with Joint Weight Learning and Propagation Binghui Wang, Jinyuan Jia, and Neil.

Junghoo “John” Cho UCLA

Presentation transcript:

Ranking Systems: Manipulability and Efficiency Eric Friedman*, ORIE Cornell University (Currently visiting: Dept of CS, U.C. Berkeley, ) *Work supported by NSF. ITR

Ranking and Reputations Reputations are important –Webpage ranking: links are “recommendations” High ranks lead to more “clicks” –P2P: choosing partners –Ebay: reputations are crucial (and quite valuable). Higher reputations lead to higher prices –PGP: web of trust. –Spam and DDoS protections

Problems with Reputation Systems Gaming reputation systems is becoming a serious problem. –P2P: Kazaa-lite –Webpage ranking: link spamming Note: most (all?) current reputation systems are ad-hoc –No formal requirements etc.

A research agenda: Understanding the tradeoffs between manipulability and efficiency 1)Quantify the manipulability of ranking systems. 2)Quantify the efficiency of ranking systems. 3)Find the ranking systems that are on the efficient frontier and maximize various objectives.

Today’s talk (some first steps) A framework for manipulability (w/Alice Cheng) –Characterization of manipulability of ranking systems. Empirical analysis of PageRank on the WWW (w/Alice Cheng) Evaluating the Efficiency of ranking mechanisms (work in progress)

Part I: Goals and Approach Our goal: create a formalism for analyzing and designing reputation systems that are robust to attacks. –Here we focus on sybils, but although this is important in itself, our goals are much broader. Note: the definitions were harder than the proofs. Approach: Game theory, mechanism design (i.e., Arrows Theorem)

Trust Graphs Most reputation systems use trust graphs: –G=(V,E) –e=(i,j) then T(e) = i’s (direct) trust of j. –higher T(e) is better Reputation function: f(G) i = reputation of i. Rank: i outranks j if f(G) i >f(G) j –Note: we focus on rank Why use a trust graph? –Many (most?) interactions are 1 st time interactions (i,j)  E

Some Representative Reputation Systems Pagerank and related systems (Brin and Page 98, Kleinberg 98, Guha et. al. 04) –Start at an arbitrary node and then take a random walk on the graph. Flow methods (e.g., Flake et. al. 02, Chuang and Stoica 02) –Compute the max flow from i to j. Shortest path method. –Let c(e)=1/T(e) then find the shortest path from i to j in terms of c’s.

Pagerank = Random Walk on Graph

Maxflow = compute flow from a chosen source to a node s t

Shortest Path s t

Sybils A single “agent” can replicate itself under a variety of pseudonyms.

Sybil Attacks Sybils are essentially unavoidable (Douceur 02) Sybil clouds can forge trust among each other. –Using strong cryptography to prevent them is expensive and awkward.

Sybils in Practice Web ranking: Create a large number of dummy websites and then all link to each other. P2P: create a large number of peers and then give each other high ratings Ebay: fake transactions with yourself. Amazon shopping: post high evaluations of your own products.

Robustness Against Sybils Pagerank: not robust. –Empirically, can increase pageranks dramatically with a few sybils. (more later) Max-flow: value robust but not rank robust. Shortest path: robust.

Robustness: Pagerank Pagerank: not robust.

Robustness: Pagerank Pagerank: not robust. –Create a “flower”

Robustness: Maxflow Max-flow: Designed for value robustness –Flow into and out of sybil cloud cannot be changed! s Sybil Cloud Min cut

Robustness: Maxflow Max-flow: not rank robust –b is higher ranked than a a b [1.2] [1] Min cut

Robustness: Maxflow Max-flow: not rank robust –a is higher ranked than b a b [0.5] [1]

Robustness: Shortest Path Shortest path: robust –a is higher ranked than b a b c=1 c=3 c=1 [2] [1]

Robustness: Shortest Path Shortest path: robust –a is higher ranked than b –a can harm b, but a is already higher ranked than b –b cannot hurt a, since it is not on the shortest path to a a b c=1 c=3 [3][3] [1]

Sybilproofness Def: A sybil strategy for node i in G=(V,E) is G’=(V’,E’) and U’  V’, such that by collapsing U’, G is obtained. (T’s are added together) Def: f is k-sybilproof if there does not exist any pair of nodes i,j and a sybil strategy for i such that f(G) i f(G) j for r  U and |U’|  k+1. Def: f is sybilproof if it is k- sybilproof for all k>0. Key: sybils can only forge recommendations among each other.

Results: Symmetric Reputations Def: A reputation function is symmetric if it is covariant under graph isomorphism. Theorem: There is no nontrivial symmetric sybilproof mechanism. –In fact, for any G, any node (except the top one) can improve their ranking via sybils Theorem: There is no nontrivial symmetric k- sybilproof mechanism, for any k  1. –(How often this occurs for small k is open.)

Proof (via the butterfly) js i G U’ Sybilproofness: by symmetry, f(G’) j =f(G’) s K-sybilproofness: build G’ one sybil at a time

Results: Non-Symmetric Theorem: There exist sybilproof reputation functions. (e.g., shortest path) Def: Given a root node s  V, let P be the set of all collections of edge disjoint paths* from s to i. Let g be a function from paths to reals and  be an (addition-like) operator on the reals.

Results: Non-Symmetric Let f(G) i =max {P  P }  {p  P} g(p) Max flow: g(p)=min{T(e)|e  p},  =+ Shortest path:g(p)=min{  T(e)|e  p},  =min Other generalizations –Leaky pipes etc.

Results: Non-Symmetric Theorem: f as defined above is value sybilproof assuming –If p’ is an extension of p, then g(p’)<g(p). –  is nondecreasing and g is nondecreasing with respect to T. –If p=p’+p’’ then g(p)=g(p’)  g(p’’)

Results: Non-Symmetric Theorem: f as defined above is rank sybilproof iff  =max, assuming: –For any p there exist an extension p’ such that g(p)=g(p’). I.e., f depends on the maximal path.

Summary (Part I) A framework for the analysis of the manipulability of ranking systems. Key distinction: rank vs. value Result 1: all symmetric ranking systems are manipulable. Result 2: “flow based” ranking systems are not value manipulable but are rank manipulable. Result 3: “path based” ranking systems are not manipulable.

Part II: Empirical Analysis of PageRank (Joint with Alice Cheng) (Inspired by Zhang et. al. on collusion) Stanford web matrix -- ~280k pages. Question:How often are a small number of sybils helpful? Answer: Surprisingly often!

Value Magnification: 1 sybil

Value Magnification – by # of sybils

Rank as a function of old Rank -- 1-Sybil

Effect of  on values

 on ranks

Summary of Empirical Analytic approximations for these. PageRank is quite manipulable –Especially for low ranked pages (but that’s where automated methods are supposed to work!)

Part III: Quantifying the Efficiency of Ranking Mechanisms Work in progress – some preliminary results. Is FlowRank or PageRank better than PathRank?

Model Random graph model (descriptive, not constructive) Follow the intuition behind pagerank –Pages link more to “better pages” –Better pages are more selective. –Pr(link)=f(q i,q j ) Increasing in q j FOSD in q i –Average outdegree = k, (n  ∞) –(many results have k  ∞, and miss important aspects of ranking.)

Finding “Baddies” 2 layer example: –½ nodes are H and ½ L –L’s link uniformly at random –H’s link to H with (relative) probability (1+a) and to L’s with (1-a). –a=0, random graph –a=1, two tiered graph

Statistical Inference Now, ranking is a problem of statistical inference –G is a random variable –r is a statistical estimate of true qualities –Note: unlike most inference problems we only have a single sample

3 methods PageRank InRank: rank by indegree MLRank: compute a maximum likelihood estimate.

Results Pr(error)=Pr(r i >r j |q i <q j ) InRank: difference of Poissons PageRank: two stage calculation –First by quality then statistical manipulations of PageRank equations. MLRank: find a subgraph with the maximal number of edges. –NP complete –Implemented a greedy algorithm

Results Pr(error) a PageRank InRank MLRank PageRank InRank MLRank

Results InRank better than PageRank when graph is close to random and vice versa. (General Theorem) Differences can be significant! MLRank is significantly better.

Some Intuition Case a=0 (Sketch -- ignoring special cases) PageRank –r j ’s are iid (in limit) InRank Theorem: PageRank is more random. (But, also need to consider expected values)

Concluding Comments Reputation systems should be designed from requirements and subject to formal validation. –Ex: What problem does pagerank solve? How well does it do it? –Ex: Why is Flowrank better than Pathrank? Is it? When and why? Aside: fighting link spam –Results show that most of the proposed methods can be defeated! –Perhaps they work so well because they are not being used and spammers haven’t tried to defeat them. Endogeneity is important!

Concluding Comments Reputation systems are important and deserve formal, careful, study! –Axiomatic analyses. –Econometric analyses. Lots of challenging open problems!