Rank Aggregation Methods II Experiments CS728 Lecture 12.

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

Google News Personalization: Scalable Online Collaborative Filtering
Computational Game Theory Amos Fiat Spring 2012
Web Information Retrieval
Fast Algorithms For Hierarchical Range Histogram Constructions
Common Voting Rules as Maximum Likelihood Estimators Vincent Conitzer (Joint work with Tuomas Sandholm) Early version of this work appeared in UAI-05.
Voting and social choice Vincent Conitzer
Algorithmic Game Theory Uri Feige Robi Krauthgamer Moni Naor Lecture 9: Social Choice Lecturer: Moni Naor.
Voting and social choice Looking at a problem from the designers point of view.
How “impossible” is it to design a Voting rule? Angelina Vidali University of Athens.
IMPOSSIBILITY AND MANIPULABILITY Section 9.3 and Chapter 10.
CS 886: Electronic Market Design Social Choice (Preference Aggregation) September 20.
Brittany Vacchiano. » Voting procedure in which voters can vote for as many candidate as they wish » Each candidate approved of receives one vote » Single.
Computing Kemeny and Slater Rankings Vincent Conitzer (Joint work with Andrew Davenport and Jayant Kalagnanam at IBM Research.)
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
Approximation Algorithms Chapter 5: k-center. Overview n Main issue: Parametric pruning –Technique for approximation algorithms n 2-approx. algorithm.
Divide and Conquer. Recall Complexity Analysis – Comparison of algorithm – Big O Simplification From source code – Recursive.
A Fairy Tale of Greedy Algorithms Yuli Ye Joint work with Allan Borodin, University of Toronto.
Nattee Niparnan. Recall  Complexity Analysis  Comparison of Two Algos  Big O  Simplification  From source code  Recursive.
Rank Aggregation Methods for the Web CS728 Lecture 11.
CPS Voting and social choice
Social choice theory = preference aggregation = voting assuming agents tell the truth about their preferences Tuomas Sandholm Professor Computer Science.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 7 April 20, 2005
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 11: Probabilistic Information Retrieval.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 4 March 30, 2005
Common Voting Rules as Maximum Likelihood Estimators Vincent Conitzer and Tuomas Sandholm Carnegie Mellon University, Computer Science Department.
Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.
Lecture 9: Rank Aggregation in MetaSearch MetaSearch Engine Social Choice Rules Rank Aggregation.
Social choice theory = preference aggregation = truthful voting Tuomas Sandholm Professor Computer Science Department Carnegie Mellon University.
CS522: Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian
Orthogonality and Least Squares
1 The Process of Computing Election Victories Computational Sociology: Social Choice and Voting Methods CS110: Introduction to Computer Science – Lab Module.
Social choice (voting) Vincent Conitzer > > > >
An Impossibility Theorem for Clustering By Jon Kleinberg.
An efficient distributed protocol for collective decision- making in combinatorial domains CMSS Feb , 2012 Minyi Li Intelligent Agent Technology.
CPS Voting and social choice Vincent Conitzer
Group Recommendations with Rank Aggregation and Collaborative Filtering Linas Baltrunas, Tadas Makcinskas, Francesco Ricci Free University of Bozen-Bolzano.
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
Information Networks Rank Aggregation Lecture 10.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Social choice theory = preference aggregation = voting assuming agents tell the truth about their preferences Tuomas Sandholm Professor Computer Science.
1 Chapter 5-1 Greedy Algorithms Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved.
1 CS 430: Information Discovery Lecture 5 Ranking.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.
MA/CSSE 473 Days Optimal linked lists Optimal BSTs.
11/24/2008CS Common Voting Rules as Maximum Likelihood Estimators - Matthew Kay 1 Common Voting Rules as Maximum Likelihood Estimators Vincent Conitzer,
INFORMATION RETRIEVAL MEASUREMENT OF RELEVANCE EFFECTIVENESS 1Adrienn Skrop.
Ashish Das Indian Institute of Technology Bombay India.
1 The Process of Computing Election Victories Computational Sociology: Social Choice and Voting Methods CS110: Introduction to Computer Science – Lab Module.
Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.
Algorithms for Large Data Sets
Model n voters, m candidates
Impossibility and Other Alternative Voting Methods
Social choice theory = preference aggregation = voting assuming agents tell the truth about their preferences Tuomas Sandholm Professor Computer Science.
Applied Mechanism Design For Social Good
Impossibility and Other Alternative Voting Methods
Introduction If we assume
Rank Aggregation.
Chap 3. The simplex method
Models and Algorithms for Complex Networks
MULTIDIMENSIONAL RANKING
Voting and social choice
CPS 173 Voting and social choice
Back to Cone Motivation: From the proof of Affine Minkowski, we can see that if we know generators of a polyhedral cone, they can be used to describe.
CPS Voting and social choice
Outline Rank Aggregation Computing aggregate scores
Presentation transcript:

Rank Aggregation Methods II Experiments CS728 Lecture 12

Recall the Rank Aggregation Problem m candidates (a.k.a. “alternatives”) –M = {1,…,m}: set of candidates n voters (a.k.a. “agents” or “judges”) –N = {1,…,n}: set of voters Each voter i, has an ranking  i on M –  i (a) <  i (b) means i-th voter prefers a to b –Ranking may be a total or partial order The rank aggregation problem: Combine  1,…,  n into a single ranking  on M, which represents the “social choice” of the voters. –Rank aggregation function: f(  1,…,  n ) =  –  may be a total or partial order

Experiments: Distance Measures Goal: Quantitatively compare different rank aggregation methods. Performance Measures: (1) Spearman footrule distance is sum of pointwise distances. It is normalized by dividing this number by the maximum value (1/2)|S| 2, value between 0 and 1. (2) Kendall tau distance counts the number of pairwise disagreements. Dividing by the maximum possible value (1/2)S(S - 1) we obtain a normalized version, value between 0 and 1. (3) The induced footrule distance is obtained by taking the projections of a full list s with each partial list. In a similar manner, induced Kendall tau distance can be defined. (4) The scaled footrule distance weights contributions of elements based on the length of the lists they are present in. If s is a full list and t is a partial list, then: SF(s, t) = Sum | s(i)/|s|) - (t(i)/|t|) |. Normalize SF by dividing by |t|/2.

Experiments: Distance Measures So for each aggregation method and each distance measure we get a vector of values, each component representing a distance to from the aggregation to each voter list Simplest is to take the average (or 1-norm) Other norms are interesting –Mean square distance (2-norm) –Max distance (∞-norm)

Experiments: Minimizing Average Altavista (AV), Alltheweb (AW), Excite (EX), Google (GG), Hotbot HB),Lycos (LY), and Northernlight (NL) AltavistaAllthewebExciteGoogleHotbotLycosNorthernlight K = Kendall distance SF = scaled footrule distance IF = induced footrule distance LK = Local Kemenization

Experiments in Spam Filtering Define spam to be web pages are low-ranked by majority opinion (machine and human – a simplifying assumption) – although they may be highly ranked by some search engines Intuition: if a page spams most search engines for a particular query, then no combination of these search engines can filter the spam.---garbage in, garbage out. Spam pages are the Condorcet losers, and will occupy the bottom of ranking that satisfies the extended Condorcet criterion Similarly, good pages will be in the Condorcet winners, and will rank above the losers.

Condorcet Criterion –An candidate of M which wins every other in pairwise simple majority voting should be ranked first. Extended Condorcet Criterion (XCC): –Version 1: If most voters prefer candidate a to candidate b (i.e., # of i s.t.  i (a) <  i (b) is at least n/2), then also  should prefer a to b (i.e.,  (a) <  (b)). –Version 2: If there is a partition (W, L) of M such that for any x in W and y in L the majority prefers x to y, then x must be ranked above y. W is called Condorcet winners and L is Condorcet losers Condorcet Criteria

XCC(2) and SPAM Filtering Note that XCC(1) => XCC(2), so Version 1 is stronger But XCC(1) is not always realizable As we will see XCC(2) is always realizable via Local Keminization Hence using rank aggregation with XCC(2) should assist in SPAM filtering, since Condorcet losers will be lowest rank Let us look at where spam pages (human determined) are ranked with good aggregation methods.

Experiments: Filtering SPAM

Experiment: Word association Different search engines and portals have different (default) semantics of handling a multi-word query. Some use OR semantics (documents contain one of the given query terms) while Google uses the AND semantics (all the query words must appear). Both inconvenient in many situations. Consider searching for the job of a software engineer from an on-line job database. The user lists a number of skills and a number of potential keywords in the job description, for example, "Silicon Valley C++ Java CORBA TCP-IP algorithms start-up pre-IPO stock options". It is clear that the "AND" rule might produce no document or SPAM, and the "OR" rule is equally disastrous. Experiment with rank aggregation using multiple queries based on small subsets of terms.

Results for query: madras madurai coimbatore vellore. (cities in the state of Tamil Nadu, India) Google SFO with LK MC4 with LK

Locally Kemeny optimal aggregation and XCC(2) Many of existing aggregation methods do not satisfy XCC(1) or XCC(2). It is possible to use your favorite aggregation method to obtain a full list. Then apply local kemenization to realize XCC(2) which filters Condorcet losers.

Locally Kemeny optimal Recall that Kemeny optimal is NP-hard Definition of locally optimal A permutation p is a locally Kemeny optimal aggregation of partial lists t1, t2,..., tk, if there is no permutation p' that can be obtained from p by performing a single transposition of an adjacent pair of elements and for which Kendal distance K(p', t1, t2,..., tk) < K(p, t1, t2,..., tk). In other words, it is impossible to reduce the total distance to the t's by flipping an adjacent pair.

Example of LKO but not KO Example 1 t1 = (1,2), t2 = (2,3), t3 = t4 = t5 = (3,1). p = (1,2,3), We have that p satisfies Definition of LKO, K(p, t1, t2,..., t5)= 3, but transposing 1 and 3 decreases the sum to 2.

LKO satisfies XCC(2) Proof by contradiction If the result is false then there exist partial lists t1, t2,..., tk, a LKO aggregation p, and a partition (W,L) that violates XCC(2); that is some pair c in W and d in L, such that p(d) < p(c). Let (c,d) be the closest such pair in p. Consider the immediate successor of d in p, call it e. If e=c then c is adjacent to d in p and transposing this adjacent pair of alternatives produces a p' such that K(p', t1, t2,..., tk) < K(p, t1, t2,..., tk), contradicting the assumption on p. If e does not equal c, then either e is in W, in which case the pair (e,d) is a closer pair in p than (d,c) and also violates the XCC(2), or e is in L, in which case (e,c) is a closer pair than (d,c) that violates XCC(2). Both cases contradict the choice of (d,c).

A local Kemenization of a full list with respect to preference lists so as to compute a locally Kemeny optimal aggregation that is maximally consistent with original. This approach: (1) preserves the strengths of the initial aggregation (2) ranks non-spam above spam. (3) gives a result that disagrees with original on any pair (i, j) only if a majority endorse this disagreement. (4) for every d, 1 ≤ d ≤ | μ |, the restriction of the output is a local Kemenization of the top d elements of μ Local Kemenization procedure

A simple inductive construction. Assume inductively for that we have constructed p, a local Kemenization of the projection of the t's onto the elements 1,..., l-1. Insert next element x into the lowest-ranked "permissible" position in p: just below the lowest-ranked element y in p such that –(a) no majority among the (original) t's prefers x to y and –(b) for all successors z of y in p there is a majority that prefers x to z. In other words, we try to insert x at the end (bottom) of the list p; we bubble it up toward the top of the list as long as a majority of the t's insists that we do.

Example local kemenization procedure ABFECDABFECD BCAEFDBCAEFD ACFDEBACFDEB BFDCAEBFDCAE CABFEDCABFED BADCEFBADCEF BBABA ABAB ABDABD ABDCABDC ABCDABCD ABCFEDABCFED Local Kemenization Example! disagree A>B: 3 A<B: 2 B>D: 4 B<D: 1

RA and Searching Workplace Web Axiom 1: Intranet documents are not spam Axiom 2: Queries usually have unique answers (not broad topic based) Axiom 3: Intranet docs are not search engine friendly (docs are accessed through portals and database queries Rank aggregation allows us to combine number of heuristic alternatives: static and dynamic, query dependent and independent