Truth-Revealing Social Choice ADT-15 Tutorial Lirong Xia
Member of Parliament election: Plurality rule Alternative vote? 68% No vs. 32% Yes UK Referendum
Ordinal Preference Aggregation: Social Choice > social choice mechanism > 2 A profile Carol Alice Bob ABC AB C A C B A
3 AB C A B C Turker 1 Turker 2 Turker n … >> Ranking pictures [PGM+ AAAI-12] > > AB > B C >
4 Social choice R1R1 R1*R1* Outcome R2R2 R2*R2* RnRn Rn*Rn* social choice mechanism …… Profile R i, R i *: full rankings over a set A of alternatives
Applications: real world People/agents often have conflicting preferences, yet they have to make a joint decision 5
Multi-agent systems [Ephrati and Rosenschein 91] Recommendation systems [Ghosh et al. 99] Meta-search engines [Dwork et al. 01] Belief merging [Everaere et al. 07] Human computation (crowdsourcing) [Mao et al. AAAI-13] etc. 6 Applications: academic world
How to design a good social choice mechanism? 7 What is being “good”?
Two goals for social choice mechanisms GOAL1: democracy 8 GOAL2: truth THIS TUTORIAL Axiomatic social choice
The Condorcet Jury Theorem (CJT) Break Four directions of extending CJT Beyond CJT: the objective decision- making perspective 9 Outline
Research questions + Basic models –tip of the iceberg More references –Survey by Nitzan and Paroush (online): Collective Decision Making and Jury Theorem –Survey by Gerlinga et al. [2005]: Information acquisition and decision making in committees: A survey –My personal summary, send me an 10 Flavor of this tutorial
Joerg’s text book Handbook of Computational Social Choice 11 Computational social choice
Axiomatic social choice The Condorcet Jury Theorem (CJT) Break Four directions of extending CJT Beyond CJT: the objective decision- making perspective 12 Outline
Common voting rules (what has been done in the past two centuries) Mathematically, a social choice mechanism (voting rule) is a mapping from {All profiles} to {outcomes} –an outcome is usually a winner, a set of winners, or a ranking – m : number of alternatives (candidates) – n : number of agents (voters) – D =( P 1,…, P n ) a profile Positional scoring rules A score vector s 1,...,s m –For each vote V, the alternative ranked in the i -th position gets s i points –The alternative with the most total points is the winner –Special cases Borda, with score vector ( m-1, m-2, …,0 ) Plurality, with score vector ( 1,0,…,0 ) [Used in the US]
An example Three alternatives { c 1, c 2, c 3 } Score vector ( 2,1,0 ) (=Borda) 3 votes, c 1 gets 2+1+1=4, c 2 gets 1+2+0=3, c 3 gets 0+0+2=2 The winner is c
Kendall tau distance – K(V,W) = # {different pairwise comparisons} Kemeny( D )=argmin W K(D,W)= argmin W Σ P ∈ D K(P,W) For single winner, choose the top-ranked alternative in Kemeny( D ) [Has a statistical interpretation] 15 The Kemeny rule K( b ≻ c ≻ a, a ≻ b ≻ c ) = 11 2
Approval, Baldwin, Black, Bucklin, Coombs, Copeland, Dodgson, maximin, Nanson, Range voting, Schulze, Slater, ranked pairs, etc… 16 …and many others
17 Q: How to evaluate rules in terms of achieving democracy? A: Axiomatic approach
18 Axiomatic approach (what has been done in the past 50 years) Anonymity: names of the voters do not matter –Fairness for the voters Non-dictatorship: there is no dictator, whose top-ranked alternative is always the winner –Fairness for the voters Neutrality: names of the alternatives do not matter –Fairness for the alternatives Consistency: if r(D 1 )∩r(D 2 )≠ ϕ, then r(D 1 ∪ D 2 )=r(D 1 )∩r(D 2 ) Condorcet consistency: if there exists a Condorcet winner, then it must win –A Condorcet winner beats all other alternatives in pairwise elections Easy to compute: winner determination is in P –Computational efficiency of preference aggregation Hard to manipulate: computing a beneficial false vote is hard
19 Which axiom is more important? Some of these axiomatic properties are not compatible with others Condorcet consistency ConsistencyEasy to compute Positional scoring rules NYY KemenyYNN Ranked pairsYNY
20 An easy fact Theorem. For voting rules that selects a single winner, anonymity is not compatible with neutrality –proof: > > > > ≠ W.O.L.G. NeutralityAnonymity Alice Bob
21 Not-So-Easy facts Arrow’s impossibility theorem –Google it! Gibbard-Satterthwaite theorem –Google it! Axiomatic characterization –Template: A voting rule satisfies axioms A1, A2, A2 if it is rule X –If you believe in A1 A2 A3 are the most desirable properties then X is optimal –(anonymity+neutrality+consistency+continuity) positional scoring rules [Young SIAMAM-75] –(neutrality+consistency+Condorcet consistency) Kemeny [Young&Levenglick SIAMAM-78]
Axiomatic social choice The Condorcet Jury Theorem (CJT) Break Four directions of extending CJT Beyond CJT: the objective decision- making perspective 22 Outline
Given –two alternatives { a, b }. –competence 0.5< p <1, Suppose –agents’ signals are i.i.d. conditioned on the ground truth w/p p, the same as the ground truth w/p 1 - p, different from the ground truth –agents truthfully report their signals The majority rule reveals ground truth as n →∞ 23 The Condorcet Jury theorem (CJT) [Condorcet 1785, Laplace 1812]
It Justifies the democracy and wisdom of the crowd It “lays, among other things, the foundations of the ideology of the democratic regime” [Paroush SCW-98] 24 Why CJT is important?
Group competence –Pr( maj ( P n )= a | a ) – P n : n i.i.d. votes given ground truth a Random variable X j : takes 1 w/p p, 0 otherwise –encoding whether signal=ground truth Σ j=1 n X j /n converges to p in probability (Law of Large Numbers) 25 Proof
The group competence 1.is higher than that of any single agent 2.increases in the group size n 3.goes to 1 as n →∞ 26 Three parts of CJT
From 2k to 2k+1 –The extra vote breaks ties with higher probability in favor of the ground truth – From 2k+1 to 2k+2 –( k+1 ) ( k+1 ( k+1 – ( k+1 ( k+1 ( k+1 27 Proof of competence monotonicity ( k+1 ) p 1-p
Given –two alternatives { a, b }. –competence 0.5< p <1, Suppose –agents’ signals are i.i.d. conditioned on the ground truth w/p p, the same as the ground truth w/p 1 - p, different from the ground truth –agents truthfully report their signals The majority rule reveals ground truth as n →∞ 28 Limitations of CJT more than two? heterogeneous agents? dependent agents? strategic agents? other rules?
Axiomatic social choice The Condorcet Jury Theorem (CJT) Break Four directions of extending CJT Beyond CJT: the objective decision- making perspective 29 Outline
Dependent agents Heterogeneous agents Strategic agents More than two alternatives 30 Extensions
31 An active area Social Choice and Welfare American Political Science Review Games and Economic Behavior Mathematical Social Sciences Theory and Decision Public Choice Econometrica + JET Myerson Shapley&Grofman MSS special issue on ADT-15
Dependent agents Heterogeneous agents Strategic agents More than two alternatives 32 Extensions
The group competence 1.is higher than that of any single agent –Not always (mimicking one leader) 2.increases in the group size n –Not always (mimicking one leader) 3.goes to 1 as n →∞ –Yes for some dependency models [Berg 92; Ladha 92, 93; Peleg&Zamir 12] 33 Does CJT hold for dependent agents?
Positive correlations –agents are likely to receive similar signals even conditioned on the ground truth Negative correlations –agents are likely to receive different signals Conjecture: Positive correlations reduces group competence –positively correlated agents effectively reduces the number of agents 34 Dependent agents
One leader ( Y ), 2k followers ( X 1,…, X 2k ), same competence p –Pr( Y=1 ) = Pr( X j =1 )= p – X j ’s are independent conditioned on Y Correlation r 2 –Pr( X j =1|Y=1 ) = p + r ( 1-p ) –Pr( X j =0|Y=0 ) = ( 1-p ) + rp Theorem. In the opinion leader model –when p >0.5 the group competence decreases in r –when p <0.5 the group competence increases in r –when p =0.5 the group competence does not change in r 35 Opinion leader model [Boland et al. 89]
One common evidence ( E ), 2k+1 agents ( X 1,…, X 2k+1 ), same competence p –Pr( E=1 ) = Pr( X j =1 )= p – X j ’s are independent conditioned on E Correlation r 2 –Pr( X j =1|E=1 ) = p + r ( 1-p ) –Pr( X j =0|E=0 ) = ( 1-p ) + rp Theorem. In the common evidence model –when p >0.5 the group competence decreases in r –when p <0.5 the group competence increases in r –when p =0.5 the group competence does not change in r 36 Common evidence model [Boland et al. 89]
Ground truth G Common evidence E Given any ideal vote function f: E G –Competence p e = Pr( X j = f ( e )| e ) Theorem. The majority rule converges to f ( e ) as n →∞ 37 Common evidence model [Dietrich and List 2004] G E X1X1 XnXn …
Dependent agents Heterogeneous agents Strategic agents More than two alternatives 38 Extensions
The group competence 1.is higher than that of any single agent –Not always (1, ,…) 2.increases in the group size n –Not always (1, ,…) 3.goes to 1 as n →∞ –not always: p j = 0.5+1/n –Yes under some condition [Berend&Paroush, 1998] 39 Does CJT hold for heterogeneous agents?
Independent signals Agent j ’s competence is p j Theorem [Berend&Paroush, 1998]. CJT holds if and only if 1., or 2.for every sufficiently large n, 40 Group competence for heterogeneous agents
Given the competence { p 1,…, p n } of n agents where p j ≥0.5 – M l : average competence of m randomly chosen agents Theorem [Berend&Sapir 05]. For two alternatives and all l≤n-1 – M l ≤ M l+1 if m is even – M l = M l+1 if m is odd 41 Competence monotonicity [Berend&Sapir 05]
Theorem [Shapley and Grofman 1984]. Given the competence { p 1,…, p n } of n agents, the maximum likelihood estimator is the weighted majority voting with Proof. Suppose the ground truth is a, the log likelihood of the profile is 42 Optimal voting rule for two alternatives
Dependent agents Heterogeneous agents Strategic agents More than two alternatives 43 Extensions
The group competence 1.is higher than that of any single agent –Not always (same-vote equilibrium) 2.increases in the group size n –Not always (same-vote equilibrium) 3.goes to 1 as n →∞ –Yes for some models and informative equilibrium 44 Does CJT hold for strategic agents?
Common interest Bayesian voting game [Austen- Smith&Banks APSR-96] –two alternatives { a, b }, two signals { A,B }, a prior, Pr(signal|truth), p a =Pr(signal= A |truth= a ) p b =Pr(signal= B |truth= b ) –agents have the same utility function U(outcome, ground truth) =1 iff outcome = ground truth –sincere voting: vote for the alternative with the highest posterior probability –informative voting: vote for the signal –strategic voting: vote for the alternative with the highest expected utility 45 Strategic voting
1.Nature chooses a ground truth g 2.Every agent j receives a signal s j ~Pr( s j |g ) 3.Every agent computes the posterior distribution (belief) over the ground truth using Bayesian’s rule 4.Every agent chooses a vote to maximizes her expected utility according to her belief 5.The outcome is computed by the voting rule 46 Timeline of the game
Two signals, two voters Model: Pr( | ) =Pr( | ) = p> High level example p 1-p + my vote, winner: utility for voting : half/half p 1-p p Truthful agent: Posterior: The other signal:
Setting –Two alternatives { a, b }, two signals { A,B } –Three agents – p a =0.8, p b =0.6 –Uniform prior: Pr( a )=0.1, Pr( b )=0.9 An agent receives A –Informative voting: a –posterior probability: a vs. b sincere voting: b 48 Sincere voting = informative voting?
Setting –Two alternatives { a, b }, two signals { A,B } –Three agents – p a =0.8, p b =0.6 –Uniform prior: Pr( a )=Pr( b )=0.5 An agent receives A, other two agents are sincere/informative –Informative voting: a –posterior probability: a b sincere voting: a –probability of a tie (other two agents’ votes are { a, b }) 0.32| a, 0.48| b –Expected utility for voting a : 0.32*2/3 –Expected utility for voting b : 0.48*1/3 –Strategic voting: a 49 Sincere voting = strategic voting?
Setting –Two alternatives { a, b }, two signals { A,B } –Three agents – p a =0.8, p b =0.6 –Uniform prior: Pr( a )=Pr( b )=0.5 An agent receives A, other two agents are sincere/informative –Conditioned on other two votes are { a, b } –Signal profile is ( A,A,B ) –Posterior probabilities Pr( a | A,A,B ) ∝ Pr( a )×Pr( A | a )×Pr( A | a )×Pr( B | a )=0.5 p a 2 (1- p a ) Pr( b | A,A,B ) ∝ Pr( b )×Pr( A | b )×Pr( A | b )×Pr( B | b )=0.5(1- p b ) 2 p b –Strategic voting: a 50 The “pivotal” approach
Given a Bayesian game, a Bayesian Nash Equilibrium is a strategy profile ( s 1,…, s n ) such that – s n : signal vote –every agent j prefers s j to any other strategy, conditioned on other agents playing s Example of strategy –Informative voting: s ( A )= a, s ( B )= b –You can also: s ( A )= b, s ( B )= a 51 Bayesian Nash Equilibrium
Theorem [McLennan 98]. Let r* denote the voting rule with maximum expected utility given informative vote. Informative voting is a Bayesian Nash Equilibrium under r*. 52 Equilibrium under the optimal voting rule
Key question: –What are the equilibria of the game (hopefully informative voting)? –Does CJT hold in equilibria? Similar model for juries –[Feddersen&Pesendorfer Econometrica-97, APSR-98, PNAS-99] Number of voters is uncertain, following a Poisson distribution –[Myerson GEB-98, JET-02] Three alternatives –[Nunez JTP-10; Goertz&Maniquet JET-11;B outon and Micael Castanheira Econometrica-12; Goertz SCW-14; Goertz&Maniquet EL-14] 53 Subsequent work
Dependent agents Heterogeneous agents Strategic agents More than two alternatives 54 Four extensions
Condorcet’s MLE approach Parametric ranking model M r : given a “ground truth” parameter Θ –each vote V is drawn i.i.d. conditioned on Θ, according to Pr ( V| Θ ) –Each P is a ranking For any profile P=(V 1,…,V n ), –The likelihood of Θ is L( Θ |P)=Pr(P| Θ )=∏ V ∈ P Pr(V| Θ ) –The MLE mechanism MLE (P)= argmax Θ L( Θ |P) –Break ties randomly What if Decision space ≠ Parameter space? “Ground truth” Θ V1V1 V2V2 VnVn … 55
Fix the dispersion ϕ <1 Parameter space –all full rankings over alternatives Sample space –i.i.d. generated full rankings Probabilities: given a ground truth ranking W, generate a ranking V w.p. Pr W ( V ) ∝ ϕ Kendall ( V,W ) MLE is the Kemeny rule 56 Mallows’ model [Mallows-1957]
Fix the dispersion ϕ <1 Parameter space –all binary relations over alternatives Sample space –i.i.d. generated binary relations Probabilities: given a ground truth relation W, generate a relation V w.p. Pr W ( V ) ∝ ϕ Kendall ( V,W ) 57 Condorcet’s model [Condorcet-1785, Young-1988]
Understanding truth-revealing property of existing rules –MLE: [Conitzer&Sandholm UAI-05; Conitzer,Rognile&Xia IJCAI- 09; Xia,Conitzer&Lang AAMAS-10; Xia&Conitzer AAAI-11] –Consistent estimator: [Caragiannis, Procaccia & Shah EC-13] –Most probable winner: [Procaccia, Reddit&Shah UAI-13; Elkind&Shah UAI-14; Azari Soufiani, Parkes,&Xia NIPS-14] Learning ranking models –Mallows’ model: [Lu&Boutilier ICML-11; Hughes, Hwang&Xia UAI-15; Awasthi et al. NIPS-14; Chierichetti et al. ITCS-15] –Random Utility Models [too many to show] 58 Recent Work in Computer Science
Axiomatic social choice The Condorcet Jury Theorem (CJT) Break Four directions of extending CJT Beyond CJT: the objective decision- making perspective 59 Outline
Thinking about Arrow’s impossibility theorem –axiomatic properties are used to evaluate and compare voting rules New perspective –an objective measurement for voting rules –can be seem as another numerical “axiomatic” property 60 Beyond CJT
How to make objectively optimal decision using voting? Goal: new computationally tractable voting rule with desirable axiomatic+statistical properties –2 alternatives: majority rule –Kemeny’s rule (for ranking), NP-hard to compute Especially when Decision space ≠ Parameter space –e.g. use Mallows’ model to choose a single winner 61 CJT: the optimal objective decision-making perspective
Social choice community –statistical models are compelling Statistics/Machine Learning community –some axioms are desirable strategy-proofness monotonicity agents have less incentive to lie 62 Why care? Stat ML Social Choice
Inputs The rule 63 Statistical decision-theoretic framework for social choice [Azari Soufiani, Parkes &Xia NIPS-14] statistical model: Θ, S, Pr θ ( s ) decision space: D loss function: L ( θ, d ) ∈ℝ f : Profiles D with minimum Bayesian expected lost: – f ( P ) ∈ argmin d E θ | P L ( θ,d ) unknown ground truthdecision to make
f B 1 (Mallows) –Statistical model: Mallows’ model –Decision space: single winners –Loss function: the top loss function L(W, a) =0 if a is top-ranked in W, otherwise it is 1 –Bayesian estimator with uniform prior f B 2 (Condorcet) –Statistical model: Condorcet’s model –Decision space: single winners –Loss function: the top loss function L(W, a) =0 if a is top-ranked in W, otherwise it is 1 –Bayesian estimator with uniform prior 64 Two examples
Anonymity, neutrality, monotonicity Consistenc y Majority, Condorcet Complexity Min Bayesian risk Kemeny (Fishburn) ✔✗ ✔✗✗ f B 1 (Mallows) ✗ ✗ ✔ for Mallows f B 2 (Condorcet) ✔ ✔ for Condorcet 65 Comparisons Highlight: f B 2 does well in many aspects.
How much does strategic agents hurt the truth-revealing power? Price of Anarchy (PoA) [Koutsoupias&Papadimitriou STACS-99] Optimal truth-revealing power WORST truth-revealing power in equilibrium Price of Stability (PoS) [Anshelevich et al. FOCS-04] Optimal truth-revealing power BEST truth-revealing power in equilibrium Theorem [Xia-15]. Informative voting is a BNE under plurality for a wide range of statistical modes with m >2 Theorem [Xia-15]. The PoA of plurality is at least m, the PoS of plurality is 1 as n →∞ 66 CJT: numerically evaluate the effect of strategic agents
The Condorcet Jury Theorem Four extensions –dependent agents –heterogeneous agents –strategic agents –more than two alternatives The new perspective –design new mechanisms –PoA and PoS 67 Wrap up
Numerical extensions of the CJT to –dependent, heterogeneous, and strategic agents –with m >2 –for commonly studied voting rules The new perspectives –New frameworks and rules compromising axiomatic, computational, and truth-revealing properties 68 Open questions Thank you!
Given –a similarity function d symmetric, coincidence axiom not necessarily triangle inequality –a dispersion 0< ϕ <1 Pr b ( a ) ∝ ϕ d ( a, b ) 69 Mallows-like models
d: Suppose an agent receives a 1 –When t is sufficiently small, reporting a 2 has a higher expected utility given that other agents are sincere under the plurality rule –When triangle inequality is satisfied, sincere voting is a BNE 70 Sincere voting is not always a BNE a1a1 a3a3 a2a2 a4a4 t t
Lemma [Austen-Smith&Banks 96]. For any ( p a, p b ) ∈ ( 0,1 ) 2, the only threshold rule where informative voting is strategic is the strict majority. Proof. W.l.o.g. suppose the threshold T >(n-1)/2. Suppose for the sake of contradiction that informative voting is strategic. –The only situation where an agent’s vote matters is a receives exactly T votes –Regardless of her signal, b has higher probability conditioned on this “tied” situation –Strategically the agent will vote for b 71 Informative voting meets strategic voting
72 Encouraging more efforts