Sep 26, 2013 Lirong Xia Computational social choice Statistical approaches.

Slides:



Advertisements
Similar presentations
Sep 16, 2013 Lirong Xia Computational social choice The easy-to-compute axiom.
Advertisements

Sep 15, 2014 Lirong Xia Computational social choice The easy-to-compute axiom.
A Tutorial on Learning with Bayesian Networks
Voting and social choice Vincent Conitzer
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
Sep 25, 2014 Lirong Xia Computational social choice Statistical approaches.
Statistics review of basic probability and statistics.
.. . Parameter Estimation using likelihood functions Tutorial #1 This class has been cut and slightly edited from Nir Friedman’s full course of 12 lectures.
Lirong Xia Friday, May 2, 2014 Introduction to Social Choice.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Parameter Estimation using likelihood functions Tutorial #1
Bayesian inference Gil McVean, Department of Statistics Monday 17 th November 2008.
Estimating parameters from data Gil McVean, Department of Statistics Tuesday 3 rd November 2009.
CPS Voting and social choice
Lecture 5: Learning models using EM
This presentation has been cut and slightly edited from Nir Friedman’s full course of 12 lectures which is available at Changes.
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
LARGE SAMPLE TESTS ON PROPORTIONS
Presenting: Assaf Tzabari
Parametric Inference.
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
. Maximum Likelihood (ML) Parameter Estimation with applications to reconstructing phylogenetic trees Comput. Genomics, lecture 6b Presentation taken from.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Thanks to Nir Friedman, HU
Learning Bayesian Networks (From David Heckerman’s tutorial)
Oct 6, 2014 Lirong Xia Judgment aggregation acknowledgment: Ulle Endriss, University of Amsterdam.
Review of Lecture Two Linear Regression Normal Equation
Social choice (voting) Vincent Conitzer > > > >
Chapter Two Probability Distributions: Discrete Variables
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
An efficient distributed protocol for collective decision- making in combinatorial domains CMSS Feb , 2012 Minyi Li Intelligent Agent Technology.
Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.
.. . Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 6a Presentation taken from.
Bayesian Inference Ekaterina Lomakina TNU seminar: Bayesian inference 1 March 2013.
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
A statistical model Μ is a set of distributions (or regression functions), e.g., all uni-modal, smooth distributions. Μ is called a parametric model if.
Bayesian Sets Zoubin Ghahramani and Kathertine A. Heller NIPS 2005 Presented by Qi An Mar. 17 th, 2006.
Truth-Revealing Social Choice ADT-15 Tutorial Lirong Xia.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Likelihood Methods in Ecology November 16 th – 20 th, 2009 Millbrook, NY Instructors: Charles Canham and María Uriarte Teaching Assistant Liza Comita.
Bayesian inference for Plackett-Luce ranking models
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.
Maximum Likelihood Estimation
1 Optimizing Decisions over the Long-term in the Presence of Uncertain Response Edward Kambour.
Javad Azimi, Ali Jalali, Xiaoli Fern Oregon State University University of Texas at Austin In NIPS 2011, Workshop in Bayesian optimization, experimental.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
11/24/2008CS Common Voting Rules as Maximum Likelihood Estimators - Matthew Kay 1 Common Voting Rules as Maximum Likelihood Estimators Vincent Conitzer,
Outline Historical note about Bayes’ rule Bayesian updating for probability density functions –Salary offer estimate Coin trials example Reading material:
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
CSC321: Lecture 8: The Bayesian way to fit models Geoffrey Hinton.
Model Comparison.
Bayesian Estimation and Confidence Intervals Lecture XXII.
Hypothesis testing and statistical decision theory
Introduction to Social Choice
Introduction to Social Choice
Parameter Estimation 主講人:虞台文.
Bayes Net Learning: Bayesian Approaches
Maximum Likelihood Estimation
Oliver Schulte Machine Learning 726
Introduction If we assume
Tutorial #3 by Ma’ayan Fishelson
Voting and social choice
EM for Inference in MV Data
Parametric Methods Berlin Chen, 2005 References:
Learning From Observed Data
EM for Inference in MV Data
Preference Modeling Lirong Xia. Preference Modeling Lirong Xia.
CPS Voting and social choice
Presentation transcript:

Sep 26, 2013 Lirong Xia Computational social choice Statistical approaches

Various “undesirable” behavior –manipulation –bribery –control 1 Last class: manipulation NP- Hard

2 ab a bc Turker 1 Turker 2 Turker n … >> Example: Crowdsourcing > ab > b c >

Outline: statistical approaches 3 Condorcet’s MLE model (history) A General framework Why MLE? Why Condorcet’s model? Random Utility Models Model selection

The Condorcet Jury theorem. Given –two alternatives { a, b }. –0.5< p <1, Suppose –each agent’s preferences is generated i.i.d., such that –w/p p, the same as the ground truth –w/p 1 - p, different from the ground truth Then, as n →∞, the majority of agents’ preferences converges in probability to the ground truth 4 The Condorcet Jury theorem [Condorcet 1785]

Condorcet’s MLE approach Parametric ranking model M r : given a “ground truth” parameter Θ –each vote P is drawn i.i.d. conditioned on Θ, according to Pr ( P| Θ ) –Each P is a ranking For any profile D=(P 1,…,P n ), –The likelihood of Θ is L( Θ |D)=Pr(D| Θ )=∏ P ∈ D Pr(P| Θ ) –The MLE mechanism MLE (D)= argmax Θ L( Θ |D) –Break ties randomly “Ground truth” Θ P1P1 P2P2 PnPn … 5

Parameterized by a ranking Given a “ground truth” ranking W and p > 1/2, generate each pairwise comparison in V independently as follows (suppose c ≻ d in W ) MLE ranking is the Kemeny rule [Young JEP-95] 6 Condorcet’s model [Condorcet 1785] Pr( b ≻ c ≻ a | a ≻ b ≻ c ) = (1-p) p (1-p)p (1-p) 2 c ≻ d in W c ≻ d in V p d ≻ c in V 1-p

Outline: statistical approaches 7 Condorcet’s MLE model (history) Why MLE? Why Condorcet’s model? A General framework

8 Statistical decision framework ground truth Θ P1P1 PnPn …… MrMr Decision (winner, ranking, etc) Information about the ground truth P1P1 P2P2 PnPn … Step 1: statistical inference Data D Given M r Step 2: decision making

M r = Condorcet’ model Step 1: MLE Step 2: top-alternative 9 Example: Kemeny Winner The most probable ranking P1P1 P2P2 PnPn … Step 1: MLE Data D Step 2: top-1 alternative

Frequentist –there is an unknown but fixed ground truth – p = 10/14=0.714 –Pr(2heads| p= ) =(0.714) 2 =0.51>0.5 –Yes! 10 Frequentist vs. Bayesian in general Bayesian –the ground truth is captured by a belief distribution –Compute Pr( p |Data) assuming uniform prior –Compute Pr(2heads|Data)=0.485 <0.5 –No! Credit: Panos Ipeirotis & Roy Radner You have a biased coin: head w/p p –You observe 10 heads, 4 tails –Do you think the next two tosses will be two heads in a row?

M r = Condorcet’ model 11 Kemeny = Frequentist approach Winner The most probable ranking P1P1 P2P2 PnPn … Step 1: MLE Data D Step 2: top-1 alternative This is the Kemeny rule (for single winner)!

12 Example: Bayesian M r = Condorcet’ model Winner Posterior over rankings P1P1 P2P2 PnPn … Step 1: Bayesian update Data D Step 2: mostly likely top-1 This is a new rule!

Anonymity, neutrality, monotonicity ConsistencyCondorcet Easy to compute Frequentist (Kemeny) YN YN Bayesian NY 13 Frequentist vs. Bayesian Lots of open questions! Writing up a paper for submission

Outline: statistical approaches 14 Condorcet’s MLE model (history) Why MLE? Why Condorcet’s model? A General framework

When the outcomes are winning alternatives –MLE rules must satisfy consistency: if r(D 1 )∩r(D 2 )≠ ϕ, then r(D 1 ∪ D 2 )=r(D 1 )∩r(D 2 ) –All classical voting rules except positional scoring rules are NOT MLEs Positional scoring rules are MLEs This is NOT a coincidence! –All MLE rules that outputs winners satisfy anonymity and consistency –Positional scoring rules are the only voting rules that satisfy anonymity, neutrality, and consistency! [Young SIAMAM-75] 15 Classical voting rules as MLEs [Conitzer&Sandholm UAI-05]

When the outcomes are winning rankings –MLE rules must satisfy reinforcement (the counterpart of consistency for rankings) –All classical voting rules except positional scoring rules and Kemeny are NOT MLEs This is not (completely) a coincidence! –Kemeny is the only preference function (that outputs rankings) that satisfies neutrality, reinforcement, and Condorcet consistency [Young&Levenglick SIAMAM-78] 16 Classical voting rules as MLEs [Conitzer&Sandholm UAI-05]

Condorcet’s model –not very natural –computationally hard Other classic voting rules –most are not MLEs –models are not very natural either –approximately compute the MLE 17 Are we happy?

New mechanisms via the statistical decision framework Model selection –How can we evaluate fitness? Frequentist or Bayesian? –Focus on frequentist Computation –How can we compute MLE efficiently? 18 Decision Information about the ground truth Data D decision making inference

Outline: statistical approaches 19 Condorcet’s MLE model (history) A General framework Why MLE? Why Condorcet’s model? Random Utility Models

Continuous parameters: Θ =( θ 1,…, θ m ) – m : number of alternatives –Each alternative is modeled by a utility distribution μ i – θ i : a vector that parameterizes μ i An agent’s perceived utility U i for alternative c i is generated independently according to μ i ( U i ) Agents rank alternatives according to their perceived utilities – Pr ( c 2 ≻ c 1 ≻ c 3 |θ 1, θ 2, θ 3 ) = Pr U i ∼ μ i ( U 2 >U 1 >U 3 ) 20 Random utility model (RUM) [Thurstone 27] U1U1 U2U2 U3U3 θ3θ3 θ2θ2 θ1θ1

Pr ( Data |θ 1, θ 2, θ 3 ) = ∏ R ∈ Data Pr(R |θ 1, θ 2, θ 3 ) 21 Generating a preference- profile Parameters P 1 = c 2 ≻ c 1 ≻ c 3 P n = c 1 ≻ c 2 ≻ c 3 … Agent 1 Agent n θ3θ3 θ2θ2 θ1θ1

μ i ’ s are Gumbel distributions –A.k.a. the Plackett-Luce (P-L) model [BM 60, Yellott 77] Equivalently, there exist positive numbers λ 1,…, λ m Pros: –Computationally tractable Analytical solution to the likelihood function –The only RUM that was known to be tractable Widely applied in Economics [McFadden 74], learning to rank [Liu 11], and analyzing elections [GM 06,07,08,09] Cons: does not seem to fit very well 22 RUMs with Gumbel distributions c 1 is the top choice in { c 1,…,c m } c 2 is the top choice in { c 2,…,c m } c m-1 is preferred to c m

μ i ’ s are normal distributions –Thurstone’s Case V [Thurstone 27] Pros: –Intuitive –Flexible Cons: believed to be computationally intractable –No analytical solution for the likelihood function Pr(P | Θ) is known 23 RUM with normal distributions U m : from - ∞ to ∞U m-1 : from U m to ∞ … U 1 : from U 2 to ∞

Utility distributions μ l ’s belong to the exponential family (EF) –Includes normal, Gamma, exponential, Binomial, Gumbel, etc. In each iteration t E-step, for any set of parameters Θ –Computes the expected log likelihood ( ELL ) ELL(Θ| Data, Θ t ) = f (Θ, g(Data, Θ t )) M-step –Choose Θ t+1 = argmax Θ ELL(Θ | Data, Θ t ) Until | Pr ( D | Θ t ) -Pr ( D | Θ t+ 1 )| < ε 24 Approximately computed by Gibbs sampling MC-EM algorithm for RUMs [APX NIPS-12]

Outline: statistical approaches 25 Condorcet’s MLE model (history) A General framework Why MLE? Why Condorcet’s model? Random Utility Models Model selection

26 Model selection Value(Normal) - Value(PL) LLPred. LLAICBIC 44.8(15.8)87.4(30.5)-79.6(31.6)-50.5(31.6) Compare RUMs with Normal distributions and PL for –log-likelihood: log Pr( D | Θ ) –predictive log-likelihood: E log Pr( D test | Θ ) –Akaike information criterion (AIC): 2k-2 log Pr( D | Θ ) –Bayesian information criterion (BIC): k log n-2 log Pr( D | Θ ) Tested on an election dataset –9 alternatives, randomly chosen 50 voters Red: statistically significant with 95% confidence Project: model fitness for election data

Generalized RUM [APX UAI-13] –Learn the relationship between agent features and alternative features Preference elicitation based on experimental design [APX UAI-13] –c.f. active learning Faster algorithms [ACPX NIPS-13] –Generalized Method of Moments (GMM) 27 Recent progress

Random sample elections –Richard Carback and David Chaum (remote) You need to –read the paper –prepare questions 28 Next class: Guest lecture