Preference Modeling Lirong Xia. Preference Modeling Lirong Xia.

Slides:

Advertisements

Similar presentations

Bayes rule, priors and maximum a posteriori

Advertisements

Basics of Statistical Estimation

ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.

Sep 25, 2014 Lirong Xia Computational social choice Statistical approaches.

Flipping A Biased Coin Suppose you have a coin with an unknown bias, θ ≡ P(head). You flip the coin multiple times and observe the outcome. From observations,

.. . Parameter Estimation using likelihood functions Tutorial #1 This class has been cut and slightly edited from Nir Friedman’s full course of 12 lectures.

1 Parametric Sensitivity Analysis For Cancer Survival Models Using Large- Sample Normal Approximations To The Bayesian Posterior Distribution Gordon B.

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.

Bayesian Wrap-Up (probably). 5 minutes of math... Marginal probabilities If you have a joint PDF:... and want to know about the probability of just one.

Parameter Estimation using likelihood functions Tutorial #1

Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.

Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections CS479/679 Pattern Recognition Dr. George Bebis.

This presentation has been cut and slightly edited from Nir Friedman’s full course of 12 lectures which is available at Changes.

Bayesian learning finalized (with high probability)

Probabilistic Graphical Models Tool for representing complex systems and performing sophisticated reasoning tasks Fundamental notion: Modularity Complex.

Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.

Presenting: Assaf Tzabari

. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.

Visual Recognition Tutorial

Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.

Computer vision: models, learning and inference

Thanks to Nir Friedman, HU

Language Modeling Approaches for Information Retrieval Rong Jin.

Sep 26, 2013 Lirong Xia Computational social choice Statistical approaches.

Bayesian Inference Ekaterina Lomakina TNU seminar: Bayesian inference 1 March 2013.

A statistical model Μ is a set of distributions (or regression functions), e.g., all uni-modal, smooth distributions. Μ is called a parametric model if.

Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.

Probabilistic Approach to Design under Uncertainty Dr. Wei Chen Associate Professor Integrated DEsign Automation Laboratory (IDEAL) Department of Mechanical.

IID Samples In supervised learning, we usually assume that data points are sampled independently and from the same distribution IID assumption: data are.

ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.

Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 25 of 41 Monday, 25 October.

Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.

Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.

Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.

Maximum Likelihood Estimation

Statistical Estimation Vasileios Hatzivassiloglou University of Texas at Dallas.

1 Optimizing Decisions over the Long-term in the Presence of Uncertain Response Edward Kambour.

Machine Learning 5. Parametric Methods.

Univariate Gaussian Case (Cont.)

Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.

Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)

Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”

Outline Historical note about Bayes’ rule Bayesian updating for probability density functions –Salary offer estimate Coin trials example Reading material:

CSC321: Lecture 8: The Bayesian way to fit models Geoffrey Hinton.

Bayesian Estimation and Confidence Intervals Lecture XXII.

Applied statistics Usman Roshan.

Univariate Gaussian Case (Cont.)

Oliver Schulte Machine Learning 726

CS479/679 Pattern Recognition Dr. George Bebis

Bayesian Estimation and Confidence Intervals

LECTURE 06: MAXIMUM LIKELIHOOD ESTIMATION

Ch3: Model Building through Regression

Learning and Decision-making from Rank Data

Parameter Estimation 主講人：虞台文.

Bayes Net Learning: Bayesian Approaches

Maximum Likelihood Estimation

Oliver Schulte Machine Learning 726

Tutorial #3 by Ma’ayan Fishelson

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Statistical NLP: Lecture 4

Important Distinctions in Learning BNs

PSY 626: Bayesian Statistics for Psychological Science

LECTURE 09: BAYESIAN LEARNING

LECTURE 07: BAYESIAN ESTIMATION

CS 594: Empirical Methods in HCC Introduction to Bayesian Analysis

Parametric Methods Berlin Chen, 2005 References:

Learning From Observed Data

Data Exploration and Pattern Recognition © R. El-Yaniv

Mathematical Foundations of BME Reza Shadmehr

Presentation transcript:

Preference Modeling Lirong Xia

Today’s Schedule Modeling random preferences Random Utility Model Modeling preferences over lotteries Prospect theory

Parametric Preference Learning Decisions Ground truth + Statistical model MLE, Bayesian, etc …

Parametric ranking models A statistical model has three parts A parameter space: Θ A sample space: S = Rankings(A)n A = the set of alternatives, n=#voters assuming votes are i.i.d. A set of probability distributions over S: {Prθ (s) for each s∈Rankings(A) and θ∈Θ}

Example Condorcet’s model for two alternatives Parameter space Θ={ , } Sample space S = { , }n Probability distributions, i.i.d. Pr( | ) = Pr( | ) = p>0.5

Mallows’ model [Mallows-1957] Fixed dispersion 𝜑 <1 Parameter space all full rankings over candidates Sample space i.i.d. generated full rankings Probabilities: PrW(V) ∝ 𝜑 Kendall(V,W)

Example: Mallows for Probabilities: 𝑍=1+2𝜑+ 2𝜑 2 + 𝜑 3 > > 1 𝑍 Stan Kyle Eric Probabilities: 𝑍=1+2𝜑+ 2𝜑 2 + 𝜑 3 > > 1 𝑍 > > 𝜑 𝑍 > > > > > > Truth 𝜑 𝑍 𝜑 2 𝑍 > > > > 𝜑 2 𝑍 𝜑 3 𝑍

Random utility model (RUM) [Thurstone 27] Continuous parameters: Θ=(θ1,…, θm) m: number of alternatives Each alternative is modeled by a utility distribution μi θi: a vector that parameterizes μi An agent’s latent utility Ui for alternative ci is generated independently according to μi(Ui) Agents rank alternatives according to their perceived utilities Pr(c2≻c1≻c3|θ1, θ2, θ3) = PrUi ∼ μi (U2>U1>U3) θ3 θ2 θ1 U3 U1 U2

Generating a preference-profile Pr(Data |θ1, θ2, θ3) = ∏V∈Data Pr(V |θ1, θ2, θ3) Parameters θ3 θ2 θ1 Agent n Agent 1 … Pn= c1≻c2≻c3 P1= c2≻c1≻c3

Plackett-Luce model μi’s are Gumbel distributions A.k.a. the Plackett-Luce (P-L) model [BM 60, Yellott 77] Alternative parameterization λ1,…,λm Pros: Computationally tractable Analytical solution to the likelihood function The only RUM that was known to be tractable Widely applied in Economics [McFadden 74], learning to rank [Liu 11], and analyzing elections [GM 06,07,08,09] Cons: may not be the best model cm-1 is preferred to cm c1 is the top choice in { c1,…,cm } c2 is the top choice in { c2,…,cm } McFadden

Example > > 1 10 × 4 9 > > 4 10 × 1 6 > > > > > > 1 10 × 4 9 > > 4 10 × 1 6 > > > > 1 Truth 1 10 × 5 9 5 10 × 1 5 4 5 > > > > 4 10 × 5 6 5 10 × 4 5

RUM with normal distributions μi’s are normal distributions Thurstone’s Case V [Thurstone 27] Pros: Intuitive Flexible Cons: believed to be computationally intractable No analytical solution for the likelihood function Pr(P | Θ) is known … Um: from -∞ to ∞ Um-1: from Um to ∞ U1: from U2 to ∞

Decision making

Maximum likelihood estimators (MLE) Model: Mr “Ground truth” θ … V1 V2 Vn For any profile P=(V1,…,Vn), The likelihood of θ is L(θ,P)=Prθ(P)=∏V∈P Prθ (V) The MLE mechanism MLE(P)=argmaxθ L(θ,P) Decision space = Parameter space

Bayesian approach Given a profile P=(V1,…,Vn), and a prior distribution 𝜋 over Θ Step 1: calculate the posterior probability over Θ using Bayes’ rule Pr(θ|P) ∝ 𝜋(θ) Prθ(P) Step 2: make a decision based on the posterior distribution Maximum a posteriori (MAP) estimation MAP(P)=argmaxθ Pr(θ|P) Technically equivalent to MLE when 𝜋 is uniform

Example Θ={ , } Pr( | ) S = { , }n = Pr( | ) Θ={ , } S = { , }n Probability distributions: Data P = {10@ + 8@ } MLE L(O)=PrO(O)6 PrO(M)4 = 0.610 0.48 L(M)=PrM(O)6 PrM(M)4 = 0.410 0.68 L(O)>L(M), O wins MAP: prior O:0.2, M:0.8 Pr(O|P) ∝0.2 L(O) = 0.2 × 0.610 0.48 Pr(M|P) ∝0.8 L(M) = 0.8 × 0.410 0.68 Pr(M|P)> Pr(O|P), M wins Pr( | ) = Pr( | ) = 0.6

Decision making under uncertainty You have a biased coin: head w/p p You observe 10 heads, 4 tails Do you think the next two tosses will be two heads in a row? Credit: Panos Ipeirotis & Roy Radner MLE-based approach there is an unknown but fixed ground truth p = 10/14=0.714 Pr(2heads|p=0.714) =(0.714)2=0.51>0.5 Yes! Bayesian the ground truth is captured by a belief distribution Compute Pr(p|Data) assuming uniform prior Compute Pr(2heads|Data)=0.485<0 .5 No!

Prospect Theory: Motivating Example Treat lung cancer with Radiation or Surgery Q1: 18% choose Radiation Q2: 49% choose Radiation Radiation Surgery Q1 100% immediately survive 22% 5-year survive 90% immediately survive 34% 5-year survive Q2 0% die immediately 78% die in 5 years 10% die immediately 66% die in 5 years

More Thoughts Framing Effect Evaluation The baseline/starting point matters Q1: starting at “the patient dies” Q2: starting at “the patient survives” Evaluation subjective value (utility) perceptual likelihood (e.g. people tend to overweight low probability events)

Prospect Theory Framing Phase (modeling the options) Kahneman Framing Phase (modeling the options) Choose a reference point to model the options O = {o1,…,ok} Evaluation Phase (modeling the preferences) a value function v: O  R a probability weighting function π: [0,1]  R For any lottery L= (p1,…,pk)∈Lot(O) V(L) = ∑ π(pi)v(oi)