On the Limits of Dictatorial Classification Reshef Meir School of Computer Science and Engineering, Hebrew University Joint work with Shaull Almagor, Assaf Michaely and Jeffrey S. Rosenschein
Strategy-Proof Classification An Example Motivation Our Model and previous results Filling the gap: proving a lower bound Filling the gap: proving a lower bound The weighted case The weighted case
The Motivating Questions Do “strategyproof” considerations apply to learning? If agents have an incentive to lie, what can we do about it? – Approximation – Randomization – And even clever use of dictators…
ERM MotivationModelResults Strategic labeling: an example Introduction 5 errors
There is a better classifier! (for me…) MotivationModelResultsIntroduction
If I just change the labels… MotivationModelResultsIntroduction 2+5 = 7 errors
Classification The Supervised Classification problem: – Input: a set of labeled data points {(x i,y i )} i=1..m – output: a classifier c from some predefined concept class C ( e.g., functions of the form f : X {-,+} ) – We usually want c to classify correctly not just the sample, but to generalize well, i.e., to minimize R(c) ≡ the expected number of errors w.r.t. the distribution D (the 0/1 loss function) MotivationResultsIntroductionModel E (x,y)~D [ c(x)≠y ]
Classification (cont.) ERM (Empirical Risk Minimizer) A common approach is to return the ERM (Empirical Risk Minimizer), i.e., the concept in C that is the best w.r.t. the given samples (has the lowest number of errors) Generalizes well under some assumptions on the concept class C (e.g., linear classifiers tend to generalize well) With multiple experts, we can’t trust our ERM! MotivationResultsIntroductionModel
Where do we find “experts” with incentives? Example 1: A firm learning purchase patterns – Information gathered from local retailers – The resulting policy affects them – “the best policy, is the policy that fits my pattern” IntroductionModelResultsMotivation
Users Reported Dataset Classification Algorithm Classifier IntroductionModelResults Example 2: Internet polls / polls of experts Motivation
IntroductionModelResults Motivation from other domains Motivation Aggregating partitions Judgment aggregation Facility location (on the binary cube) AgentABA & BA | ~B TFFT FTFF FFFT
A problem instance is defined by Set of agents I = {1,...,n} A set of data points X = {x 1,...,x m } X For each x k X agent i has a label y ik { , } – Each pair s ik= x k,y ik is a sample – All samples of a single agent compose the labeled dataset S i = {s i1,...,s i,m(i) } The joint dataset S= S 1, S 2,…, S n is our input – m=|S| We denote the dataset with the reported labels by S’ IntroductionMotivationResultsModel
Agent 1 Agent 2 Agent 3 Input: Example – – – – – – – – – – X X m Y 1 {-,+} m Y 2 {-,+} m Y 3 {-,+} m S = S 1, S 2,…, S n = (X,Y 1 ),…, (X,Y n ) IntroductionMotivationResultsModel – – + + – – – – – – – –
Mechanisms A Mechanism M receives a labeled dataset S and outputs c = M (S) C Private risk of i: R i (c,S) = |{k: c(x ik ) y ik }| / m i Global risk: R (c,S) = |{i,k: c(x ik ) y ik }| / m We allow non-deterministic mechanisms – Measure the expected risk IntroductionMotivationResultsModel % of errors on S i % of errors on S
ERM We compare the outcome of M to the ERM: c* = ERM(S) = argmin( R (c),S) r* = R (c*,S) c Cc C Can our mechanism simply compute and return the ERM? IntroductionMotivationResultsModel
(Lying) Requirements 1.Good approximation: S R ( M (S),S) ≤ α ∙r* 2.Strategy-Proofness (SP): i,S,S i ‘ R i ( M (S -i, S i ‘),S) ≥ R i ( M (S),S) ERM(S) is 1-approximating but not SP ERM(S 1 ) is SP but gives bad approximation Are there any mechanisms that guarantee both SP and good approximation? IntroductionMotivationResultsModel MOST IMPORTANT SLIDE (Truth)
A study of SP mechanisms in Regression learning – O. Dekel, F. Fischer and A. D. Procaccia, SODA (2008), JCSS (2009). [supervised learning] No SP mechanisms for Clustering – J. Perote-Peña and J. Perote, Economics Bulletin (2003) [unsupervised learning] IntroductionMotivationModelResults Related work
Results A simple case Tiny concept class: |C|= 2 Either “all positive” or “all negative” Theorem: There is a SP 2-approximation mechanism There are no SP α-approximation mechanisms, for any α<2 IntroductionMotivationModel Meir, Procaccia and Rosenschein, AAAI 2008 Previous work
Results General concept classes Theorem: Selecting a dictator at random is SP and guarantees approximation – True for any concept class C – Generalizes well from sampled data when C has a bounded VC dimension Open question #1: are there better mechanisms? Open question #2: what if agents are weighted? IntroductionMotivationModel Meir, Procaccia and Rosenschein, IJCAI 2009 Previous work
A lower bound IntroductionMotivationModelResults Theorem: There is a concept class C (where |C|=3), for which any SP mechanism has an approximation ratio of at least Our main result: o Matching the upper bound from IJCAI-09 o Proof is by a careful reduction to a voting scenario o We will see the proof sketch
Proof sketch IntroductionMotivationModelResults Gibbard [‘77] proved that every (randomized) SP voting rule for 3 candidates, must be a lottery over dictators*. We define X = {x,y,z}, and C as follows: We also restrict the agents, so that each agent can have mixed labels on just one point xyz cxcx +-- cycy -+- czcz --+ xyz
Proof sketch (cont.) IntroductionMotivationModelResults xyz Suppose that M is SP
Proof sketch (cont.) IntroductionMotivationModelResults xyz Suppose that M is SP 1. M must be monotone on the mixed point 2. M must ignore the mixed point 3. M is a (randomized) voting rule c z > c y > c x c x > c z > c y
Proof sketch (cont.) IntroductionMotivationModelResults xyz By Gibbard [‘77], M is a random dictator 5. We construct an instance where random dictators perform poorly c z > c y > c x c x > c z > c y
Weighted agents IntroductionMotivationModelResults We must select a dictator randomly However, probability may be based on weight Naïve approach: o Only gives 3-approximation An optimal SP algorithm: o Matches the lower bound of
Future work Other concept classes Other loss functions (linear loss, quadratic loss,…) Alternative assumptions on structure of data Other models of strategic behavior … IntroductionMotivationModelResults