On the Limits of Dictatorial Classification Reshef Meir School of Computer Science and Engineering, Hebrew University Joint work with Shaull Almagor, Assaf.

Slides:



Advertisements
Similar presentations
Truthful Mechanisms for Combinatorial Auctions with Subadditive Bidders Speaker: Shahar Dobzinski Based on joint works with Noam Nisan & Michael Schapira.
Advertisements

1 AI and Economics: The Dynamic Duo Ariel Procaccia Center for Research on Computation and Society Harvard SEAS AI AND ECONOMICS DYNAMIC DUO THE.
Linear Regression.
Ioannis Caragiannis, Jason A. Covey, Michal Feldman, Christopher M. Homan, Christos Kaklamanis, Nikos Karanikolask, Ariel D. Procaccia, Je ff rey S. Rosenschein.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Raef Bassily Adam Smith Abhradeep Thakurta Penn State Yahoo! Labs Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds Penn.
FilterBoost: Regression and Classification on Large Datasets Joseph K. Bradley 1 and Robert E. Schapire 2 1 Carnegie Mellon University 2 Princeton University.
1 Truthful Mechanism for Facility Allocation: A Characterization and Improvement of Approximation Ratio Pinyan Lu, MSR Asia Yajun Wang, MSR Asia Yuan Zhou,
Reshef Meir, Ariel D. Procaccia, and Jeffrey S. Rosenschein.
Ariel D. Procaccia (Microsoft)  Best advisor award goes to...  Thesis is about computational social choice Approximation Learning Manipulation BEST.
Speaker: Ariel Procaccia Joint work with: Michael Zuckerman, Jeff Rosenschein Hebrew University of Jerusalem.
MIX AND MATCH Itai Ashlagi, Felix Fischer, Ian Kash, Ariel Procaccia (Harvard SEAS)
The Nature of Statistical Learning Theory by V. Vapnik
Strategy-Proof Classification Reshef Meir School of Computer Science and Engineering, Hebrew University A joint work with Ariel. D. Procaccia and Jeffrey.
1 A Prediction Interval for the Misclassification Rate E.B. Laber & S.A. Murphy.
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.
Incentive Compatible Regression Learning Ofer Dekel, Felix A. Fischer and Ariel D. Procaccia.
ROC Curves.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Strategic Behavior in Multi-Winner Elections A follow-up on previous work by Ariel Procaccia, Aviv Zohar and Jeffrey S. Rosenschein Reshef Meir The School.
Statistical Learning Theory: Classification Using Support Vector Machines John DiMona Some slides based on Prof Andrew Moore at CMU:
Part I: Classification and Bayesian Learning
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Strategy-Proof Classification Reshef Meir School of Computer Science and Engineering, Hebrew University A joint work with Ariel. D. Procaccia and Jeffrey.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
An Algorithm for the Coalitional Manipulation Problem under Maximin Michael Zuckerman, Omer Lev and Jeffrey S. Rosenschein AAMAS’11.
An Algorithm for the Coalitional Manipulation Problem under Maximin Michael Zuckerman, Omer Lev and Jeffrey S. Rosenschein (Simulations by Amitai Levy)
Machine Learning CSE 681 CH2 - Supervised Learning.
Trust-Aware Optimal Crowdsourcing With Budget Constraint Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department.
Bayesian Networks Martin Bachler MLA - VO
1 CS546: Machine Learning and Natural Language Discriminative vs Generative Classifiers This lecture is based on (Ng & Jordan, 02) paper and some slides.
By: Amir Ronen, Department of CS Stanford University Presented By: Oren Mizrahi Matan Protter Issues on border of economics & computation, 2002.
Potential-Based Agnostic Boosting Varun Kanade Harvard University (joint work with Adam Tauman Kalai (Microsoft NE))
CSE 446 Logistic Regression Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
CLASSIFICATION: Ensemble Methods
Mechanism Design on Discrete Lines and Cycles Elad Dokow, Michal Feldman, Reshef Meir and Ilan Nehama.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
INTRODUCTION TO Machine Learning 3rd Edition
Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
A Membrane Algorithm for the Min Storage problem Dipartimento di Informatica, Sistemistica e Comunicazione Università degli Studi di Milano – Bicocca WMC.
Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University Today: Computational Learning Theory Probably Approximately.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
Page 1 CS 546 Machine Learning in NLP Review 1: Supervised Learning, Binary Classifiers Dan Roth Department of Computer Science University of Illinois.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
Mechanism design for computationally limited agents (previous slide deck discussed the case where valuation determination was complex) Tuomas Sandholm.
Mechanism design for computationally limited agents (last lecture discussed the case where valuation determination was complex) Tuomas Sandholm Computer.
Empirical risk minimization
Introduction to Data Science Lecture 7 Machine Learning Overview
CH. 2: Supervised Learning
Generalization and adaptivity in stochastic convex optimization
Rank Aggregation.
CSCI B609: “Foundations of Data Science”
Computational Learning Theory
Computer Vision Chapter 4
Computational Learning Theory
Empirical risk minimization
Generalization bounds for uniformly stable algorithms
Supervised machine learning: creating a model
INTRODUCTION TO Machine Learning 3rd Edition
Presentation transcript:

On the Limits of Dictatorial Classification Reshef Meir School of Computer Science and Engineering, Hebrew University Joint work with Shaull Almagor, Assaf Michaely and Jeffrey S. Rosenschein

Strategy-Proof Classification An Example Motivation Our Model and previous results Filling the gap: proving a lower bound Filling the gap: proving a lower bound The weighted case The weighted case

The Motivating Questions Do “strategyproof” considerations apply to learning? If agents have an incentive to lie, what can we do about it? – Approximation – Randomization – And even clever use of dictators…

ERM MotivationModelResults Strategic labeling: an example Introduction 5 errors

There is a better classifier! (for me…) MotivationModelResultsIntroduction

If I just change the labels… MotivationModelResultsIntroduction 2+5 = 7 errors

Classification The Supervised Classification problem: – Input: a set of labeled data points {(x i,y i )} i=1..m – output: a classifier c from some predefined concept class C ( e.g., functions of the form f : X  {-,+} ) – We usually want c to classify correctly not just the sample, but to generalize well, i.e., to minimize R(c) ≡ the expected number of errors w.r.t. the distribution D (the 0/1 loss function) MotivationResultsIntroductionModel E (x,y)~D [ c(x)≠y ]

Classification (cont.) ERM (Empirical Risk Minimizer) A common approach is to return the ERM (Empirical Risk Minimizer), i.e., the concept in C that is the best w.r.t. the given samples (has the lowest number of errors) Generalizes well under some assumptions on the concept class C (e.g., linear classifiers tend to generalize well) With multiple experts, we can’t trust our ERM! MotivationResultsIntroductionModel

Where do we find “experts” with incentives? Example 1: A firm learning purchase patterns – Information gathered from local retailers – The resulting policy affects them – “the best policy, is the policy that fits my pattern” IntroductionModelResultsMotivation

Users Reported Dataset Classification Algorithm Classifier IntroductionModelResults Example 2: Internet polls / polls of experts Motivation

IntroductionModelResults Motivation from other domains Motivation Aggregating partitions Judgment aggregation Facility location (on the binary cube) AgentABA & BA | ~B TFFT FTFF FFFT

A problem instance is defined by Set of agents I = {1,...,n} A set of data points X = {x 1,...,x m }  X For each x k  X agent i has a label y ik  { ,  } – Each pair s ik=  x k,y ik  is a sample – All samples of a single agent compose the labeled dataset S i = {s i1,...,s i,m(i) } The joint dataset S=  S 1, S 2,…, S n  is our input – m=|S| We denote the dataset with the reported labels by S’ IntroductionMotivationResultsModel

Agent 1 Agent 2 Agent 3 Input: Example – – – – – – – – – – X  X m Y 1  {-,+} m Y 2  {-,+} m Y 3  {-,+} m S =  S 1, S 2,…, S n  =  (X,Y 1 ),…, (X,Y n )  IntroductionMotivationResultsModel – – + + – – – – – – – –

Mechanisms A Mechanism M receives a labeled dataset S and outputs c = M (S)  C Private risk of i: R i (c,S) = |{k: c(x ik )  y ik }| / m i Global risk: R (c,S) = |{i,k: c(x ik )  y ik }| / m We allow non-deterministic mechanisms – Measure the expected risk IntroductionMotivationResultsModel % of errors on S i % of errors on S

ERM We compare the outcome of M to the ERM: c* = ERM(S) = argmin( R (c),S) r* = R (c*,S) c  Cc  C Can our mechanism simply compute and return the ERM? IntroductionMotivationResultsModel

(Lying) Requirements 1.Good approximation:  S R ( M (S),S) ≤ α ∙r* 2.Strategy-Proofness (SP):  i,S,S i ‘ R i ( M (S -i, S i ‘),S) ≥ R i ( M (S),S) ERM(S) is 1-approximating but not SP ERM(S 1 ) is SP but gives bad approximation Are there any mechanisms that guarantee both SP and good approximation? IntroductionMotivationResultsModel MOST IMPORTANT SLIDE (Truth)

A study of SP mechanisms in Regression learning – O. Dekel, F. Fischer and A. D. Procaccia, SODA (2008), JCSS (2009). [supervised learning] No SP mechanisms for Clustering – J. Perote-Peña and J. Perote, Economics Bulletin (2003) [unsupervised learning] IntroductionMotivationModelResults Related work

Results A simple case Tiny concept class: |C|= 2 Either “all positive” or “all negative” Theorem: There is a SP 2-approximation mechanism There are no SP α-approximation mechanisms, for any α<2 IntroductionMotivationModel Meir, Procaccia and Rosenschein, AAAI 2008 Previous work

Results General concept classes Theorem: Selecting a dictator at random is SP and guarantees approximation – True for any concept class C – Generalizes well from sampled data when C has a bounded VC dimension Open question #1: are there better mechanisms? Open question #2: what if agents are weighted? IntroductionMotivationModel Meir, Procaccia and Rosenschein, IJCAI 2009 Previous work

A lower bound IntroductionMotivationModelResults Theorem: There is a concept class C (where |C|=3), for which any SP mechanism has an approximation ratio of at least Our main result: o Matching the upper bound from IJCAI-09 o Proof is by a careful reduction to a voting scenario o We will see the proof sketch

Proof sketch IntroductionMotivationModelResults Gibbard [‘77] proved that every (randomized) SP voting rule for 3 candidates, must be a lottery over dictators*. We define X = {x,y,z}, and C as follows: We also restrict the agents, so that each agent can have mixed labels on just one point xyz cxcx +-- cycy -+- czcz --+ xyz

Proof sketch (cont.) IntroductionMotivationModelResults xyz Suppose that M is SP

Proof sketch (cont.) IntroductionMotivationModelResults xyz Suppose that M is SP 1. M must be monotone on the mixed point 2. M must ignore the mixed point 3. M is a (randomized) voting rule c z > c y > c x c x > c z > c y

Proof sketch (cont.) IntroductionMotivationModelResults xyz By Gibbard [‘77], M is a random dictator 5. We construct an instance where random dictators perform poorly c z > c y > c x c x > c z > c y

Weighted agents IntroductionMotivationModelResults We must select a dictator randomly However, probability may be based on weight Naïve approach: o Only gives 3-approximation An optimal SP algorithm: o Matches the lower bound of

Future work Other concept classes Other loss functions (linear loss, quadratic loss,…) Alternative assumptions on structure of data Other models of strategic behavior … IntroductionMotivationModelResults