Matchin: Eliciting User Preferences with an Online Game Severin Hacker, and Luis von Ahn Carnegie Mellon University SIGCHI 2009.

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Statistical Reasoning for everyday life
Probability: Review The state of the world is described using random variables Probabilities are defined over events –Sets of world states characterized.
Rules for Means and Variances
Last lecture summary Which measures of variability do you know? What are they advantages and disadvantages? Empirical rule.
COLLABORATIVE FILTERING Mustafa Cavdar Neslihan Bulut.
The adjustment of the observations
Introduction to Supervised Machine Learning Concepts PRESENTED BY B. Barla Cambazoglu February 21, 2014.
Crowdsourcing 04/11/2013 Neelima Chavali ECE 6504.
Active Learning and Collaborative Filtering
2. Introduction Multiple Multiplicative Factor Model For Collaborative Filtering Benjamin Marlin University of Toronto. Department of Computer Science.
Robust Moving Object Detection & Categorization using self- improving classifiers Omar Javed, Saad Ali & Mubarak Shah.
Statistical Significance What is Statistical Significance? What is Statistical Significance? How Do We Know Whether a Result is Statistically Significant?
Carnegie Mellon 1 Maximum Likelihood Estimation for Information Thresholding Yi Zhang & Jamie Callan Carnegie Mellon University
Statistical Significance What is Statistical Significance? How Do We Know Whether a Result is Statistically Significant? How Do We Know Whether a Result.
Ensemble Learning: An Introduction
Common Voting Rules as Maximum Likelihood Estimators Vincent Conitzer and Tuomas Sandholm Carnegie Mellon University, Computer Science Department.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
Presented by Zeehasham Rasheed
Predicting Sequential Rating Elicited from Humans Aviv Zohar & Eran Marom.
Experimental Evaluation
The Analysis of Variance
The Social Web: A laboratory for studying s ocial networks, tagging and beyond Kristina Lerman USC Information Sciences Institute.
VOCABULARY  Deck or pack  Suit  Hearts  Clubs  Diamonds  Spades  Dealer  Shuffle  Pick up  Rank  Draw  Set  Joker  Jack 
Game Theory Statistics 802. Lecture Agenda Overview of games 2 player games representations 2 player zero-sum games Render/Stair/Hanna text CD QM for.
It is possible to extract a global “beauty” ranking within a large collection of images After a person has played the game on a small number of pairs.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Lecture 5 Note: Some slides and/or pictures are adapted from Lecture slides / Books of Dr Zafar Alvi. Text Book - Aritificial Intelligence Illuminated.
Error Analysis Accuracy Closeness to the true value Measurement Accuracy – determines the closeness of the measured value to the true value Instrument.
12/07/2008UAI 2008 Cumulative Distribution Networks and the Derivative-Sum-Product Algorithm Jim C. Huang and Brendan J. Frey Probabilistic and Statistical.
Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers Victor Sheng, Foster Provost, Panos Ipeirotis KDD 2008 New York.
Introduction to Inferential Statistics. Introduction  Researchers most often have a population that is too large to test, so have to draw a sample from.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved OPIM 303-Lecture #9 Jose M. Cruz Assistant Professor.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
1 1 Slide Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination n Model Assumptions n Testing.
Learning BlackJack with ANN (Aritificial Neural Network) Ip Kei Sam ID:
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Tag Ranking Present by Jie Xiao Dept. of Computer Science Univ. of Texas at San Antonio.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Chapter 13 Multiple Regression
Chapter 13 - ANOVA. ANOVA Be able to explain in general terms and using an example what a one-way ANOVA is (370). Know the purpose of the one-way ANOVA.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Using Measurement Scales to Build Marketing Effectiveness CHAPTER ten.
MLE’s, Bayesian Classifiers and Naïve Bayes Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 30,
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
Evaluation of Recommender Systems Joonseok Lee Georgia Institute of Technology 2011/04/12 1.
Measures of variability: understanding the complexity of natural phenomena.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Pairwise Preference Regression for Cold-start Recommendation Speaker: Yuanshuai Sun
Bayes Theorem. Prior Probabilities On way to party, you ask “Has Karl already had too many beers?” Your prior probabilities are 20% yes, 80% no.
Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR.
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.
NTU & MSRA Ming-Feng Tsai
Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.
Predicting Consensus Ranking in Crowdsourced Setting Xi Chen Mentors: Paul Bennett and Eric Horvitz Collaborator: Kevyn Collins-Thompson Machine Learning.
Chapter Twelve Copyright © 2006 McGraw-Hill/Irwin Attitude Scale Measurements Used In Survey Research.
Applying the Churchman/Ackoff Value Estimation Procedure to Spatial Modeling Susan L. Ose MGIS Capstone Presentation Penn State University - World Campus.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Adopted from Bin UIC Recommender Systems Adopted from Bin UIC.
CS 4/527: Artificial Intelligence
CS 188: Artificial Intelligence
Discrete Event Simulation - 4
Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang
LECTURE 23: INFORMATION THEORY REVIEW
Probabilistic Latent Preference Analysis
Presentation transcript:

Matchin: Eliciting User Preferences with an Online Game Severin Hacker, and Luis von Ahn Carnegie Mellon University SIGCHI 2009

Matchin A game that asks two randomly chosen partners "which of these two images do you think your partner prefers?"

Some Findings It is possible to extract a global "beauty" ranking within a large collection of images. It is possible to extract the person's general image preferences. Their model can determine a player's gender with high probability.

A Taxonomy of Methods Absolute Versus Relative Judgments Total Versus Partial Judgments Random Access Versus Predefined Access "I Like" Versus "Others Like" Direct Versus Indirect

Existing Methods Flickr Interestingness Voting Hot or Not

The Mechanism Matchin is a two-player game that is played over the Internet. Every game takes two minutes. –One pair of images usually takes between two to five seconds. Matchin uses a collection of 80,000 images from Flickr that were gathered October 2007.

The Scoring Function Matchin uses a sigmoid function for scoring games. Constant scoring function –Players could get many points by quickly picking the images at random. Exponential scoring function –The rewards sometimes became too high

The Data The game was launched on May 15, Within only four months, 86,686 games had been played by 14,993 players. There have been 3,562,856 individual decisions (clicks) on images. An individual decision/record is stored in the form: –

Ranking Functions Empirical Winning Rate (EWR) ELO Rating TrueSkill Rating

Empirical Winning Rate (EWR) Function: Two problems: –For images that have a low degree, the empirical winning rate might be artificially high or low. –It does not take the quality of the competing image into account.

ELO Rating (1/2) The ELO rating system was introduced for rating chess players. Each chess player’s performance in a game is modeled as a normally distributed random variable. The mean of that random variable should reflect the player’s true skill and is called the player’s ELO rating.

ELO Rating (2/2) Expected score: ELO rating: pdpdp

TrueSkill Rating (1/2) Every player’s skill s is modeled as a normally distributed random variable centered around a mean μ and per-player variance σ 2. A player’s particular performance in a game then is drawn from a normal distribution with mean s and a per-game variance β 2.

TrueSkill Rating (2/2) Update: Conservative skill estimate:

Collaborative Filtering (1/2) In the collaborative filtering setting, they want to find out about each individual's preferences –recommend images to each user based on his/her preferences –compare users and images with each other They have developed a new collaborative filtering algorithm they call “Relative SVD”

Collaborative Filtering (2/2) The user feature vectors: The image feature vectors: The amount by which user i likes image j Data: a set D of triplets (i,j,k) The error for a particular decision: The total sum of squared errors (SSE):

Comparison of the Models

Local Minimum Do humans learn while playing the game? They compared the agreement rate of first-time players and other players. –the first-time players: 69.0% –the more experienced players: 71.8% They have also measured if people learn within a game. –the first half of the game: 67% –the second half of the game : 64%

Gender Prediction The conditional entropy: The necessary conditional probabilities Pr(G=g|X=x) can be computed with Bayes' rule given the class conditionals Pr(X=x|G=g). The naïve Bayes classifier will maximize the likelihood of the data: The total accuracy is 78.3%

The Top Ranked Images

Discussion (1/2) The highest ranked pictures –sunsets, animals, flowers, churches, bridges, and famous tourist attractions –neither provocative nor offensive The worst pictures –taken indoors and include a person –blurry or too dark –screenshots or pictures of documents or text

Discussion (2/2) There are substantial differences among players in judging images, and taking those differences into account can greatly help in predicting the users’ behavior on new images. More experienced players had about the same error rate as new players.

Conclusion The main contribution of this paper is to provide a new method to elicit user preferences. They compared several algorithms for combining these relative judgments into a total ordering and found that they can correctly predict a user’s behavior in 70% of the cases. They describe a new algorithm called Relative SVD to perform collaborative filtering on pair-wise relative judgments. They present a gender test that asks users to make some relative judgments and can predict a random user’s gender in roughly 4 out of 5 cases.