Rethinking the ESP Game Stephen Robertson, Milan Vojnovic, Ingmar Weber* Microsoft Research & Yahoo! Research *This work was done while I was a visiting researcher at MSRC.
- 2 - The ESP Game – Live Demo Show it live. (2min)live AlternativeAlternative version.
- 3 - The ESP Game - Summary Two players try to agree on a label to be added to an image No way to communicate Entered labels only revealed at end Known labels are “off-limits” ESP refers to “ Extrasensory perception” Read the other person’s mind
- 4 - The ESP Game - History Developed by Luis von Ahn and Laura Dabbish at CMU in 2004 Goal: Improve image search Licensed by Google in 2006 A prime example of harvesting human intelligence for difficult tasks Many variants (music, shapes, …)
- 5 - The ESP Game – Strengths and Weaknesses Strengths –Creative approach to a hard problem –Fun to play –Vast majority of labels are appropriate –Difficult to spam –Powerful idea: Reaching consensus with little or no communication
- 6 - The ESP Game – Strengths and Weaknesses Weaknesses –The ultimate object is ill-defined –Finds mostly general labels –Already millions of images for these –“Lowest common denominator” problem –Human time is used sub-optimally
- 7 - A “Robot” Playing the ESP Game VideoVideo of recorded play.
- 8 - The ESP Game – Labels are Predictable Synonyms are redundant –“guy” => “man” for 81% of images Co-occurrence reduces “new” information –“clouds” => “sky” for 68% of images Colors are easy to agree on –“black” is 3.3% of all occurrences
- 9 - How to Predict the Next Label T = {“beach”, “water”}, next label t = ??
How to Predict the Next Label Want to know: P(“blue” next label | {“beach”, “water”}) P(“car” next label | {“beach”, “water”}) P(“sky” next label | {“beach”, “water”}) P(“bcn” next label | {“beach”, “water”}) Problem of data sparsity!
How to Predict the Next Label Want to know: P(“t” next label | T) = P(T | “t” next label) ¢ P(“t”) / P(T) Use conditional independence … Give a random topic to two people. Ask them to each think of 3 related terms. P(A,B) = P(A|B) ¢ P(B) = P(B|A) ¢ P(A) Bayes’ Theorem
Conditional Independence Madrid sun paella beach soccer flamenco “Spain” sky water eyes azul blau bleu “blue” P(A,B|C) = P(A|C) ¢ P(B|C) P(“p1: sky”, “p2: azul” | “blue”) = P(“p1: sky” | “blue”) ¢ P(“p2: azul” | “blue”) p1 p2
How to Predict the Next Label P({s 1, s 2 } | “t”) ¢ P(“t”) / P(T) = P(s 1 | “t”) ¢ P(s 2 | “t”) ¢ P(“t”) / P(T) P(s | “t”) will still be zero very often ! smoothing P(s | “t”) = (1- ¸ ) P(s | “t”) + ¸ P(s) C.I. Assumption violated in practice, but “close enough”. Non-zero background probability
How to Predict the Next Label P(“t” next label | T already present) = s 2 T P(s | “t”) P(“t”) / C where C is a normalizing constant ¸ chosen using a “validation set”. ¸ = 0.85 in the experiments. Model trained on ~13,000 tag sets. Also see: Naïve Bayes classifier Cond. indep. assumptionBayes’ Theorem
Experimental Results: Part 1 Number of -games played 205 -images encountered 1,335 -images w/ OLT 1,105 Percentage w/ match -all images 69% -only images with OLTs 81% -all entered tags 17% Av. number of labels entered -per image4.1 -per game26.7 Agreement index -mean 2.6 -median 2.0 The “robot” plays reasonably well. The “robot” plays human-like.
Quantifying “Predictability” and “Information” So, labels are fairly predictable. But how can we quantify “predictability”?
Quantifying “Predictability” and “Information” “sunny” vs. “cloudy” tomorrow in BCN The role of a cubic dice The next single letter in “barcelo*” The next single letter in “re*” Clicked search result for “yahoo research”
Entropy and Information An event occurring with probability p corresponds to an information of -log 2 (p) bits... … number of bits required to encode in optimally compressed encoding Example: Compressed weather forecast: P(“sunny”) = 0.5 0(1 bit) P(“cloudy”) = (2 bits) P(“rain”) = (3 bits) P(“thunderstorm”) = (3 bits)
Entropy and Information p=1 ! 0 bits of information –Cubic dice showed a number in [1,6] p ¼ 0 ! many, many bits of information –The numbers for the lottery “information” = “amount of surprise”
Entropy and Information Expected information for p 1, p 2, …, p n : i -p i ¢ log(p i ) = (Shannon) entropy Might not know true p 1, p 2, …, p n, but think they are p 1, p 2, …, p n. Then, w.r.t. p you observe i -p i ¢ log(p i ) minimized for p = p p given by earlier model. p is then observed.
Experimental Results: Part 2 Av. information per position of label in tag set Later labels are more predictable. Equidistribution = 12.3 bits. “Static” distribution = 9.3 bits. Av. information per position of human suggestions Human thinks harder and harder.
Improving the ESP Game Could score points according to –log 2 (p) - Number of bits of information added to the system Have an activation time limit for “obvious” labels - Remove the immediate satisfaction for simple matches Hide off-limits terms - Have to be more careful to avoid “obvious” labels Try to match “experts” - Use previous tags or meta information Educate players - Use previously labeled images to unlearn behavior Automatically expand the off-limits list - Easy, but 10+ terms not practical
Questions Thank you!