C ROWD C ENTRALITY David Karger Sewoong Oh Devavrat Shah MIT and UIUC.

C ROWD C ENTRALITY David Karger Sewoong Oh Devavrat Shah MIT and UIUC

C ROWD S OURCING

$30 million to land on moon $0.05 for Image Labeling Data Entry Transcription

M ICRO - TASK C ROWDSOURCING

Which door is the women’s restroom? Right Left M ICRO - TASK C ROWDSOURCING

Undergrad Intern: Mturk (single label): 200 image/hr, cost: $15/hr 4000 image/hr, cost: $15/hr Reliability 90% 65% Mturk (mult. labels): 500 image/hr, cost: $15/hr 90% Find cancerous tumor cells

T HE P ROBLEM Goal: Reliable estimate the tasks with minimal cost Operational questions: Task assignment Inferring the “answers”

T ASK A SSIGNMENT Random (, )-regular bipartite graphs Locally Tree-like  Sharp analysis Good expander  High Signal to Noise Ratio Tasks Batches

M ODELING T HE C ROWD Binary tasks: Worker reliability: Necessary assumption: we know A ij +-+-+

I NFERENCE P ROBLEM Majority: Oracle: titi + + - p1p1 p2p2 p3p3 p4p4 p5p5

I NFERENCE P ROBLEM Majority: Oracle: Our Approach: p1p1 p2p2 p3p3 p4p4 p5p5

P REVIEW OF R ESULTS Distribution of {p j }: observed to be Beta distribution by Holmes ‘10 + Ryker et al ‘10 EM algorithm : Dawid, Skene ‘79 + Sheng, Provost, Ipeirotis ‘10

P REVIEW OF R ESULTS

I TERATIVE I NFERENCE Iteratively learn Message-passing O(# edges) operations Approximate MAP p1p1 p2p2 p3p3 p4p4 p5p5

E XPERIMENTS : A MAZON MT URK Learning similarities Recommendations Searching, …

E XPERIMENTS : A MAZON MT URK

T ASK A SSIGNMENT : W HY R ANDOM G RAPH

K EY M ETRIC : Q UALITY OF C ROWD Crowd Quality Parameter p1p1 p2p2 p3p3 p4p4 p5p5 Theorem (Karger-Oh-Shah). Let n tasks assigned to n workers as per an ( l,l) random regular graph Let ql > √2 Then, for all n large enough (i.e. n =Ω( l O(log(1/q)) e lq ))) after O(log (1/q)) iterations of the algorithm If p j = 1 for all j q = 1 If p j = 0.5 for all j q = 0 q different from μ 2 = (E[2p-1]) 2 q≤μ≤√q

H OW G OOD I S T HIS ? To achieve target P error ≤ε, we need Per task budget l = Θ(1/q log ( 1/ε)) And this is minimax optimal Under majority voting (with any graph choice) Per task budget required is l = Ω(1/q 2 log (1/ε)) no significant gain by knowing side-information (golden question, reputation, …!) no significant gain by knowing side-information (golden question, reputation, …!)

A DAPTIVE T ASK A SSIGNMENT : D OES IT H ELP ? Theorem (Karger-Oh-Shah). Given any adaptive algorithm, let Δbe the average number of workers required per task to achieve desired P error ≤ε Then there exists { p j } with quality q so that gain through adaptivity is limited

W HICH C ROWD T O E MPLOY

B EYOND B INARY T ASKS Tasks: Workers: Assume p j ≥ 0.5 for all j Let q be quality of { p j } Results for binary task extend to this setting Per task, number of workers required scale as O(1/q log (1/ε) + 1/q log K) To achieve P error ≤ ε

B EYOND B INARY T ASKS Converting to K-1 binary problems each with quality ≥ q For each x, 1 < x ≤ K: A ij (x) = +1 if A ij ≥ x, and -1 otherwise t i (x) = +1 if t i ≥ x, and -1 otherwise Then Corresponding quality q (x) ≥ q Using result for binary problem, we have P error (x) ≤ exp(- lq /16) Therefore P error ≤ P error (2) + … + P error (K) ≤ K exp(- lq /16)

W HY A LGORITHM W ORKS ? MAP estimation Prior on probability { p j } Let f(p) be density over [0,1] Answers A=[ A ij ] Then, Belief propagation (max-product) algorithm for MAP With Haldane prior: p j is 0 or 1 with equal probability Iteration k+1: for all task-worker pairs (i,j) X i /Y j represent log likelihood ratio for t i /p j = +1 vs -1 This is exactly the same as our algorithm! And our random task assignment graph is tree-like That is, our algorithm is effectively MAP for Haldane prior

A minor variation of this algorithm T i next = T ij next = Σ W ij’ A ij’ = Σ W j’ A ij’ W j next = W ij next = Σ T i’j A i’j = Σ T i’ A i’j Then, T next = AA T T (subject to this modification) our algorithm is computing Left signular vector of A (corresponding to largest s.v.) So why compute rank-1 approximation of A ? W HY A LGORITHM W ORKS ?

Random graph + probabilistic model E[A ij ] = (t i p j - (1-p j )t i ) l /n = t i (2p j -1) l /n E[A] = t ( 2p - 1 ) T l /n That is, E[A] is rank-1 matrix And, t is the left singular vector of E[A] If A ≈ E[A] Then computing left singular vector of A makes sense Building upon Friedman-Kahn-Szemeredi ‘89 Singular vector of A provides reasonable approximation P error = O(1/ lq ) Ghosh, Kale, Mcafee ’12 For sharper result we use belief propagation

C ONCLUDING R EMARKS Budget optimal micro-task crowd sourcing via Random regular task allocation graph Belief propagation Key messages All that matters is quality of crowd Worker reputation is not useful for non-adaptive tasks Adaptation does not help due to fleeting nature of workers Reputation + worker id needed for adaptation to be effective Inference algorithm can be useful for assigning reputation Model of binary task is equivalent to K-ary tasks

O N T HAT N OTE …

C ROWD C ENTRALITY David Karger Sewoong Oh Devavrat Shah MIT and UIUC.

Similar presentations

Presentation on theme: "C ROWD C ENTRALITY David Karger Sewoong Oh Devavrat Shah MIT and UIUC."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

C ROWD C ENTRALITY David Karger Sewoong Oh Devavrat Shah MIT and UIUC.

Similar presentations

Presentation on theme: "C ROWD C ENTRALITY David Karger Sewoong Oh Devavrat Shah MIT and UIUC."— Presentation transcript:

Similar presentations

About project

Feedback