Download presentation
Presentation is loading. Please wait.
Published byMaurice Lynch Modified over 6 years ago
1
Optimistic Knowledge Gradient Policy for Optimal Budget Allocation in Crowdsourcing
Xi Chen Mentor: Denny Zhou In collaboration with: Qihang Lin Machine Learning Department Carnegie Mellon University Qihang Lin is another intern here at Microsoft.
2
Crowdsourcing Services
Building predictive models: collecting reliable labels for training Crowdsourcing: outsourcing labeling tasks to a group of workers, who usually are not experts on these tasks
3
Challenge: budget allocation in crowdsourcing
Labels from crowd are very noisy: repeated labeling. Aggregate labels to infer the truth. More labels lead to higher confidence. No free lunch: each labeling will cost a certain amount of money! Different workers have different reliability Goal: given a fixed amount of budget, how to sequentially allocate the budgets on different item-worker pairs so that the overall labeling accuracy is maximized ? Estimate items’ difficulty and workers’ reliability and incorporate the estimation into the sequential allocation process. Simplest setting: Binary Labeling Homogenous Workers
4
Binary Labeling by Homogenous Workers
𝐾 items: 𝑖={1, …,𝐾} Soft-label 𝜃 𝑖 = Pr 𝑍 𝑖 =1 ∈ 0,1 : unknown Easy: 𝜃 𝑖 →0, 𝜃 𝑖 →1. Difficult : 𝜃 𝑖 →0.5 Positive Set: 𝐻 ∗ ={𝑖: 𝜃 𝑖 ≥0.5} Example: identify whether individuals are adult or not
5
Binary Labeling by Homogenous Workers
Homogenous Workers: provide labeling according to Bernoulli( 𝜃 𝑖 ) Coin Tossing Problem: There are K different biased coins Positive/Head Set: Total budget 𝑇 : the number of labels that we can acquire Challenge: How to dynamically allocate budgets on 𝐾 items so that the overall accuracy can be maximized ? [CrowdSynth: Kamar et al. 12] Galaxy Zoo worker + - Unknown Labeling Procedure: Tossing the Coin
6
Binary Labeling by Homogenous Workers
Dynamic Budget Allocation: Step 1: Dynamically acquire labels from the crowd for different items: Step 2: When the budget 𝑇 is exhausted, make an inference about the positive set 𝐻 𝑇 based on collected labels. Homogenous workers: majority vote Heterogeneous workers: involves the reliability of the workers Goal: maximize the accuracy: Theory Tools: Bayesian Markov Decision Process Bayesian Statistical Decision Bayesian Sequential Optimization Bayesian Reinforcement Learning
7
Roadmap Bayesian Markov Decision Process &
Optimal Allocation Policy via Dynamic Programming Approximate Policy: Optimistic Knowledge Gradient Modeling Workers’ Reliability Extensions: Incorporate of Feature Information Extensions: Multi-Class Labeling
8
Bayesian Markov Decision Process
Beta Prior Conjugate prior of Bernoulli At each stage: Current state: Decision rule: The Decision rule is Markovian Taking the Observation: Allocation Policy: 𝑎 𝑖 : counts of 1s, 𝑏 𝑖 : counts of -1s
9
Bayesian Markov Decision Process
State Transition and Transition Probability: Sample Path Filtration: State: Action:
10
Final Inference on Positive Set
When the budget 𝑇 is exhausted, make an inference about the positive set 𝐻 𝑇 based on collected labels. Bayesian Decision Rule Explain derivations 𝑎 𝑖 𝑇 : counts of observed 1s plus 𝑎 𝑖 0 𝑏 𝑖 𝑇 : counts of observed -1s plus 𝑏 𝑖 0 𝑎 𝑖 0 = 𝑏 i 0 Majority Vote
11
Expected Accuracy Maximization
Value Function: optimization over the policy Optimization: Markov Decision Process and Dynamic Programming Stage-wise Reward v.s. Final Accuracy: No stage-wise reward
12
Stage-wise Reward Value Function Telescope Expansion
Expected Stage-wise Reward [Ng et al. 99 Xie et al. 11]
13
Markov Decision Process
Value Function Stage-wise Reward Markov Decision Process (Finite-horizon) State
14
Optimal Policy via Dynamic Programming
Finite Horizon Markov Decision Process: Dynamic Programming (a.k.a. Backward Induction) Curse of dimensionality: Approximate policies are needed
15
Roadmap Bayesian Markov Decision Process &
Optimal Allocation Policy via Dynamic Programming Approximate Policy: Optimistic Knowledge Gradient Modeling Workers’ Reliability Extensions: Incorporate of Feature Information Extensions: Multi-Class Labeling
16
Approximate Policies Uniform Sampling:
Finite-Horizon Gittins Index Rule: Decompose the joint state space into state space for each single item 𝑂( 𝑇 2 ) Infinite-horizon and discounted reward: Gittins index rule is optimal Finite-horizon and non-discounted reward: suboptimal policy Exact Computation: (Time & Space) Approximate Computation: (Time & Space) [J.C.Gittins, 79] [Nino-Mora, 11]
17
Knowledge Gradient Knowledge Gradient (KG):
Myopic/single-step look-ahead policy: optimal if only one labeling is still remaining. Deterministic KG: breaking ties by choosing the smallest index Randomized KG: breaking ties at random [Powell, 07]
18
Optimistic Knowledge Gradient
Pessimistic Knowledge Gradient: Proof Sketch: 𝑅 + 𝑎,𝑏 >0 lim 𝑎+𝑏→∞ 𝑅 + 𝑎,𝑏 =0 𝑇→∞ , each item will be labeled infinitely many times By strong law of large number, 𝐻 𝑇 = 𝐻 ∗ , 𝑎.𝑠. Inconsistent Policy
19
Optimistic Knowledge Gradient
20
Conditional Value-at-Risk
Conditional Value-at-Risk (CVaR) [Rockafellar and Uryasev, 02] Max Reward Value-at-Risk: 𝛼 -upper quantile Conditional Value-at-Risk (CVaR): Expected reward exceeding Va R 𝛼 (𝑅) CVaR is a consistent policy for any 𝛼<1 Knowledge Gradient Optimistic Knowledge Gradient
21
Experiments Simulated Data Recognizing Textual Entailment Data
(Snow at .al. EMNLP’08) 𝐾=50, 𝜃 𝑖 ∼Beta(1,1) 𝑇=2𝐾, 3𝐾,…, 10𝐾 𝐾=800, 𝜃 𝑖 ∼Beta(1,1) 𝑇=2𝐾, 3𝐾,…, 10𝐾
22
Roadmap Bayesian Markov Decision Process &
Optimal Allocation Policy via Dynamic Programming Approximate Policy: Optimistic Knowledge Gradient Modeling Workers’ Reliability Extensions: Incorporate of Feature Information Extensions: Multi-Class Labeling
23
Heterogeneous Workers: modeling reliability
Labeling Matrix: 𝑍 𝑖𝑗 ∈ −1,1 Heterogeneous workers: Modeling reliability of workers to facilitate the estimation of true label Assign more items to reliable workers 𝑁 items 1≤𝑖≤𝑁 𝜃 𝑖 = Pr ( 𝑍 𝑖 =1) 𝜃 𝑖 ∼Beta( 𝑎 𝑖 0 , 𝑏 𝑖 0 ) 𝑀 workers 1≤𝑗≤𝑀 Reliability: 𝜌 𝑗 =Pr( 𝑍 𝑖𝑗 = 𝑍 𝑖 𝑍 𝑖 𝜌 𝑗 ∼Beta( 𝑐 𝑗 0 , 𝑑 𝑗 0 ) Action space: 𝑖,𝑗 ∈ 1,…, 𝑁 ×{1,…, 𝑀} Likelihood (Law of total probability): [Dawid and Skene, 79] Homogeneous Worker Model
24
Variational Approximation and Moment Matching
Prior (product of Beta): Likelihood: Posterior: Two-Coin Model: No longer product of Beta distribution Variational Approximation: Approximate the posterior by the product of marginal distributions Approximate marginal distribution by beta distribution using moment matching
25
Optimistic Knowledge Gradient
Reward of getting label 1 Reward of getting label -1
26
Experiments Simulated Data Recognizing Textual Entailment Data
𝐾=50, 𝜃 𝑖 ∼Beta(1,1) 𝑀=10, 𝜌 𝑗 ∼Beta(4,1) 𝑇=2𝐾, 3𝐾,…, 10𝐾 Recognizing Textual Entailment Data (Snow at .al. EMNLP’08) 𝐾=800, 𝜃 𝑖 ∼Beta(1,1), 𝑀=164, 𝜌 𝑗 ∼Beta(4,1) 𝑇=2𝐾, 3𝐾,…, 10𝐾 Homogenous (Perfect) Workers Heterogeneous Workers Best accuracy (92.25% ) with only 40% of the budget
27
Roadmap Bayesian Markov Decision Process &
Optimal Allocation Policy via Dynamic Programming Approximate Policy: Optimistic Knowledge Gradient Modeling Workers’ Reliability Extensions: Incorporate of Feature Information Extensions: Multi-Class Labeling
28
Incorporate Feature Information
Each item 𝑖 has a feature vector x 𝑖 ∈ R 𝑝 Posterior: Laplace Approximation: Updated Mean: Updated Covariance: Bottleneck: Variational Bayesian logistic regression update Prior: Rank-1 update: Sherman-Morrison [Jaakkola & Jordan, 00]
29
Incorporate Feature Information
Simulated Data 𝐾=50, dimension of feature 𝑝=10, 𝒘 ~ 𝑁 0,0.1∗𝑰 , 𝒙 ~ 𝑁 0,0.3 𝑖−𝑗
30
Roadmap Bayesian Markov Decision Process &
Optimal Allocation Policy via Dynamic Programming Approximate Policy: Optimistic Knowledge Gradient Modeling Workers’ Reliability Extensions: Incorporate of Feature Information Extensions: Multi-Class Labeling
31
Multi-Class Labeling Classes: 𝑐= 1, …, 𝐶
𝜃 𝑖 ∈[0,1]: underlying probability of being positive 𝜽 𝑖 = 𝜃 𝑖1 , …, 𝜃 𝑖𝐶 : 𝑐=1 𝐶 𝜃 𝑖𝑐 =1 𝜃 𝑖𝑐 : underlying probability of belong to class c 𝜃 𝑖 :Beta Prior 𝜽 𝑖 = 𝜃 𝑖1 , …, 𝜃 𝑖𝐶 : Dirichlet Prior 𝑦 𝑖 𝑡 ∈ −1, 1 :Bernoulli 𝜃 𝑖 𝑦 𝑖 𝑡 ∈ 1, 2, …,𝐶 :Categorical 𝜽 𝑖
32
Real Experiment Stanford Image Data (4-classes of dogs)
(Zhou at .al. NIPS’12, labeled by Amazon MTurk) Bing Search Relevance Data (5 Ratings) 𝐾=807, 𝜃 𝑖 ∼Dirichlet(1,…,1) 𝑇=2𝐾, 3𝐾,…, 10𝐾 𝐾=2653, 𝜃 𝑖 ∼Dirichlet(1,…,1) 𝑇=2𝐾, 3𝐾,…, 6𝐾
33
Conclusions A general MDP framework for budget allocation in crowdsourcing Optimistic Knowledge Gradient Policy: approximate dynamic programming Future Works: Saving computational cost (e.g., features / multi-class settings) Budget allocation in other settings in crowdsourcing (e.g., rating) Make the current framework more practical: batch assignment (assign a set of items to a worker at each stage) Apply algorithms to real platforms in Bing
34
Acknowledgement Great summer at Redmond: May 1st ~ Oct 12th
CLUES Group Machine Learning Department Theory Group Interns
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.