Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,

Similar presentations


Presentation on theme: "1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,"— Presentation transcript:

1 1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science, Carnegie Mellon University KDD ’09 June 30 th 2009 Paris, France

2 2 Problem Illustration 0.74 0.55 0.8 0.9 0.67 0.83 0.58 0.69 instances oracles

3 3 Interval Estimate Threshold (IEThresh)  Goal: find the labeler(s) with the highest expected accuracy  Our work builds upon Interval Estimation [L. P. Kaelbling] 1. Estimate the reward of each labeler (more on next slide) 2. Compute upper confidence interval for the labelers 3. Select labelers with upper interval higher than a threshold 4. Observe the output of the chosen oracles to estimate their reward 5. Repeat to step 1 filter out unreliable labelers reduce labeling cost

4 4 Reward of the labelers  The reward of each labeler is unknown => need to be estimated  reward of a labeler  eliciting true label  true label is also unknown => estimated by the majority vote  We propose the below reward function reward=1 if the labeler agrees with the majority label reward=0 otherwise

5 5 IEThresh at the Beginning Oracles Expected reward increases

6 6 IEThresh Oracle Selection Oracles Expected reward increases Threshold 1 2 3 45

7 7 IE Learning Snapshot II Expected reward increases Oracles Threshold 1 23 4 5

8 8 IEThresh Instance Selection 1 3 4 5 2

9 9 Uniform Expert Accuracy є (0.5,1] Repeated Labeling [Sheng et al, 2008]: querying all experts for labeling Classification error

10 10 # Oracle Queries vs. Accuracy : First 10 iterations : Next 40 iterations : Next 100 iterations

11 11 # Oracle queries to reach a target accuracy skew increases better

12 12 Results on AMT Data with Human Annotators  IEThresh reaches the best performance with similar effort to Repeated labeling  Repeated baseline needs 840 queries total to reach 0.95 accuracy Dataset at http://nlpannotations.googlepages.com/http://nlpannotations.googlepages.com/ made available by [Snow et al., 2008] 5 annotators 6 annotators

13 13 Conclusions and Future Work  Conclusions IEThresh is effective in balancing exploration vs. exploitation tradeoff Early filtering of unreliable labelers boosts performance Utilizing labeler accuracy estimates is more effective than asking all or randomly  Future Work from consistent to time-variant labeler quality label noise conditioned on the data instance correlated labeling errors

14 14 THANK YOU!

15 15

16 16

17 17 Problem Setup Summary  multiple noisy oracles (labelers)  unknown labeling accuracy  Goal: estimate labeler accuracy (quality) select highest quality labeler(s) balance exploration vs. exploitation tradeoff

18 18 Interval Estimation Learning (IE) [L. P. Kaelbling] Goal : find action a* with the highest expected reward 1. Estimate the reward of each action/oracle 2. Choose the action a* with the highest upper confidence interval 3. Record the observed reward of a* 4. Repeat to step 1  a* has high expected reward (exploitation) and/or large uncertainty in the reward (exploration)  IE automatically trades off these two

19 19 IE Learning Snapshot I Expected reward increases Actions (Oracles, Experts, etc.)

20 20 Outline of IEThresh

21 21 Classification Error vs. # Oracle Queries skew increases


Download ppt "1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,"

Similar presentations


Ads by Google