Download presentation
Presentation is loading. Please wait.
Published byElijah Smith Modified over 9 years ago
1
Beyond datasets: Learning in a fully-labeled real world Thesis proposal Alexander Sorokin
2
Research projects
4
Thesis
6
Motivation
7
Task Amazon Mechanical Turk Is this a dog? o Yes o No Workers Answer: Yes Task: Dog? Pay: $0.01 Broker www.mturk.com $0.01
8
Select examples Joint work with Tamara and Alex Berg http://vision.cs.uiuc.edu/annotation/data/simpleevaluation/html/horse.html
9
Click on landmarks $0.01 http://vision-app1.cs.uiuc.edu:8080/mt/results/people14-batch11/p7/
10
Outline something $0.01 http://vision.cs.uiuc.edu/annotation/results/production-3-2/results_page_013.html Data from Ramanan NIPS06
11
Mark object attributes $0.03
12
Teach a robot
13
How do we define the task?
14
Annotation specification
15
Annotation language
16
Ideal task properties
18
How good are the annotations? Submission isVolumeActionRedo Empty6%Rejectyes Clearly bad2%Rejectyes Almost good4%Accept (pay)yes Good88%Accept (pay)no Task: label people, box+14pts; Volume 3078 HITs
19
How do we make it better?
20
1. Average N annotations
21
2. Require qualification Please read the detailed instructions to learn how to perform the task. Please confirm that you understand the instructions by answering the following questions: Which of the following checboxes are correct for this annotation? No people (there are people in the image) > 20 people (there are less than 20 people of appropriate size) Small heads (there are unmarked small heads in the image) Task: Put a box around every head
22
2. Require qualification
23
3. Use task pipeline
24
4. Do grading
25
Grade conflicts Total grades: 4410
26
5. Automatic grading
27
Learning to grade TaskBottlesPeopleHandsLarge objects Accuracy95.0%83.8%45.5%29.5%
28
Quality control
29
Setting the pay
30
Annotation Method Comparison ApproachCostScaleSetup effort CollaborativeQualityDirectedCentralElastic to $ MTurk$+++ * no+/+++Yesno+++++ GWAP++++***no+Yes + LabelME++Yes++noYes Image Parsing $$++**no++++Yes +++ In house$$$+*no+++Yesno++
31
Is it useful?
32
Publications
33
Thesis
35
Fully labeled world assumption Goal: learn to detect every object
36
Why is it important
37
Computer vision task
38
Challenges
39
Lighting conditions Background clutter Lighting and background are known Within-class variability Viewpoint changes Internal deformations 100 000 categories How many instances? 10s billions total 10 000 locally 1000 examples per category 1-10 labels per object Single image Rich sensor data
40
PR2 Sensing capabilities
41
Autonomous data collection
42
Data labeling
43
Learning
44
Preliminary learning results UChicago-VOC2008-person
45
Expected outcome
46
Thesis
47
Detect-Sample-Label
48
Sampling based estimation
49
Standard deviation table
50
Estimating recall
51
Experimental results
52
What are the errors?
53
Timeline
59
Acknowledgments Special thanks to: David Forsyth Nicolas Loeff, Ali Farhadi, Du Tran, Ian Endres Tamara Berg, Pushmeet Kohli Dolores Labs (Lukas Biewald) Willow Garage (Gary Bradsky, Alex Teichman, Daniel Munos, …) All workers at Amazon Mechanical Turk This work was supported in part by the National Science Foundation under IIS - 0534837 and in part by the Office of Naval Research under N00014-01-1-0890 as part of the MURI program. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect those of the National Science Foundation or the Office of Naval Research.
60
Thank you
61
What is an annotation task?
62
PR2 Platform 2 Laser scanners Fixed and Tilting 7 cameras 2 stereo pairs, 1 hires (5mpx) 2 in the arms Structured light 16 cores, 48 GB RAM 2 Arms
63
What are datasets good for? Training –The data is fully labeled Evaluation Tweaking the parameters –Performance is computed automatically Comparing algorithms –“They run on exact same data”
64
Why are datasets bad? Data sampling and labeling bias Small changes in performance are insignificant Parameter tweaking doesn’t generalize Overfitting to the datasets Datasets should be discarded after performance is measured
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.