Download presentation
Presentation is loading. Please wait.
1
Programme 2pm Introduction –Andrew Zisserman, Chris Williams 2.10pm Overview of the challenge and results –Mark Everingham (Oxford) 2.40pm Session 1: The Classification Task –Frederic Jurie presenting work by Jianguo Zhang (INRIA) 20 mins Frederic Jurie (INRIA) 20 mins –Thomas Deselaers (Aachen) 20 mins –Jason Farquhar (Southampton) 20 mins 4-4.30pm Coffee break 4.30pm Session 2: The Detection Task –Stefan Duffner/Christophe Garcia (France Telecom) 30 mins –Mario Fritz (Darmstadt) 30 mins 5.30pm Discussion –Lessons learnt, and future challenges
2
The PASCAL Visual Object Classes Challenge Mark Everingham Luc Van Gool Chris Williams Andrew Zisserman
3
Challenge Four object classes –Motorbikes –Bicycles –People –Cars Classification –Predict object present/absent Detection –Predict bounding boxes of objects
4
Competitions Train on any (non-test) data –How well do state-of-the-art methods perform on these problems? –Which methods perform best? Train on supplied data –Which methods perform best given specified training data?
5
Data sets train, val, test1 –Sampled from the same distribution of images –Images taken from PASCAL image databases –“Easier” challenge test2 –Freshly collected for the challenge (mostly Google Images) –“Harder” challenge
6
Training and first test set ClassImagesObjects Motorbikes214217 Bicycles114123 People84152 Cars272320 Total684 ClassImagesObjects Motorbikes216220 Bicycles114123 People84149 Cars275341 Total689 train+valtest1
7
Example images
11
Second test set ClassImagesObjects Motorbikes202227 Bicycles279399 People5261038 Cars275381 Total1282 test2
12
Example images
16
Annotation for training Object class present/absent Sub-class labels (partial) –Car side, Car rear, etc. Bounding boxes Segmentation masks (partial)
17
Issues in ground truth What objects should be considered detectable? –Subjective judgement by size in image, level of occlusion, detection without ‘inference’ Disagreements will cause noise in evaluation i.e. incorrectly- judged false positives “Errors” in training data –Un-annotated objects Requires machine learning algorithms robust to noise on class labels –Inaccurate bounding boxes Hard to specify for some instances e.g. bicycles Detection threshold was set “liberally”
18
Results: Classification
19
Participants test1test2 ParticipantMotorbikesBicyclesPeopleCarsMotorbikesBicyclesPeopleCars Aachen Darmstadt Edinburgh FranceTelecom HUT INRIA: dalal INRIA: dorko INRIA: jurie INRIA: zhang METU MPITuebingen Southampton
20
Methods Interest points (LoG/Harris) + patches/SIFT –Histogram of clustered descriptors SVM: INRIA: Dalal, INRIA: Zhang Log-linear model: Aachen Logistic regression: Edinburgh Other: METU –No clustering step SVM with other kernels: MPITuebingen, Southampton –Additional features Color: METU, moments: Southampton
21
Methods Image segmentation and region features: HUT –MPEG-7 color, shape, etc. –Self organizing map Classification by detection: Darmstadt –Generalized Hough transform/SVM verification
22
Evaluation Receiver Operating Characteristic (ROC) –Equal Error Rate (EER) –Area Under Curve (AUC) EER AUC
23
Competition 1: train+val/test1 1.1: Motorbikes Max EER: 0.977 (INRIA: Jurie)
24
Competition 1: train+val/test1 1.2: Bicycles Max EER: 0.930 (INRIA: Jurie, INRIA: Zhang)
25
Competition 1: train+val/test1 1.3: People Max EER: 0.917 (INRIA: Jurie, INRIA: Zhang)
26
Competition 1: train+val/test1 1.4: Cars Max EER: 0.961 (INRIA: Jurie)
27
Competition 2: train+val/test2 2.1: Motorbikes Max EER: 0.798 (INRIA: Zhang)
28
Competition 2: train+val/test2 2.2: Bicycles Max EER: 0.728 (INRIA: Zhang)
29
Competition 2: train+val/test2 2.3: People Max EER: 0.719 (INRIA: Zhang)
30
Competition 2: train+val/test2 2.4: Cars Max EER: 0.720 (INRIA: Zhang)
31
Classes and test1 vs. test2 Mean EER of ‘best’ results across classes –test1 : 0.946, test2 : 0.741
32
Conclusions? Interest points + SIFT + clustering (histogram) + SVM did ‘best’ –Log-linear model (Aachen) a close second –Results with SVM (INRIA) significantly better than with logistic regression (Edinburgh) Method using detection (Darmstadt) did not do so well –Cannot exploit context (= unintended bias?) of image –Used subset of training data and is able to localize
33
Competitions 3 & 4 Classification Any (non-test) training data to be used No entries submitted
34
Results: Detection
35
Participants test1test2 ParticipantMotorbikesBicyclesPeopleCarsMotorbikesBicyclesPeopleCars Aachen Darmstadt Edinburgh FranceTelecom HUT INRIA: dalal INRIA: dorko INRIA: jurie INRIA: zhang METU MPITuebingen Southampton
36
Methods Generalized Hough Transform –Interest points, clustered patches/descriptors, GHT Darmstadt: (SVM verification stage), side views with segmentation mask used for training INRIA: Dorko: SIFT features, semi-supervised clustering, single detection per image “Sliding window” classifiers –Exhaustive search over translation and scale FranceTelecom: Convolutional neural network INRIA: Dalal: SVM with SIFT-based input representation
37
Methods Baselines: Edinburgh –Detection confidence class prior probability Whole-image classifier (SIFT + logistic regression) –Bounding box Entire image Scale-normalized mean bounding box from training data Bounding box of all interest points Bounding box of interest points weighted by ‘class purity’
38
Evaluation Correct detection: 50% overlap in bounding boxes –Multiple detections considered as (one true + ) false positives Precision/Recall –Average Precision (AP) as defined by TREC Mean precision interpolated at recall = 0,0.1,…,0.9,1 Measured Interpolated
39
Competition 5: train+val/test1 5.1: Motorbikes Max AP: 0.886 (Darmstadt)
40
Competition 5: train+val/test1 5.2: Bicycles Max AP: 0.119 (Edinburgh)
41
Competition 5: train+val/test1 5.3: People Max AP: 0.013 (INRIA: Dalal)
42
Competition 5: train+val/test1 5.4: Cars Max AP: 0.613 (INRIA: Dalal)
43
Competition 6: train+val/test2 6.1: Motorbikes Max AP: 0.341 (Darmstadt)
44
Competition 6: train+val/test2 6.2: Bicycles Max AP: 0.113 (Edinburgh)
45
Competition 6: train+val/test2 6.3: People Max AP: 0.021 (INRIA: Dalal)
46
Competition 6: train+val/test2 6.4: Cars Max AP: 0.304 (INRIA: Dalal)
47
Classes and test1 vs. test2 Mean AP of ‘best’ results across classes –test1 : 0.408, test2 : 0.195
48
Conclusions? GHT (Darmstadt) method did ‘best’ on classes entered –SVM verification stage effective –Limited to lower recall (by use of only side views) SVM (INRIA: Dalal) comparable for cars, better on test2 –Smaller objects?, higher recall Performance on bicycles, people was ‘poor’ –“Non-solid” objects, articulation?
49
Competition 7: any train/ test1 One entry: 7.3: people (INRIA: Dalal) AP: 0.416 Use of own training data improved results dramatically (AP: 0.013)
50
Competition 8: any train/ test2 One entry: 8.3: people (INRIA: Dalal) AP: 0.438 Use of own training data improved results dramatically (AP: 0.021)
51
Conclusions Classification –Variety of methods and variations on SIFT+SVM –Encouraging performance on all object classes Detection –Variety of methods and variations on GHT –Encouraging performance on cars, motorbikes People and bicycles more challenging Use of own training data –Only one entry (people detection), much better results than using provided training data –State-of-the-art performance for pre-built classification/detection remains to be assessed
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.