Presentation is loading. Please wait.

Presentation is loading. Please wait.

Recognition Using Visual Phrases

Similar presentations

Presentation on theme: "Recognition Using Visual Phrases"— Presentation transcript:

1 Recognition Using Visual Phrases
CVPR 2011 Best Student Paper

2 Outline Introduction Related Works Approach Results Discussion
Phrasal Recognition Decoding Multiple Detections Results Discussion

3 Introduction

4 Introduction Visual Phrases Traditional approach
Detect objects (person, dog, horse…) Relation between objects NMS(non-maximum suppression) PASCAL other Disadvantage

5 Introduction

6 Introduction Contributions
Introducing visual phrases as categories for recognition Introducing a novel dataset for phrasal recognition The state of the art methods of modeling interactions A decoding algorithm Performance results in multi-class object recognition

7 Related Work Object Recognition
Deformable templates [IEEE2001,CVPR1998] Part base model [CVPR2005,CVPR2003] Detectors Deformable based model [IEEE2010]

8 Related Work Object Interactions left, right, top, down
Focus on relation [ECCV2008] Person with object [CVPR 2010] Objects [ECCV2010] Relation of objects [ICCV2010] left, right, top, down label weight, confidence

9 Related Work Scene understanding
Represent scenes as with global features that take into account general information about images [Vision2001,CVPR2006] Cluster [ECCV2008]

10 Related Work Machine translation
Statistical translation methods [Press2010] Translation model Language model A decoding algorithm Output: a query sentence Allow multiple to multiple translation

11 Phrasal Recognition Phrasal Recognition Dataset
select 8 obj. class (Pascal VOC 2008) person, bike, car, dog, horse, bottle, sofa, chair A list of 17 visual phrases + background class Dog jumping ,horse jumping, person riding horse…

12 Phrasal Recognition

13 Phrasal Recognition Datasets The complexity of Visual Phrases crease
2769 images (822 negative image) 120 examples, average of each classes 5067 bounding boxes(1796 phrases,3271 objects) The complexity of Visual Phrases crease The number of training example decrease

14 Phrasal Recognition Appearance models Deformation part model
17 phrases in our dataset using provided bounding boxes 8 categories from Pascal are used as models for objects

15 Decoding Multiple Detections
NMS decoding Perfect detectors with excellent tightly tuned models Natural decoding strategy better than NMS on interaction Greedily search the space of labels Well designed feature (nearby) All detector responses Final outcome Decoding

16 Decoding Multiple Detections
Decoding process We compare our decoding algorithm with that of [2] on our phrase dataset Step1: construct the feature Step2: running algorithm to learn a set of weights that rescore the confidences of the bounding boxes based on interactions Step3: We again rescore until optimal

17 Discriminative models for multi-class object layout

18 Decoding Multiple Detections
: a bounding box in an image An image is represented as a collection of overlapping Bounding boxes X = { : i=1….M},M is the total num of bounding box K is different categories , 1 is the score of image X with Y is the set of weights that corresponds to the class of the bounding box

19 Decoding Multiple Detections
Representation Image = bounding boxes Confidence Overlap Size ratio Relation Above, Below, overlapping Window, category, spatial bins Representation has K*3*3+1 dimensions

20 Decoding Multiple Detections
Inference assume bounding boxes are independent given their features 1

21 Decoding Multiple Detections
Learning A form of max margin structure learning 1

22 Decoding Multiple Detections
1 our inner maximization is exact and very fast. We solve this optimization problem by subgradient descent method as follows.

23 Result Single category detection
deformable part models for 17 visual phrase the trained models from for objects Use PASCAL dataset : 50 positive and 150 negative examples Show Precision-Recall (PR) curves Trained these detectors with at most 50 positive examples

24 Result

25 Result

26 Result

27 Result

28 Result Decoding 0.319 0.313 0.308 0.495 0.493 0.491 Paper decoding
*[2] NMS Overall AP 0.319 0.313 0.308 Mean per class AP 0.495 0.493 0.491 [2] C. F. C. Desai, D. Ramanan. Discriminative models for multi-class object layout. In ICCV, 2010.

29 Result

30 Result

31 Discussion Future Work
Introduce visual phrases, phrasal recognition dataset A coding algorithm The dimensionality of our features grows with the number of categories Future Work the relations between attributes and objects parts and objects visual phrases and scenes objects and visual phrases mirror one another

32 Discussion Experience Low complexity Use less data to detection
Features grows with the number of categories (exponential 2n) But we don’t need to consider all of the categories when we model the interactions Building long enough phrase tables is still a challenge

Download ppt "Recognition Using Visual Phrases"

Similar presentations

Ads by Google