Learning Shared Body Plans Ian Endres University of Illinois work with Derek Hoiem, Vivek Srikumar and Ming-Wei Chang
How should we represent multiple related object categories?
Want to detect, localize, and estimate pose of broad range of objects, including new ones
One option: independent detectors Cat Detector Dog Detector 4-Legged Animal Detector Basic-Level Categories Broad Categories Parts … Head Detector
Our previous work: Train separate detectors, Joint spatial model Vehicle Wheel Animal Leg Head Four-legged Mammal Can run Can Jump Facing right Moves on road Facing right Farhadi Endres Hoiem (2010)
Jointly trained multi-category models Train part/category detectors to jointly predict object structure – Only need to perform well in context defined by others Spatial model encodes likely part positions, number of parts, likely categories, etc. – Generalizes Felzenszwalb et al.: cross-category sharing, multiple parts with one model, variable size
Deformable Part Models From Felzenszwalb et al.
Detection with Deformable Part Models From Felzenszwalb et al.
Shared mixture of deformable parts: Body Plans Include a body plan for background patches: No appearance models, just a bias
Body Plan Overview Object Center Head Anchors High Scoring Detections
Anchor Point Score S a = bias + appearance score - deformation cost HOG based Deformable part model (Felzenszwalb et al.) Quadratic penalty in position and scale S a = bias + appearance score - deformation cost Overall score must be greater than 0 to be detected
Inference: Head ✓
Inference: Leg
✓ Search Constraints: Count Pairwise Exclusion
Inference: Leg ✓
✓ ✓
✓ ✓
✓ ✓ ✓
✓ ✓ ✓
✓ ✓ ✓ ✓
Inference Score for each body plan: Overall score for an object hypothesis:
Benefits of Joint Learning Only consider structures with:
Benefits of Joint Learning No structures have
(Latent) Max Margin Structured Learning Highest Scoring Valid Structure Invalid Structure Loss Soft margin slack
Valid Structures LEG Head Four-legged Elk Object Detectors:50% Overlap with ground truth Part Detectors:25% Overlap with ground truth Positive ExamplesNegative Examples Must select BG body plan
Loss LEG Head Four-legged Elk False Positives: +1 Duplicate Detections: +1 Missed Detections: + 1 Head LEG Positive ExamplesNegative Examples Non-BG body plan: +1 False Positives: +1
Optimization Latent Structured SVM – Non-convex - CCCP Stochastic gradient descent based cutting plane optimization
Optimization Challenges 1)Expensive search for violated constraints – Mine many violated constraints at once – Speeds convergence 2)Large feature vectors (100k+) – Can’t store every mined violated constraint – Requires careful caching
Experimental Setup CORE: Train + Test – Familiar Categories: Camel, Dog, Elephant, Elk – Parts: Head, Leg, Torso – Unfamiliar Categories: Cat, Cow Pascal 2008: Test – Unfamiliar Categories: Cat, Cow, Horse, Sheep
Familiar Objects Unfamiliar Objects
Mistakes
Object Level Results AP
Familiar four-legged parts AP
Unfamiliar four-legged parts AP
Mixed Supervision LEGLEG LEGLEG LEGLEG Head Four-legged Dog LEGLEG LEGLEG LEGLEG Four-legged Dog LEGLEG LEGLEG Head Learning
Mixed Supervision LEGLEG LEGLEG LEGLEG Head Four-legged Dog LEGLEG Four-legged Dog + LEGLEG LEGLEG Four-legged Dog LEGLEG LEGLEG Head Learning
Mixed Supervision - Learning Unlabeled boxes become latent variables – Compute most likely positition – No loss for missed detections Highest Scoring Valid Structure Loss
Mixed Supervision … Mixed Results AP
Conclusions Jointly representing related categories leads to better performance and generalization to unfamiliar categories Joint training important to get full benefit of spatial model
Thanks