Learning Shared Body Plans Ian Endres University of Illinois work with Derek Hoiem, Vivek Srikumar and Ming-Wei Chang.

Slides:

Advertisements

Similar presentations

Self-Paced Learning for Semantic Segmentation

Advertisements

Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford.

Presenter: Duan Tran (Part of slides are from Pedro’s)

Describing Images Using Attributes. Describing Images Farhadi et.al. CVPR 2009.

Combining Detectors for Human Hand Detection Antonio Hernández, Petia Radeva and Sergio Escalera Computer Vision Center, Universitat Autònoma de Barcelona,

Curriculum Learning for Latent Structural SVM

Diagnosing Error in Object Detectors Department of Computer Science University of Illinois at Urbana-Champaign (UIUC) Derek Hoiem Yodsawalai Chodpathumwan.

Ľubor Ladický1 Phil Torr2 Andrew Zisserman1

Efficient Large-Scale Structured Learning

Structured SVM Chen-Tse Tsai and Siddharth Gupta.

Structured Hough Voting for Vision-based Highway Border Detection

Many slides based on P. FelzenszwalbP. Felzenszwalb General object detection with deformable part-based models.

Face Detection, Pose Estimation, and Landmark Localization in the Wild

Learning Structural SVMs with Latent Variables Xionghao Liu.

Intro to DPM By Zhangliliang. Outline Intuition Introduction to DPM Model Inference(matching) Training latent SVM Training Procedure Initialization Post-processing.

Object-centric spatial pooling for image classification Olga Russakovsky, Yuanqing Lin, Kai Yu, Li Fei-Fei ECCV 2012.

More sliding window detection: Discriminative part-based models Many slides based on P. FelzenszwalbP. Felzenszwalb.

Student: Yao-Sheng Wang Advisor: Prof. Sheng-Jyh Wang ARTICULATED HUMAN DETECTION 1 Department of Electronics Engineering National Chiao Tung University.

Retrieving Actions in Group Contexts Tian Lan, Yang Wang, Greg Mori, Stephen Robinovitch Simon Fraser University Sept. 11, 2010.

Beyond Actions: Discriminative Models for Contextual Group Activities Tian Lan School of Computing Science Simon Fraser University August 12, 2010 M.Sc.

Good morning, everyone, thank you for coming to my presentation.

An opposition to Window- Scanning Approaches in Computer Vision Presented by Tomasz Malisiewicz March 6, 2006 Advanced The Robotics Institute.

Object Recognizing We will discuss: Features Classifiers Example ‘winning’ system.

Generic object detection with deformable part-based models

Ranking with High-Order and Missing Information M. Pawan Kumar Ecole Centrale Paris Aseem BehlPuneet DokaniaPritish MohapatraC. V. Jawahar.

A Scale and Rotation Invariant Approach to Tracking Human Body Part Regions in Videos Yihang BoHao Jiang Institute of Automation, CAS Boston College.

Object Recognizing. Object Classes Individual Recognition.

Modeling Latent Variable Uncertainty for Loss-based Learning Daphne Koller Stanford University Ben Packer Stanford University M. Pawan Kumar École Centrale.

Self-paced Learning for Latent Variable Models

Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben.

Object Recognizing. Recognition -- topics Features Classifiers Example ‘winning’ system.

Computer Vision CS 776 Spring 2014 Recognition Machine Learning Prof. Alex Berg.

Ranking with High-Order and Missing Information M. Pawan Kumar Ecole Centrale Paris Aseem BehlPuneet KumarPritish MohapatraC. V. Jawahar.

Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,

Learning Collections of Parts for Object Recognition and Transfer Learning University of Illinois at Urbana- Champaign.

Modeling Latent Variable Uncertainty for Loss-based Learning Daphne Koller Stanford University Ben Packer Stanford University M. Pawan Kumar École Centrale.

Reading Between The Lines: Object Localization Using Implicit Cues from Image Tags Sung Ju Hwang and Kristen Grauman University of Texas at Austin Jingnan.

Object Detection with Discriminatively Trained Part Based Models

Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V.

Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.

Deformable Part Models (DPM) Felzenswalb, Girshick, McAllester & Ramanan (2010) Slides drawn from a tutorial By R. Girshick AP 12% 27% 36% 45% 49% 2005.

Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting.

Object detection, deep learning, and R-CNNs

Multi-core Structural SVM Training Kai-Wei Chang Department of Computer Science University of Illinois at Urbana-Champaign Joint Work With Vivek Srikumar.

Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11.

Category Independent Region Proposals Ian Endres and Derek Hoiem University of Illinois at Urbana-Champaign.

Recognition Using Visual Phrases

Learning from Big Data Lecture 5

Object Recognition as Ranking Holistic Figure-Ground Hypotheses Fuxin Li and Joao Carreira and Cristian Sminchisescu 1.

Object Recognition by Integrating Multiple Image Segmentations Caroline Pantofaru, Cordelia Schmid, Martial Hebert ECCV 2008 E.

Object Recognizing. Object Classes Individual Recognition.

Coherent Scene Understanding with 3D Geometric Reasoning Jiyan Pan 12/3/2012.

Object Recognizing. Object Classes Individual Recognition.

A Discriminatively Trained, Multiscale, Deformable Part Model Yeong-Jun Cho Computer Vision and Pattern Recognition,2008.

Strong Supervision From Weak Annotation Interactive Training of Deformable Part Models ICCV /05/23.

Optimizing Average Precision using Weakly Supervised Data Aseem Behl 1, C.V. Jawahar 1 and M. Pawan Kumar 2 1 IIIT Hyderabad, India, 2 Ecole Centrale Paris.

Discriminative Machine Learning Topic 4: Weak Supervision M. Pawan Kumar Slides available online

Strong Supervision from Weak Annotation: Interactive Training of Deformable Part Models S. Branson, P. Perona, S. Belongie.

Object detection with deformable part-based models

Data Driven Attributes for Action Detection

Recognizing Deformable Shapes

Object detection, deep learning, and R-CNNs

Object Localization Goal: detect the location of an object within an image Fully supervised: Training data labeled with object category and ground truth.

Object detection as supervised classification

Group Norm for Learning Latent Structural SVMs

“The Truth About Cats And Dogs”

On-going research on Object Detection *Some modification after seminar

Large Scale Support Vector Machines

Object Classes Most recent work is at the object level We perceive the world in terms of objects, belonging to different classes. What are the differences.

Recognizing Deformable Shapes

Presentation transcript:

Learning Shared Body Plans Ian Endres University of Illinois work with Derek Hoiem, Vivek Srikumar and Ming-Wei Chang

How should we represent multiple related object categories?

Want to detect, localize, and estimate pose of broad range of objects, including new ones

One option: independent detectors Cat Detector Dog Detector 4-Legged Animal Detector Basic-Level Categories Broad Categories Parts … Head Detector

Our previous work: Train separate detectors, Joint spatial model Vehicle Wheel Animal Leg Head Four-legged Mammal Can run Can Jump Facing right Moves on road Facing right Farhadi Endres Hoiem (2010)

Jointly trained multi-category models Train part/category detectors to jointly predict object structure – Only need to perform well in context defined by others Spatial model encodes likely part positions, number of parts, likely categories, etc. – Generalizes Felzenszwalb et al.: cross-category sharing, multiple parts with one model, variable size

Deformable Part Models From Felzenszwalb et al.

Detection with Deformable Part Models From Felzenszwalb et al.

Shared mixture of deformable parts: Body Plans Include a body plan for background patches: No appearance models, just a bias

Body Plan Overview Object Center Head Anchors High Scoring Detections

Anchor Point Score S a = bias + appearance score - deformation cost HOG based Deformable part model (Felzenszwalb et al.) Quadratic penalty in position and scale S a = bias + appearance score - deformation cost Overall score must be greater than 0 to be detected

Inference: Head ✓

Inference: Leg

✓ Search Constraints: Count Pairwise Exclusion

Inference: Leg ✓

✓ ✓

✓ ✓

✓ ✓ ✓

✓ ✓ ✓

✓ ✓ ✓ ✓

Inference Score for each body plan: Overall score for an object hypothesis:

Benefits of Joint Learning Only consider structures with:

Benefits of Joint Learning No structures have

(Latent) Max Margin Structured Learning Highest Scoring Valid Structure Invalid Structure Loss Soft margin slack

Valid Structures LEG Head Four-legged Elk Object Detectors:50% Overlap with ground truth Part Detectors:25% Overlap with ground truth Positive ExamplesNegative Examples Must select BG body plan

Loss LEG Head Four-legged Elk False Positives: +1 Duplicate Detections: +1 Missed Detections: + 1 Head LEG Positive ExamplesNegative Examples Non-BG body plan: +1 False Positives: +1

Optimization Latent Structured SVM – Non-convex - CCCP Stochastic gradient descent based cutting plane optimization

Optimization Challenges 1)Expensive search for violated constraints – Mine many violated constraints at once – Speeds convergence 2)Large feature vectors (100k+) – Can’t store every mined violated constraint – Requires careful caching

Experimental Setup CORE: Train + Test – Familiar Categories: Camel, Dog, Elephant, Elk – Parts: Head, Leg, Torso – Unfamiliar Categories: Cat, Cow Pascal 2008: Test – Unfamiliar Categories: Cat, Cow, Horse, Sheep

Familiar Objects Unfamiliar Objects

Mistakes

Object Level Results AP

Familiar four-legged parts AP

Unfamiliar four-legged parts AP

Mixed Supervision LEGLEG LEGLEG LEGLEG Head Four-legged Dog LEGLEG LEGLEG LEGLEG Four-legged Dog LEGLEG LEGLEG Head Learning

Mixed Supervision LEGLEG LEGLEG LEGLEG Head Four-legged Dog LEGLEG Four-legged Dog + LEGLEG LEGLEG Four-legged Dog LEGLEG LEGLEG Head Learning

Mixed Supervision - Learning Unlabeled boxes become latent variables – Compute most likely positition – No loss for missed detections Highest Scoring Valid Structure Loss

Mixed Supervision … Mixed Results AP

Conclusions Jointly representing related categories leads to better performance and generalization to unfamiliar categories Joint training important to get full benefit of spatial model

Thanks