Presenter: Duan Tran (Part of slides are from Pedro’s)

Slides:



Advertisements
Similar presentations
3.6 Support Vector Machines
Advertisements

Copyright © Cengage Learning. All rights reserved.
Multistage Sampling.
Constraint Satisfaction Problems
Advanced Piloting Cruise Plot.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Chapter 1 The Study of Body Function Image PowerPoint
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Exit a Customer Chapter 8. Exit a Customer 8-2 Objectives Perform exit summary process consisting of the following steps: Review service records Close.
Right vs. Left Brain. This theory of the structure and functions of the mind suggests that the two different sides of the brain control two different.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Year 6 mental test 5 second questions
Year 6 mental test 10 second questions
Environmental Remote Sensing GEOG 2021
1 Hierarchical Part-Based Human Body Pose Estimation * Ramanan Navaratnam * Arasanathan Thayananthan Prof. Phil Torr * Prof. Roberto Cipolla * University.
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
ABC Technology Project
Hash Tables.
© Paradigm Publishing, Inc Access 2010 Level 1 Unit 1Creating Tables and Queries Chapter 2Creating Relationships between Tables.
Chapter 15 Complex Numbers
Outline Minimum Spanning Tree Maximal Flow Algorithm LP formulation 1.
Green Eggs and Ham.
VOORBLAD.
1. Researcher: Jean-Remy Hochmann RA: Melody Wu 2.
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
1..
© 2012 National Heart Foundation of Australia. Slide 2.
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
Copyright © 2014 by Educational Testing Service. ETS, the ETS logo, LISTENING. LEARNING. LEADING. and GRE are registered trademarks of Educational Testing.
Model and Relationships 6 M 1 M M M M M M M M M M M M M M M M
25 seconds left…...
1 Using one or more of your senses to gather information.
Polynomial Functions of Higher Degree
Solving Systems of Linear Equations By Elimination
H to shape fully developed personality to shape fully developed personality for successful application in life for successful.
Januar MDMDFSSMDMDFSSS
Determining How Costs Behave
Chapter 10: The Traditional Approach to Design
Systems Analysis and Design in a Changing World, Fifth Edition
We will resume in: 25 Minutes.
18-Dec-14 Pruning. 2 Exponential growth How many leaves are there in a complete binary tree of depth N? This is easy to demonstrate: Count “going left”
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Local Search Jim Little UBC CS 322 – CSP October 3, 2014 Textbook §4.8
PSSA Preparation.
Mani Srivastava UCLA - EE Department Room: 6731-H Boelter Hall Tel: WWW: Copyright 2003.
4/4/2015Slide 1 SOLVING THE PROBLEM A one-sample t-test of a population mean requires that the variable be quantitative. A one-sample test of a population.
Many slides based on P. FelzenszwalbP. Felzenszwalb General object detection with deformable part-based models.
More sliding window detection: Discriminative part-based models Many slides based on P. FelzenszwalbP. Felzenszwalb.
Lecture 17: Parts-based models and context CS6670: Computer Vision Noah Snavely.
Generic object detection with deformable part-based models
Marco Pedersoli, Jordi Gonzàlez, Xu Hu, and Xavier Roca
Object Detection with Discriminatively Trained Part Based Models
Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.
Deformable Part Model Presenter : Liu Changyu Advisor : Prof. Alex Hauptmann Interest : Multimedia Analysis April 11 st, 2013.
Deformable Part Models (DPM) Felzenswalb, Girshick, McAllester & Ramanan (2010) Slides drawn from a tutorial By R. Girshick AP 12% 27% 36% 45% 49% 2005.
Discussion of Pictorial Structures Pedro Felzenszwalb Daniel Huttenlocher Sicily Workshop September, 2006.
Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11.
Object Recognizing. Object Classes Individual Recognition.
Object detection with deformable part-based models
Object Localization Goal: detect the location of an object within an image Fully supervised: Training data labeled with object category and ground truth.
“The Truth About Cats And Dogs”
Presentation transcript:

Presenter: Duan Tran (Part of slides are from Pedro’s) - Pictorial Structures for Object Recognition Pedro F. Felzenszwalb & Daniel P. Huttenlocher - A Discriminatively Trained, Multiscale, Deformable Part Model Pedro Felzenszwalb, David McAllester Deva Ramanan Presenter: Duan Tran (Part of slides are from Pedro’s)

Deformable objects Images from D. Ramanan’s dataset

Non-rigid objects Images from Caltech-256

Challenges High intra-class variations Deformable Therefore… Part-based model might be a better choice !

Part-based representation Objects are decomposed into parts and spatial relations among parts E.g. Face model by Fischler and Elschlager ‘73

Part-based representation K-fans model (D.Crandall, et.all, 2005) - Full distribution is factorized to small cliques

Part-based representation Tree model  Efficient inference by dynamic programming

Pictorial Structure Matching = Local part evidence + Global constraint mi(li): matching cost for part I dij(li,lj): deformable cost for connected pairs of parts (vi,vj): connection between part i and j

Matching on tree structure For each l1, find best l2: Remove v2, and repeat with smaller tree, until only a single part Complexity: O(nk2): n parts, k locations per part

Sample result on matching human

Sample result on matching human

A Discriminatively Trained, Multiscale, Deformable Part Model

Overview

Filters Filters are rectangular templates defining weights for features

Object hypothesis Coarser level for the root filter (whole object) and higher level for part filters

Deformable parts A model consists of a root filter F0 and part model (P1,…,Pn), Pi = (Fi, vi, si, ai, bi) Filter Fi; location and size of part (vi,si), and parameter to evaluate the placement of part (ai,bi) Score a placement  Using dynamic programming to find best placement

Learning Training data consists of images with labeled bounding boxes  Learn the model structure, filters and deformation costs

SVM-like model (Latent Variables)

Latent SVM Linear SVM (convex) when z is fixed Solving by coordinate descent Fixed w, find the latent variable z for the positive examples Fixed z, solve the Linear SVM to find w

Implementation details Select root filter window size Initialize root filter by training model without latent variables on unoccluded examples. Root filter update: get new positives (best score and significant overlap with ground truth), add to positives and retrain Part initialization: sequentially choosing area a having high positive score and 6a = 80% root area

Learned models

Sample results

Other results

Other results

Other results

Other results

Other results

Pascal VOC Challenge tasks

Pascal2006 Person

Discussion Mani: Couple of questions: How parts could be defined in various objects? When does breaking objects or parts into parts help in doing a better job? Gang: The successful object representation turns out to be a global object template + several parts templates. There are two questions: (1) How to deal with occlusion? Occlusion seems to be the biggest difficulty for PASCAL object detection. And such a global structure (though it has parts, but all the parts are constrained by a global spatial relationship) cannot deal with occlusion. (2) What makes a part? Is there a part? For the first question, extracting information from multiple levels might be helpful. Except the global spatial structure, we also extract such structure at different scales and train separate classifiers. The final detection output is the fusion of all these classifiers with learned weights.

Discussion Mert: Considering the success of the sliding window + classifier approach, there are couple of natural questions one would ask: 1) Will using different kinds of features help? 2) Can you do a better job on deformable objects by breaking them into respective parts? According to earlier papers in the literature the answer to both questions is yes. Felzenszwalb et al.'s results convincingly demonstrate the affirmative conclusion for the second question. The big question to be answered is however, how far we can push the sliding window approach and whether we can obtain the ultimate object detector through this paradigm. Sanketh: I concur with Mert's comments on how far we can push object detection with the sliding window approach. Ultimately, I believe there is just too much variability in part placement and part shapes for gradient histogram based techniques to be effective. It is interesting that most of the popular object recognition paradigms completely ignore segmentation as a possible source of information for object recognition. A combination of segmentation + orientation histograms may be something worth trying. I am unclear on a few details in the Latent-SVM training. Especially on how it goes from being non-convex/semi-convex to convex. It will be helpful if we could go into some details of the process there. It seemed that the model described in the initial part of the paper was not implemented in its entirety.

Discussion Ian: One very appealing extension of this machinery is to enforce that each "part" has some underlying semantics. It is not clear if such a constraint would decrease performance, since the existing parts are chosen for their high discriminative ability. However, one may argue that without this extra prior knowledge, it may be difficult to learn that a part like articulated arms occurs in many images of people, but this part is still a strong cue for recognition. Eamon: At what stage does searching for an object make more sense than searching for its parts? In some sense, even an entire scene could be considered a deformable object with its constituent objects acting as parts constrained to certain (contextually-dependent) locations.