Object DetectionI Ali Taalimi 01/08/2013.

Object DetectionI Ali Taalimi 01/08/2013

Object Detection Face Detection Conclusion Outline
Sliding Window Based Local Interest Points Face Detection Conclusion 5/17/2018 Slide 2/90

General Process of Object Recognition
Specify Object Model Generate Hypotheses Score Hypotheses Resolve Detections What are the object parameters? Propose an alignment of the model to the image Mainly-gradient based features, usually based on summary representation, many classifiers Rescore each proposed object based on whole set 5/17/2018 Slide 3/90

Specifying an object model
Geometry vs. Appearance Parts vs. The Whole …and the standard answer: probably both or neither Torralba Tutorial

Statistical Template in Bounding Box –Object is some (x,y,w,h) in image –Features defined wrt bounding box coordinates N. Dalal and B. Triggs, “Histograms of oriented gradients,” in Proc. IEEE CVPR, Jun. 2005 5/17/2018 Slide 5/90

Articulated parts model Object is configuration of parts Each part is detectable Fischler & Elschlager 73 Felzenszwalb, 2005 Part = oriented rectangle Spatial model = relative size/orientation 5/17/2018 Slide 6/90

Hybrid template/parts model Pixels  Pixel groupings  Parts  Object detection Felzenszwalb et al. 2008 5/17/2018 Slide 7/90

Generating hypotheses
1. Sliding window – Test patch at each location and scale 5/17/2018 Slide 8/90

Sliding window-based Search over space and scale: Detection as subwindow classification problem: In the absence of a more intelligent strategy, any global image classification approach can be converted into a localization approach by using a sliding-window search Object model = sum of scores of features at fixed positions Building such a classifier is possible because pixels on a face are highly correlated, whereas those in a nonface subwindow present much less regularity. Derek Hoiem, tutorial 5/17/2018 Slide 9/90

Concept of Online Learning Sliding window-based
collect a large set of face/target and nonface examples, and adopt certain machine learning algorithms to learn a target model to perform classification. Given a set of windows corresponding to faces and nonfaces, extract features such as color, texture, and contours use statistical models such as SVM and AdaBoost to learn the patterns of pixels in those windows. perform detection across windows of different test image scales Concept of Online Learning Face Detection, Raghuraman Gopalan, chapter5 of the book visual analysis of humans 5/17/2018 Slide 10/90

Sliding window-based Notes: But… Training the classifier:
Schemes like multiple instance learning boosting and multiple category boosting leverage unlabeled data to facilitate learning. Unsupervised or semi-supervised learning Works with lower resolution tiny images for training: Viola&Jones: 2424 pixels Torralba et al: 3232 pixels Dalal&Triggs: 6496 pixels (notable exception) But… Limited information content available at those resolutions Not enough support to compensate for occlusions! 3) How to efficiently search for likely objects? Even simple models require searching hundreds of thousands of positions and scales 4) Feature design and scoring How should appearance be modeled? What features correspond to the object? 5) How to deal with different viewpoints? Often train different models for a few different viewpoints 5/17/2018 Slide 11/90

Statistical Template Approach/Sliding Window Base
Strengths and Weaknesses Strengths • Works very well for non-deformable objects: faces, cars, upright pedestrians • Fast detection Weaknesses • Not so well for highly deformable objects • Not robust to occlusion • Requires lots of training data 5/17/2018 Slide 12/90

Generating hypotheses
2. Voting from patches/keypoints ISM model by Leibe et al, 2004 5/17/2018 Slide 13/90

Local interest point represent image in terms of local interest-point detectors like Harris detector. Feature descriptions such as SIFT, and Shape contexts are built upon these feature detectors to form inputs for a classification engine. instead of directly analyzing all pixels (regions), the classifier analyzes only those regions with prominent feature responses. Face Detection, Raghuraman Gopalan, chapter5 of the book visual analysis of humans 5/17/2018 Slide 14/90

Local interest points Bag of Words/Dictionary/CodeWords approach
ICCV 2009 tutorial on Recognizing and learning object categories 5/17/2018 Slide 15/90

Problem with bag-of-words
All have equal probability for bag-of-words methods Location information is important 5/17/2018 Slide 16/90

Deformable objects Images from D. Ramanan’s dataset
5/17/2018 Slide 17/90

Parts-based Models Define object by collection of parts modeled by
1. Appearance of part 2. Spatial configuration (Relative locations between parts) Parts need to be distinctive to separate from other classes by Rob Fergus (MIT) 5/17/2018 Slide 18/90

How to model spatial relations?
One extreme: fixed template Derek Hoiem, Illinoise Another extreme: bag of words 5/17/2018 Slide 19/90

How to model spatial relations?
1. Star-shaped model–Example: Example: ISM Leibe et al. 2004, 2008 2. Tree-shaped model: Example: Pictorial structures Felzenszwalb, 2005, 2009 These parts are either semantically motivated (body parts such as head, torso, and legs) or concern codebook representations. 5/17/2018 Slide 20/90

Things to remember Rather than searching for whole object, can locate “parts” that vote for object – Better encoding of spatial variation These parts can vote for other things too Models can be broken down into part appearance and spatial configuration – Wide variety of models Efficient optimization can be tricky but usually possible Their applicability to lower resolution images is limited since each component detector requires a certain spatial support for robustness. 5/17/2018 Slide 21/90

Resolving detection scores
Non-max suppression Context/reasoning Putting Objects in Perspective, Hoiem et al, CVPR2006 Carnegie Mellon University 5/17/2018 Slide 22/90

Face Detection Categorization of Existing Approaches for face detection data representation perspective: Sliding window-based Local interest-point-based 2. mode of classification: generative methods: attempt to model how the data is generated. discriminative methods: directly discriminate between the two classes Based of the range of acceptable head poses: single pose rotation-invariant: in-plane rotations of the head multi-view: out-of-plane rotations pose-invariant: no restrictions on the orientation 5/17/2018 Slide 23/90

Viola-Jones Face Detector
VJ system [2001] made face detection practically feasible in real world applications Key ideas: Integral images for fast feature evaluation Boosting for feature selection Attentional cascade for fast rejection of non-face windows The AdaBoost algorithm is used to solve the following three fundamental problems: (1) selecting effective features from a large feature set; (2) constructing weak classifiers, each of which is based on one of the selected features; and (3) boosting the weak classifiers to construct a strong classifier. Three major components contribute to the cascade face detector: an over-complete set of local features that can be evaluated quickly, An AdaBoost based method to build strong nonlinear classifiers from the weak local features, cascade detector architecture that leads to realtime detection speed. Viola and Jones CVPR 2001, IJCV 2004 5/17/2018 Slide 24/90

Attentional cascade Chain classifiers that are progressively more complex and have lower false positive rates: Viola and Jones CVPR 2001, IJCV 2004 5/17/2018 Slide 25/90

Cascade for Fast Detection
The ith filter of the cascade will be designed to: Reject the larger possible number of non-object windows To let pass the larger possible number of object windows Evaluated as fast as possible 5/17/2018 Slide 26/90

Limitation of Viola-Jones
Frame 44, Small People Detector Frame 20, Small People Detector Using MATLAB For UpperBody detection: UprightPeople_96x48 Frame 48, Upper Body Detector Frame 35, Upper Body Detector Frame 64, Front Face Detector Frame 48, Front Face Detector 5/17/2018 Slide 27/90

Detection By Tracking and Tracking By Detection
Tracking systems address: motion and matching Motion problem: identify a limited search region in which the element is expected to be found Matching problem: identify the image element in the next frame within the designated search region Why is association between detections and targets difficult? Detection result degrades in occluded scene. Detector output is unreliable and sparse. 5/17/2018 Slide 28/90

Some Experiments FPDW MATLAB Built in Tracking FPDW
5/17/2018 Slide 29/90

Multiview Face Detection
multiview face detection: face detection + pose estimation Face detection: distinguish faces from nonfaces, using similarities between faces of different poses Pose estimation: identify the probable pose of a pattern, whether it is a face or not view-based method: in which several face models are built, each describing faces in a certain view range. different detector structures High-Performance Rotation Invariant Multiview Face Detection, Huang et al, PAMI2007 5/17/2018 Slide 30/90

Context context-based methods tries to answer following questions:
What information does a face share with its surroundings? Given some characteristics of the global scene, how probable is the presence of face in there? The first question is a bottom-up way of learning the object and its surroundings. The second question is a top down model of what a scene conveys about the probability of presence of an object. 5/17/2018 Slide 31/90

Context Objects do not occur in isolation
The surrounding scene information does provide some clue about the presence of objects The influence of object extends beyond its physical boundaries. Torralba, A.: Contextual priming for object detection. Int. J. Comput. Vis. 53, 169–191 (2003) Face detection probability < Person detection probability 5/17/2018 Slide 32/90

Conclusions three main steps which can be varied to gain performance: feature extraction, classification, and non-maxima suppression. The most common features: variants of the HOG, histograms of gradients and optic flow and different generalized Haar wavelets Combining multiple complementary types of low-level features: Which combination of cues should be used? Channel of features should complete each other. Simple/Complex Features  Faster/Slower, Less/High Discriminant Regional statistics like histograms: local edge orientation (HoG)/ spatial (LBP) histogram feature pool becomes larger and larger challenges in the feature selection  data/feature mining One suggestion: HoG, HoF, Color Self Similarity (CSS). Other Option: color channels, gradient magnitude, HoG Why Hog? Encode high frequency gradient information Why Haar? Encode lower frequency changes in the color channels Why HoF? Motion cues spatial histogram = LBP 5/17/2018 Slide 33/90

Conclusions classification techniques aim at determining an optimal decision boundary between pattern classes in a feature space There are huge number of engineering details in training classifier! Asymmetry Learning Rare Event Detection Merging multiple times firing on true pedestrians on nearby positions in scale and space Bootstrapping method Select the most discriminative feature subset = Classifier Design 5/17/2018 Slide 34/90

Object DetectionI Ali Taalimi 01/08/2013.

Similar presentations

Presentation on theme: "Object DetectionI Ali Taalimi 01/08/2013."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Object DetectionI Ali Taalimi 01/08/2013.

Similar presentations

Presentation on theme: "Object DetectionI Ali Taalimi 01/08/2013."— Presentation transcript:

Similar presentations

About project

Feedback