Object DetectionII Ali Taalimi 01/08/2013.

Slides:

Advertisements

Similar presentations

Jan-Michael Frahm, Enrique Dunn Spring 2013

Advertisements

EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.

Recap: Advanced Feature Encoding Bag of Visual Words is only about counting the number of local descriptors assigned to each Voronoi region (0 th order.

Lecture 31: Modern object recognition

Many slides based on P. FelzenszwalbP. Felzenszwalb General object detection with deformable part-based models.

Activity Recognition Aneeq Zia. Agenda What is activity recognition Typical methods used for action recognition “Evaluation of local spatio-temporal features.

AdaBoost & Its Applications

Global spatial layout: spatial pyramid matching Spatial weighting the features Beyond bags of features: Adding spatial information.

Object Category Detection: Sliding Windows Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/10/12.

Face detection Many slides adapted from P. Viola.

EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.

Detecting Pedestrians by Learning Shapelet Features

More sliding window detection: Discriminative part-based models Many slides based on P. FelzenszwalbP. Felzenszwalb.

Student: Yao-Sheng Wang Advisor: Prof. Sheng-Jyh Wang ARTICULATED HUMAN DETECTION 1 Department of Electronics Engineering National Chiao Tung University.

The Viola/Jones Face Detector Prepared with figures taken from “Robust real-time object detection” CRL 2001/01, February 2001.

Object Category Detection: Statistical Templates Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/14/15.

1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.

Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson

Robust Real-time Object Detection by Paul Viola and Michael Jones ICCV 2001 Workshop on Statistical and Computation Theories of Vision Presentation by.

CS 223B Assignment 1 Help Session Dan Maynes-Aminzade.

Face Detection CSE 576. Face detection State-of-the-art face detection demo (Courtesy Boris Babenko)Boris Babenko.

FACE DETECTION AND RECOGNITION By: Paranjith Singh Lohiya Ravi Babu Lavu.

Generic object detection with deformable part-based models

Computer vision.

Internet-scale Imagery for Graphics and Vision James Hays cs195g Computational Photography Brown University, Spring 2010.

EADS DS / SDC LTIS Page 1 7 th CNES/DLR Workshop on Information Extraction and Scene Understanding for Meter Resolution Image – 29/03/07 - Oberpfaffenhofen.

Shape-Based Human Detection and Segmentation via Hierarchical Part- Template Matching Zhe Lin, Member, IEEE Larry S. Davis, Fellow, IEEE IEEE TRANSACTIONS.

A General Framework for Tracking Multiple People from a Moving Camera

Marcin Marszałek, Ivan Laptev, Cordelia Schmid Computer Vision and Pattern Recognition, CVPR Actions in Context.

“Secret” of Object Detection Zheng Wu (Summer intern in MSRNE) Sep. 3, 2010 Joint work with Ce Liu (MSRNE) William T. Freeman (MIT) Adam Kalai (MSRNE)

Detecting Pedestrians Using Patterns of Motion and Appearance Paul Viola Microsoft Research Irfan Ullah Dept. of Info. and Comm. Engr. Myongji University.

Window-based models for generic object detection Mei-Chen Yeh 04/24/2012.

Marco Pedersoli, Jordi Gonzàlez, Xu Hu, and Xavier Roca

Lecture 29: Face Detection Revisited CS4670 / 5670: Computer Vision Noah Snavely.

Face detection Slides adapted Grauman & Liebe’s tutorial

Visual Object Recognition

DIEGO AGUIRRE COMPUTER VISION INTRODUCTION 1. QUESTION What is Computer Vision? 2.

Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.

Pedestrian Detection and Localization

Efficient Subwindow Search: A Branch and Bound Framework for Object Localization ‘PAMI09 Beyond Sliding Windows: Object Localization by Efficient Subwindow.

Deformable Part Models (DPM) Felzenswalb, Girshick, McAllester & Ramanan (2010) Slides drawn from a tutorial By R. Girshick AP 12% 27% 36% 45% 49% 2005.

Object Category Detection: Sliding Windows Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/18/10.

A Statistical Method for 3D Object Detection Applied to Face and Cars CVPR 2000 Henry Schneiderman and Takeo Kanade Robotics Institute, Carnegie Mellon.

Face Detection Ying Wu Electrical and Computer Engineering Northwestern University, Evanston, IL

Project 3 Results.

The Viola/Jones Face Detector A “paradigmatic” method for real-time object detection Training is slow, but detection is very fast Key ideas Integral images.

CS 1699: Intro to Computer Vision Detection II: Deformable Part Models Prof. Adriana Kovashka University of Pittsburgh November 12, 2015.

Object Detection Overview Viola-Jones Dalal-Triggs Deformable models Deep learning.

Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11.

Multi-view Traffic Sign Detection, Recognition and 3D Localisation Radu Timofte, Karel Zimmermann, and Luc Van Gool.

More sliding window detection: Discriminative part-based models

Object Recognizing. Object Classes Individual Recognition.

Face detection Many slides adapted from P. Viola.

Object DetectionI Ali Taalimi 01/08/2013.

Cascade for Fast Detection

Guillaume-Alexandre Bilodeau

Object detection with deformable part-based models

Data Driven Attributes for Action Detection

Recognizing Deformable Shapes

Lit part of blue dress and shadowed part of white dress are the same color

Recap: Advanced Feature Encoding

Object detection as supervised classification

Introduction of Pedestrian Detection

Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science

“The Truth About Cats And Dogs”

Brief Review of Recognition + Context

KFC: Keypoints, Features and Correspondences

SIFT keypoint detection

Liyuan Li, Jerry Kah Eng Hoe, Xinguo Yu, Li Dong, and Xinqi Chu

Lecture 29: Face Detection Revisited

Presentation transcript:

Object DetectionII Ali Taalimi 01/08/2013

Object Detection Face Detection Conclusion Outline Sliding Window Based Local Interest Points Face Detection Conclusion 12/29/2018 Slide 2/90

General Process of Object Recognition Specify Object Model Generate Hypotheses Score Hypotheses Resolve Detections What are the object parameters? Propose an alignment of the model to the image Mainly-gradient based features, usually based on summary representation, many classifiers Rescore each proposed object based on whole set 12/29/2018 Slide 3/90

Specifying an object model Geometry vs. Appearance Parts vs. The Whole …and the standard answer: probably both or neither Torralba Tutorial

Specifying an object model Statistical Template in Bounding Box –Object is some (x,y,w,h) in image –Features defined wrt bounding box coordinates N. Dalal and B. Triggs, “Histograms of oriented gradients,” in Proc. IEEE CVPR, Jun. 2005 12/29/2018 Slide 5/90

Specifying an object model Articulated parts model Object is configuration of parts Each part is detectable Fischler & Elschlager 73 Felzenszwalb, 2005 Part = oriented rectangle Spatial model = relative size/orientation 12/29/2018 Slide 6/90

Specifying an object model Hybrid template/parts model Pixels  Pixel groupings  Parts  Object detection Felzenszwalb et al. 2008 12/29/2018 Slide 7/90

Generating hypotheses 1. Sliding window – Test patch at each location and scale 12/29/2018 Slide 8/90

Sliding window-based Search over space and scale: Detection as subwindow classification problem: In the absence of a more intelligent strategy, any global image classification approach can be converted into a localization approach by using a sliding-window search Object model = sum of scores of features at fixed positions Building such a classifier is possible because pixels on a face are highly correlated, whereas those in a nonface subwindow present much less regularity. Derek Hoiem, tutorial 12/29/2018 Slide 9/90

Concept of Online Learning Sliding window-based collect a large set of face/target and nonface examples, and adopt certain machine learning algorithms to learn a target model to perform classification. Given a set of windows corresponding to faces and nonfaces, extract features such as color, texture, and contours use statistical models such as SVM and AdaBoost to learn the patterns of pixels in those windows. perform detection across windows of different test image scales Concept of Online Learning Face Detection, Raghuraman Gopalan, chapter5 of the book visual analysis of humans 12/29/2018 Slide 10/90

Sliding window-based Notes: But… Training the classifier: Schemes like multiple instance learning boosting and multiple category boosting leverage unlabeled data to facilitate learning. Unsupervised or semi-supervised learning Works with lower resolution tiny images for training: Viola&Jones: 2424 pixels Torralba et al: 3232 pixels Dalal&Triggs: 6496 pixels (notable exception) But… Limited information content available at those resolutions Not enough support to compensate for occlusions! 3) How to efficiently search for likely objects? Even simple models require searching hundreds of thousands of positions and scales 4) Feature design and scoring How should appearance be modeled? What features correspond to the object? 5) How to deal with different viewpoints? Often train different models for a few different viewpoints 12/29/2018 Slide 11/90

Statistical Template Approach/Sliding Window Base Strengths and Weaknesses Strengths • Works very well for non-deformable objects: faces, cars, upright pedestrians • Fast detection Weaknesses • Not so well for highly deformable objects • Not robust to occlusion • Requires lots of training data 12/29/2018 Slide 12/90

Generating hypotheses 2. Voting from patches/keypoints ISM model by Leibe et al, 2004 12/29/2018 Slide 13/90

Local interest point represent image in terms of local interest-point detectors like Harris detector. Feature descriptions such as SIFT, and Shape contexts are built upon these feature detectors to form inputs for a classification engine. instead of directly analyzing all pixels (regions), the classifier analyzes only those regions with prominent feature responses. Face Detection, Raghuraman Gopalan, chapter5 of the book visual analysis of humans 12/29/2018 Slide 14/90

Local interest points Bag of Words/Dictionary/CodeWords approach ICCV 2009 tutorial on Recognizing and learning object categories 12/29/2018 Slide 15/90

Problem with bag-of-words All have equal probability for bag-of-words methods Location information is important 12/29/2018 Slide 16/90

Deformable objects Images from D. Ramanan’s dataset 12/29/2018 Slide 17/90

Parts-based Models Define object by collection of parts modeled by 1. Appearance of part 2. Spatial configuration (Relative locations between parts) Parts need to be distinctive to separate from other classes by Rob Fergus (MIT) 12/29/2018 Slide 18/90

How to model spatial relations? One extreme: fixed template Derek Hoiem, Illinoise Another extreme: bag of words 12/29/2018 Slide 19/90

How to model spatial relations? 1. Star-shaped model–Example: Example: ISM Leibe et al. 2004, 2008 2. Tree-shaped model: Example: Pictorial structures Felzenszwalb, 2005, 2009 These parts are either semantically motivated (body parts such as head, torso, and legs) or concern codebook representations. 12/29/2018 Slide 20/90

Things to remember Rather than searching for whole object, can locate “parts” that vote for object – Better encoding of spatial variation These parts can vote for other things too Models can be broken down into part appearance and spatial configuration – Wide variety of models Efficient optimization can be tricky but usually possible Their applicability to lower resolution images is limited since each component detector requires a certain spatial support for robustness. 12/29/2018 Slide 21/90

Resolving detection scores Non-max suppression Context/reasoning Putting Objects in Perspective, Hoiem et al, CVPR2006 Carnegie Mellon University 12/29/2018 Slide 22/90

Face Detection Categorization of Existing Approaches for face detection data representation perspective: Sliding window-based Local interest-point-based 2. mode of classification: generative methods: attempt to model how the data is generated. discriminative methods: directly discriminate between the two classes Based of the range of acceptable head poses: single pose rotation-invariant: in-plane rotations of the head multi-view: out-of-plane rotations pose-invariant: no restrictions on the orientation 12/29/2018 Slide 23/90

Viola-Jones Face Detector VJ system [2001] made face detection practically feasible in real world applications Key ideas: Integral images for fast feature evaluation Boosting for feature selection Attentional cascade for fast rejection of non-face windows The AdaBoost algorithm is used to solve the following three fundamental problems: (1) selecting effective features from a large feature set; (2) constructing weak classifiers, each of which is based on one of the selected features; and (3) boosting the weak classifiers to construct a strong classifier. Three major components contribute to the cascade face detector: an over-complete set of local features that can be evaluated quickly, An AdaBoost based method to build strong nonlinear classifiers from the weak local features, cascade detector architecture that leads to realtime detection speed. Viola and Jones CVPR 2001, IJCV 2004 12/29/2018 Slide 24/90

Attentional cascade Chain classifiers that are progressively more complex and have lower false positive rates: Viola and Jones CVPR 2001, IJCV 2004 12/29/2018 Slide 25/90

Cascade for Fast Detection The ith filter of the cascade will be designed to: Reject the larger possible number of non-object windows To let pass the larger possible number of object windows Evaluated as fast as possible 12/29/2018 Slide 26/90

Limitation of Viola-Jones Frame 44, Small People Detector Frame 20, Small People Detector Using MATLAB For UpperBody detection: UprightPeople_96x48 Frame 48, Upper Body Detector Frame 35, Upper Body Detector Frame 64, Front Face Detector Frame 48, Front Face Detector 12/29/2018 Slide 27/90

Detection By Tracking and Tracking By Detection Tracking systems address: motion and matching Motion problem: identify a limited search region in which the element is expected to be found Matching problem: identify the image element in the next frame within the designated search region Why is association between detections and targets difficult? Detection result degrades in occluded scene. Detector output is unreliable and sparse. 12/29/2018 Slide 28/90

Some Experiments FPDW MATLAB Built in Tracking FPDW 12/29/2018 Slide 29/90

Multiview Face Detection multiview face detection: face detection + pose estimation Face detection: distinguish faces from nonfaces, using similarities between faces of different poses Pose estimation: identify the probable pose of a pattern, whether it is a face or not view-based method: in which several face models are built, each describing faces in a certain view range. different detector structures High-Performance Rotation Invariant Multiview Face Detection, Huang et al, PAMI2007 12/29/2018 Slide 30/90

Context context-based methods tries to answer following questions: What information does a face share with its surroundings? Given some characteristics of the global scene, how probable is the presence of face in there? The first question is a bottom-up way of learning the object and its surroundings. The second question is a top down model of what a scene conveys about the probability of presence of an object. 12/29/2018 Slide 31/90

Context Objects do not occur in isolation The surrounding scene information does provide some clue about the presence of objects The influence of object extends beyond its physical boundaries. Torralba, A.: Contextual priming for object detection. Int. J. Comput. Vis. 53, 169–191 (2003) Face detection probability < Person detection probability 12/29/2018 Slide 32/90

Conclusions three main steps which can be varied to gain performance: feature extraction, classification, and non-maxima suppression. The most common features: variants of the HOG, histograms of gradients and optic flow and different generalized Haar wavelets Combining multiple complementary types of low-level features: Which combination of cues should be used? Channel of features should complete each other. Simple/Complex Features  Faster/Slower, Less/High Discriminant Regional statistics like histograms: local edge orientation (HoG)/ spatial (LBP) histogram feature pool becomes larger and larger challenges in the feature selection  data/feature mining One suggestion: HoG, HoF, Color Self Similarity (CSS). Other Option: color channels, gradient magnitude, HoG Why Hog? Encode high frequency gradient information Why Haar? Encode lower frequency changes in the color channels Why HoF? Motion cues spatial histogram = LBP 12/29/2018 Slide 33/90

Conclusions classification techniques aim at determining an optimal decision boundary between pattern classes in a feature space There are huge number of engineering details in training classifier! Asymmetry Learning Rare Event Detection Merging multiple times firing on true pedestrians on nearby positions in scale and space Bootstrapping method Select the most discriminative feature subset = Classifier Design 12/29/2018 Slide 34/90

DETECTION The Process Features Classifier Training Tracking 12/29/2018 Slide 35/90

Detection Only Video Length = 29’, Video Size = 720*480, NumOfframe=726, FPS=25 Time Of Process = 12’ 45’’ FPDW Algorithm 12/29/2018 Slide 36/90

Detection + Tracking Single Tracking Video Length = 29’, Video Size = 720*480, NumOfframes=726, FPS=25 Time= 12’ 29” FPDW+KalmanFiltering 12/29/2018 Slide 37/90

Detection only Multiple Target Tracking Video Length = 29’, Video Size = 320*240, NumOfframes=3216, FPS=25 Time Of Process = 31’16” FPDW from CALTECH 12/29/2018 Slide 38/90

Detection + Tracking Video Length = 29’, Video Size = 320*240, NumOfframes=3216, FPS=25 Time Of Process Time = 35’ 27” 12/29/2018 Slide 39/90

Detection + Tracking Multiple Target Tracking Video Length = 29’, Video Size = 320*240, NumOfframes=3216, FPS=25 Time of Process = 30’ 22” This time we show detection of FPDW, too. 12/29/2018 Slide 40/90

Motion Tracking MATLAB built in function Motion Tracking using MATLAB built in function Time of Process = 32’ 12/29/2018 Slide 41/90