Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, T. Tuytelaars, …

Slides:



Advertisements
Similar presentations
Computer Vision - A Modern Approach Set: Recognition by relations Slides by D.A. Forsyth Matching by relations Idea: –find bits, then say object is present.
Advertisements

Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction.
Computer Vision - A Modern Approach Set: Probability in segmentation Slides by D.A. Forsyth Missing variable problems In many vision problems, if some.
Outline Feature Extraction and Matching (for Larger Motion)
Face Alignment with Part-Based Modeling
TP14 - Local features: detection and description Computer Vision, FCUP, 2014 Miguel Coimbra Slides by Prof. Kristen Grauman.
Object Recognition using Invariant Local Features Applications l Mobile robots, driver assistance l Cell phone location or object recognition l Panoramas,
Learning to estimate human pose with data driven belief propagation Gang Hua, Ming-Hsuan Yang, Ying Wu CVPR 05.
Announcements Final Exam May 13th, 8 am (not my idea).
University of Pennsylvania 1 GRASP CIS 580 Machine Perception Fall 2004 Jianbo Shi Object recognition.
Facial feature localization Presented by: Harvest Jang Spring 2002.
Instructor: Mircea Nicolescu Lecture 13 CS 485 / 685 Computer Vision.
1 CS6825: Recognition 8. Hidden Markov Models 2 Hidden Markov Model (HMM) HMMs allow you to estimate probabilities of unobserved events HMMs allow you.
Matching with Invariant Features
Model: Parts and Structure. History of Idea Fischler & Elschlager 1973 Yuille ‘91 Brunelli & Poggio ‘93 Lades, v.d. Malsburg et al. ‘93 Cootes, Lanitis,
Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.
Lecture 5 Template matching
Region labelling Giving a region a name. Image Processing and Computer Vision: 62 Introduction Region detection isolated regions Region description properties.
Announcements Final Exam May 16 th, 8 am (not my idea). Practice quiz handout 5/8. Review session: think about good times. PS5: For challenge problems,
Real-time Embedded Face Recognition for Smart Home Fei Zuo, Student Member, IEEE, Peter H. N. de With, Senior Member, IEEE.
Computer Vision Fitting Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, T. Darrel, A. Zisserman,...
1 Interest Operators Find “interesting” pieces of the image –e.g. corners, salient regions –Focus attention of algorithms –Speed up computation Many possible.
A Study of Approaches for Object Recognition
Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.
Problem Sets Problem Set 3 –Distributed Tuesday, 3/18. –Due Thursday, 4/3 Problem Set 4 –Distributed Tuesday, 4/1 –Due Tuesday, 4/15. Probably a total.
1 Interest Operator Lectures lecture topics –Interest points 1 (Linda) interest points, descriptors, Harris corners, correlation matching –Interest points.
Feature matching and tracking Class 5 Read Section 4.1 of course notes Read Shi and Tomasi’s paper on.
Many slides and illustrations from J. Ponce
Distinctive Image Feature from Scale-Invariant KeyPoints
Feature tracking Class 5 Read Section 4.1 of course notes Read Shi and Tomasi’s paper on good features.
Multiple View Geometry Marc Pollefeys University of North Carolina at Chapel Hill Modified by Philippos Mordohai.
Distinctive image features from scale-invariant keypoints. David G. Lowe, Int. Journal of Computer Vision, 60, 2 (2004), pp Presented by: Shalomi.
Stepan Obdrzalek Jirı Matas
Template matching and object recognition
CS 223B Assignment 1 Help Session Dan Maynes-Aminzade.
Fitting a Model to Data Reading: 15.1,
Scale Invariant Feature Transform (SIFT)
Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, T. Tuytelaars, …
1 CS6825: Recognition – a sample of Applications.
Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2006 with a lot of slides stolen from Steve Seitz and.
Distinctive Image Features from Scale-Invariant Keypoints David G. Lowe – IJCV 2004 Brien Flewelling CPSC 643 Presentation 1.
Computer Vision Segmentation Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, T. Darrel,...
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Computer Vision Optical Flow Marc Pollefeys COMP 256 Some slides and illustrations from L. Van Gool, T. Darell, B. Horn, Y. Weiss, P. Anandan, M. Black,
1 Interest Operators Find “interesting” pieces of the image Multiple possible uses –image matching stereo pairs tracking in videos creating panoramas –object.
Overview Introduction to local features
Distinctive Image Features from Scale-Invariant Keypoints By David G. Lowe, University of British Columbia Presented by: Tim Havinga, Joël van Neerbos.
Final Exam Review CS485/685 Computer Vision Prof. Bebis.
Shape-Based Human Detection and Segmentation via Hierarchical Part- Template Matching Zhe Lin, Member, IEEE Larry S. Davis, Fellow, IEEE IEEE TRANSACTIONS.
Recognition and Matching based on local invariant features Cordelia Schmid INRIA, Grenoble David Lowe Univ. of British Columbia.
Human-Computer Interaction Human-Computer Interaction Tracking Hanyang University Jong-Il Park.
Object Tracking/Recognition using Invariant Local Features Applications l Mobile robots, driver assistance l Cell phone location or object recognition.
Overview Harris interest points Comparing interest points (SSD, ZNCC, SIFT) Scale & affine invariant interest points Evaluation and comparison of different.
Local invariant features Cordelia Schmid INRIA, Grenoble.
Template matching and object recognition. CS8690 Computer Vision University of Missouri at Columbia Matching by relations Idea: –find bits, then say object.
MSRI workshop, January 2005 Object Recognition Collected databases of objects on uniform background (no occlusions, no clutter) Mostly focus on viewpoint.
Computer Vision Lecture #10 Hossam Abdelmunim 1 & Aly A. Farag 2 1 Computer & Systems Engineering Department, Ain Shams University, Cairo, Egypt 2 Electerical.
Local invariant features Cordelia Schmid INRIA, Grenoble.
Overview Introduction to local features Harris interest points + SSD, ZNCC, SIFT Scale & affine invariant interest point detectors Evaluation and comparison.
Computer Vision Set: Object Recognition Slides by C.F. Olson 1 Object Recognition.
Features, Feature descriptors, Matching Jana Kosecka George Mason University.
Final Review Course web page: vision.cis.udel.edu/~cv May 21, 2003  Lecture 37.
Local features: detection and description
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
776 Computer Vision Jan-Michael Frahm Spring 2012.
Recognizing specific objects Matching with SIFT Original suggestion Lowe, 1999,2004.
TP12 - Local features: detection and description
CSE 455 – Guest Lectures 3 lectures Contact Interest points 1
Recognition and Matching based on local invariant features
Presentation transcript:

Computer Vision Template matching and object recognition Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, T. Tuytelaars, …

Computer Vision Aug 26/28-Introduction Sep 2/4CamerasRadiometry Sep 9/11Sources & ShadowsColor Sep 16/18Linear filters & edges(hurricane Isabel) Sep 23/25Pyramids & TextureMulti-View Geometry Sep30/Oct2StereoProject proposals Oct 7/9Tracking (Welch)Optical flow Oct 14/16-- Oct 21/23Silhouettes/carving(Fall break) Oct 28/30-Structure from motion Nov 4/6Project updateProj. SfM Nov 11/13Camera calibrationSegmentation Nov 18/20FittingProb. segm.&fit. Nov 25/27Matching templates(Thanksgiving) Dec 2/4Matching relationsRange data Dec 9Final project Tentative class schedule

Computer Vision Discussion assignment 2 Simple stereo using SSD on 11x11 windows and a disparity range of [0,150] without histogram equalization with histogram equalization

Computer Vision Assignment 3 Use Hough, RANSAC and EM to estimate noisy line embedded in noise (details on the web by tonight)

Computer Vision Reminder EM (Expectation Maximization) Alternate Expectation (determine feature appartenance) Maximization (determine ML model parameters) optimization (weighted with  i ) counting

Computer Vision Last class: Recognition by matching templates Classifiers PCA LDA decision boundaries, not prob.dens. dimensionality reduction maximize discrimination

Computer Vision Last class: Recognition by matching templates Neural Networks Support Vector Machines Universal approximation property Optimal separating hyperplane (OSH) support vectors Convex problem! also for non-linear boundaries

Computer Vision Matching by relations Idea: –find bits, then say object is present if bits are ok Advantage: –objects with complex configuration spaces don’t make good templates internal degrees of freedom aspect changes (possibly) shading variations in texture etc.

Computer Vision Simplest Define a set of local feature templates –could find these with filters, etc. –corner detector+filters Think of objects as patterns Each template votes for all patterns that contain it Pattern with the most votes wins

Computer Vision Figure from “Local grayvalue invariants for image retrieval,” by C. Schmid and R. Mohr, IEEE Trans. Pattern Analysis and Machine Intelligence, 1997 copyright 1997, IEEE

Computer Vision Probabilistic interpretation Write Assume Likelihood of image given pattern

Computer Vision Possible alternative strategies Notice: –different patterns may yield different templates with different probabilities –different templates may be found in noise with different probabilities

Computer Vision Employ spatial relations Figure from “Local grayvalue invariants for image retrieval,” by C. Schmid and R. Mohr, IEEE Trans. Pattern Analysis and Machine Intelligence, 1997 copyright 1997, IEEE

Computer Vision Figure from “Local grayvalue invariants for image retrieval,” by C. Schmid and R. Mohr, IEEE Trans. Pattern Analysis and Machine Intelligence, 1997 copyright 1997, IEEE

Computer Vision Example Training examples Test image

Computer Vision

Computer Vision

Computer Vision Finding faces using relations Strategy: –Face is eyes, nose, mouth, etc. with appropriate relations between them –build a specialised detector for each of these (template matching) and look for groups with the right internal structure –Once we’ve found enough of a face, there is little uncertainty about where the other bits could be

Computer Vision Finding faces using relations Strategy: compare Notice that once some facial features have been found, the position of the rest is quite strongly constrained. Figure from, “Finding faces in cluttered scenes using random labelled graph matching,” by Leung, T. ;Burl, M and Perona, P., Proc. Int. Conf. on Computer Vision, 1995 copyright 1995, IEEE

Computer Vision Detection This means we compare

Computer Vision Issues Plugging in values for position of nose, eyes, etc. –search for next one given what we’ve found when to stop searching –when nothing that is added to the group could change the decision –i.e. it’s not a face, whatever features are added or –it’s a face, and anything you can’t find is occluded what to do next –look for another eye? or a nose? –probably look for the easiest to find What if there’s no nose response –marginalize

Computer Vision Figure from, “Finding faces in cluttered scenes using random labelled graph matching,” by Leung, T. ;Burl, M and Perona, P., Proc. Int. Conf. on Computer Vision, 1995 copyright 1995, IEEE

Computer Vision Pruning Prune using a classifier –crude criterion: if this small assembly doesn’t work, there is no need to build on it. Example: finding people without clothes on –find skin –find extended skin regions –construct groups that pass local classifiers (i.e. lower arm, upper arm) –give these to broader scale classifiers (e.g. girdle)

Computer Vision Pruning Prune using a classifier –better criterion: if there is nothing that can be added to this assembly to make it acceptable, stop –equivalent to projecting classifier boundaries.

Computer Vision Horses

Computer Vision Hidden Markov Models Elements of sign language understanding –the speaker makes a sequence of signs –Some signs are more common than others –the next sign depends (roughly, and probabilistically) only on the current sign –there are measurements, which may be inaccurate; different signs tend to generate different probability densities on measurement values Many problems share these properties –tracking is like this, for example

Computer Vision Hidden Markov Models Now in each state we could emit a measurement, with probability depending on the state and the measurement We observe these measurements

Computer Vision HMM’s - dynamics

Computer Vision HMM’s - the Joint and Inference

Computer Vision Trellises Each column corresponds to a measurement in the sequence Trellis makes the collection of legal paths obvious Now we would like to get the path with the largest negative log- posterior Trellis makes this easy, as follows.

Computer Vision

Computer Vision Fitting an HMM I have: –sequence of measurements –collection of states –topology I want –state transition probabilities –measurement emission probabilities Straightforward application of EM –discrete vars give state for each measurement –M step is just averaging, etc.

Computer Vision HMM’s for sign language understanding-1 Build an HMM for each word

Computer Vision HMM’s for sign language understanding-2 Build an HMM for each word Then build a language model

Computer Vision Figure from “Real time American sign language recognition using desk and wearable computer based video,” T. Starner, et al. Proc. Int. Symp. on Computer Vision, 1995, copyright 1995, IEEE User gesturing For both isolated word recognition tasks and for recognition using a language model that has five word sentences (words always appearing in the order pronoun verb noun adjective pronoun ), Starner and Pentland’s displays a word accuracy of the order of 90%. Values are slightly larger or smaller, depending on the features and the task, etc.

Computer Vision HMM’s can be spatial rather than temporal; for example, we have a simple model where the position of the arm depends on the position of the torso, and the position of the leg depends on the position of the torso. We can build a trellis, where each node represents correspondence between an image token and a body part, and do DP on this trellis.

Computer Vision

Computer Vision Figure from “Efficient Matching of Pictorial Structures,” P. Felzenszwalb and D.P. Huttenlocher, Proc. Computer Vision and Pattern Recognition2000, copyright 2000, IEEE

Computer Vision Recognition using local affine and photometric invariant features Hybrid approach that aims to deal with large variations in –Viewpoint Tuytelaars and Van Gool, BMVC2000

Computer Vision Recognition using local affine and photometric invariant features Hybrid approach that aims to deal with large variations in –Viewpoint –Illumination

Computer Vision Recognition using local affine and photometric invariant features Hybrid approach that aims to deal with large variations in –Viewpoint –Illumination –Background

Computer Vision Recognition using local affine and photometric invariant features Hybrid approach that aims to deal with large variations in –Viewpoint –Illumination –Background –and Occlusions

Computer Vision Recognition using local affine and photometric invariant features Hybrid approach that aims to deal with large variations in –Viewpoint –Illumination –Background –and Occlusions  Use local invariant features Invariant features = features that are preserved under a specific group of transformations Robust to occlusions and changes in background Robust to changes in viewpoint and illumination

Computer Vision Affine geometric deformations Linear photometric changes Transformations for planar objects

Computer Vision Local invariant features ‘Affine invariant neighborhood’

Computer Vision Local invariant features

Computer Vision Local invariant features Geometry-based region extraction –Curved edges –Straight edges Intensity-based region extraction

Computer Vision Geometry-based method (curved edges)

Computer Vision Geometry-based method (curved edges) 1.Harris corner detection

Computer Vision Geometry-based method (curved edges) 2.Canny edge detection

Computer Vision Geometry-based method (curved edges) 3.Evaluation relative affine invariant parameter along two edges

Computer Vision Geometry-based method (curved edges) 4.Construct 1-dimensional family of parallelogram shaped regions

Computer Vision Geometry-based method (curved edges) f 5.Select parallelograms based on local extrema of invariant function

Computer Vision Geometry-based method (curved edges) 5.Select parallelograms based on local extrema of invariant function

Computer Vision Geometry-based method (straight edges) Relative affine invariant parameters are identically zero!

Computer Vision Geometry-based method (straight edges) 1.Harris corner detection

Computer Vision Geometry-based method (straight edges) 2.Canny edge detection

Computer Vision Geometry-based method (straight edges) 3.Fit lines to edges

Computer Vision Geometry-based method (straight edges) 4.Select parallelograms based on local extrema of invariant functions

Computer Vision Geometry-based method (straight edges) 4.Select parallelograms based on local extrema of invariant functions

Computer Vision Intensity based method 1.Search intensity extrema 2.Observe intensity profile along rays 3.Search maximum of invariant function f(t) along each ray 4.Connect local maxima 5.Fit ellipse 6.Double ellipse size

Computer Vision Intensity based method

Computer Vision Comparison Intensity-based method More robust Geometry-based method Less computations More environments

Computer Vision Robustness “Correct” detection of single environment cannot be guaranteed –Non-planar region –Noise, quantization errors –Non-linear photometric distortion –Perspective-distortion –… All regions of an object / image should be considered simultaneously

Computer Vision 1.Extract affine invariant regions 2.Describe region with feature vector of moment invariants e.g. Search for corresponding regions

Computer Vision 1.Extract affine invariant regions 2.Describe region with feature vector of moment invariants 3.Search for corresponding regions based on Mahalanobis distance 4.Check cross-correlation (after normalization) 5.Check consistency of correspondences Search for corresponding regions

Computer Vision Semi-local constraints = check consistency of correspondences Epipolar constraint ( RANSAC ) based on 7 points Geometric constraints Photometric constraints based on a combination of only 2 regions

Computer Vision Experimental validation degrees symmetric correct Number of matches

Computer Vision Experimental validation symmetric correct scale Number of matches error

Computer Vision Experimental validation symmetric correct Number of matches illumination reference

Computer Vision Object recognition and localization ‘Appearance’-based approach = objects are modeled by a set of reference images Voting principle based on number of similar regions More invariance = requires less reference images

Computer Vision Object recognition and localization

Computer Vision Object recognition and localization

Computer Vision Wide-baseline stereo

Computer Vision Wide-baseline stereo

Computer Vision Wide-baseline stereo

Computer Vision = Searching of ‘similar’ images in a database based on image content Local features Similarity = images contain the same object or the same scene Voting principle –Based on the number of similar regions Content-based image retrieval from database

Computer Vision Database ( > 450 images) Search image Content-based image retrieval from database

Computer Vision Content-based image retrieval from database

Computer Vision Content-based image retrieval from database

Computer Vision Application: virtual museum guide

Computer Vision Next class: Range data Reading: Chapter 21