Computer Vision 776 Jan-Michael Frahm 12/05/2011 Many slides from Derek Hoiem, James Hays.

Slides:



Advertisements
Similar presentations
Feature extraction: Corners
Advertisements

Feature Detection. Description Localization More Points Robust to occlusion Works with less texture More Repeatable Robust detection Precise localization.
Feature Matching and Robust Fitting Computer Vision CS 143, Brown James Hays Acknowledgment: Many slides from Derek Hoiem and Grauman&Leibe 2008 AAAI Tutorial.
TP14 - Local features: detection and description Computer Vision, FCUP, 2014 Miguel Coimbra Slides by Prof. Kristen Grauman.
MASKS © 2004 Invitation to 3D vision Lecture 7 Step-by-Step Model Buidling.
Object Recognition using Invariant Local Features Applications l Mobile robots, driver assistance l Cell phone location or object recognition l Panoramas,
Matching with Invariant Features
Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.
Feature extraction: Corners 9300 Harris Corners Pkwy, Charlotte, NC.
1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.
Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.
Feature matching and tracking Class 5 Read Section 4.1 of course notes Read Shi and Tomasi’s paper on.
Midterm review: Cameras Pinhole cameras Vanishing points, horizon line Perspective projection equation, weak perspective Lenses Human eye Sample question:
Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2005 with a lot of slides stolen from Steve Seitz and.
Feature extraction: Corners and blobs
Feature tracking Class 5 Read Section 4.1 of course notes Read Shi and Tomasi’s paper on good features.
May 2004Stereo1 Introduction to Computer Vision CS / ECE 181B Tuesday, May 11, 2004  Multiple view geometry and stereo  Handout #6 available (check with.
Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2006 with a lot of slides stolen from Steve Seitz and.
CSE473/573 – Stereo Correspondence
CS4670: Computer Vision Kavita Bala Lecture 8: Scale invariance.
Computer Vision: Summary and Discussion Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/30/2015.
Computer vision: models, learning and inference
CSE 185 Introduction to Computer Vision
Computer Vision: Summary and Discussion
Distinctive Image Features from Scale-Invariant Keypoints By David G. Lowe, University of British Columbia Presented by: Tim Havinga, Joël van Neerbos.
Final Exam Review CS485/685 Computer Vision Prof. Bebis.
Internet-scale Imagery for Graphics and Vision James Hays cs195g Computational Photography Brown University, Spring 2010.
Computer Vision: Summary and Anti-summary Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 05/04/2010.
1 Interest Operators Harris Corner Detector: the first and most basic interest operator Kadir Entropy Detector and its use in object recognition SIFT interest.
Object Tracking/Recognition using Invariant Local Features Applications l Mobile robots, driver assistance l Cell phone location or object recognition.
Computer Vision: Summary and Discussion Computer Vision ECE 5554 Virginia Tech Devi Parikh 11/21/
CSCE 643 Computer Vision: Structure from Motion
Computer Vision: Summary and Discussion Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 05/03/2011.
Feature extraction: Corners 9300 Harris Corners Pkwy, Charlotte, NC.
776 Computer Vision Jan-Michael Frahm, Enrique Dunn Spring 2013.
Lecture 7: Features Part 2 CS4670/5670: Computer Vision Noah Snavely.
CSE 185 Introduction to Computer Vision Stereo. Taken at the same time or sequential in time stereo vision structure from motion optical flow Multiple.
Feature extraction: Corners and blobs. Why extract features? Motivation: panorama stitching We have two images – how do we combine them?
CSE 185 Introduction to Computer Vision Feature Matching.
Project 3 questions? Interest Points and Instance Recognition Computer Vision CS 143, Brown James Hays 10/21/11 Many slides from Kristen Grauman and.
Local features: detection and description
Features Jan-Michael Frahm.
776 Computer Vision Jan-Michael Frahm Spring 2012.
Robotics Chapter 6 – Machine Vision Dr. Amit Goradia.
Announcements Final is Thursday, March 18, 10:30-12:20 –MGH 287 Sample final out today.
CSE 185 Introduction to Computer Vision Local Invariant Features.
MASKS © 2004 Invitation to 3D vision Lecture 3 Image Primitives andCorrespondence.
Recognizing specific objects Matching with SIFT Original suggestion Lowe, 1999,2004.
Correspondence and Stereopsis. Introduction Disparity – Informally: difference between two pictures – Allows us to gain a strong sense of depth Stereopsis.
Invariant Local Features Image content is transformed into local feature coordinates that are invariant to translation, rotation, scale, and other imaging.
776 Computer Vision Jan-Michael Frahm Spring 2012.
Computer Vision: Summary and Discussion Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 05/01/2012.
Another Example: Circle Detection
CS262: Computer Vision Lect 09: SIFT Descriptors
Summary and Discussion
TP12 - Local features: detection and description
Computer Vision: Summary and Discussion
Local features: detection and description May 11th, 2017
Feature description and matching
Common Classification Tasks
Features Readings All is Vanity, by C. Allan Gilbert,
CSE 455 – Guest Lectures 3 lectures Contact Interest points 1
Brief Review of Recognition + Context
Filtering Things to take away from this lecture An image as a function
Announcements Final is Thursday, March 16, 10:30-12:20
CSE 185 Introduction to Computer Vision
Lecture VI: Corner and Blob Detection
Feature descriptors and matching
Filtering An image as a function Digital vs. continuous images
Presented by Xu Miao April 20, 2005
Presentation transcript:

Computer Vision 776 Jan-Michael Frahm 12/05/2011 Many slides from Derek Hoiem, James Hays

Computer Vision Builds On… Image Processing – to extract low-level information from images structure tensor in Harris corners gradients, smoothing, …. Machine Learning – to make decisions based on data SVM, …

Fundamentals of Computer Vision Geometry – How to relate world coordinates and image coordinates Matching – How to measure the similarity of two regions Alignment – How to align points/patches – How to recover transformation parameters based on matched points Grouping – What points/regions/lines belong together? Categorization / Recognition – What similarities are important?

Geometry x = K [R t] X – Maps 3d point X to 2d point x – Rotation R and translation t map into 3D camera coordinates – Intrinsic matrix K projects from 3D to 2D Parallel lines in 3D converge at the vanishing point in the image – A 3D plane has a vanishing line in the image x’ T F x = 0 – Points in two views that correspond to the same 3D point are related by the fundamental matrix F – if calibration is known it is the essential matrix with 5 instead of 7 degrees of freedom

Matching Does this patch match that patch? – In two simultaneous views? (stereo) – In two successive frames? (tracking like KLT, optical flow, SFM) – In two pictures of the same object? (recognition) ? ?

Matching Representation: be invariant/robust to expected deformations but nothing else Often assume that shape is constant – Key cue: local differences in shading (e.g., gradients, binary features, …) Change in viewpoint – Rotation invariance: rotate and/or affine warp patch according to dominant orientations Change in lighting or camera gain – Average intensity invariance: oriented gradient-based matching – Contrast invariance: normalize gradients by magnitude Small translations – Translation robustness: histograms over small regions But can one representation do all of this? SIFT: local normalized histograms of oriented gradients provides robustness to in-plane orientation, lighting, contrast, translation HOG: like SIFT but does not rotate to dominant orientation

Alignment of points Search: efficiently align matching patches Interest points: find repeatable, distinctive points – Long-range matching: e.g., wide baseline stereo, panoramas, object instance recognition – Harris: points with strong gradients in orthogonal directions (e.g., corners) are precisely repeatable in x-y – Difference of Gaussian: points with peak response in Laplacian image pyramid are somewhat repeatable in x-y-scale (e.g. SIFT) Local search – Short range matching: e.g., tracking, optical flow – Gradient descent on patch SSD, often with image pyramid Windowed search – Long-range matching: e.g., recognition, stereo w/ scanline

Alignment of sets Find transformation to align matching sets of points Geometric transformation (e.g., affine) – Least squares fit (SVD), if all matches can be trusted – Hough transform: each potential match votes for a range of parameters Works well if there are very few parameters (3-4) – RANSAC: repeatedly sample potential matches, compute parameters, and check for inliers Works well if fraction of inliers is reasonable and few parameters (2- 12) B1B1 B2B2 B3B3 A1A1 A2A2 A3A3

Grouping Clustering: group items (patches, pixels, lines, etc.) that have similar appearance – Discretize continuous values; typically, represent points within cluster by center – Improve efficiency: e.g., cluster interest points before recognition – Summarize data Segmentation: group pixels into regions of coherent color, texture, motion, and/or label – Mean-shift clustering – Graph-based segmentation: e.g., normalized cuts

Categorization Match objects, parts, or scenes that may vary in appearance Categories are typically defined by human and may be related by function, location, or other non-visual attributes Key problem: what are important similarities? – Can be learned from training examples Training Labels Training Images Classifier Training Image Features Trained Classifier

Categorization Representation: ideally should be compact, comprehensive, direct Histograms of quantized local descriptors (SIFT, HOG), color, texture – Typical for image or region categorization – Degree of spatial encoding is controllable by using spatial pyramids HOG features at specified position – Often used for finding parts or objects Bag of words approaches

Object Categorization Search by Sliding Window Detector May work well for rigid objects Key idea: simple alignment for simple deformations Object or Background?

Object Categorization Search by Parts-based model Key idea: more flexible alignment for articulated objects Defined by models of part appearance, geometry or spatial layout, and search algorithm

Vision as part of an intelligent system 3D Scene Feature Extraction Interpretation Action TextureColor Optical Flow Stereo Disparity Grouping Surfaces Bits of objects Sense of depth Objects Agents and goals Shapes and properties Open paths Words Walk, touch, contemplate, smile, evade, read on, pick up, … Motion patterns

Important open problems Computer vision is potentially worth major $$$, but there are major challenges to overcome first. Driver assistance – MobileEye received >$100M in funding from Goldman Sachs Entertainment (Kinect, movies, etc.) – Intel is spending $100M for visual computing over next five years Security – Potential for billions of deployed cameras Robot workers Many more

Important open problems Object category recognition: where is the cat?

Important open problems Object category recognition: where is the cat? Important questions: How can we better align two object instances? How do we identify the important similarities of objects within a category? How do we tell if two patches depict similar shapes?

Important open problems Spatial understanding: what is it doing? Or how do I do it?

Important open problems Spatial understanding: what is it doing? Or how do I do it? Important questions: What are good representations of space for navigation and interaction? What kind of details are important? How can we combine single-image cues with multi-view cues?

Important open problems Object representation: what is it?

Important open problems Object representation: what is it? Important questions: How can we pose recognition so that it lets us deal with new objects? What do we want to predict or infer, and to what extent does that rely on categorization? How do we transfer knowledge of one type of object to another?

Open Problems Can we build a “core” vision system that can easily be extended to perform new tasks or even learn on its own? What kind of representations might allow this? What should be built in and what should be learned?