Computer Vision: Summary and Anti-summary Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 05/04/2010.

Slides:



Advertisements
Similar presentations
3D Model Matching with Viewpoint-Invariant Patches(VIP) Reporter :鄒嘉恆 Date : 10/06/2009.
Advertisements

Summary of Friday A homography transforms one 3d plane to another 3d plane, under perspective projections. Those planes can be camera imaging planes or.
Feature Matching and Robust Fitting Computer Vision CS 143, Brown James Hays Acknowledgment: Many slides from Derek Hoiem and Grauman&Leibe 2008 AAAI Tutorial.
TP14 - Local features: detection and description Computer Vision, FCUP, 2014 Miguel Coimbra Slides by Prof. Kristen Grauman.
MASKS © 2004 Invitation to 3D vision Lecture 7 Step-by-Step Model Buidling.
Object Recognition using Invariant Local Features Applications l Mobile robots, driver assistance l Cell phone location or object recognition l Panoramas,
Uncertainty Representation. Gaussian Distribution variance Standard deviation.
Image alignment Image from
Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.
3D Computer Vision and Video Computing Review Midterm Review CSC I6716 Spring 2011 Prof. Zhigang Zhu
1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.
A Study of Approaches for Object Recognition
Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.
Midterm review: Cameras Pinhole cameras Vanishing points, horizon line Perspective projection equation, weak perspective Lenses Human eye Sample question:
Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2005 with a lot of slides stolen from Steve Seitz and.
Distinctive Image Feature from Scale-Invariant KeyPoints
Distinctive image features from scale-invariant keypoints. David G. Lowe, Int. Journal of Computer Vision, 60, 2 (2004), pp Presented by: Shalomi.
Fitting a Model to Data Reading: 15.1,
May 2004Stereo1 Introduction to Computer Vision CS / ECE 181B Tuesday, May 11, 2004  Multiple view geometry and stereo  Handout #6 available (check with.
Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2006 with a lot of slides stolen from Steve Seitz and.
CSCE 641 Computer Graphics: Image-based Modeling (Cont.) Jinxiang Chai.
Project 4 Results Representation – SIFT and HoG are popular and successful. Data – Hugely varying results from hard mining. Learning – Non-linear classifier.
Instance-level recognition I. - Camera geometry and image alignment Josef Sivic INRIA, WILLOW, ENS/INRIA/CNRS UMR 8548 Laboratoire.
Computer Vision: Summary and Discussion Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/30/2015.
Computer vision: models, learning and inference
CSE 185 Introduction to Computer Vision
Computer Vision: Summary and Discussion
Distinctive Image Features from Scale-Invariant Keypoints By David G. Lowe, University of British Columbia Presented by: Tim Havinga, Joël van Neerbos.
Computer vision.
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 35 – Review for midterm.
Final Exam Review CS485/685 Computer Vision Prof. Bebis.
1 Interest Operators Harris Corner Detector: the first and most basic interest operator Kadir Entropy Detector and its use in object recognition SIFT interest.
Object Tracking/Recognition using Invariant Local Features Applications l Mobile robots, driver assistance l Cell phone location or object recognition.
Computer Vision: Summary and Discussion Computer Vision ECE 5554 Virginia Tech Devi Parikh 11/21/
CSCE 643 Computer Vision: Structure from Motion
Computer Vision: Summary and Discussion Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 05/03/2011.
Computer Vision 776 Jan-Michael Frahm 12/05/2011 Many slides from Derek Hoiem, James Hays.
Lec 22: Stereo CS4670 / 5670: Computer Vision Kavita Bala.
Geometric Transformations
Features, Feature descriptors, Matching Jana Kosecka George Mason University.
CSE 185 Introduction to Computer Vision Feature Matching.
Project 3 questions? Interest Points and Instance Recognition Computer Vision CS 143, Brown James Hays 10/21/11 Many slides from Kristen Grauman and.
Local features: detection and description
Lecture 9 Feature Extraction and Motion Estimation Slides by: Michael Black Clark F. Olson Jean Ponce.
John Morris Stereo Vision (continued) Iolanthe returns to the Waitemata Harbour.
776 Computer Vision Jan-Michael Frahm Spring 2012.
Robotics Chapter 6 – Machine Vision Dr. Amit Goradia.
Robust Estimation Course web page: vision.cis.udel.edu/~cv April 23, 2003  Lecture 25.
Announcements Final is Thursday, March 18, 10:30-12:20 –MGH 287 Sample final out today.
1 Review and Summary We have covered a LOT of material, spending more time and more detail on 2D image segmentation and analysis, but hopefully giving.
MASKS © 2004 Invitation to 3D vision Lecture 3 Image Primitives andCorrespondence.
Correspondence and Stereopsis. Introduction Disparity – Informally: difference between two pictures – Allows us to gain a strong sense of depth Stereopsis.
Invariant Local Features Image content is transformed into local feature coordinates that are invariant to translation, rotation, scale, and other imaging.
776 Computer Vision Jan-Michael Frahm Spring 2012.
Computer Vision: Summary and Discussion Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 05/01/2012.
CS262: Computer Vision Lect 09: SIFT Descriptors
Summary and Discussion
TP12 - Local features: detection and description
Announcements Final is Thursday, March 20, 10:30-12:20pm
Computer Vision: Summary and Discussion
Paper Presentation: Shape and Matching
Feature description and matching
Common Classification Tasks
Brief Review of Recognition + Context
Filtering Things to take away from this lecture An image as a function
Announcements Final is Thursday, March 16, 10:30-12:20
CSE 185 Introduction to Computer Vision
Feature descriptors and matching
Filtering An image as a function Digital vs. continuous images
Presented by Xu Miao April 20, 2005
Presentation transcript:

Computer Vision: Summary and Anti-summary Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 05/04/2010

Today’s class Administrative stuff – HW 4 due – Posters next Tues + reports due – Feedback and Evaluation (end of class) Review of important concepts Anti-summary: important concepts that were not covered Wide open problems

Review Geometry Matching Grouping Categorization

Geometry (on board) Projection matrix P = K [R t] relates 2d point x and 3d point X in homogeneous coordinates: x = P X Parallel lines in 3D converge at the vanishing point in the image – A 3D plane has a vanishing line in the image In two views, points that correspond to the same 3D point are related by the fundamental matrix: x’ T F x = 0

Matching Does this patch match that patch? – In two simultaneous views? (stereo) – In two successive frames? (tracking, flow, SFM) – In two pictures of the same object? (recognition) ? ?

Matching Representation: be invariant/robust to expected deformations but nothing else Change in viewpoint – Rotation invariance: rotate and/or affine warp patch according to dominant orientations Change in lighting or camera gain – Average intensity invariance: oriented gradient-based matching – Contrast invariance: normalize gradients by magnitude Small translations – Translation robustness: histograms over small regions But can one representation do all of this? SIFT: local normalized histograms of oriented gradients provides robustness to in-plane orientation, lighting, contrast, translation HOG: like SIFT but does not rotate to dominant orientation

Matching Search: efficiently localize matching patches Interest points: find repeatable, distinctive points – Long-range matching: e.g., wide baseline stereo, panoramas, object instance recognition – Harris: points with strong gradients in orthogonal directions (e.g., corners) are precisely repeatable in x-y – Difference of Gaussian: points with peak response in Laplacian image pyramid are somewhat repeatable in x-y-scale Local search – Short range matching: e.g., tracking, optical flow – Gradient descent on patch SSD, often with image pyramid Windowed search – Long-range matching: e.g., recognition, stereo w/ scanline

Matching Registration: match sets of points that satisfy deformation constraints Geometric transformation (e.g., affine) – Least squares fit (SVD), if all matches can be trusted – Hough transform: each potential match votes for a range of parameters Works well if there are very few parameters (3-4) – RANSAC: repeatedly sample potential matches, compute parameters, and check for inliers Works well if fraction of inliers is high and few parameters (4-8) Other cases – One-to-one correspondence (Hungarian algorithm) – Small local deformation of ordered points B1B1 B2B2 B3B3 A1A1 A2A2 A3A3

Grouping Clustering: group items (patches, pixels, lines, etc.) that have similar appearance – Discretize continuous values; typically, represent points within cluster by center – Improve efficiency: e.g., cluster interest points before recognition – Enable counting: histograms of interest points, color, texture Segmentation: group pixels into regions of coherent color, texture, motion, and/or label – Mean-shift clustering – Watershed – Graph-based segmentation: e.g., MRF and graph cuts EM, mixture models: probabilistically group items that are likely to be drawn from the same distribution, while estimating the distributions’ parameters

Categorization Match objects, parts, or scenes that may vary in appearance Categories are typically defined by human and may be related by function, cost, or other non-visual attributes Naïve matching or clustering approach will usually fail – Elements within category often do not have obvious visual similarities – Possible deformations are not easily defined Typically involves example-based machine learning approach Training Labels Training Images Classifier Training Image Features Trained Classifier

Categorization Representation: ideally should be compact, comprehensive, direct Histograms of quantized interest points (SIFT, HOG), color, texture – Typical for image or region categorization – Degree of spatial invariance is controllable by using spatial pyramids HOG features at specified position – Often used for finding parts or objects

Object Categorization Sliding Window Detector May work well for rigid objects Combines window-based matching search with feature-based classifier Object or Background?

Object Categorization Parts-based model Defined by models of part appearance, geometry or spatial layout, and search algorithm May work better for articulated objects

Vision as part of an intelligent system 3D Scene Feature Extraction Interpretation Action TextureColor Optical Flow Stereo Disparity Grouping Surfaces Bits of objects Sense of depth Objects Agents and goals Shapes and properties Open paths Words Walk, touch, contemplate, smile, evade, read on, pick up, … Motion patterns

Anti-summary Summary of things not covered Term “anti-summary” from Murphy’s book “Big Book of Concepts”

Context Biederman’s Relations among Objects in a Well-Formed Scene (1981): – Support – Size – Position – Interposition – Likelihood of Appearance Hock, Romanski, Galie, & Williams 1978

Machine learning Probabilistic approaches – Logistic regression – Bayesian network (special case: tree-shaped graph) Support vector machines Energy-based approaches – Conditional random fields (special case: lattice) – Graph cuts and belief propagation (BP-TRW) Further reading: Learning from Data: Concepts, Theory, and Methods by Cherkassky and Muliel (2007): I have not read this, but reviews say it is good for SVM and statistical learning)Learning from Data: Concepts, Theory, and Methods Machine Learning by Tom Mitchell (1997): Good but somewhat outdated introduction to learning Heckerman’s tutorial on learning with Bayesian Networks (1995)Heckerman’s tutorial on learning with Bayesian Networks

3D Object Models Aspect graphs (see Forsyth and Ponce) Model 3D spatial configuration of parts – E.g., see recent work by Silvio Savarese and othersSilvio Savarese

Action recognition Common method: compute SIFT + optical flow on image, classify Starting to move beyond this Tennis serveVolleyball smash

Geometry Beyond pinhole cameras – Special cameras, radial distortion Geometry from 3 or more views – Trifocal tensors Much of this covered in Forsyth and Ponce or Hartley and Zisserman

Other sensors Infrared, LIDAR, structured light

Physics-based vision Models of atmospheric effects, reflection, partial transparency – Depth from fog – Removing rain from photos – Rendering – See Shree Nayar’s projects:

Wide-open problems Computer vision is potentially worth billions per year, but there are major challenges to overcome first. Major $$$ Driver assistance (MobileEye received >$100M in funding from Goldman Sachs) Entertainment (MS Natal and others) Robot workers

Wide-open problems How do you represent actions and activities? – Probably involves motion, pose, objects, and goals – Compositional

Wide-open problems How to build vision systems that manage the visual world’s complexity? – Potentially thousands of categories but it depends how you think about it

Wide-open problems How should we adjust vision systems to solve particular tasks? – How do we know what is important?

Wide-open problems Can we build a “core” vision system that can easily be extended to perform new tasks or even learn on its own? – What kind of representations might allow this? – What should be built in and what should be learned?

See you next week! Projects on Tuesday – Pizza provided – Posters: 24” wide x 32” tall Online feedback forms

Wide-open problems How do you represent actions and activities? – Usually involve motion and objects – Compositional – Probably involves motion fields + pose estimation + object recognition + goal reasoning How to build vision systems that manage the visual world’s complexity? – Potentially thousands of categories but it depends how you think about it How should we adjust vision systems to solve particular tasks? – How do we know what is important? Can we build a “core” vision system that can easily be extended to perform new tasks or even learn on its own? – What should be built in and what should be learned? – What kind of representations might allow this?

Anti-summary outline Context – (well-formed scene) Physics-based vision Machine learning – Probabilistic approaches Logistic regression Bayesian network (special case: tree-shaped graph) – Support vector machines – Energy-based approaches Conditional random fields (special case: lattice) Graph cuts and belief propagation (BP-TRW) Action recognition – Current methods: bag of SIFT + optical flow Geometry – Beyond perspective projection (non-pinhole cameras, radial distortion) – Tri-focal tensors Object recognition – Aspect graphs (how does an object’s appearance change as you move around it?) – 3D object models Other sensors: laser range finders, infrared, structured light

The Vision Process Scene  two eyes  extract low-level representation: {color, edges/bars/blobs, motion, stereo}  interpretation: {3D surfaces, physical properties, distance, objects of interest, reading, reasoning about agents, adjusting actions}

Summary outline Geometry – Single-view (projection matrix) – Multiview (epipolar geometry) – Vanishing points + SFM + stereo Matching – Image filtering – Interest points – RANSAC and Hough Tracking – Repeated detection + simple motion model (e.g., doesn’t move far, constant velocity) Categorization – HOG + histograms – Matching and modeling