Summary and Discussion

Slides:



Advertisements
Similar presentations
Feature extraction: Corners
Advertisements

Summary of Friday A homography transforms one 3d plane to another 3d plane, under perspective projections. Those planes can be camera imaging planes or.
TP14 - Local features: detection and description Computer Vision, FCUP, 2014 Miguel Coimbra Slides by Prof. Kristen Grauman.
Object Recognition using Invariant Local Features Applications l Mobile robots, driver assistance l Cell phone location or object recognition l Panoramas,
Object Recognition & Model Based Tracking © Danica Kragic Tracking system.
Matching with Invariant Features
Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.
Feature extraction: Corners 9300 Harris Corners Pkwy, Charlotte, NC.
Advanced Computer Vision Introduction Goal and objectives To introduce the fundamental problems of computer vision. To introduce the main concepts and.
1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.
A Study of Approaches for Object Recognition
Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.
Feature matching and tracking Class 5 Read Section 4.1 of course notes Read Shi and Tomasi’s paper on.
Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2005 with a lot of slides stolen from Steve Seitz and.
Feature extraction: Corners and blobs
1 Invariant Local Feature for Object Recognition Presented by Wyman 2/05/2006.
Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2006 with a lot of slides stolen from Steve Seitz and.
Computer Vision: Summary and Discussion Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/30/2015.
Computer vision: models, learning and inference
CSE 185 Introduction to Computer Vision
Computer Vision: Summary and Discussion
Distinctive Image Features from Scale-Invariant Keypoints By David G. Lowe, University of British Columbia Presented by: Tim Havinga, Joël van Neerbos.
Computer vision.
Final Exam Review CS485/685 Computer Vision Prof. Bebis.
Computer Vision: Summary and Anti-summary Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 05/04/2010.
Computer Vision: Summary and Discussion Computer Vision ECE 5554 Virginia Tech Devi Parikh 11/21/
CSCE 643 Computer Vision: Structure from Motion
Computer Vision Why study Computer Vision? Images and movies are everywhere Fast-growing collection of useful applications –building representations.
Computer Science Department Pacific University Artificial Intelligence -- Computer Vision.
Computer Vision: Summary and Discussion Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 05/03/2011.
Computer Vision 776 Jan-Michael Frahm 12/05/2011 Many slides from Derek Hoiem, James Hays.
Feature extraction: Corners 9300 Harris Corners Pkwy, Charlotte, NC.
Lecture 7: Features Part 2 CS4670/5670: Computer Vision Noah Snavely.
Feature extraction: Corners and blobs. Why extract features? Motivation: panorama stitching We have two images – how do we combine them?
CSE 185 Introduction to Computer Vision Feature Matching.
Project 3 questions? Interest Points and Instance Recognition Computer Vision CS 143, Brown James Hays 10/21/11 Many slides from Kristen Grauman and.
Local features: detection and description
Lecture 9 Feature Extraction and Motion Estimation Slides by: Michael Black Clark F. Olson Jean Ponce.
CS654: Digital Image Analysis
776 Computer Vision Jan-Michael Frahm Spring 2012.
Robotics Chapter 6 – Machine Vision Dr. Amit Goradia.
Announcements Final is Thursday, March 18, 10:30-12:20 –MGH 287 Sample final out today.
1 Review and Summary We have covered a LOT of material, spending more time and more detail on 2D image segmentation and analysis, but hopefully giving.
Invariant Local Features Image content is transformed into local feature coordinates that are invariant to translation, rotation, scale, and other imaging.
776 Computer Vision Jan-Michael Frahm Spring 2012.
Computer Vision: Summary and Discussion Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 05/01/2012.
Another Example: Circle Detection
Presented by David Lee 3/20/2006
Project 1: hybrid images
TP12 - Local features: detection and description
Announcements Final is Thursday, March 20, 10:30-12:20pm
Computer Vision: Summary and Discussion
Local features: detection and description May 11th, 2017
Paper Presentation: Shape and Matching
Feature description and matching
Common Classification Tasks
Features Readings All is Vanity, by C. Allan Gilbert,
CSE 455 – Guest Lectures 3 lectures Contact Interest points 1
Brief Review of Recognition + Context
The SIFT (Scale Invariant Feature Transform) Detector and Descriptor
Filtering Things to take away from this lecture An image as a function
KFC: Keypoints, Features and Correspondences
Announcements Final is Thursday, March 16, 10:30-12:20
SIFT keypoint detection
CSE 185 Introduction to Computer Vision
Lecture VI: Corner and Blob Detection
Computational Photography
Feature descriptors and matching
Filtering An image as a function Digital vs. continuous images
Presented by Xu Miao April 20, 2005
Presentation transcript:

Summary and Discussion Computer Vision Jia-Bin Huang, Virginia Tech Many slides from D. Hoiem

Administrative stuffs Final project Poster (40%) Presentation (70%) Summary (30%) Webpage report (50%) Final exam Module 1 – module 5 Simple concept questions (18 questions) One letter size cheat sheet allowed (2 side) SPOT evaluation due Dec 8th

Today’s class Review of important concepts Some important open problems Feedback and course evaluation Tie together concepts that were disjoint in class. Identify some major research problems.

Fundamentals of Computer Vision Light What an image records Geometry How to relate world coordinates and image coordinates Matching How to measure the similarity of two regions Alignment How to align points/patches How to recover transformation parameters based on matched points Grouping What points/regions/lines belong together? Categorization What similarities are important?

Light and Color Shading of diffuse materials depends on albedo and orientation wrt light Gradients are a major cue for changes in orientation (shape) Many materials have a specular component that directly reflects light Reflected color depends on albedo and light color RGB is default color space, but sometimes others (e.g., HSV, L*a*b) are more useful Image from Koenderink

Geometry x = K [R t] X Maps 3d point X to 2d point x Rotation R and translation t map into 3D camera coordinates Intrinsic matrix K projects from 3D to 2D Parallel lines in 3D converge at the vanishing point in the image A 3D plane has a vanishing line in the image x’T F x = 0 Points in two views that correspond to the same 3D point are related by the fundamental matrix F

Matching Does this patch match that patch? ? ? In two simultaneous views? (stereo) In two successive frames? (tracking, flow, SFM) In two pictures of the same object? (recognition) Add picture here ? ?

Matching Representation: be invariant/robust to expected deformations but nothing else Assume that shape does not change Key cue: local differences in shading (e.g., gradients) Change in viewpoint Rotation invariance: rotate and/or affine warp patch according to dominant orientations Change in lighting or camera gain Average intensity invariance: oriented gradient-based matching Contrast invariance: normalize gradients by magnitude Small translations Translation robustness: histograms over small regions But can one representation do all of this? SIFT: local normalized histograms of oriented gradients provides robustness to in-plane orientation, lighting, contrast, translation HOG: like SIFT but does not rotate to dominant orientation

Alignment of points Search: efficiently align matching patches Interest points: find repeatable, distinctive points Long-range matching: e.g., wide baseline stereo, panoramas, object instance recognition Harris: points with strong gradients in orthogonal directions (e.g., corners) are precisely repeatable in x-y Difference of Gaussian: points with peak response in Laplacian image pyramid are somewhat repeatable in x-y- scale Local search Short range matching: e.g., tracking, optical flow Gradient descent on patch SSD, often with image pyramid Windowed search Long-range matching: e.g., recognition, stereo w/ scanline

Alignment of sets Find transformation to align matching sets of points Geometric transformation (e.g., affine) Least squares fit (SVD), if all matches can be trusted Hough transform: each potential match votes for a range of parameters Works well if there are very few parameters (3-4) RANSAC: repeatedly sample potential matches, compute parameters, and check for inliers Works well if fraction of inliers is high and few parameters (4-8) Other cases Thin plate spline for more general distortions One-to-one correspondence (Hungarian algorithm) B1 B2 B3 A1 A2 A3

Grouping Clustering: group items (patches, pixels, lines, etc.) that have similar appearance Uses: discretize continuous values; improve efficiency; summarize data Algorithms: k-means, agglomerative Segmentation: group pixels into regions of coherent color, texture, motion, and/or label Mean-shift clustering Watershed Graph-based segmentation: e.g., MRF and graph cuts EM, mixture models: probabilistically group items that are likely to be drawn from the same distribution, while estimating the distributions’ parameters

Categorization Match objects, parts, or scenes that may vary in appearance Categories are typically defined by human and may be related by function, cost, or other non-visual attributes Key problem: what are important similarities? Can be learned from training examples Training Labels Training Images Classifier Training Image Features Trained Classifier

Categorization Representation: ideally should be compact, comprehensive, direct Histograms of quantized interest points (SIFT, HOG), color, texture Typical for image or region categorization Degree of spatial encoding is controllable by using spatial pyramids HOG features at specified position Often used for finding parts or objects

Object Categorization Search by Sliding Window Detector May work well for rigid objects Key idea: simple alignment for simple deformations Object or Background?

Object Categorization Search by Parts-based model Key idea: more flexible alignment for articulated objects Defined by models of part appearance, geometry or spatial layout, and search algorithm

Vision as part of an intelligent system 3D Scene Feature Extraction Texture Color Optical Flow Stereo Disparity Surfaces Bits of objects Sense of depth Motion patterns Grouping Objects Agents and goals Shapes and properties Open paths Words Interpretation Walk, touch, contemplate, smile, evade, read on, pick up, … Action

(interpretation/tasks) Well-Established (patch matching) Major Progress (pattern matching++) New Opportunities (interpretation/tasks) Face Detection/Recognition Category Detection Entailment/Prediction Key research questions are evolving from recognizing local patterns to broader scene interpretation. Some problems in vision are well-established – useful systems have existed for ten years or more, such as face detection, object tracking, and 3D reconstruction from multiple images. These involve feature extraction, local patch matching, and template matching. In the last ten year, the research focus has been on problems that require advanced signal processing but also models of individual components of the world, such as object category detection, human pose estimation, and recovering 3D scene layout from limited views. In these problems, we need to think about how to model an object category or the human body or physical surfaces in a scene. The progress in these problems, which I’ll describe, creates new opportunities to tackle problems that require broad and complex models of the world, going beyond individual objects and surfaces to reason about narratives and intention. Object Tracking / Flow Human Pose Life-long Learning Multi-view Geometry 3D Scene Layout Vision for Robots

Scene Understanding =. Objects + People + Layout + Scene Understanding = Objects + People + Layout + Interpretation within Task Context

What do I see.  Why is this happening. What is important What do I see?  Why is this happening? What is important? What will I see? How can we learn about the world through vision? How do we create/evaluate vision systems that adapt to useful tasks?

Important open problems How can we interpret vision given structured plans of a scene?

Important open problems Algorithms: works pretty well  perfect E.g., stereo: top of wish list from Pixar guy Micheal Kass Good directions: Incorporate higher level knowledge

Important open problems Spatial understanding: what is it doing? Or how do I do it? Important questions: What are good representations of space for navigation and interaction? What kind of details are important? How can we combine single-image cues with multi-view cues?

Important open problems Object representation: what is it? Important questions: How can we pose recognition so that it lets us deal with new objects? What do we want to predict or infer, and to what extent does that rely on categorization? How do we transfer knowledge of one type of object to another?

Important open problems Can we build a “core” vision system that can easily be extended to perform new tasks or even learn on its own? What kind of representations might allow this? What should be built in and what should be learned?

Important open problems Vision for the masses Analyzing social effects of green space How to make vision systems that can quickly adapt to these thousands of visual tasks? Counting cells

Important problems Learning features and intermediate representations that transfer to new tasks Figure from http://www.datarobot.com/blog/a-primer-on-deep-learning/

Important open problems Almost everything is still unsolved! Robust 3D shape from multiple images Recognize objects (only faces and maybe cars is really solved, thanks to tons of data) Caption images/video Predict intention Object segmentation Count objects in an image Estimate pose Recognize actions ….

If you want to learn more… Read lots of papers: IJCV, PAMI, CVPR, ICCV, ECCV, NIPS Helpful topics for classes Classes in machine learning, pattern recognition, deep learning Statistics, graphical models Seminar-style paper-reading classes Just implement stuff, try demos, see what works

Thank you!