Computer Vision Dan Witzner Hansen Course web page: www.itu.dk/courses/MCV Email: witzner@itu.dk
What is Vision?
Today Introduction to the course Crash course in 2D and 3D geometry (Brush-up from High school) (Solving linear equations and least squares solution to linear equations)
The Vision Problem How to infer salient properties of 3-D world from time-varying 2-D image projection ¤ What is salient? ¤ How to deal with loss of information going from 3-D to 2-D? signal may be noisy, occluded, etc.
Computer Vision: Stages Image formation Low-level Single image processing Multiple views Mid-level Estimation, segmentation (main topic of Image Analysis and Foundations of Image Analysis and will only be covered briefly here) High-level Recognition Classification sort of in order from light being emitted to the brain understanding
Image Formation 3-D geometry Physics of light Camera properties Focal length Distortion Sampling issues Spatial Temporal
Low-level: Single Image Processing Filtering Edge Color Local pattern similarity Texture Appearance characterization from the statistics of applying multiple filters 3-D structure estimation from… Shading
Low-level: Multiple Views Stereo Structure from two views Structure from motion What can we learn in general from many views, whether they were taken simultaneously or sequentially?
Mid-Level: Estimation, Segmentation Estimation: Fitting parameters to data Static (e.g., shape) Dynamic (e.g., tracking) Segmentation/clustering Breaking an image or image sequence into a few meaningful pieces with internal similarity
High-level: Recognition, Classification Recognition: Finding and parametrizing a known object Classification Assignment to known categories using statistics/probability to make best choice
APPLICATIONS Course Overview Image formation and cameras Projective geometry Relating points between images Motion Motion analysis Object Tracking Shape and recognition Shape anaysis Object recognition APPLICATIONS
Applications: Factory Inspection Cognex’s “CapInspect” system: Low-level image analysis: Identify edges, regions Mid-level: Distinguish “cap” from “no cap” Estimation: What are orientation of cap, height of liquid?
Applications: Face Detection courtesy of H. Rowley How is this like the bottle problem on the previous slide? From http://www-2.cs.cmu.edu/~har/faces.html
Applications: Text Detection & Recognition from J. Zhang et al. Similar to face finding: Where is the text and what does it say? Viewing at an angle complicates things... From http://www.is.cs.cmu.edu/papers/speech/icmi02/icmi02_xilin.pdf
Detection and Recognition: How? Build models of the appearance characteristics (color, texture, etc.) of all objects of interest Detection: Look for areas of image with sufficiently similar appearance to a particular object Recognition: Decide which of several objects is most similar to what we see Segmentation: “Recognize” every pixel
Applications: Virtual Advertising courtesy of Princeton Video Image
First-Down Line, Virtual Advertising: How? Where should message go? Sensors that measure pan, tilt, zoom and focus are attached to calibrated cameras at surveyed positions Knowledge of the 3-D position of the line, advertising rectangle, etc. can be directly translated into where in the image it should appear for a given camera What pixels get painted? Occluding image objects like the ball, players, etc. where the graphic is to be put must be segmented out. These are recognized by being a sufficiently different color from the background at that point. This allows pixel-by-pixel compositing.
Applications: Inserting Computer Graphics with a Moving Camera See this page: http://www.reservocation.com/04_10_02/art_panic_room_04_10_02.html
CG Insertion with a Moving Camera: How? This technique is often called matchmove Once again, we need camera calibration, but also information on how the camera is moving—its egomotion. This allows the CG object to correctly move with the real scene, even if we don’t know the 3-D parameters of that scene. Estimating camera motion: Much simpler if we know camera is moving sideways because then the problem is only 2-D For general motions: By identifying and following scene features over the entire length of the shot, we can solve retrospectively for what 3-D camera motion would be consistent with their 2-D image tracks. Must also make sure to ignore independently moving objects like cars and people.
Applications: Motion Capture Vicon software: 12 cameras, 41 markers for body capture; 6 zoom cameras, 30 markers for face
Applications: Motion Capture without Markers courtesy of C. Bregler What’s the difference between these two problems?
Motion Capture: How? Similar to matchmove in that we follow features and estimate underlying motion that explains their tracks Difference is that the motion is not of the camera but rather of the subject (though camera could be moving, too) Face/arm/person has more degrees of freedom than camera flying through space, but still constrained Special markers make feature identification and tracking considerably easier Multiple cameras gather more information
Applications: Image-Based Modeling courtesy of P. Debevec Façade project: UC Berkeley Campanile
Image-Based Modeling: How? 3-D model constructed from manually-selected line correspondences in images from multiple calibrated cameras Novel views generated by texture-mapping selected images onto model
A Movie Movie
Applications: Robotics Autonomous driving: Lane & vehicle tracking (with radar)
Human Computer Interaction
What is the relationship between many of these applications? Knowledge of Cameras Motion and Tracking Shapes and object recognition Mathematics and Statistics
Course Prerequisites Background in/comfort with: Linear algebra Multi-variable calculus Statistics, probability Homeworks will use Matlab but you are also welcome to use C/C++ (harder though) An ability to program in C/C++, Java, or equivalent should be sufficient preparation, but knowing Matlab is better (no introduction given, but you can come see me if needed)
Grading 100 % on mandatory assignments Submission ON TIME
More specifically…..
Single View Examples
Mosaicing
Stereo
Stereo reconstruction
Tracking, Shape and HCI
After the course Understand, choose between, and apply various computer vision algorithms. Understand the relations between objects in the 3D world and those obtained from cameras. Understand the principles on how to make 3D models (reconstruction) from images. Write programs which are able to follow objects in pre-recorded movies or live images obtained from cameras in either Matlab or C++. Understand principles for making computer vision systems that aim towards enabling humans to interact with a computer through cameras.
Reading Material Textbooks: “Multiple View Geometry” Hartley and Zisserman ”Introductory Techniques for 3D Computer Vision”(less important) Supplemental readings will be available online as PDF files and a few as photocopies from books. Complete assigned reading before corresponding lecture and re-read difficult parts after the lecture. This is NOT an easy course so, expect at least 15 hrs WORK each week. Show up for ALL lectures.
Details Homework Submission at to me in by the end of the exercises. Expect to have it ready before the exercises, though! NO Lateness policy – Add-on’s will be exprected if late Exam Submission of mandatory assignments by the end of the semeste.
More Details Instructor E-mail: witzner@itu.dk Office hours (by appointment): Friday, 10:00-12:00 pm Remember that semester projects in connection with the course are possible.
Your First Assignment Try to get Matlab running Take a look at a Matlab primer Unfortunately most of the tools (mathematics) have to be developed in the beginning of the course and it may therefore seem quite mathematical. DON’T LET THAT DISCURAGE YOU
First try the web page: www.itu.dk/courses/MCV More questions? First try the web page: www.itu.dk/courses/MCV Feel free to e-mail me at any time
What is needed here?