Download presentation
Presentation is loading. Please wait.
Published byKelly Newton Modified over 9 years ago
1
IIIT-B Computer Vision, Fall 2006 Lecture 1 Introduction to Computer Vision Arvind Lakshmikumar Technology Manager, Sarnoff Corporation Adjunct Faculty, IIIT-B
2
Course Overview Introduction to vision Case Studies of Applied Vision –Automotive Safety –Autonomous Navigation –Industrial Inspection –Medical Imaging –Entertainment Image Formation About Cameras Image Processing Geometric Vision Camera Motion Paper readings
3
Computer Graphics Image Output Model Synthetic Camera (slides courtesy of Michael Cohen)
4
Real Scene Computer Vision Real Cameras Model Output (slides courtesy of Michael Cohen)
5
Combined Model Real Scene Real Cameras Image Output Synthetic Camera (slides courtesy of Michael Cohen)
6
The Vision Problem How to infer salient properties of 3-D world from time-varying 2-D image projection ¤ What is salient? ¤ How to deal with loss of information going from 3-D to 2-D?
7
Why study Computer Vision? Images and movies are everywhere Fast-growing collection of useful applications –building representations of the 3D world from pictures –automated surveillance (who’s doing what) –movie post-processing –face finding Various deep and attractive scientific mysteries –how does object recognition work? Greater understanding of human vision
8
Properties of Vision One can “see the future” –Cricketers avoid being hit in the head There’s a reflex --- when the right eye sees something going left, and the left eye sees something going right, move your head fast. –Gannets pull their wings back at the last moment Gannets are diving birds; they must steer with their wings, but wings break unless pulled back at the moment of contact. Area of target over rate of change of area gives time to contact.
9
Properties of Vision 3D representations are easily constructed –There are many different cues. –Useful to humans (avoid bumping into things; planning a grasp; etc.) in computer vision (build models for movies). –Cues include multiple views (motion, stereopsis) texture shading
10
Properties of Vision People draw distinctions between what is seen –“Object recognition” –This could mean “is this a fish or a bicycle?” –It could mean “is this George Washington?” –It could mean “is this someone I know?” –It could mean “is this poisonous or not?” –It could mean “is this slippery or not?” –It could mean “will this support my weight?” –Great mystery How to build programs that can draw useful distinctions based on image properties.
11
Part I: The Physics of Imaging How images are formed –Cameras What a camera does How to tell where the camera was –Light How to measure light What light does at surfaces How the brightness values we see in cameras are determined –Color The underlying mechanisms of color How to describe it and measure it
12
Part II: Early Vision in One Image Representing small patches of image –For three reasons We wish to establish correspondence between (say) points in different images, so we need to describe the neighborhood of the points Sharp changes are important in practice --- known as “edges” Representing texture by giving some statistics of the different kinds of small patch present in the texture. –Tigers have lots of bars, few spots –Leopards are the other way
13
Representing an image patch Filter outputs –essentially form a dot-product between a pattern and an image, while shifting the pattern across the image –strong response -> image locally looks like the pattern –e.g. derivatives measured by filtering with a kernel that looks like a big derivative (bright bar next to dark bar)
14
Convolve this image With this kernel To get this
15
Texture Many objects are distinguished by their texture –Tigers, cheetahs, grass, trees We represent texture with statistics of filter outputs –For tigers, bar filters at a coarse scale respond strongly –For cheetahs, spots at the same scale –For grass, long narrow bars –For the leaves of trees, extended spots Objects with different textures can be segmented The variation in textures is a cue to shape
16
Part III: Early Vision in Multiple Images The geometry of multiple views –Where could it appear in camera 2 (3, etc.) given it was here in 1 (1 and 2, etc.)? Stereopsis –What we know about the world from having 2 eyes Structure from motion –What we know about the world from having many eyes or, more commonly, our eyes moving.
17
Part IV: Mid-Level Vision Finding coherent structure so as to break the image or movie into big units –Segmentation: Breaking images and videos into useful pieces E.g. finding video sequences that correspond to one shot E.g. finding image components that are coherent in internal appearance –Tracking: Keeping track of a moving object through a long sequence of views
18
Part V: High Level Vision (Geometry) The relations between object geometry and image geometry –Model based vision find the position and orientation of known objects –Smooth surfaces and outlines how the outline of a curved object is formed, and what it looks like –Aspect graphs how the outline of a curved object moves around as you view it from different directions –Range data
19
Part VI: High Level Vision (Probabilistic) Using classifiers and probability to recognize objects –Templates and classifiers how to find objects that look the same from view to view with a classifier –Relations break up objects into big, simple parts, find the parts with a classifier, and then reason about the relationships between the parts to find the object. –Geometric templates from spatial relations extend this trick so that templates are formed from relations between much smaller parts
20
Applications: Factory Inspection Cognex’s “CapInspect” system: Low-level image analysis: Identify edges, regions Mid-level: Distinguish “cap” from “no cap” Estimation: What are orientation of cap, height of liquid?
21
Applications: Face Detection courtesy of H. Rowley How is this like the bottle problem on the previous slide?
22
Applications: Text Detection & Recognition from J. Zhang et al. Similar to face finding: Where is the text and what does it say? Viewing at an angle complicates things...
23
Applications: MRI Interpretation Coronal slice of brainSegmented white matter from W. Wells et al.
24
Detection and Recognition: How? Build models of the appearance characteristics (color, texture, etc.) of all objects of interest Detection: Look for areas of image with sufficiently similar appearance to a particular object Recognition: Decide which of several objects is most similar to what we see Segmentation: “Recognize” every pixel
25
Applications: Football First-Down Line courtesy of Sportvision
26
Applications: Virtual Advertising courtesy of Princeton Video Image
27
First-Down Line, Virtual Advertising: How? Where should message go? –Sensors that measure pan, tilt, zoom and focus are attached to calibrated cameras at surveyed positions –Knowledge of the 3-D position of the line, advertising rectangle, etc. can be directly translated into where in the image it should appear for a given camera What pixels get painted? –Occluding image objects like the ball, players, etc. where the graphic is to be put must be segmented out. These are recognized by being a sufficiently different color from the background at that point. This allows pixel-by-pixel compositing.
28
Applications: Inserting Computer Graphics with a Moving Camera How does motion complicate things? Opening titles from the movie “Panic Room”
29
Applications: Inserting Computer Graphics with a Moving Camera courtesy of 2d3
30
CG Insertion with a Moving Camera: How? This technique is often called matchmove Once again, we need camera calibration, but also information on how the camera is moving—its egomotion. This allows the CG object to correctly move with the real scene, even if we don’t know the 3-D parameters of that scene. Estimating camera motion: –Much simpler if we know camera is moving sideways (e.g., some of the “Panic Room” shots), because then the problem is only 2-D –For general motions: By identifying and following scene features over the entire length of the shot, we can solve retrospectively for what 3-D camera motion would be consistent with their 2-D image tracks. Must also make sure to ignore independently moving objects like cars and people.
31
Applications: Rotoscoping 2d3’s Pixeldust
32
Applications: Motion Capture Vicon software: 12 cameras, 41 markers for body capture; 6 zoom cameras, 30 markers for face
33
Applications: Motion Capture without Markers courtesy of C. Bregler What’s the difference between these two problems?
34
Motion Capture: How? Similar to matchmove in that we follow features and estimate underlying motion that explains their tracks Difference is that the motion is not of the camera but rather of the subject (though camera could be moving, too) –Face/arm/person has more degrees of freedom than camera flying through space, but still constrained Special markers make feature identification and tracking considerably easier Multiple cameras gather more information
35
Applications: Image-Based Modeling courtesy of P. Debevec Façade project: UC Berkeley Campanile
36
Image-Based Modeling: How? 3-D model constructed from manually- selected line correspondences in images from multiple calibrated cameras Novel views generated by texture- mapping selected images onto model
37
Applications: Robotics Autonomous driving: Lane & vehicle tracking (with radar)
38
Why is Vision Interesting? Psychology –~ 50% of cerebral cortex is for vision. –Vision is how we experience the world. Engineering –Want machines to interact with world. –Digital images are everywhere.
39
Vision is inferential: Light (http://www-bcs.mit.edu/people/adelson/checkershadow_illusion.html)
40
Vision is inferential: Light (http://www-bcs.mit.edu/people/adelson/checkershadow_illusion.html)
41
Vision is Inferential: Geometry
42
Computer Vision Inference Computation Building machines that see Modeling biological perception
43
Boundary Detection: Local cues
47
Boundary Detection http://www.robots.ox.ac.uk/~vdg/dynamics.html
48
Boundary Detection Finding the Corpus Callosum (G. Hamarneh, T. McInerney, D. Terzopoulos)
49
(Sharon, Balun, Brandt, Basri)
51
Texture Photo Pattern Repeated
52
Texture Computer Generated Photo
53
Tracking (Comaniciu and Meer)
54
Understanding Action
55
Tracking and Understanding (www.brickstream.com)
56
Tracking
60
Stereo http://www.magiceye.com/
61
Stereo http://www.magiceye.com/
62
Motion Courtesy Yiannis Aloimonos
63
Motion - Application (www.realviz.com)
64
Pose Determination Visually guided surgery
65
Recognition - Shading Lighting affects appearance
66
Classification (Funkhauser, Min, Kazhdan, Chen, Halderman, Dobkin, Jacobs)
67
Viola and Jones: Real time Face Detection
68
Vision depends on: Geometry Physics The nature of objects in the world (This is the hardest part).
69
Modeling + Algorithms Build a simple model of the world (eg., flat, uniform intensity). Find provably good algorithms. Experiment on real world. Update model. Problem: Too often models are simplistic or intractable.
70
Bayesian inference Bayes law: P(A|B) = P(B|A)*P(A)/P(B). P(world|image) = P(image|world)*P(world)/P(image) P(image|world) is computer graphics –Geometry of projection. –Physics of light and reflection. P(world) means modeling objects in world. Leads to statistical/learning approaches. Problem: Too often probabilities can’t be known and are invented.
71
Related Fields Graphics. “Vision is inverse graphics”. Visual perception. Neuroscience. AI Learning Math: eg., geometry, stochastic processes. Optimization.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.