Download presentation
Presentation is loading. Please wait.
Published byBrian Chesson Modified over 9 years ago
1
Image Understanding Roxanne Canosa, Ph.D.
2
Introduction Computer vision Give machines the ability to see The goal is to duplicate the effect of human visual processing We live in a 3-D world, but camera sensors can only capture 2-D information. Flip side of computer graphics?
3
Introduction Computer vision is composed of: Image processing Image analysis Image understanding
4
Introduction Image processing The goal is to present the image to the system in a useful form image capture and early processing remove noise detect luminance differences detect edges enhance image
5
Introduction Image analysis The goal is to extract useful information from the processed image identify boundaries find connected components label regions segment parts of objects group parts together into whole objects
6
Introduction Image understanding The goal is to make sense of the information. Draw qualitative, or semantic, conclusions from the quantitative information. make a decision about the quantitative information classify the parts recognize objects understand the objects’ usage and the meaning of the scene
7
Introduction Computer vision uses techniques and methods from: electronics - sensor technology mathematics - statistics and differential calculus spatial pattern recognition artificial intelligence psychophysics
8
Low-level Representations Low-level: little knowledge about the content of the image The data that is manipulated usually resembles the input image. For example, if the image is captured using a CCD camera (2-D), the representation can be described by an image function whose value is brightness depending on 2 parameters: the x-y coordinates of the location of the brightness value.
9
Low-Level Mechanisms Low-level vision only takes us to the sophistication of a very expensive digital camera
10
High-level Representations High-level: extract meaningful information from the low-level representation. Image may be mapped to a formalized model of the world (model may change dynamically as new information becomes available) Data to be processed is dramatically reduced: instead of dealing with pixel values, deal with features such as shape, size, relationships, etc Usually expressed in symbolic form
11
High-Level Mechanisms High-level vision and perception requires brain functions that we do not fully understand yet
12
Bottom-up vs. Top-down Bottom-up: processing is content-driven Top-down: processing is context-driven Goal: combine knowledge about content as well as context. Goals, plans, history, expectations Imitate human cognition and the ability to make decisions based on extracted information
13
Bottom-up v. Top-down Top-Down?Bottom-up? Information flow
14
Visual Completion: Top-down Control
15
Visual Completion: Top-down Control
16
Visual Completion: Top-down Control
17
Visual Completion: Top-down Control
18
Visual Completion: Top-down Control
19
Old Women or Young Girl? http://dragon.uml.edu/psych/woman.html
20
Expectation and Learning From Palmer (1999)
21
Zolner Illusion http://www.torinfo.com/illusion/illus-17.html Are the black and yellow lines parallel?
22
Visual Illusions Demos http://www.michaelbach.de/ot/index.html
23
The Human Visual System Optical information from the eyes is transmitted to the primary visual cortex in the occipital lobe at the back of the head.
24
The Human Visual System - 20 mm focal length lens - iris controls amount of light entering eye by changing the size of the pupil
25
The Human Visual System Light enters the eye through the cornea, aqueous humor, lens, and vitreous humor before striking the light-sensitive receptors of the retina. After striking the retina, light is converted into electrochemical signals that are carried to the brain via the optic nerve.
26
The Human Visual System image from www.photo.net/photo/edscott/vis00010.htm
27
Multi-Resolution Vision +
28
+ If you can read this you must be cheating
29
Multi-Resolution Vision From Palmer (1999) The distribution of rods and cones across the retina is highly uneven The fovea contains the highest concentration of cones for high visual acuity
30
Contrast Sensitivity 1 10100 Spatial frequency (cpd) Contrast sensitivity low high
31
Lateral Inhibition
33
A biological neural network in which neurons inhibit spatially neighboring neurons. Architecture of first few layers of retina. Input light level Layer n Layer n + 1 10 5 5 5 Output perception 3 3 2 7 6 6 10-2-2 = 10-2-1 = 5-2-1 = 5-1-1 = +1 -0.2 10 5
34
Simultaneous Contrast Two regions that have identical spectra result in different color (lightness) perceptions due to the spectra of the surrounding regions Background color can visibly affect the perceived color of the target
35
Simultaneous Contrast
36
Original Painting Task-Oriented Vision Free ViewingEstimate the economic level of the people Judge their ages Guess what they had been doing before the visitor’s arrival Remember the clothes worn by the people
37
Change Blindness Lack of attention to an object causes failure to perceive it People find it difficult to detect major changes in a scene if those changes occur in objects that are not the focus of attention Our impression that our visual capabilities give us a rich, complete, and detailed representation of the world around us is a grand illusion!
38
Change Blindness Demos http://www.usd.edu/psyc301/ChangeBlindness.htm http://viscog.beckman.uiuc.edu/djs_lab/demos.html
39
Modeling Attention How do we decide where to look next while performing a task? What factors influence our decision to look at something? Can we model visual behavior?
40
Modeling Attention - Saliency Maps Koch & Ullman (1985), Itti & Koch (2000), Parkhurst, Law, & Neibur (2002), Turano, Geruschat, & Baker (2003) Input Image Saliency Map
41
colorintensityorient center surrounds Input Image Saliency Map Computational Model of Saliency
42
Proto-object Map G0 G45G90 G135 XYZ transform rods LMSLMS A C1 C2 Intensity Map Input Image (RGB) Conspicuity Map Orientation Map Color Map Object Module Pre-processing Module Oriented Edge Module
43
Proto-object Map G0 G45G90 G135 XYZ transform rods Orientation Map LMSLMS A C1 C2 Intensity Map Input Image (RGB) Conspicuity Map Color Map Oriented Edge Module Object Module
44
Proto-object Map G0 G45G90 G135 XYZ transform rods LMSLMS A C1 C2 Intensity Map Input Image (RGB) Orientation Map Conspicuity Map Color Map Pre-processing Module Object Module
45
Proto-object Map G0 G45G90 G135 XYZ transform rods LMSLMS A C1 C2 Intensity Map Input Image (RGB) Orientation Map Conspicuity Map Color Map Oriented Edge Module Pre-processing Module
46
1 10 100 Spatial frequency (cpd) Contrast sensitivity low high weight Weight Output with Contrast Sensitivity Function CSF = 2.6 (0.0192 + 0.114f) e -(o.114f) ^ 1.1 - Manno and Sakrison (1974)
47
Input Image CIE Map Conspicuity Map (C_Map)
48
Verification of Model
49
Task Differences Free-view“Get supplies from the closet” “Work at the computer”“Make a photocopy”
50
External mirror - IR reflecting, visible passing Scene camera LASER Optics module, includes IR source and eye camera Head-tracking receiver Head-Mounted Eye-Tracker
51
Portable Eye-Tracker
53
Newell’s temporal hierarchy of brain organization: The Benefits of Eye-Tracking Cognition Working Memory Visual Routines Neural Operations 10 seconds 1 second 300 msec 80 msec
54
Verification of Model
55
Comparison of Models
56
Task Differences
57
Possible M.S. Projects Comparison of saliency map generation techniques Feature-based Graph-based Information theory-based Object detection from salient keypoints SIFT features Multi-resolution images
58
Possible M.S. Projects Multi-modal tumor classification PET, MRI, CT Solving visual CAPTCHAs Text-based Image-based http://gs264.sp.cs.cmu.edu/cgi-bin/esp-pix http://gs264.sp.cs.cmu.edu/cgi-bin/esp-pix
59
Possible M.S. Projects Using reasoning or logic for learning about the world from visual observations Bayes nets Reinforcement learning Inductive logic programming (ILP)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.