Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective.

Slides:



Advertisements
Similar presentations
November 12, 2013Computer Vision Lecture 12: Texture 1Signature Another popular method of representing shape is called the signature. In order to compute.
Advertisements

Imaging Science FundamentalsChester F. Carlson Center for Imaging Science The Human Visual System Part 2: Perception.
COLORCOLOR A SET OF CODES GENERATED BY THE BRAİN How do you quantify? How do you use?
Color Image Processing
Computer Vision: CSE 803 A brief intro MSU/CSE Fall 2014.
1Ellen L. Walker Edges Humans easily understand “line drawings” as pictures.
MSU/CSE Fall Computer Vision: Imaging Devices Summary of several common imaging devices covered ( Chapter 2 of Shapiro and Stockman)
Digital Image Processing: Digital Imaging Fundamentals.
Stockman CSE/MSU Fall 2005 Models and Matching Methods of modeling objects and their environments; Methods of matching models to sensed data for recogniton.
EE663 Image Processing Edge Detection 2 Dr. Samir H. Abdul-Jauwad Electrical Engineering Department King Fahd University of Petroleum & Minerals.
MSU CSE 803 Stockman Fall 2009 Vectors [and more on masks] Vector space theory applies directly to several image processing/representation problems.
MSU CSE 803 Stockman Linear Operations Using Masks Masks are patterns used to define the weights used in averaging the neighbors of a pixel to compute.
1 Color Color Used heavily in human vision Used heavily in human vision Color is a pixel property, making some recognition problems easy Color is a pixel.
Stockman MSU/CSE Fall Computer Vision: Imaging Devices Summary of several common imaging devices covered in Chapter 2 of Shapiro and Stockman.
MSU CSE 803 Stockman1 CV: Perceiving 3D from 2D Many cues from 2D images enable interpretation of the structure of the 3D world producing them.
Segmentation (Section 10.2)
Stockman MSU Fall Computing Motion from Images Chapter 9 of S&S plus otherwork.
Ch 31 Sensation & Perception Ch. 3: Vision © Takashi Yamauchi (Dept. of Psychology, Texas A&M University) Main topics –convergence –Inhibition, lateral.
Stockman MSU/CSE Fall 2009 Computer Vision: CSE 803 A brief intro.
CSE 803 Stockman Fall Color Used heavily in human vision Color is a pixel property, making some recognition problems easy Visible spectrum for humans.
Depth of Field depth of fieldConversely, for a given film position, there is a range of distance at which all objects have acceptable images on the film.
CIS 681 Distributed Ray Tracing. CIS 681 Anti-Aliasing Graphics as signal processing –Scene description: continuous signal –Sample –digital representation.
1 MSU CSE 803 Fall 2014 Vectors [and more on masks] Vector space theory applies directly to several image processing/representation problems.
1 Perceiving 3D from 2D Images How can we derive 3D information from one or more 2D images? There have been 2 approaches: 1. intrinsic images: a 2D representation.
1B50 – Percepts and Concepts Daniel J Hulme. Outline Cognitive Vision –Why do we want computers to see? –Why can’t computers see? –Introducing percepts.
November 29, 2004AI: Chapter 24: Perception1 Artificial Intelligence Chapter 24: Perception Michael Scherger Department of Computer Science Kent State.
Recap Low Level Vision –Input: pixel values from the imaging device –Data structure: 2D array, homogeneous –Processing: 2D neighborhood operations Histogram.
SCCS 4761 Introduction What is Image Processing? Fundamental of Image Processing.
Neighborhood Operations
Visual Perception How We See Things. Visual Perception It is generally agreed that we have five senses o Vision o Hearing o Touch o Taste o Smell Of our.
CAP4730: Computational Structures in Computer Graphics 3D Concepts.
DO NOW: What do you know about our sense of sight and vision? What parts of the eye do you know? What do you know about light?
Visual Perception How We See Things.
Mr. Koch AP Psychology Forest Lake High School
Image Formation CS418 Computer Graphics John C. Hart.
Ch 31 Sensation & Perception Ch. 3: Vision © Takashi Yamauchi (Dept. of Psychology, Texas A&M University) Main topics –convergence –Inhibition, lateral.
Chapter 6 Section 2: Vision. What we See Stimulus is light –Visible light comes from sun, stars, light bulbs, & is reflected off objects –Travels in the.
1 Artificial Intelligence: Vision Stages of analysis Low level vision Surfaces and distance Object Matching.
CSE 185 Introduction to Computer Vision Stereo. Taken at the same time or sequential in time stereo vision structure from motion optical flow Multiple.
University of Kurdistan Digital Image Processing (DIP) Lecturer: Kaveh Mollazade, Ph.D. Department of Biosystems Engineering, Faculty of Agriculture,
(c) 2000, 2001 SNU CSE Biointelligence Lab Finding Region Another method for processing image  to find “regions” Finding regions  Finding outlines.
1Ellen L. Walker 3D Vision Why? The world is 3D Not all useful information is readily available in 2D Why so hard? “Inverse problem”: one image = many.
Intelligent Vision Systems Image Geometry and Acquisition ENT 496 Ms. HEMA C.R. Lecture 2.
Perception and VR MONT 104S, Fall 2008 Lecture 8 Seeing Depth
Lecture 04 Edge Detection Lecture 04 Edge Detection Mata kuliah: T Computer Vision Tahun: 2010.
Machine Vision Edge Detection Techniques ENT 273 Lecture 6 Hema C.R.
Computer Vision Image Features Instructor: Dr. Sherif Sami Lecture 4.
Instructor: Mircea Nicolescu Lecture 5 CS 485 / 685 Computer Vision.
Chapter Twenty-Five: Light  25.1 Properties of Light  25.2 Color and Vision  25.3 Optics.
MODULE #13: VISION. Vision Transduction: transformation of stimulus energy (light, sound, smells, etc.) to neural impulses our brains can interpret. Our.
Light and Optics Part Three: Optics and Reflection.
Image Perception ‘Let there be light! ‘. “Let there be light”
Vision.
CS262 – Computer Vision Lect 4 - Image Formation
- photometric aspects of image formation gray level images
Color Image Processing
Color Image Processing
Digital Image Processing (DIP)
Color Image Processing
Vision.
Introduction Computer vision is the analysis of digital images
Color Image Processing
Distributed Ray Tracing
Filtering Things to take away from this lecture An image as a function
Color Image Processing
Introduction Computer vision is the analysis of digital images
Experiencing the World
Filtering An image as a function Digital vs. continuous images
IT472 Digital Image Processing
Presentation transcript:

Stockman MSU/CSE Fall Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall Computer Vision and Human Perception What are the goals of CV? What are the applications? How do humans perceive the 3D world via images? Some methods of processing images.

Stockman MSU/CSE Fall Goal of computer vision Make useful decisions about real physical objects and scenes based on sensed images. Alternative (Aloimonos and Rosenfeld): goal is the construction of scene descriptions from images. How do you find the door to leave? How do you determine if a person is friendly or hostile?.. an elder?.. a possible mate?

Stockman MSU/CSE Fall Critical Issues Sensing: how do sensors obtain images of the world? Information: how do we obtain color, texture, shape, motion, etc.? Representations: what representations should/does a computer [or brain] use? Algorithms: what algorithms process image information and construct scene descriptions?

Stockman MSU/CSE Fall Images: 2D projections of 3D 3D world has color, texture, surfaces, volumes, light sources, objects, motion, betweeness, adjacency, connections, etc. 2D image is a projection of a scene from a specific viewpoint; many 3D features are captured, some are not. Brightness or color = g(x,y) or f(row, column) for a certain instant of time Images indicate familiar people, moving objects or animals, health of people or machines

Stockman MSU/CSE Fall Image receives reflections Light reaches surfaces in 3D Surfaces reflect Sensor element receives light energy Intensity matters Angles matter Material maters

Stockman MSU/CSE Fall CCD Camera has discrete elts Lens collects light rays CCD elts replace chemicals of film Number of elts less than with film (so far)

Stockman MSU/CSE Fall Intensities near center of eye

Stockman MSU/CSE Fall Camera + Programs = Display Camera inputs to frame buffer Program can interpret data Program can add graphics Program can add imagery

Stockman MSU/CSE Fall Some image format issues Spatial resolution; intensity resolution; image file format

Stockman MSU/CSE Fall Resolution is “pixels per unit of length” Resolution decreases by one half in cases at left Human faces can be recognized at 64 x 64 pixels per face

Stockman MSU/CSE Fall Features detected depend on the resolution Can tell hearts from diamonds Can tell face value Generally need 2 pixels across line or small region (such as eye)

Stockman MSU/CSE Fall Human eye as a spherical camera 100M sensing elts in retina Rods sense intensity Cones sense color Fovea has tightly packed elts, more cones Periphery has more rods Focal length is about 20mm Pupil/iris controls light entry Eye scans, or saccades to image details on fovea 100M sensing cells funnel to 1M optic nerve connections to the brain

Stockman MSU/CSE Fall Image processing operations Thresholding; Edge detection; Motion field computation

Stockman MSU/CSE Fall Find regions via thresholding Region has brighter or darker or redder color, etc. If pixel > threshold then pixel = 1 else pixel = 0

Stockman MSU/CSE Fall Example red blood cell image Many blood cells are separate objects Many touch – bad! Salt and pepper noise from thresholding How useable is this data?

Stockman MSU/CSE Fall Robot vehicle must see stop sign sign = imread('Images/stopSign.jpg','jpg'); red = (sign(:, :, 1)>120) & (sign(:,:,2)<100) & (sign(:,:,3)<80); out = red*200; imwrite(out, 'Images/stopRed120.jpg', 'jpg')

Stockman MSU/CSE Fall Thresholding is usually not trivial

Stockman MSU/CSE Fall Can cluster pixels by color similarity and by adjacency Original RGB ImageColor Clusters by K-Means

Stockman MSU/CSE Fall Some image processing ops Finding contrast in an image; using neighborhoods of pixels; detecting motion across 2 images

Stockman MSU/CSE Fall Differentiate to find object edges For each pixel, compute its contrast Can use max difference of its 8 neighbors Detects intensity change across boundary of adjacent regions LOG filter later on

Stockman MSU/CSE Fall and 8 neighbors of a pixel 4 neighbors are at multiples of 90 degrees. N. W * E. S. 8 neighbors are at every multiple of 45 degrees NW N NE W * E SW S SE

Stockman MSU/CSE Fall Detect Motion via Subtraction Constant background Moving object Produces pixel differences at boundary Reveals moving object and its shape Differences computed over time rather than over space

Stockman MSU/CSE Fall Two frames of aerial imagery Video frame N and N+1 shows slight movement: most pixels are same, just in different locations.

Stockman MSU/CSE Fall Best matching blocks between video frames N+1 to N (motion vectors) The bulk of the vectors show the true motion of the airplane taking the pictures. The long vectors are incorrect motion vectors, but they do work well for compression of image I2! Best matches from 2 nd to first image shown as vectors overlaid on the 2 nd image. (Work by Dina Eldin.)

Stockman MSU/CSE Fall Gradient from 3x3 neighborhood Estimate both magnitude and direction of the edge.

Stockman MSU/CSE Fall Prewitt versus Sobel masks Sobel mask uses weights of 1,2,1 and -1,-2,-1 in order to give more weight to center estimate. The scaling factor is thus 1/8 and not 1/6.

Stockman MSU/CSE Fall Computational short cuts

Stockman MSU/CSE Fall rows of intensity vs difference

Stockman MSU/CSE Fall “Masks” show how to combine neighborhood values Multiply the mask by the image neighborhood to get first derivatives of intensity versus x and versus y

Stockman MSU/CSE Fall Curves of contrasting pixels

Stockman MSU/CSE Fall Boundaries not always found well

Stockman MSU/CSE Fall Canny boundary operator

Stockman MSU/CSE Fall LOG filter creates zero crossing at step edges (2 nd der. of Gaussian) 3x3 mask applied at each image position Detects spots Detects steps edges Marr-Hildreth theory of edge detection

Stockman MSU/CSE Fall G(x,y): Mexican hat filter

Stockman MSU/CSE Fall Positive center Negative surround

Stockman MSU/CSE Fall Properties of LOG filter Has zero response on constant region Has zero response on intensity ramp Exaggerates a step edge by making it larger Responds to a step in any direction across the “receptive field”. Responds to a spot about the size of the center

Stockman MSU/CSE Fall Human receptive field is analogous to a mask X[j] are the image intensities. W[j] are gains (weights) in the mask

Stockman MSU/CSE Fall Human receptive fields amplify contrast

Stockman MSU/CSE Fall D neural network in brain Level j Level j+1

Stockman MSU/CSE Fall Mach band effect shows human bias

Stockman MSU/CSE Fall Human bias and illusions supports receptive field theory of edge detection

Stockman MSU/CSE Fall Human brain as a network 100B neurons, or nodes Half are involved in vision 10 trillion connections Neuron can have “fanout” of 10,000 Visual cortex highly structured to process 2D signals in multiple ways

Stockman MSU/CSE Fall Color and shading Used heavily in human vision Color is a pixel property, making some recognition problems easy Visible spectrum for humans is 400nm (blue) to 700 nm (red) Machines can “see” much more; ex. X-rays, infrared, radio waves

Stockman MSU/CSE Fall Imaging Process (review)

Stockman MSU/CSE Fall Factors that Affect Perception Light: the spectrum of energy that illuminates the object surface Reflectance: ratio of reflected light to incoming light Specularity: highly specular (shiny) vs. matte surface Distance: distance to the light source Angle: angle between surface normal and light source Sensitivity how sensitive is the sensor

Stockman MSU/CSE Fall Some physics of color White light is composed of all visible frequencies ( ) Ultraviolet and X-rays are of much smaller wavelength Infrared and radio waves are of much longer wavelength

Stockman MSU/CSE Fall Models of Reflectance We need to look at models for the physics of illumination and reflection that will 1. help computer vision algorithms extract information about the 3D world, and 2. help computer graphics algorithms render realistic images of model scenes. Physics-based vision is the subarea of computer vision that uses physical models to understand image formation in order to better analyze real-world images.

Stockman MSU/CSE Fall The Lambertian Model: Diffuse Surface Reflection A diffuse reflecting surface reflects light uniformly in all directions Uniform brightness for all viewpoints of a planar surface.

Stockman MSU/CSE Fall Real matte objects Light from ring around camera lens

Stockman MSU/CSE Fall Specular reflection is highly directional and mirrorlike. R is the ray of reflection V is direction from the surface toward the viewpoint  is the shininess parameter

Stockman MSU/CSE Fall CV: Perceiving 3D from 2D Many cues from 2D images enable interpretation of the structure of the 3D world producing them

Stockman MSU/CSE Fall Many 3D cues How can humans and other machines reconstruct the 3D nature of a scene from 2D images? What other world knowledge needs to be added in the process?

Stockman MSU/CSE Fall Labeling image contours interprets the 3D scene structure + An egg and a thin cup on a table top lighted from the top right Logo on cup is a “mark” on the material “shadow” relates to illumination, not material

Stockman MSU/CSE Fall “Intrinsic Image” stores 3D info in “pixels” and not intensity. For each point of the image, we want depth to the 3D surface point, surface normal at that point, albedo of the surface material, and illumination of that surface point.

Stockman MSU/CSE Fall D scene versus 2D image Creases Corners Faces Occlusions (for some viewpoint) Edges Junctions Regions Blades, limbs, T’s

Stockman MSU/CSE Fall Labeling of simple polyhedra Labeling of a block floating in space. BJ and KI are convex creases. Blades AB, BC, CD, etc model the occlusion of the background. Junction K is a convex trihedral corner. Junction D is a T- junction modeling the occlusion of blade CD by blade JE.

Stockman MSU/CSE Fall Trihedral Blocks World Image Junctions: only 16 cases! Only 16 possible junctions in 2D formed by viewing 3D corners formed by 3 planes and viewed from a general viewpoint! From top to bottom: L-junctions, arrows, forks, and T-junctions.

Stockman MSU/CSE Fall How do we obtain the catalog? think about solid/empty assignments to the 8 octants about the X-Y-Z-origin think about non-accidental viewpoints account for all possible topologies of junctions and edges then handle T-junction occlusions

Stockman MSU/CSE Fall Blocks world labeling Left: block floating in space Right: block glued to a wall at the back

Stockman MSU/CSE Fall Try labeling these: interpret the 3D structure, then label parts What does it mean if we can’t label them? If we can label them?

Stockman MSU/CSE Fall researchers very excited very strong constraints on interpretations several hundred in catalogue when cracks and shadows allowed (Waltz): algorithm works very well with them but, world is not made of blocks! later on, curved blocks world work done but not as interesting

Stockman MSU/CSE Fall Backtracking or interpretation tree

Stockman MSU/CSE Fall “Necker cube” has multiple interpretations A human staring at one of these cubes typically experiences changing interpretations. The interpretation of the two forks (G and H) flip-flops between “front corner” and “back corner”. What is the explanation? Label the different interpretations

Stockman MSU/CSE Fall Depth cues in 2D images

Stockman MSU/CSE Fall “Interposition” cue Def: Interposition occurs when one object occludes another object, thus indicating that the occluding object is closer to the viewer than the occluded object.

Stockman MSU/CSE Fall interposition T-junctions indicate occlusion: top is occluding edge while bar is the occluded edge Bench occludes lamp post leg occludes bench lamp post occludes fence railing occludes trees trees occlude steeple

Stockman MSU/CSE Fall Perspective scaling: railing looks smaller at the left; bench looks smaller at the right; 2 steeples are far away Forshortening: the bench is sharply angled relative to the viewpoint; image length is affected accordingly

Stockman MSU/CSE Fall Texture gradient reveals surface orientation Texture Gradient: change of image texture along some direction, often corresponding to a change in distance or orientation in the 3D world containing the objects creating the texture. ( In East Lansing, we call it “corn” not “maize’. ) Note also that the rows appear to converge in 2D

Stockman MSU/CSE Fall D Cues from Perspective

Stockman MSU/CSE Fall D Cues from perspective

Stockman MSU/CSE Fall More 3D cues Virtual linesFalsely perceived interposition

Stockman MSU/CSE Fall Irving Rock: The Logic of Perception; 1982 Summarized an entire career in visual psychology Concluded that the human visual system acts as a problem-solver Triangle unlikely to be accidental; must be object in front of background; must be brighter since it’s closer

Stockman MSU/CSE Fall More 3D cues 2D alignment usually means 3d alignment 2D image curves create perception of 3D surface

Stockman MSU/CSE Fall “structured light” can enhance surfaces in industrial vision Sculpted object Potatoes with light stripes

Stockman MSU/CSE Fall Models of Reflectance We need to look at models for the physics of illumination and reflection that will 1. help computer vision algorithms extract information about the 3D world, and 2. help computer graphics algorithms render realistic images of model scenes. Physics-based vision is the subarea of computer vision that uses physical models to understand image formation in order to better analyze real-world images.

Stockman MSU/CSE Fall The Lambertian Model: Diffuse Surface Reflection A diffuse reflecting surface reflects light uniformly in all directions Uniform brightness for all viewpoints of a planar surface.

Stockman MSU/CSE Fall Shape (normals) from shading Cylinder with white paper and pen stripes Intensities plotted as a surface Clearly intensity encodes shape in this case

Stockman MSU/CSE Fall Shape (normals) from shading Plot of intensity of one image row reveals the 3D shape of these diffusely reflecting objects.

Stockman MSU/CSE Fall Specular reflection is highly directional and mirrorlike. R is the ray of reflection V is direction from the surface toward the viewpoint  is the shininess parameter

Stockman MSU/CSE Fall What about models for recognition “recognition” = to know again; How does memory store models of faces, rooms, chairs, etc.?

Stockman MSU/CSE Fall Human capability extensive Child age 6 might recognize 3000 words And 30,000 objects Junkyard robot must recognize nearly all objects Hundreds of styles of lamps, chairs, tools, …

Stockman MSU/CSE Fall Some methods: recognize Via geometric alignment: CAD Via trained neural net Via parts of objects and how they join Via the function/behavior of an object

Stockman MSU/CSE Fall Side view classes of Ford Taurus (Chen and Stockman) These were made in the PRIP Lab from a scale model. Viewpoints in between can be generated from x and y curvature stored on boundary. Viewpoints matched to real image boundaries via optimization.

Stockman MSU/CSE Fall Matching image edges to model limbs Could recognize car model at stoplight or gate.

Stockman MSU/CSE Fall Object as parts + relations Parts have size, color, shape Connect together at concavities Relations are: connect, above, right of, inside of, …

Stockman MSU/CSE Fall Functional models Inspired by JJ Gibson’s Theory of Affordances. An object is what an object does * container: holds stuff * club: hits stuff * chair: supports humans

Stockman MSU/CSE Fall Louise Stark: chair model Dozens of CAD models of chairs Program analyzed for * stable pose * seat of right size * height off ground right size * no obstruction to body on seat * program would accept a trash can (which could also pass as a container)

Stockman MSU/CSE Fall Minski’s theory of frames (Schank’s theory of scripts) Frames are learned expectations – frame for a room, a car, a party, an argument, … Frame is evoked by current situation – how? (hard) Human “fills in” the details of the current frame (easier)

Stockman MSU/CSE Fall summary Images have many low level features Can detect uniform regions and contrast Can organize regions and boundaries Human vision uses several simultaneous channels: color, edge, motion Use of models/knowledge diverse and difficult Last 2 issues difficult in computer vision