Instructor: Mircea Nicolescu Computer Vision Instructor: Mircea Nicolescu Good afternoon and thank you everyone for coming. My talk today will describe the research I performed at the IRIS at USC, the object of this work being to build a computational framework that addresses the problem of motion analysis and interpretation.
Nomenclature Somewhat interchangeable names, with somewhat different implications: Computer Vision Most general term Computational Vision Includes modeling of biological vision Image Understanding Automated scene analysis (e.g., satellite images, robot navigation) Machine Vision Industrial, factory-floor systems for inspection, measurements, part placement, etc.
Another View Interesting Computer vision Actually works Computational vision Machine vision Actually works
Related Fields Largely built upon Related areas Image Processing Statistical Pattern Recognition Artificial Intelligence Related areas Robotics Biological vision Medical imaging Computer graphics Human-computer interaction
Vision Processing
Low Level Vision
Low Level Vision Feature extraction
Low Level Vision Region segmentation
Mid Level Vision
Mid Level Vision 3D Reconstruction
Mid Level Vision Motion
Mid Level Vision A moving camera can be used instead of stereo
High Level Vision
Progress in Computer Vision First generation: Military/Early Research Few systems, each custom-built, cost $Ms “Users” have PhDs 1 hour per frame Second generation: Industrial/Medical Numerous systems, 1000s of each, cost $10Ks Users have college degree Special hardware Third generation: Consumer 100000(00) systems, cost $100s Users have little or no training Emphasis on software
Visual Inspection
Character Recognition
Biometrics
Indexing into Databases Shape content
Indexing into Databases Color, texture
Target Recognition Department of Defense (Army, Air Force, Navy)
Interpretation of Aerial Photography
Autonomous Vehicles Land, Underwater, Space
Traffic Monitoring
Inserting Artificial Objects into a Scene
Inserting Images onto Ground Plane
Face Detection
Face Recognition
Human Activity Recognition
Human Activity Recognition Overview: The course examines fundamental concepts in computer vision, while highlighting application areas where vision techniques are providing solutions. It is suited for students who are interested in doing research in computer vision, or who are involved in computer graphics, robotics, artificial intelligence or image processing. Topics: Color Vision Multiple View Geometry Camera Calibration 3D Reconstruction Object Recognition Optical Flow and Motion Analysis Perceptual Grouping Background Modeling Detection and Tracking Robotic Vision Gesture and Facial Expression Recognition Activity Recognition
Cameras First photograph due to Niepce (1816) First on record - shown below (1822)
Main Concepts Factors that define image formation: Optical Parameters Focal length Angular aperture Photometric Parameters Type, intensity and direction of illumination Reflectance properties of viewed surfaces Sensor structure Geometric Parameters Type of projection Position and orientation of the camera in space Distortions
Evolution Photographic camera: Animal eye: a (really) long time ago. Niepce, 1816. Animal eye: a (really) long time ago. Pinhole perspective projection: Brunelleschi, XVth Century. Camera obscura: XVIth Century.
Pinhole Cameras Abstract camera model - box with a small hole in it Pinhole cameras work in practice The point to make here is that each point on the image plane sees light from only one direction, the one that passes through the pinhole.
Distant Objects Are Smaller
Parallel Lines Meet Common to draw film plane in front of the focal point. Moving the film plane merely scales the image.
Vanishing Points Each set of parallel lines (direction) meets at a different point The vanishing point for this direction Sets of parallel lines on the same plane lead to collinear vanishing points. The line is called the horizon for that plane Good ways to spot fake images scale and perspective don’t work vanishing points behave badly supermarket tabloids are a great source
Is This Correct? Is this a perspective image of four identical buildings? why not?
The Perspective Projection Equation x’ = f’ x/z y’ = f’ y/z
Homogenous Coordinates Add an extra coordinate and use an equivalence relation for 2D equivalence relation k*(X,Y,Z) is the same as (X,Y,Z) for 3D equivalence relation k*(X,Y,Z,T) is the same as (X,Y,Z,T) Basic notion Possible to represent points “at infinity” Where parallel lines intersect Where parallel planes intersect Possible to write the action of a perspective camera as a matrix
The Camera Matrix Turn previous expression into HC’s HC’s for 3D point are (X,Y,Z,T) HC’s for point in image are (U,V,W) Image coordinates are x’ and y’: x’ = U / W = f’ X / Z y’ = V / W = f’ Y / Z
Other Projection Models Weak Perspective Assumes that the relative distance along the Z axis between any scene points is much smaller than the average distance z0 to the camera (objects far away) is the magnification.
Other Projection Models Orthographic Projection Quite unrealistic
Pinhole Cameras - Problems Why are pinhole cameras not really used? Pinhole too big - many directions are averaged, blurring the image Pinhole too small - diffraction effects blur the image Generally, pinhole cameras are dark, because a very small set of rays from a particular point hits the screen.