Robot Vision.

Slides:

Advertisements

Similar presentations

Mobile Robot Localization and Mapping using the Kalman Filter

Advertisements

Automatic Color Gamut Calibration Cristobal Alvarez-Russell Michael Novitzky Phillip Marks.

November 12, 2013Computer Vision Lecture 12: Texture 1Signature Another popular method of representing shape is called the signature. In order to compute.

Probabilistic Robotics

(Includes references to Brian Clipp

Uncertainty Representation. Gaussian Distribution variance Standard deviation.

December 5, 2013Computer Vision Lecture 20: Hidden Markov Models/Depth 1 Stereo Vision Due to the limited resolution of images, increasing the baseline.

A Versatile Depalletizer of Boxes Based on Range Imagery Dimitrios Katsoulas*, Lothar Bergen*, Lambis Tassakos** *University of Freiburg **Inos Automation-software.

Autonomous Robot Navigation Panos Trahanias ΗΥ475 Fall 2007.

ECE 7340: Building Intelligent Robots QUALITATIVE NAVIGATION FOR MOBILE ROBOTS Tod S. Levitt Daryl T. Lawton Presented by: Aniket Samant.

© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.

Probabilistic video stabilization using Kalman filtering and mosaicking.

Simultaneous Localization and Map Building System for Prototype Mars Rover CECS 398 Capstone Design I October 24, 2001.

Jacinto C. Nascimento, Member, IEEE, and Jorge S. Marques

Capturing the Motion of Ski Jumpers using Multiple Stationary Cameras Atle Nes Faculty of Informatics and e-Learning Trondheim University.

Real-Time Face Detection and Tracking Using Multiple Cameras RIT Computer Engineering Senior Design Project John RuppertJustin HnatowJared Holsopple This.

Computer Vision Spring ,-685 Instructor: S. Narasimhan Wean Hall 5409 T-R 10:30am – 11:50am.

Mohammed Rizwan Adil, Chidambaram Alagappan., and Swathi Dumpala Basaveswara.

Tal Mor  Create an automatic system that given an image of a room and a color, will color the room walls  Maintaining the original texture.

Fuzzy control of a mobile robot Implementation using a MATLAB-based rapid prototyping system.

October 14, 2014Computer Vision Lecture 11: Image Segmentation I 1Contours How should we represent contours? A good contour representation should meet.

Course 12 Calibration. 1.Introduction In theoretic discussions, we have assumed: Camera is located at the origin of coordinate system of scene.

Geometric Models & Camera Calibration

3D SLAM for Omni-directional Camera

Simultaneous Localization and Mapping Presented by Lihan He Apr. 21, 2006.

CSCE 5013 Computer Vision Fall 2011 Prof. John Gauch

Intelligent Vision Systems ENT 496 Object Shape Identification and Representation Hema C.R. Lecture 7.

University of Palestine Faculty of Applied Engineering and Urban Planning Software Engineering Department Introduction to computer vision Chapter 2: Image.

University of Amsterdam Search, Navigate, and Actuate - Qualitative Navigation Arnoud Visser 1 Search, Navigate, and Actuate Qualitative Navigation.

Generalized Hough Transform

December 9, 2014Computer Vision Lecture 23: Motion Analysis 1 Now we will talk about… Motion Analysis.

Digital Image Processing In The Name Of God Digital Image Processing Lecture2: Digital Image Fundamental M. Ghelich Oghli By: M. Ghelich Oghli

Electronic Analog Computer Dr. Amin Danial Asham by.

Autonomous Robots Vision © Manfred Huber 2014.

Course14 Dynamic Vision. Biological vision can cope with changing world Moving and changing objects Change illumination Change View-point.

October 1, 2013Computer Vision Lecture 9: From Edges to Contours 1 Canny Edge Detector However, usually there will still be noise in the array E[i, j],

1Ellen L. Walker 3D Vision Why? The world is 3D Not all useful information is readily available in 2D Why so hard? “Inverse problem”: one image = many.

Robot Vision SS 2009 Matthias Rüther ROBOT VISION 2VO 1KU Matthias Rüther.

SLAM Techniques -Venkata satya jayanth Vuddagiri 1.

Processing Images and Video for An Impressionist Effect Automatic production of “painterly” animations from video clips. Extending existing algorithms.

Portable Camera-Based Assistive Text and Product Label Reading From Hand-Held Objects for Blind Persons.

Coin Recognition Using MATLAB - Emad Zaben - Bakir Hasanein - Mohammed Omar.

October 3, 2013Computer Vision Lecture 10: Contour Fitting 1 Edge Relaxation Typically, this technique works on crack edges: pixelpixelpixel pixelpixelpixelebg.

Paper – Stephen Se, David Lowe, Jim Little

Gait Recognition Gökhan ŞENGÜL.

Contents Team introduction Project Introduction Applicability

Motion Detection And Analysis

Memory Units Memories store data in units from one to eight bits. The most common unit is the byte, which by definition is 8 bits. Computer memories are.

Recognition: Face Recognition

Mean Shift Segmentation

Simultaneous Localization and Mapping

Common Classification Tasks

Fitting Curve Models to Edges

Range Imaging Through Triangulation

Vehicle Segmentation and Tracking in the Presence of Occlusions

Computer Vision Lecture 3: Digital Images

Computer Vision Lecture 16: Texture II

Multiple View Geometry for Robotics

Introduction to Robot Mapping

In the land of the blind, the one eyed man is king

Probabilistic Map Based Localization

MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING

MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING

Announcements Project 2 artifacts Project 3 due Thursday night

Announcements Project 4 out today Project 2 winners help session today

Unsupervised Perceptual Rewards For Imitation Learning

Error Correction Coding

Introduction to Artificial Intelligence Lecture 22: Computer Vision II

Presentation transcript:

Robot Vision

Methods for Digital Image Processing

Every picture tells a story

Vision Goal of computer vision is to write computer programs that can interpret images

Can do amazing things like: Human Vision Can do amazing things like: Recognize people and objects Navigate through obstacles Understand mood in the scene Imagine stories But still is not perfect: Suffers from Illusions Ignores many details Ambiguous description of the world Doesn’t care about accuracy of world

Computer Vision What we see What a computer sees

Image Image : a two-dimensional array of pixels The indices [i, j] of pixels : integer values that specify the rows and columns in pixel values

Computer Vision What we see What a computer sees

Gray level image vs. binary image

Components of a Computer Vision System Camera Lighting Scene Interpretation Computer Scene

Microsoft Kinect IR LED Emitter IR Camera RGB Camera

Face detection

Face detection Many digital/Mobile cameras detect faces Why would this be useful? Main reason is focus. Also enables “smart” cropping. Many digital/Mobile cameras detect faces Canon, Sony, Fuji, …

Smile detection? Sony Cyber-shot® T70 Digital Still Camera

Face Recognition Principle Components Analysis (PCA)

Vision-based biometrics “How the Afghan Girl was Identified by Her Iris Patterns” Read the story wikipedia

Definition of Robot vision Robot vision may be defined as the process of extracting, characterizing, and interpreting information from images of a three dimensional world

Common reasons for failure of vision systems Small changes in the environment can result in significant variations in image data Changes in contrast Unexpected occlusion of features

What Skills Do Robots Need? Identification: What/who is that? Object detection, recognition Movement: How do I move safely? Obstacle avoidance, homing Manipulation: How do I change that? Interacting with objects/environment Navigation: Where am I? Mapping, localization

Visual Skills: Identification Recognizing face/body/structure: Who/what do I see? Use shape, color, pattern, other static attributes to distinguish from background, other hypotheses Gesture/activity: What is it doing? From low-level motion detection & tracking to categorizing high-level temporal patterns Feedback between static and dynamic

Visual Skills: Movement Steering, foot placement or landing spot for entire vehicle MAKRO sewer shape pattern Demeter region boundary detection

Visual Skills: Manipulation Moving other things Grasping: Door opener (KTH) Pushing, digging, cranes KTH robot & typical handle Clodbusters push a box cooperatively

Visual Skills: Navigation Building a map Localization/place recognition Where are you in the map? Laser-based wall map (CMU) Minerva’s ceiling map

Binary Image Creation Popularly used in industrial robotics

Bit per Pixel

Color models Color models for images, RGB, CMY Color models for video, YIQ, YUV (YCbCr) Relationship between color models :

Simplified diagram of camera to CPU interface

Interfacing Digital Cameras to CPU Digital camera sensors are very complex units. In many respects they are themselves similar to an embedded controller chip. Some sensors buffer camera data and allow slow reading via handshake (ideal for slow microprocessors) Most sensors send full image as a stream after start signal (CPU must be fast enough to read or use hardware buffer)

Idea • Use FIFO as image data buffer • FIFO is similar to dual-ported RAM, it is required since there is no synchronization between camera and CPU • Interrupt service routine then reads FIFO until empty

Vision Sensors Single Perspective Camera

Vision Sensors Multiple Perspective Cameras (e.g. Stereo Camera Pair)

Vision Sensors Multiple Perspective Cameras (e.g. Stereo Camera Pair)

There are several good approaches to detect objects: Model-based vision. 1) We can have stored models of line- drawings of objects (from many possible angles, and at many different possible scales!), and then compare those with all possible combinations of edges in the image. Notice that this is a very computationally intensive and expensive process.

Motion vision. 2) We can take advantage of motion. If we look at an image at two consecutive time-steps, and we move the camera in between, each continuous solid objects (which obeys physical laws) will move as one. This gives us a hint for finding objects, by subtracting two images from each other. But notice that this also depends on knowing well: how we moved the camera relative to the scene (direction, distance), and that nothing was moving in the scene at the time.

Clever Special Tricks that work: to do object recognition, it is possible to simplify the vision problem in various ways: 1) Use color; look for specifically and uniquely colored objects, and recognize them that way (such as stop signs, for example) 2) Use a small image plane; instead of a full 512 x 512 pixel array, we can reduce our view to much less. Of course there is much less information in the image, but if we are clever, and know what to expect, we can process what we see quickly and usefully.

Smart Tricks continued: 3) Use other, simpler and faster, sensors, and combine those with vision. IR cameras isolate people by body-temperature. Grippers allow us to touch and move objects, after which we can be sure they exist. 4) Use information about the environment; if you know you will be driving on the road which has white lines, look specifically for those lines at the right places in the image. This is how first and still fastest road and highway robotic driving is done.

SLAM: Simultaneous Localization and Mapping A robot is exploring an unknown, static environment. Given: The robot’s controls Observations of nearby features The controls and observations are both noisy. Estimate: Location of the robot -- localization where I am ? Detail map of the environment – mapping What does the world look like?

Objective: Determination of the pose (= position + orientation) of a mobile robot in a known environment in order to succesfully perform a given task

The SLAM Problem SLAM is a chicken-or-egg problem: → A map is needed for localizing a robot → A pose estimate is needed to build a map Thus, SLAM is (regarded as) a hard problem in robotics

SLAM Applications Indoors Undersea Space Underground

SLAM – Multiple parts Landmark extraction data association State estimation state update landmark update There are many ways to solve each of the smaller parts

Hardware Mobile Robot Range Measurement Device Laser scanner – CANNOT be used underwater Sonar – NOT accurate Vision – Cannot be used in a room with NO light

Mobile Robot Mapping What does the world look like? Robot is unaware of its environment The robot must explore the world and determine its structure Most often, this is combined with localization Robot must update its location wrt the landmarks Known in the literature as Simultaneous Localization and Mapping, or Concurrent Localization and Mapping : SLAM (CLM) Example : AIBOs are placed in an unknown environment and must learn the locations of the landmarks (An interesting project idea?)

Notation Robot pose Robot poses from time 0 to time t Localization as an estimation problem Notation y q Robot pose x Robot poses from time 0 to time t Robot exteroceptive measurements from time 1 to time t Motion commands (or proprioceptive measurements) from time 0 to time t

Localization as an estimation problem Notation The robot motion model is the pdf of the robot pose at time t+1 given the robot pose and the motion action at time t. It takes into account the noise characterizing the proprioceptive sensors: The measurement model describes the probability of observing at time t a given measurement zt when the robot pose is xr,t. It takes into account the noise characterizing the exteroceptive sensors:

SLAM: Full SLAM: Online SLAM: Simultaneous Localization and Mapping p(x1:t , m | z1:t ,u1:t ) Estimates entire path and map! Online SLAM: p(xt , m | z1:t ,u1:t )   … p(x1:t , m | z1:t ,u1:t ) dx1dx2...dxt 1 Integrations (marginalization) typically done one at a time Estimates most recent pose and map!

Localization Basics Several cameras, pointing straight down Fitted with ultra wide angle lens Instance of Mezzanine (USC) per camera "finds" fiducial pairs atop robot Removes barrel distortion ("dewarps") Reported positions aggregated into tracks But... Fiducials identical: identify robots via commanded motion pattern

Localization: Better Dewarping Mezzanine's supplied dewarp algorithm unstable (10-20 cm error) Model barrel distortion using cosine function locworld = locimage / cos( α * w ) (where α is angle between optical axis and fiducial) Added interpolative error correction Result: ~1cm max location error No need to account for more complex distortion, even for very cheap lenses An interesting problem is that, when dewarping, we are converting from $d_{image}$ to $d_{world}$, but to get the angle $a$ exactly we need to already know the dewarped world coordinates. In our algorithm we iterate, using each $d_{world}$ approximation to calculate a new angle $\alpha$ and hence a new $d_{world}$, converging in fewer than eight iterations. Uses simple geometry of cameras pointing down Approximation function poor fit for large amts distortion; grid generated with approximation function would tilt/bend GLOBALLY if we added enough control points to fit our level of barrel distortion

To set this room as a goal, we'll associate a reward value to each door (i.e. link between nodes). The doors that lead immediately to the goal have an instant reward of 100. Other doors not directly connected to the target room have zero reward

The -1's in the table represent null values (i. e The -1's in the table represent null values (i.e.; where there isn't a link between nodes). For example, State 0 cannot go to State 1.

Q Matrix "Q", to the brain of our agent, representing the memory of what the agent has learned through experience. The rows of matrix Q represent the current state of the agent, and the columns represent the possible actions leading to the next state (the links between the nodes). The agent starts out knowing nothing, the matrix Q is initialized to zero If we didn't know how many states were involved, the matrix Q could start out with only one element. It is a simple task to add more columns and rows in matrix Q if a new state is found.

Learning Q(state, action) = R(state, action) + Gamma * Max[Q(next state, all actions)] learning parameter Gamma maximum value of Q for all possible actions in the next state

Initialize matrix Q to zero. Select a random initial state. Do While the goal state hasn't been reached. Select one among all possible actions for the current state. Using this possible action, consider going to the next state. Get maximum Q value for this next state based on all possible actions. Compute: Q(state, action) = R(state, action) + Gamma * Max[Q(next state, all actions)] Set the next state as the current state.

Look at the second row (state 1) of matrix R Look at the second row (state 1) of matrix R. There are two possible actions for the current state 1: go to state 3, or go to state 5. By random selection, we select to go to 5 as our action. Since matrix Q is still initialized to zero, Q(5, 1), Q(5, 4), Q(5, 5), are all zero. The result of this computation for Q(1, 5) is 100 because of the instant reward from R(5, 1).

For the next episode, we start with a randomly chosen initial state For the next episode, we start with a randomly chosen initial state. This time, we have state 3 as our initial state We use the updated matrix Q from the last episode. Q(1, 3) = 0 and Q(1, 5) = 100. The result of the computation is Q(3, 1) = 80 because the reward is zero. The matrix Q becomes:

This matrix Q, can then be normalized (i. e This matrix Q, can then be normalized (i.e.; converted to percentage) by dividing all non-zero entries by the highest number (500 in this case):

For example, from initial State 2, the agent can use the matrix Q as a guide: From State 2 the maximum Q values suggests the action to go to state 3. From State 3 the maximum Q values suggest two alternatives: go to state 1 or 4. Suppose we arbitrarily choose to go to 1. From State 1 the maximum Q values suggests the action to go to state 5. Thus the sequence is 2 - 3 - 1 - 5.