Vision January 10, 2007
Today's Agenda ● Some general notes on vision ● Colorspaces ● Numbers and Java ● Feature detection ● Rigid body motion
Some general notes on vision
Vision is difficult ● Animal vision is very complex ● Right: Circuitry of macaque visual cortex (Felleman and Van Essen, 1991)
Object recognition is difficult A theory of object recognition has to specify: ● The nature of stored visual representations in long- term memory ● The nature of the intermediate representations ● A computational account of how each intermediate representation can be derived from the previous one ● A determination of whether the answers to the above points are different for different kinds of objects
What makes it difficult? ● A single image can be cast by many different objects ● A single object can cast many different images ● Two main challenges: 1. Invariance/Tolerance: Generalizing across changes in size, orientation, lighting, etc. to realize that these images are all of the same thing 2. Specificity: Appreciating the distinction between different categories
Theories on solving this problem ● Inverse optics: ● Try to solve the problem of determining what shape may have cast the image ● This is an ill-posed problem ● Association: ● Store each possible version of an object ● Brute force ● Other intermediate descriptors (e.g. Parts, image fragments)
But... Luckily, in MASLab, your problem is greatly simplified!
Colorspaces
Representing Colors ● RGB used to represent colors of light ● CMYK used to represent colors of pigment ●...but both mix color, tint, brightness
HSV Colorspace ● Hue (color): 360 degrees mapped 0 to 255. Note that red is both 0 and 255. ● Saturation (amount of color) ● Value (amount of light and dark) ● We provide the code to convert to HSV ● Note: White is low saturation but can have any hue ● Note: Black is low value but can have any hue
Tips for differentiating colors ● Calibrate for different lighting conditions ( is darker than the lab) ● Globally define thresholds ● Use the gimp/bot client on real images ● Use a large sample set You don't HAVE to use HSV. Past teams have used other models such as RGB.
Numbers and Java
How values are stored Uses Hexadecimal (base 16) ● 0x12 = 18 A color is 4 bytes = 8 hex numbers For HSV, these bytes are: ● Alpha ● Hue ● Saturation ● Value
Manipulating HSV values Use masks to pick out parts. ● 0x & 0x0000FF00 = 0x Shift to move parts. ● 0x >> 8 = 0x Example: Hue = (X >> 16) & 0xFF
Notes on Java... All Java types are signed! ● A byte ranges from -128 to 127 ● Coded in two's complement: to change sign, flip every bit and add 1 Don't forget higher-order bits! ● (int) 0x0000FF00 = (int) 0xFF00 ● (int) ((byte) 0xFF) = (int) 0xFFFFFFFF Watch out for shifts! ● 0xFD >> 8 = 0xFFFD0000
Performance Getting an image performs a copy ● Int[] = bufferedImage.getRGB(…) Getting a pixel performs a multiplication ● int v = bufferedImage.RGB(x,y) ● offset = y*width + x Go across rows, down columns (memory in rows, not columns)
Feature Detection
MASLab Features ● Red balls ● Yellow goals ● Blue lines ● Blue ticks ● Green-and-black bar codes
Ideas for dealing with blue lines ● Search for N blue pixels in a column ● Make sure there's wall-white below ● Candidate voting: ● In each column, list places where you think line might be ● Find shortest left-to-right path through candidates
Ideas for dealing with bar codes ● Look for green and black ● Look for not-white under blue line ● Check along a column to determine colors ● RANdom SAmple Consensus (RANSAC): ● Pick random pixels within bar code ● Are they black or green?
Looking for an object ● Look for a red patch ● Set center to current coordinates ● Loop: – Find new center based on pixels within d of old – Enlarge d and recompute – Stop if increasing d doesn't add enough red pixels
Distance Remember: ● Closer objects are bigger ● Closer objects are lower
Other considerations Find ways to deal with: ● Noise ● Occlusion ● Different viewing angles ● Overly large thresholds
Rigid Body Motion
● Going from data association to motion ● Given – Starting x1, y1, θ1 – Set of objects visible in both images ● What are x2, y2, θ2?
More rigid body motion ● If we know angles but not distances, the problem is hard ● Assume distances are known
But if angles and distances are known......we can construct triangles
Rigid body motion math ● Apply the math for a rotation: x1i = cos(θ) * x2i + sin(θ) * y2i + x0 y1i = cos(θ) * y2i - sin(θ) * x2i + y0 ● Solve for x0, y0, θ with least squares: Σ (x1i - cos(θ) * x2i - sin(θ) * y2i – x0)^2 + (y1i - cos(θ) * y2i + sin(θ) * x2i – y0)^2 ● Need at least two objects to solve
Advantages and Disadvantages Advantages: ● Relies on the world, not odometry ● Can use many or few associations Disadvantages: ● Can take time to compute