Attention in Computer Vision Mica Arie-Nachimson and Michal Kiwkowitz May 22, 2005 Advanced Topics in Computer Vision Weizmann Institute of Science.

Attention in Computer Vision Mica Arie-Nachimson and Michal Kiwkowitz May 22, 2005 Advanced Topics in Computer Vision Weizmann Institute of Science

Problem definition – Search Order Object recognition NO Vision applications apply “expensive” algorithms (e.g. recognition) to image patches Mostly naïve selection of patches Selection of patches determines number of calls to “expensive” algorithm

Problem Definition - Search Order Object recognition NO YES More sophisticated selection of patches would imply less calls to “expensive” algorithm Attention used to efficiently focus on incoming data (better use for limited processing capacity)

Problem Definition - Search Order Object recognition 1 23 4 5 6

Outline What is Attention Attention in Object Recognition Saliency Model Feature Integration Theory Saliency Algorithm Saliency & Object Recognition Comparison Inner Scene Similarity Model Biological motivation Difficulty of Search Tasks Algorithms FLNN VSLE

Attention Attention implies allocating resources, perceptual or cognitive, to some things at the expense of not allocating them to something else.

What is Attention You are sitting in class listening to a lecture. Two people behind you are talking. –Can you hear the lecture? One of them mentions the name of a friend of yours. –How did you know?

Attention in Other Applications Face Detection (feature selection) Video Analysis (temporal block selection) Robot Navigation (select locations) …

Attention is Directed by: Bottom-up: From small to large units of meaning Rapid Task-independent

Attention is Directed by: Top-down: Use higher levels (context, expectation) to process incoming information (Guess) Slower Task dependent http://www.rybak-et-al.net/nisms.html

When is information selected (filtered)? –Early selection (Broadbent, 1958) –Cocktail party phenomenon (Moray, 1959) –Late selection (Treisman, 1960) - attenuation All information is sent to perceptual systems for processing Some is selected for complete processing Some is more likely to be selected Attention WHICH?

Parallel Search Is there a green O ? + A. Treisman, G. Gelade, 1980

Conjunction Search Is there a green N ? + A. Treisman, G. Gelade, 1980

Results A. Treisman, G. Gelade, 1980

Conjunction Search + A. Treisman, G. Gelade, 1980

Color map Orientation map A. Treisman, G. Gelade, 1980

Color mapOrientation map A. Treisman, G. Gelade, 1980

Conjunction Search + A. Treisman, G. Gelade, 1980

Primitives P PP P P P Intensity P PP P P P Orientation P PP P P P Color x x x x s x Curvature I I I I I Line End Movement x x x x x x

Feature Integration Theory Attention - two stages: Attention Serial Processing Localized Focus Slower Conjunctive search Pre-attention Parallel Processing Low Level Features Fast Parallel Search How is the Focus found & shifted? A. Treisman, G. Gelade, 1980

Shifts in Attention “Shifts in selective visual attention: towards the underlying neural circuitry”, Christof Koch, and Shimon Ullman, 1985 C. Koch, and S. Ullman, 1985 Feature Maps Orientation Color Curvature Line end Movement Feature Maps Orientation Color Curvature Line end Movement Feature Maps Orientation Color Curvature Line end Movement Feature Maps Orientation Color Curvature Line end Movement Feature Maps Orientation Color Curvature Line end Movement Central Representation Attention Saliency

“A model of saliency-based visual attention for rapid scene analysis” Laurent Itti, Christof Koch, and Ernst Niebur, 1998 L. Itti, C. Koch, and E. Niebur, 1998 Salient - stands out Example – telephone & road sign have high saliency

from C. Koch L. Itti, C. Koch, and E. Niebur, 1998

Intensity L. Itti, C. Koch, and E. Niebur, 1998 Cells in the retina

012 Intensity Create 8 spatial scale using Gaussian pyramids 8 L. Itti, C. Koch, and E. Niebur, 1998

Intensity Center-Surround difference operator -Sensitive to local spatial discontinuities -Principle computation in the retina & primary visual cortex -Subtract coarse scale from fine scale + - Fine scale Coarse scale L. Itti, C. Koch, and E. Niebur, 1998 + - fine coarse

Toy Example 000 000 000 000 02550 000 Fine levelCoarse level Gauss Pyramid Interpolation Coarse level Point-by-point subtraction 000 02550 000

Toy Example 255 Fine level Coarse level Gauss Pyramid Interpolation Coarse level Point-by-point subtraction 000 000 000

Intensity Compute:  6 Intensity maps Different ratios – multiscale feature extraction L. Itti, C. Koch, and E. Niebur, 1998

Color Same c and s as with intensity  12 Color maps Kandel et al. (2000). Principles of Neural Science. McGraw-Hill/Appleton & Lange L. Itti, C. Koch, and E. Niebur, 1998 More

Color - More Same c and s as with intensity  12 Color maps Kandel et al. (2000). Principles of Neural Science. McGraw-Hill/Appleton & Lange L. Itti, C. Koch, and E. Niebur, 1998

Orientation Same c and s as with intensity  24 Orientation maps From Visual system presentation by S. Ullman L. Itti, C. Koch, and E. Niebur, 1998 More

from C. Koch L. Itti, C. Koch, and E. Niebur, 1998

More Normalization Operator L. Itti, C. Koch, and E. Niebur, 1998

Normalization -Normalize to fixed range -Find global maximum -Compute average over all other points -Multiply map by L. Itti, C. Koch, and E. Niebur, 1998

Saliency Map L. Itti, C. Koch, and E. Niebur, 1998

Conspicuity Maps

1. Extract Feature Maps Algorithm- up to now 2. Compute Center- Surround (42) Intensity – I(6) Color – C(12) Orientation – O(24) 3. Combine each channel into conspicuity map 4. Compute saliency by summing and normalizing maps

Laurent Itti, Christof Koch, and Ernst Niebur, 1998

Leaky integrate-and-fire neurons “Inhibition of return” Winner Takes All Selection (FOA) L. Itti, C. Koch, and E. Niebur, 1998 FOA – Focus Of Attention

Results FOA shifts: 30-70 ms Inhibition: 500-900 ms Inhibition of return ends L. Itti, C. Koch, and E. Niebur, 1998

Results Spatial Frequency Content, Reinage & Zador, 1997 Image SFC Saliency Output L. Itti, C. Koch, and E. Niebur, 1998

Results (a) (b) (c)(d) Image SFC Saliency Output L. Itti, C. Koch, and E. Niebur, 1998 Spatial Frequency Content, Reinage & Zador, 1997

Attention & Object Recognition “Is bottom-up attention useful for object recognition?” –U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004 U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004 Computer recognition Human recognition segmentedCluttered scenes labeledNon labeled Attention

Object Recognition saliency model U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004 Growing region in strongest map To Object Recognition (Lowe) More

Attention & Object Recognition U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004 Added selection of image region: 1.Strongest contribution map 2.Segment “winning” map 3.create a mask M that modulates contrast in original image

Attention & Object Recognition Learning inventories – “grocery cart problem” U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004 Real world scenes 1 image for training (15 fixations) 2-5 images for testing (20 fixations)

testing training Object recognition Match

“Grocery Cart” Problem U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004 trainingtesting1 testing2

“Grocery Cart” Problem Downsides: Bias of human photography Small image set U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004 Solution Robot as acquisition tool

Robot - Landmark Learning Objective – how many objects are found and classified correctly? Navigation – simple obstacle avoiding algorithm using infrared sensors U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004

Landmark Learning Algorithm: 1.Extract most salient location 2.Has 3 “ key points”? No – back to 1 3.Test patch with all known object models Match – increase object count. No match – learn as new object. U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004

Object recognition < 3 key points

Landmark Learning With Attention U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004

Landmark Learning With Random Selection U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004

Landmark Learning - Results U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004

Saliency Based Object Recognition Biologically motivated Uses bottom-up, allows combining top-down information Segmentation –Cluttered scenes –Unlabeled objects –Multiple objects in single image Static priority map U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004

Comparison “Comparing attention operators for learning landmarks”, R. Sim, S. Polifroni, G. Dudek, June 2003 Other attention operators for low level features R. Sim, S. Polifroni, G. Dudek, June 2003

Comparison R. Sim, S. Polifroni, G. Dudek, June 2003 Edge densityRadial symmetry Smallest eigenvalue Caltech saliency

Comparison Landmark learning Training – learn landmarks knowing camera pose Testing - determine pose of camera according to landmarks (pose estimation) R. Sim, S. Polifroni, G. Dudek, June 2003

Comparison - Results All operators better than random Radial symmetry worst results Caltech operator performs similar to edge and eigenvalue operators BUT –More complex to implement –More computing time Less preferred candidate in practice R. Sim, S. Polifroni, G. Dudek, June 2003

The Problem Object recognition 1 23 4 5 6

Biological Motivation An alternative approach: continuous search difficulty Based on similarity: –Between Targets and Non-Targets in the scene –Between Non-Targets and Non-Targets in the scene Similar structural units do not need separate treatment Structural units similar to a possible target get high priority Duncan & Humphreys [89]

Biological Motivation similar not similar search difficulty target- nontarget similarity nontarget- nontarget similarity Duncan & Humphreys [89]

Biological Motivation Explains pop-out vs. serial search phenomenon Non-targets: Target: Duncan & Humphreys [89]

similar not similar search difficulty Biological Motivation Explains pop-out vs. serial search phenomenon Non-targets: Target: Non-targets: Target: target- nontarget similarity nontarget- nontarget similarity Duncan & Humphreys [89]

Using Inner-scene Similarities Every candidate is characterized by a vector of n attributes n-dimentional metric space –A candidate is a point in the space –Some distance function d is associated with the space Avraham & Lindenbaum [04] Avraham & Lindenbaum [05]

Using Inner-scene Similarities Example One feature only: object area d: regular Euclidean distance Feature space

Difficulty of Search The difficulty measure is the number of queries until the first target is found Two main factors –Distance between Targets and Non-Targets –Distance between Non-Targets and Non- Targets Feature space

Cover Difficulty of Search Feature space c: the number of circles in the cover

Difficulty of Search c will be our measure of the search difficulty We need some constraint on the circles’ size! c: the number of circles

d t : max-min target distance Difficulty of Search dtdt

d t -cover diameter d t Difficulty of Search dtdt

Minimum d t -cover c: The number of circles in the minimal d t -cover diameter d t Difficulty of Search dtdt

c: the number of circles Difficulty of Search dtdt c = 7 dtdt dtdt

c: insects example Difficulty of Search dtdt Feature space c = 3

Example: easy search Difficulty of Search dtdt c = 2

Example: hard search Difficulty of Search c = # of candidates dtdt

Define the Difficulty using c Lower bound: Every search algorithm needs c calls to the oracle before finding the first target in the worst case Upper bound: There is an algorithm that will need max. c calls to the oracle to find the first target, for all search tasks Difficulty of Search

Lower bound Every search algorithm needs c calls to the oracle before finding the first target in the worst case Difficulty of Search 1 2 3 4 5 dtdt dtdt dtdt dtdt

Upper bound There is an algorithm that will need max. c calls to the oracle to find the first target, for all search tasks FLNN-Farthest Labeled Nearest Neighbor Difficulty of Search

FLNN Farthest Labeled Nearest Neighbor Efficient Algorithms 1 2 3 4 5 c is a tight bound!

How do we compute c? Difficulty of Search dtdt –Need to know d t –Compute the minimal d t -cover –Count number of circles c=7 dtdt

–Need to know d t –Compute the minimal d t -cover –Count number of circles = c To know the exact d t we need to know all the targets and non-targets, but that’s what we’re looking for… Computing the minimal d t -cover is NP-complete! Ok, that’s easy… Difficulty of Search dtdt How do we compute c?

Upper & Lower Bounds on c Upper bounds: –The number of candidates –Know that d t is larger than some d 0 : Can approximate cover size Lower bounds: –FLNN worst case –Know that d t is larger than some d 0 : Can approximate cover size Difficulty of Search

Improving FLNN What’s wrong with FLNN? –Relates only to the nearest known neighbor –Finds only the first target efficiently –Cannot be easily extended to include top- down information Efficient Algorithms

VSLE Visual Search using Linear Estimation Each candidate has a prob. to be a target Query the candidate with the highest probability Update other candidates’ prob. according to the known results –Every known target/non-target affects other candidates in reverse order to its distance. If we know results for candidates 1,…,m: Dynamic priority map Efficient Algorithms

0.650.4 0.45 0.6 0.5 0.54 0.45 0.51 0.53 0.46 0.58 0.51 0.1 0.4 0.45 0.5 0.56 0.48 0.5 0.56 0.63 0.7 0.68 VSLE Visual Search using Linear Estimation

Efficient Algorithms 0.15 0.45 0.6 0.63 0.45 0.65 0.2 0.25 0.53 0.23 0.55 0.1 0.62 0.15 0.59 0.21 0.27 0.65 VSLE Visual Search using Linear Estimation 0.06 0.45 0.12 0.55 0.18 0.95 0.22 0.28 More

Combining Top-Down Information Simply specify the initial probabilities to match previous known data Add known target objects to the space. This will alter the probabilities accordingly and speed up search Efficient Algorithms

Experiment 1: COIL-100 Efficient Algorithms Columbia Object Image Library [96]

Experiment 1: COIL-100 Features: –1 st, 2 nd, 3 rd gaussian derivatives 9 basis filters –5 scales 9x5 = 45 features Euclidean distance Efficient Algorithms Rao & Ballard [95]

Experiment 1: COIL-100 Efficient Algorithms 10 cars 10 cups # queries

Experiment 2: hand segmented Efficient Algorithms Every large segment is a candidate 24 candidates 4 targets Berkeley hand segmented DB Martin, Fowlkes, Tal & Malik [01]

Experiment 2: hand segmented Features: color histograms and separated into 8 bins each 64 features Euclidean distance Efficient Algorithms

Experiment 3: automatic color segmentation Automatic color segmented image for face detection Efficient Algorithms

Experiment 3: color segmentation 146 candidates 4 features: segment size, mean value of red, green and blue Euclidean distance Efficient Algorithms # queries

Combining top-down information Add known targets to the space Efficient Algorithms Without additional targets With additional targets # queries

Summary: similarity model Saliency model Biologically motivated Uses bottom-up, allows combining top-down information Segmentation Static priority map Similarity model Biologically motivated Uses bottom-up, allows combining top-down information No segmentation Dynamic priority map Measures the search difficulty

Summary What is attention Aid object recognition tasks by choosing the area of interest Two approaches: saliency model and similarity model –Biological motivation –Algorithms

Thank You!

Linearly Estimating l(x k ) A linear estimation for l(x k ): Which, of course, minimizes the error Solving a set of equations gives an estimation:

Linearly Estimating l(x k ) Estimation: Where vector of known labels, and is computed as follows (i,j=1,…,m) : R and r depend only on the distances, computed in advance once

Attention in Computer Vision Mica Arie-Nachimson and Michal Kiwkowitz May 22, 2005 Advanced Topics in Computer Vision Weizmann Institute of Science.

Similar presentations

Presentation on theme: "Attention in Computer Vision Mica Arie-Nachimson and Michal Kiwkowitz May 22, 2005 Advanced Topics in Computer Vision Weizmann Institute of Science."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Attention in Computer Vision Mica Arie-Nachimson and Michal Kiwkowitz May 22, 2005 Advanced Topics in Computer Vision Weizmann Institute of Science.

Similar presentations

Presentation on theme: "Attention in Computer Vision Mica Arie-Nachimson and Michal Kiwkowitz May 22, 2005 Advanced Topics in Computer Vision Weizmann Institute of Science."— Presentation transcript:

Similar presentations

About project

Feedback