Download presentation
Presentation is loading. Please wait.
2
Attention in Computer Vision Mica Arie-Nachimson and Michal Kiwkowitz May 22, 2005 Advanced Topics in Computer Vision Weizmann Institute of Science
3
Problem definition – Search Order Object recognition NO Vision applications apply “expensive” algorithms (e.g. recognition) to image patches Mostly naïve selection of patches Selection of patches determines number of calls to “expensive” algorithm
4
Problem Definition - Search Order Object recognition NO YES More sophisticated selection of patches would imply less calls to “expensive” algorithm Attention used to efficiently focus on incoming data (better use for limited processing capacity)
5
Problem Definition - Search Order Object recognition 1 23 4 5 6
6
Outline What is Attention Attention in Object Recognition Saliency Model Feature Integration Theory Saliency Algorithm Saliency & Object Recognition Comparison Inner Scene Similarity Model Biological motivation Difficulty of Search Tasks Algorithms FLNN VSLE
7
Outline What is Attention Attention in Object Recognition Saliency Model Feature Integration Theory Saliency Algorithm Saliency & Object Recognition Comparison Inner Scene Similarity Model Biological motivation Difficulty of Search Tasks Algorithms FLNN VSLE
8
Attention Attention implies allocating resources, perceptual or cognitive, to some things at the expense of not allocating them to something else.
9
What is Attention You are sitting in class listening to a lecture. Two people behind you are talking. –Can you hear the lecture? One of them mentions the name of a friend of yours. –How did you know?
10
Attention in Other Applications Face Detection (feature selection) Video Analysis (temporal block selection) Robot Navigation (select locations) …
11
Attention is Directed by: Bottom-up: From small to large units of meaning Rapid Task-independent
12
Attention is Directed by: Top-down: Use higher levels (context, expectation) to process incoming information (Guess) Slower Task dependent http://www.rybak-et-al.net/nisms.html
13
Outline What is Attention Attention in Object Recognition Saliency Model Feature Integration Theory Saliency Algorithm Saliency & Object Recognition Comparison Inner Scene Similarity Model Biological motivation Difficulty of Search Tasks Algorithms FLNN VSLE
14
When is information selected (filtered)? –Early selection (Broadbent, 1958) –Cocktail party phenomenon (Moray, 1959) –Late selection (Treisman, 1960) - attenuation All information is sent to perceptual systems for processing Some is selected for complete processing Some is more likely to be selected Attention WHICH?
15
Parallel Search Is there a green O ? + A. Treisman, G. Gelade, 1980
16
Conjunction Search Is there a green N ? + A. Treisman, G. Gelade, 1980
17
Results A. Treisman, G. Gelade, 1980
18
Conjunction Search + A. Treisman, G. Gelade, 1980
19
Color map Orientation map A. Treisman, G. Gelade, 1980
20
Color mapOrientation map A. Treisman, G. Gelade, 1980
21
Conjunction Search + A. Treisman, G. Gelade, 1980
22
Primitives P PP P P P Intensity P PP P P P Orientation P PP P P P Color x x x x s x Curvature I I I I I Line End Movement x x x x x x
23
Feature Integration Theory Attention - two stages: Attention Serial Processing Localized Focus Slower Conjunctive search Pre-attention Parallel Processing Low Level Features Fast Parallel Search How is the Focus found & shifted? A. Treisman, G. Gelade, 1980
24
Outline What is Attention Attention in Object Recognition Saliency Model Feature Integration Theory Saliency Algorithm Saliency & Object Recognition Comparison Inner Scene Similarity Model Biological motivation Difficulty of Search Tasks Algorithms FLNN VSLE
25
Shifts in Attention “Shifts in selective visual attention: towards the underlying neural circuitry”, Christof Koch, and Shimon Ullman, 1985 C. Koch, and S. Ullman, 1985 Feature Maps Orientation Color Curvature Line end Movement Feature Maps Orientation Color Curvature Line end Movement Feature Maps Orientation Color Curvature Line end Movement Feature Maps Orientation Color Curvature Line end Movement Feature Maps Orientation Color Curvature Line end Movement Central Representation Attention Saliency
26
“A model of saliency-based visual attention for rapid scene analysis” Laurent Itti, Christof Koch, and Ernst Niebur, 1998 L. Itti, C. Koch, and E. Niebur, 1998 Salient - stands out Example – telephone & road sign have high saliency
27
from C. Koch L. Itti, C. Koch, and E. Niebur, 1998
28
Intensity L. Itti, C. Koch, and E. Niebur, 1998 Cells in the retina
29
012 Intensity Create 8 spatial scale using Gaussian pyramids 8 L. Itti, C. Koch, and E. Niebur, 1998
30
Intensity Center-Surround difference operator -Sensitive to local spatial discontinuities -Principle computation in the retina & primary visual cortex -Subtract coarse scale from fine scale + - Fine scale Coarse scale L. Itti, C. Koch, and E. Niebur, 1998 + - fine coarse
31
Toy Example 000 000 000 000 02550 000 Fine levelCoarse level Gauss Pyramid Interpolation Coarse level Point-by-point subtraction 000 02550 000
32
Toy Example 255 Fine level Coarse level Gauss Pyramid Interpolation Coarse level Point-by-point subtraction 000 000 000
33
Intensity Compute: 6 Intensity maps Different ratios – multiscale feature extraction L. Itti, C. Koch, and E. Niebur, 1998
34
Color Same c and s as with intensity 12 Color maps Kandel et al. (2000). Principles of Neural Science. McGraw-Hill/Appleton & Lange L. Itti, C. Koch, and E. Niebur, 1998 More
35
Color - More Same c and s as with intensity 12 Color maps Kandel et al. (2000). Principles of Neural Science. McGraw-Hill/Appleton & Lange L. Itti, C. Koch, and E. Niebur, 1998
36
Orientation Same c and s as with intensity 24 Orientation maps From Visual system presentation by S. Ullman L. Itti, C. Koch, and E. Niebur, 1998 More
37
Reprinted from “Shiftable MultiScale Transforms,” by Simoncelli et al., IEEE Transactions on Information Theory, 1992, copyright 1992, IEEE Orientation – Gabor pyramids
38
from C. Koch L. Itti, C. Koch, and E. Niebur, 1998
39
More Normalization Operator L. Itti, C. Koch, and E. Niebur, 1998
40
Normalization -Normalize to fixed range -Find global maximum -Compute average over all other points -Multiply map by L. Itti, C. Koch, and E. Niebur, 1998
41
Saliency Map L. Itti, C. Koch, and E. Niebur, 1998
42
Conspicuity Maps
43
1. Extract Feature Maps Algorithm- up to now 2. Compute Center- Surround (42) Intensity – I(6) Color – C(12) Orientation – O(24) 3. Combine each channel into conspicuity map 4. Compute saliency by summing and normalizing maps
44
Laurent Itti, Christof Koch, and Ernst Niebur, 1998
45
Leaky integrate-and-fire neurons “Inhibition of return” Winner Takes All Selection (FOA) L. Itti, C. Koch, and E. Niebur, 1998 FOA – Focus Of Attention
46
Results FOA shifts: 30-70 ms Inhibition: 500-900 ms Inhibition of return ends L. Itti, C. Koch, and E. Niebur, 1998
47
Results Spatial Frequency Content, Reinage & Zador, 1997 Image SFC Saliency Output L. Itti, C. Koch, and E. Niebur, 1998
48
Results (a) (b) (c)(d) Image SFC Saliency Output L. Itti, C. Koch, and E. Niebur, 1998 Spatial Frequency Content, Reinage & Zador, 1997
49
Outline What is Attention Attention in Object Recognition Saliency Model Feature Integration Theory Saliency Algorithm Saliency & Object Recognition Comparison Inner Scene Similarity Model Biological motivation Difficulty of Search Tasks Algorithms FLNN VSLE
50
Attention & Object Recognition “Is bottom-up attention useful for object recognition?” –U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004 U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004 Computer recognition Human recognition segmentedCluttered scenes labeledNon labeled Attention
51
Object Recognition saliency model U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004 Growing region in strongest map To Object Recognition (Lowe) More
52
Attention & Object Recognition U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004 Added selection of image region: 1.Strongest contribution map 2.Segment “winning” map 3.create a mask M that modulates contrast in original image
53
Attention & Object Recognition Learning inventories – “grocery cart problem” U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004 Real world scenes 1 image for training (15 fixations) 2-5 images for testing (20 fixations)
54
testing training Object recognition Match
55
“Grocery Cart” Problem U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004 trainingtesting1 testing2
56
“Grocery Cart” Problem Downsides: Bias of human photography Small image set U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004 Solution Robot as acquisition tool
57
Robot - Landmark Learning Objective – how many objects are found and classified correctly? Navigation – simple obstacle avoiding algorithm using infrared sensors U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004
58
Landmark Learning Algorithm: 1.Extract most salient location 2.Has 3 “ key points”? No – back to 1 3.Test patch with all known object models Match – increase object count. No match – learn as new object. U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004
59
Object recognition < 3 key points
60
Landmark Learning With Attention U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004
61
Landmark Learning With Random Selection U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004
62
Landmark Learning - Results U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004
63
Saliency Based Object Recognition Biologically motivated Uses bottom-up, allows combining top-down information Segmentation –Cluttered scenes –Unlabeled objects –Multiple objects in single image Static priority map U. Rutishauser, D. Walther, C. Koch and P. Perona, 2004
64
Outline What is Attention Attention in Object Recognition Saliency Model Feature Integration Theory Saliency Algorithm Saliency & Object Recognition Comparison Inner Scene Similarity Model Biological motivation Difficulty of Search Tasks Algorithms FLNN VSLE
65
Comparison “Comparing attention operators for learning landmarks”, R. Sim, S. Polifroni, G. Dudek, June 2003 Other attention operators for low level features R. Sim, S. Polifroni, G. Dudek, June 2003
66
Comparison R. Sim, S. Polifroni, G. Dudek, June 2003 Edge densityRadial symmetry Smallest eigenvalue Caltech saliency
67
Comparison Landmark learning Training – learn landmarks knowing camera pose Testing - determine pose of camera according to landmarks (pose estimation) R. Sim, S. Polifroni, G. Dudek, June 2003
68
Comparison - Results All operators better than random Radial symmetry worst results Caltech operator performs similar to edge and eigenvalue operators BUT –More complex to implement –More computing time Less preferred candidate in practice R. Sim, S. Polifroni, G. Dudek, June 2003
69
Outline What is Attention Attention in Object Recognition Saliency Model Feature Integration Theory Saliency Algorithm Saliency & Object Recognition Comparison Inner Scene Similarity Model Biological motivation Difficulty of Search Tasks Algorithms FLNN VSLE
70
The Problem Object recognition 1 23 4 5 6
71
Outline What is Attention Attention in Object Recognition Saliency Model Feature Integration Theory Saliency Algorithm Saliency & Object Recognition Comparison Inner Scene Similarity Model Biological motivation Difficulty of Search Tasks Algorithms FLNN VSLE
72
Biological Motivation An alternative approach: continuous search difficulty Based on similarity: –Between Targets and Non-Targets in the scene –Between Non-Targets and Non-Targets in the scene Similar structural units do not need separate treatment Structural units similar to a possible target get high priority Duncan & Humphreys [89]
73
Biological Motivation similar not similar search difficulty target- nontarget similarity nontarget- nontarget similarity Duncan & Humphreys [89]
74
Biological Motivation Explains pop-out vs. serial search phenomenon Non-targets: Target: Duncan & Humphreys [89]
75
Biological Motivation Explains pop-out vs. serial search phenomenon Non-targets: Target: Duncan & Humphreys [89]
76
similar not similar search difficulty Biological Motivation Explains pop-out vs. serial search phenomenon Non-targets: Target: Non-targets: Target: target- nontarget similarity nontarget- nontarget similarity Duncan & Humphreys [89]
77
Using Inner-scene Similarities Every candidate is characterized by a vector of n attributes n-dimentional metric space –A candidate is a point in the space –Some distance function d is associated with the space Avraham & Lindenbaum [04] Avraham & Lindenbaum [05]
78
Using Inner-scene Similarities Example One feature only: object area d: regular Euclidean distance Feature space
79
Outline What is Attention Attention in Object Recognition Saliency Model Feature Integration Theory Saliency Algorithm Saliency & Object Recognition Comparison Inner Scene Similarity Model Biological motivation Difficulty of Search Tasks Algorithms FLNN VSLE
80
Difficulty of Search The difficulty measure is the number of queries until the first target is found Two main factors –Distance between Targets and Non-Targets –Distance between Non-Targets and Non- Targets Feature space
81
Cover Difficulty of Search Feature space c: the number of circles in the cover
82
Difficulty of Search c will be our measure of the search difficulty We need some constraint on the circles’ size! c: the number of circles
83
d t : max-min target distance Difficulty of Search dtdt
84
d t -cover diameter d t Difficulty of Search dtdt
85
Minimum d t -cover c: The number of circles in the minimal d t -cover diameter d t Difficulty of Search dtdt
86
c: the number of circles Difficulty of Search dtdt c = 7 dtdt dtdt
87
c: insects example Difficulty of Search dtdt Feature space c = 3
88
Example: easy search Difficulty of Search dtdt c = 2
89
Example: hard search Difficulty of Search c = # of candidates dtdt
90
Define the Difficulty using c Lower bound: Every search algorithm needs c calls to the oracle before finding the first target in the worst case Upper bound: There is an algorithm that will need max. c calls to the oracle to find the first target, for all search tasks Difficulty of Search
91
Lower bound Every search algorithm needs c calls to the oracle before finding the first target in the worst case Difficulty of Search 1 2 3 4 5 dtdt dtdt dtdt dtdt
92
Upper bound There is an algorithm that will need max. c calls to the oracle to find the first target, for all search tasks FLNN-Farthest Labeled Nearest Neighbor Difficulty of Search
93
Outline What is Attention Attention in Object Recognition Saliency Model Feature Integration Theory Saliency Algorithm Saliency & Object Recognition Comparison Inner Scene Similarity Model Biological motivation Difficulty of Search Tasks Algorithms FLNN VSLE
94
FLNN Farthest Labeled Nearest Neighbor Efficient Algorithms 1 2 3 4 5 c is a tight bound!
95
How do we compute c? Difficulty of Search dtdt –Need to know d t –Compute the minimal d t -cover –Count number of circles c=7 dtdt
96
–Need to know d t –Compute the minimal d t -cover –Count number of circles = c To know the exact d t we need to know all the targets and non-targets, but that’s what we’re looking for… Computing the minimal d t -cover is NP-complete! Ok, that’s easy… Difficulty of Search dtdt How do we compute c?
97
Upper & Lower Bounds on c Upper bounds: –The number of candidates –Know that d t is larger than some d 0 : Can approximate cover size Lower bounds: –FLNN worst case –Know that d t is larger than some d 0 : Can approximate cover size Difficulty of Search
98
Outline What is Attention Attention in Object Recognition Saliency Model Feature Integration Theory Saliency Algorithm Saliency & Object Recognition Comparison Inner Scene Similarity Model Biological motivation Difficulty of Search Tasks Algorithms FLNN VSLE
99
Improving FLNN What’s wrong with FLNN? –Relates only to the nearest known neighbor –Finds only the first target efficiently –Cannot be easily extended to include top- down information Efficient Algorithms
100
VSLE Visual Search using Linear Estimation Each candidate has a prob. to be a target Query the candidate with the highest probability Update other candidates’ prob. according to the known results –Every known target/non-target affects other candidates in reverse order to its distance. If we know results for candidates 1,…,m: Dynamic priority map Efficient Algorithms
101
0.650.4 0.45 0.6 0.5 0.54 0.45 0.51 0.53 0.46 0.58 0.51 0.1 0.4 0.45 0.5 0.56 0.48 0.5 0.56 0.63 0.7 0.68 VSLE Visual Search using Linear Estimation
102
Efficient Algorithms 0.15 0.45 0.6 0.63 0.45 0.65 0.2 0.25 0.53 0.23 0.55 0.1 0.62 0.15 0.59 0.21 0.27 0.65 VSLE Visual Search using Linear Estimation 0.06 0.45 0.12 0.55 0.18 0.95 0.22 0.28 More
103
Combining Top-Down Information Simply specify the initial probabilities to match previous known data Add known target objects to the space. This will alter the probabilities accordingly and speed up search Efficient Algorithms
104
Experiment 1: COIL-100 Efficient Algorithms Columbia Object Image Library [96]
105
Experiment 1: COIL-100 Features: –1 st, 2 nd, 3 rd gaussian derivatives 9 basis filters –5 scales 9x5 = 45 features Euclidean distance Efficient Algorithms Rao & Ballard [95]
106
Experiment 1: COIL-100 Efficient Algorithms 10 cars 10 cups # queries
107
Experiment 2: hand segmented Efficient Algorithms Every large segment is a candidate 24 candidates 4 targets Berkeley hand segmented DB Martin, Fowlkes, Tal & Malik [01]
108
Experiment 2: hand segmented Features: color histograms and separated into 8 bins each 64 features Euclidean distance Efficient Algorithms
109
Experiment 3: automatic color segmentation Automatic color segmented image for face detection Efficient Algorithms
110
Experiment 3: color segmentation 146 candidates 4 features: segment size, mean value of red, green and blue Euclidean distance Efficient Algorithms # queries
111
Combining top-down information Add known targets to the space Efficient Algorithms Without additional targets With additional targets # queries
112
Summary: similarity model Saliency model Biologically motivated Uses bottom-up, allows combining top-down information Segmentation Static priority map Similarity model Biologically motivated Uses bottom-up, allows combining top-down information No segmentation Dynamic priority map Measures the search difficulty
113
Summary What is attention Aid object recognition tasks by choosing the area of interest Two approaches: saliency model and similarity model –Biological motivation –Algorithms
114
Thank You!
115
Linearly Estimating l(x k ) A linear estimation for l(x k ): Which, of course, minimizes the error Solving a set of equations gives an estimation:
116
Linearly Estimating l(x k ) Estimation: Where vector of known labels, and is computed as follows (i,j=1,…,m) : R and r depend only on the distances, computed in advance once
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.