黃文中
Introduction The Model Results Conclusion 2
Introduction Introduction The Model Results Conclusion 3
4
Many visual processes are expensive Humans don’t process the whole visual field How do we decide what to process? How can we use insights about this to make machine vision more efficient? 5
Salience ~ visual prominence Must be cheap to calculate Related to features that we collect from very early stages of visual processing Colour, orientation, intensity change and motion are all important indicators of salience 6
The Saliency Map is a topographically arranged map that represents visual saliency of a corresponding visual scene. 7
Two kinds of stimuli type: Bottom-up Depend only on the instantaneous sensory input Without taking into account the internal state of the organism Top-down Take into account the internal state Such as goals the organisms has at this time, personal history and experiences, etc 8
Introduction The Model The Model Results Conclusion 9
10
Extraction extract feature vectors at locations over the image plane Activation form an "activation map" (or maps) using the feature vectors Normalization / Combination normalize the activation map (or maps, followed by a combination of the maps into a single map) 11
Nine spatial scales are created using dyadic Gaussian pyramids. Each features is computed by a set of linear “center- surround” operations akin to visual receptive fields. Normalization Across-scale combination into three “conspicuity maps.” Linear combinations to create saliency map. Winner-take-all 12
The original image is decomposed into sets of lowpass and bandpass components via Gaussian and Laplacian pyramids. The Gaussian pyramid consists of lowpass filtered (LPF). The Laplacian pyramid consists of bandpass filtered (BPF). 13
14 W
Intensity image: Color channels: Local orientation information: Obtained from using oriented Gabor pyramids 15
16
Nine spatial scales are created using dyadic Gaussian pyramids. Each features is computed by a set of linear “center-surround” operations akin to visual receptive fields. Normalization Across-scale combination into three “conspicuity maps.” Linear combinations to create saliency map. Winner-take-all 17
is obtained by interpolation to the finer scale and point-by-point substraction. Intensity contrast: Color double-opponent: Orientation feature maps: 18
Nine spatial scales are created using dyadic Gaussian pyramids. Each features is computed by a set of linear “center- surround” operations akin to visual receptive fields. Normalization Across-scale combination into three “conspicuity maps.” Linear combinations to create saliency map. Winner-take-all 19
Map normalization operator: 20
1) Normalizing the values in the map to a fixed range [0..M], in order to eliminate modality- dependent amplitude differences 2) Finding the location of the map’s global maximum M and computing the average m of all its other local maxima 3) Globally multiplying the map by. 21
The method is called the global non-linear normalization. Pros: 1) Computationally very simple. 2) Easily allows for real-time implementation because it is non-iterative. Cons: 1) This strategy is not very biologically plausible, since global computations are used. 2) Not robust to noise, when noise can be stronger than the signal. 22
Non-classical surround inhibition Interactions within each individual feature map rather than between maps Inhibition appears strongest at a particular distance from the center, and weakens both with shorter and longer distances. The structure of non-classical interactions can be coarsely modeled by a two-dimensional difference-of- Gaussians(DoG) connection pattern. 23
24
25
26
Nine spatial scales are created using dyadic Gaussian pyramids. Each features is computed by a set of linear “center- surround” operations akin to visual receptive fields. Normalization Across-scale combination into three “conspicuity maps.” Linear combinations to create saliency map. Winner-take-all 27
“ ”, which consists of reduction of each map to scale 4 and point-by-point addition: 28
Nine spatial scales are created using dyadic Gaussian pyramids. Each features is computed by a set of linear “center- surround” operations akin to visual receptive fields. Normalization Across-scale combination into three “conspicuity maps.” Linear combinations to create saliency map. Winner-take-all 29
The three conspicuity maps are normalized and summed into the final input S to the saliency map: The weights of each channel is tunable. 30
Nine spatial scales are created using dyadic Gaussian pyramids. Each features is computed by a set of linear “center- surround” operations akin to visual receptive fields. Normalization Across-scale combination into three “conspicuity maps.” Linear combinations to create saliency map. Winner-take-all 31
At any given time, only one location is selected from the early representation and copied into the central representation. 32
1) The FOA is shifted to the location of the winner neuron. 2) The global inhibition of the WTA is triggered and completely inhibits (resets) all WTA neurons. 3) Local inhibition is transiently activated in the SM, in an area with the size and new location of the FOA. 33
Introduction The Model Results Results Conclusion 34
35
36
37
Introduction The Model Results Conclusion Conclusion 38
Have proposed a conceptually simple computational model for saliency-driven focal visual attention. The framework can consequently be easily tailored to arbitrary tasks through the implementation of dedicated feature maps. 39
L. Itti, C. Koch, E. Niebur, “A Model of Saliency-Based Visual Attention for Rapid Scene Analysis”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, No. 11, pp , Nov L. Itti, C. Koch, A saliency-based search mechanism for overt and covert shifts of visual attention, Vision Research, Vol. 40, No , pp , May H. Greenspan, S. Belongie, R. Goodman, P. Perona, S. Rakshit, and C.H. Anderson, “Overcomplete Steerable Pyramid Filters and Rotation Invariance,” Proc. IEEE Computer Vision and Pattern Recognition, pp , Seattle, Wash., June