Download presentation
Presentation is loading. Please wait.
Published byΝανα Ζέρβας Modified over 6 years ago
1
Attentional Neural Network: Feature Selection Using Cognitive Feedback
Qian Wang1, Sen Song1, Jiaxing Zhang2, Zheng Zhang3 1 Department of Biomedical Engineering, Tsinghua University, Beijing, China 2 Microsoft Research Asia, Beijing, China 3 Department of Computer Science, NYU Shanghai, Shanghai, China Presented by Amir K. Afshar Wayne State University Department of Computer Science 5 February 2018
2
What is an Attentional Neural Network? Motivation and Inspiration
The human visual system is capable of achieving curiously robust performance in the identification and classification of objects and figures. This work proposes a model (Attentional Neural Networks, or aNNs) that attempts to explain, or perhaps gain some insight into the mysterious cognitive processes to make such results possible.
3
What is an Attentional Neural Network
What is an Attentional Neural Network? An Introduction, Part I: The Building Blocks The aNN is a novel architecture which combines two major tasks, namely segmentation and classification. They are composed with a collection of simple modules: 1. A segmentation module* 2. A denoising module 3. A classification module * The construction of the segmentation module is influenced by a “cognitive bias” vector “b,” b ϵ {0, 1}N (to be detailed on the following slide).
4
What is an Attentional Neural Network
What is an Attentional Neural Network? A Brief Introduction, Part II: The Segmentation Module, Continued As aforementioned, the aNN has two primary tasks: the first being to segment the input data, which is the objective of the segmentation module, as the name implies. The ith element of this bias vector contains a prior belief of the membership of a segmented object to class i. As an example, if N = 3, b = (0 1 0) indicates that it is believed that an object y belongs to the second class of objects in. The input image x is then mapped into a feature vector, denoted h, with h = σ (W · x), where W is the feature weight matrix and σ is the sigmoid function. Simultaneously, b generates a gating vector, denoted g, with g = σ (U · b), with feedback weights U and again, σ the sigmoid function. g may then select or deselect features by modifying hg , with hg = h .* g, the element-wise product of h and g. From here the reconstructed segment to be classified is computed by z, with z = σ (W’ · hg). It must be noted as well that b need not be a binary vector; it may instead be pdf containing a mixture of guesses as to which class y may belong. For the sake of simplicity, only two (simpler scenarios) were considered: b is a binary vector indicating whether there is a particular class of objects associated with its weights, UG, or A universal (group) bias bG with equal weights for all classes, indicating the certain presence of an object (but of no particular class). The segmentation model (with cognitive bias vector b), denoted by M Note that this diagram shows y, not z. This will be clarified on the next slide.
5
What is an Attentional Neural Network
What is an Attentional Neural Network? A Brief Introduction, Part III: Segmentation to Classification The second primary task of an aNN is classification. Denoising is an intermediate step and not nearly as critical (refer to slide 10). Here, it would seem natural to feed y into a classifier C (depicted). A critical issue with this, however, is the proneness to misclassification altogether, due to loss of details during the segmentation process. For example, suppose that during the reconstruction of y, M was given the wrong bias vector b?! As a precautionary measure for such a potential mishap, the reconstructed segment y was used to gate the image raw x with a threshold, ε (that is, y MUST exceed this threshold), in order to produce the gated image z (as in the previous slide) with z = (y > ε) .* x for classification. Figure 1 Figure 2 ( .* above indicates element-wise multiplication) Figure 1 illustrates the aNN framework discussed until now. Figure 2 illustrates the same framework in principle, but extended to an iterative design (reminiscent of a RNN) to handle more complex segmentation problems. The red circles in the figures above indicate the denoising modules (Wang, Song, Zhang & Zhang, 2)
6
aNN Classification: Some More Details
In the case of iterative classification (Figure 2 of the previous slide), the system may be given an initial cognitive bias. Subsequently, a series of guesses b and classification results C will be produced. Should the bias b agree with result C, then b is to be considered a candidate for the final classification result. If this is not the case, i.e., b is chosen incorrectly, the raw image x may be transformed incorrectly, but those segments with correct biases will often be better than the transformed images under the wrong bias altogether.
7
Some Notes and Details on Training Attentional Neural Networks
A shallow Restricted Boltzmann Machine was used for the generative model, and, when compared with autoencoders, qualitatively-similar results were achieved. The feature and feedback weights W and U, respectively, were difficult to learn simultaneously, due to their multiplicative nature. To make the training more feasible, first, a feedback-disabled RBM was trained with noisy data to learn W. Next, fixing W, U was trained on noisy data with clean target data using backpropagation. This process constrained U to learn to include features of relevance and discard those that were not.
8
Results and Analysis: The Data and Initial Methods
The MNIST and MNIST-2 datasets were used to evaluate effectiveness. “MNIST-2” is a dataset (exclusive to this paper) composed by laying two randomly chosen MNIST digits on top of each other. A 3-layer perceptron with 256 hidden nodes was trained on clean MNIST data, yielding a 1.6% error rate.
9
Results and Analysis Continued
Under the assumption that feature selection is sensitive to the choice of the cognitive bias (and it is), then any given b should result in the activation of the corresponding relevant features. Here, the hidden units are hidden by the associated weights in U for a given bias from the set {0, 1, 2, 8} are and inspected their associated feature weights in W. The top features, when superimposed, do compose a rough version of the digit of interest. The “Top features” image illustrates the most popular features selected by different cognitive biases, namely b = 0, b = 1, b = 2, b = 8, and their accumulations. The “Reconstruction” image illustrates the effects of: 1. Denoising without bias 2. Denoising with group bias 3. Denoising with correct bias, and 4. Denoising with wrong bias The “Feature selection” image illustrates how the cognitive bias selects and eliminates features. Feature selection
10
Results and Analysis Continued 2
11
Questions or Comments?
12
Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.