Feature fusion and attention scheme

Slides:

Advertisements

Similar presentations

Face Recognition and Biometric Systems Eigenfaces (2)

Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Lecture 14 – Neural Networks

Radial Basis Functions

Un Supervised Learning & Self Organizing Maps Learning From Examples

Hazırlayan NEURAL NETWORKS Radial Basis Function Networks I PROF. DR. YUSUF OYSAL.

Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.

Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.

Radial Basis Function Networks:

CSC 2535 Lecture 8 Products of Experts Geoffrey Hinton.

381 Self Organization Map Learning without Examples.

CHAPTER 14 Competitive Networks Ming-Feng Yeh.

Convolutional LSTM Networks for Subcellular Localization of Proteins

CSC321: Introduction to Neural Networks and Machine Learning Lecture 15: Mixtures of Experts Geoffrey Hinton.

Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!

Lecture 3b: CNN: Advanced Layers

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Ensemble Classifiers.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Attention Model in NLP Jichuan ZENG.

Machine Learning Supervised Learning Classification and Regression

Deep Residual Learning for Image Recognition

Convolutional Sequence to Sequence Learning

RNNs: An example applied to the prediction task

Chapter 7. Classification and Prediction

Hierarchical Question-Image Co-Attention for Visual Question Answering

CS 388: Natural Language Processing: LSTM Recurrent Neural Networks

CSC2535: Computation in Neural Networks Lecture 11 Extracting coherent properties by maximizing mutual information across space or time Geoffrey Hinton.

Deep Feedforward Networks

Computer Science and Engineering, Seoul National University

Attention Is All You Need

Combining CNN with RNN for scene labeling (segmentation)

Intelligent Information System Lab

Mini Presentations - part 2

Supervised Training of Deep Networks

Multiple Wavelet Coefficients Fusion in Deep Residual Networks for Fault Diagnosis

Convolutional Networks

Random walk initialization for training very deep feedforward networks

Attention Is All You Need

RNNs: Going Beyond the SRN in Language Prediction

Convolutional Neural Networks for sentence classification

Hidden Markov Models Part 2: Algorithms

Neuro-Computing Lecture 4 Radial Basis Function Network

Tips for Training Deep Network

Chap. 7 Regularization for Deep Learning (7.8~7.12 )

Neural Networks Geoff Hulten.

Other Classification Models: Recurrent Neural Network (RNN)

Use 3D Convolutional Neural Network to Inspect Solder Ball Defects

Neural Speech Synthesis with Transformer Network

Yi Zhao1, Yanyan Shen*1, Yanmin Zhu1, Junjie Yao2

Analysis of Trained CNN (Receptive Field & Weights of Network)

RCNN, Fast-RCNN, Faster-RCNN

RNNs: Going Beyond the SRN in Language Prediction

Mihir Patel and Nikhil Sardana

Inception-v4, Inception-ResNet and the Impact of

Attention for translation

Automatic Handwriting Generation

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

A unified extension of lstm to deep network

Batch Normalization.

Neural Machine Translation using CNN

Recurrent Neural Networks

Sequence-to-Sequence Models

Week 7 Presentation Ngoc Ta Aidean Sharghi

LHC beam mode classification

Visual Grounding.

A Neural Passage Model for Ad-hoc Document Retrieval

Shengcong Chen, Changxing Ding, Minfeng Liu 2018

Presentation transcript:

Feature fusion and attention scheme

Average/Max A very simple and naive approach is to get the average of input vectors. Vout = avg(v1,v2,...,vn) Or use max function on each vector dimension: Vout,k = max(v1,k,v2,k,...,vn,k) This approach is usually non-optimal vector fusion since the weightage for each vector cannot be the same in most of the cases.

Feature concatenate Instead of using simple averaging/max scheme, it is possible to apply a vector fusion layer on a concatenated feature. Y = MLP(concat(x1,x2,...,xn)) Since the MLP receives fixed dimension of input, this fusion scheme only accepts fixed input size. Concatenated inputs Result vector

LSTM fusion As RNN module (or RNN-like module) is able to obtain recurrent information of inputs, an RNN-like structure can also be utilized for feature merging. Simple LSTM fusion. The 'Average' block can be replaed by other functioanl part such as softmax or Norm-Avarage block.

Scaled dot product attention Dot product attention is more straight forward. The weightage is obtained by the inner production of vector itself. Note that XTX is equal to Euclidean length of X, thus SDPA is a weightage policy based on embedded vector length. This is a simple approch for feature merging.

Attention mechanism Attention receives 3 vector inputs: Q (query), K(key) and V(value) Typically, attention model produce attention map A (weight) based on Q and K, then A is applied on V to get final results. Attention map A is usually normalized by softmax function, although there are other options such as L1 norm. Attention mask is optinally added before normalization. Such as Top-k selection or other mask generation function/net. QKV attention block

Self-attention (fully connected) The Q,K,V are produced by seperated MLP. Which is: Q = MLP1(x) K = MLP2(x) V = MLP3(x) Y = Attention(Q,K,V) Furthermore, a short cut between Y and x can be established, similar to the shortcut in ResNet blocks. Residual shortcut (opt.) x

Self-attention (convolutional) Q = conv1(x) K = conv2(x) V = conv3(x) Q is in shape: [h*w, chatt] K is in shape: [h*w, chatt] V is in shape: [h*w, chx] Weight = softmax(QKT,axis=-1) Weight is in shape: [w*h, w*h] Residual shortcut (opt.) x

Self-attention: discussion Self attention can be used as one network component as it receives a layer output and produce a new feature. This netpart is useful for single image structure, since the Q,K,V are all deducted from image itself. Moreover, the convolutional self-attention utilizes the global information to derive the attention map for convolutional layers, which are only contain local information.

General-query attention The vectors extracted from images have different reliability, such as image quality, feature variance, etc. Therefore, a general query vector Q can be independently trained and remain K and V to be related to X. The query vector tends to test the reliability of input feature vector x. X: [dim, #feature] Q: [1,dim] x

General-query attention The general-query attention can serve as vector fusion block. Furthermore, we can stack two of such attention blocks, as the first one to extract the general Query vector based on input features and the second one acts as the merging block. Feature Block1 Block2 tanh Feature

Iterative fusion Merely using attention block may be biased because fixed query vector will not always give the correct weightage prediction. We assume that the fused feature reaches a maximum response based on an optimum weight {wk}, then we can iteratively find the feature center. Initialize: wk = 1/k for i=1:maxiter do result = sum(wkfk) wk = softmax(result · fk) end for return {wk}

Ensemble two fusion schemes We can further train an ensemble approach the two fusion schemes using a variable α. w1 = αwg + (1-α)wi w2 = βwg + (1-β)wi where α is coefficient for weight combination, and β is a random number in interval (0,1) We train α using w1 and train network using w2 in training stage, and test with w1.