Attention is not Explanation NAACL’19 Sarthak Jain, Byron C. Wallace Northeastern University 2018-207/647,31%, 2019-424/1198,22.6% 5-5-5, 5-5-3
Background Attention Mechanism
Background-Attention Given sequence h and query Q Calculate attention distribution Additive function Scaled dot-product function Get attention vector:
Question Is the attention mechanism really get the semantic attention?
Is the attention provide transparency? Do attention weights correlate with measures of feature importance? Would alternative attention weights necessarily yield different predictions?
Experiment Model y h dense layer encoder (BiRNN) attention h embedding one hot h Q
Dataset
Correlation with Feature Importance Gradient based measure Leave one feature out
Result for Correlation Orange=>Positive, Purple=>Negative O,P,G=>Neutral, Contradiction, Entailment Gradients
Result for Correlation Leave One Out
Statistically Significant
Random Attention Weights
Result for Random Permutation Orange=>Positive, Purple=>Negative O,P,G=>Neutral, Contradiction, Entailment
Adversarial Attention Optimize a relaxed version with Adam SGD
Result for Adversarial Attention 0.69
Conclusion correlation between feature importance measures and learned attention weights is weak counterfactual attentions often have no effect on model output limitations only consider a handful of attention variants only evaluate tasks with unstructured output spaces (no seq2seq)
Adversarial Heatmaps Example
Adversarial Heatmaps Example