Rationalizing Neural Predictions

Slides:



Advertisements
Similar presentations
Neural Networks and Kernel Methods
Advertisements

Suleyman Cetintas 1, Monica Rogati 2, Luo Si 1, Yi Fang 1 Identifying Similar People in Professional Social Networks with Discriminative Probabilistic.
An Introduction of Support Vector Machine
Mixture of trees model: Face Detection, Pose Estimation and Landmark Localization Presenter: Zhang Li.
Pattern Recognition and Machine Learning
Distributed Representations of Sentences and Documents
Scalable Text Mining with Sparse Generative Models
Introduction to Machine Learning Approach Lecture 5.
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
Stochastic k-Neighborhood Selection for Supervised and Unsupervised Learning University of Toronto Machine Learning Seminar Feb 21, 2013 Kevin SwerskyIlya.
A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA
TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.
LexPageRank: Prestige in Multi- Document Text Summarization Gunes Erkan and Dragomir R. Radev Department of EECS, School of Information University of Michigan.
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Data Mining and Decision Support
LexPageRank: Prestige in Multi-Document Text Summarization Gunes Erkan, Dragomir R. Radev (EMNLP 2004)
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
Reinforcement Learning for Mapping Instructions to Actions S.R.K. Branavan, Harr Chen, Luke S. Zettlemoyer, Regina Barzilay Computer Science and Artificial.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)
DeepWalk: Online Learning of Social Representations
Sparse Coding: A Deep Learning using Unlabeled Data for High - Level Representation Dr.G.M.Nasira R. Vidya R. P. Jaia Priyankka.
Attention Model in NLP Jichuan ZENG.
Convolutional Sequence to Sequence Learning
Learning to Compare Image Patches via Convolutional Neural Networks
End-To-End Memory Networks
CS 388: Natural Language Processing: LSTM Recurrent Neural Networks
Deep Learning Amin Sobhani.
Sentence Modeling Representation of sentences is the heart of Natural Language Processing A sentence model is a representation and analysis of semantic.
LECTURE 11: Advanced Discriminant Analysis
Deep Compositional Cross-modal Learning to Rank via Local-Global Alignment Xinyang Jiang, Fei Wu, Xi Li, Zhou Zhao, Weiming Lu, Siliang Tang, Yueting.
Adversarial Learning for Neural Dialogue Generation
A Hierarchical Model of Reviews for Aspect-based Sentiment Analysis
Wei Wei, PhD, Zhanglong Ji, PhD, Lucila Ohno-Machado, MD, PhD
Multimodal Learning with Deep Boltzmann Machines
Compositional Human Pose Regression
Joint Training for Pivot-based Neural Machine Translation
Intelligent Information System Lab
Generating Natural Answers by Incorporating Copying and Retrieving Mechanisms in Sequence-to-Sequence Learning Shizhu He, Cao liu, Kang Liu and Jun Zhao.
Hybrid computing using a neural network with dynamic external memory
Adversarially Tuned Scene Generation
Learning with information of features
REMOTE SENSING Multispectral Image Classification
Weakly Learning to Match Experts in Online Community
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
Papers 15/08.
Unsupervised Pretraining for Semantic Parsing
Natural Language to SQL(nl2sql)
Attention.
Using Multilingual Neural Re-ranking Models for Low Resource Target Languages in Cross-lingual Document Detection Using Multilingual Neural Re-ranking.
Discriminative Probabilistic Models for Relational Data
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Attention for translation
Reuben Feinman Research advised by Brenden Lake
Jointly Generating Captions to Aid Visual Question Answering
Human-object interaction
Presented by: Anurag Paul
Modeling IDS using hybrid intelligent systems
Learning and Memorization
Presented By: Harshul Gupta
Week 3 Presentation Ngoc Ta Aidean Sharghi.
GhostLink: Latent Network Inference for Influence-aware Recommendation
CS249: Neural Language Model
Patterson: Chap 1 A Review of Machine Learning
Presentation transcript:

Rationalizing Neural Predictions [Tao Lei, Regina Barzilay and Tommi Jaakkola, EMNLP 16] Feb 9 2017

Abstract 1. Prediction without justification has limited applicability. We learn to extract pieces of input text as justificaitons-rationales. Rationales are tailored to be short and coherent, yet sufficient for making the same prediction. 2. Our approach combines two modular components-generator and encoder. The generator specifies a distribution over text fragements as candidate rationales and these are passed through the encoder for prediction. 3. Evaluate the approach on multi-aspect sentiment analysis against manually annotated test cases. It outperforms attention-based baseline by a significant margin. Also success on the question retrieval task.

Contents Motivation Related Work Extracitve Rationale Generation Encoder and Generator Experiments Discussion

Motivation Many recent advances in NLP problems have come from NN(neural network). The gains in accuracy have ome at the cost of interpretability since complex NN offer little transparency concerning the inner workings. Ideally, NN would not only yield improved performance but would also offer interpretable justifications-rationals-for their predictions. In this paper, we propose a novel approach to incorporating rationale generation as an intergral part of the overall learning proble,

Motivation Rationals are simply subsets of the words from the input text that satisfy two key properties: 1. Selected words represent short and coherent pieces of text(phrases); 2. Selected words must alone suffice for prediction as a substitute of the original text.

Motivation

Motivation We evaluate our approach on two domains: 1. Multi-aspect sentiment analysis 2. Problem of retrieving similar questions

Motivation

Motivation

Related Work 1. Attention based models: have been succeessfully applied to many NLP problems. Xu et al. (2015) introduced a stochastic attention mechanism together with a more standard soft attention on image captioning task. Our rationale extraction can be understood as a type of stochastic attention although architectures and objectives differ.

Related Work 2. Rationale-based classification (Zaidan et al., 2007; Marshall et al., 2015; Zhang et al., 2016) which seek to improve prediction by relying on richer annotations in the form of human-provided rationales. In our work, rationales are never given during training. The goal is to learn to generate them.

Extractive Rationale Generation We are provided with a sequence of words as input, namely X = {x1, ... xl}, each xt denotes the vector representation of the i-th word. The learning problem is to map the input sequence x to a target vector in Rm. Estimating a complex parameterized mapping enc(x) from input sequences to target vectors. This mapping called encoder. Encapsulate the selection of words as a rationale generator which is another parameterized mapping gen(x) from input sequences to shorter sequences of words Thus gen(x) must include only a few words and enc(gen(x)) should result in nearly the same target vector as the original input passed through the encoder.

Extractive Rationale Generation The rationale generation task is entirely unsupervised. We assume no explicit annotations about which words should be included in the rationale. The rationale is introduced as a latent variable, a constraint that guides how to interpret the input sequence. The encoder and generator are trained jointly, in an end-to-end fashion so as to function well together.

Encoder and Generator

Encoder and Generator

Encoder and Generator Generator: it extracts a subset of text from the original input x to function as an interpretable summary. Thus the rationale for a given sequence x can be equivalently defined in terms of binary varibles {z1, ..., zl} where each zt equal to 0 or 1. indicates whether word xt is selected or not. We use (z, x) as the actual rationale generatred. And gen(x) as synonymous with a probability distribution over binary selections, i.e., z~gen(x) = p(z|x) where the length of z varies with the input x.

Encoder and Generator

Encoder and Generator

Encoder and Generator Joint Objective

Doubly Stochastic Gradient Generator

Doubly Stochastic Gradient Encoder

Choice of recurrent unit We employ recurrent convolution(RCNN), a refinement of local-ngram based convolution. RCNN attempts to learn n-gram features that are not neccessarily consecutive, and average features in a dynamic(recuurent) fashion.

Experiment: Multi-aspect Sentiment Analysis Dataset: We use the BeerAdvocate review dataset used in prior work (McAuley et al., 2012). This dataset contains 1.5 million reviews written by the website users. In addition to the written text, the reviewer provides the ratings (on a scale of 0 to 5 stars) for each aspect(appearance, smell, palate, taste) as well as an overall rating. It also provided sentence level annotations on around 1,000 reviews. Each sentence is annotated with one(multiple) aspect label, indicating what aspect this sentence covers. (Use as test set)

Experiment: Multi-aspect Sentiment Analysis The sentiment correlation between any pair of aspects (and the overall score) is quite high, getting 63.5% on average and a maximum of 79.1%(between the taste and overall score). If directly training the model on this set, the model can be confused due to such strong correlation. We therefore picking “less correlated” examples from the dataset. This gives us a de-correlated subset for each aspect, each containing about 80k to 90k reviews. We use 10k as the development set. We focus on three aspects since the fourth aspect taste still gets > 50% correlation with the overall sentiment.

Experiment: Multi-aspect Sentiment Analysis Sentiment Prediction: Based on the results, we choose RCNN unit for encoder and generator

Experiment: Multi-aspect Sentiment Analysis Rationale Selection: The dependent generator has one additional recurrent layer. For this layer we use 30 states so the dependent version still has a number of parameters comparable to the independent version. The two versions of the generator have 358k and 323k parameters respectively.

Experiment: Similar Text Retrieval on QA Forum Dataset: we use the real-world AskUbuntu5 dataset used in recent work (dos Santos et al., 2015; Lei et al., 2016). This set contains a set of 167k unique questions (eachconsisting a question title and a body) and 16k useridentified similar question pairs. Data is used to train the neural encoder that learns the vector representation of the input question, optimizing the cosine distance between similar questions against random non-similar ones. During development and testing, the model is used to score 20 candidate questions given each query question, and a total of 400×20 query-candidate questionpairs are annotated for evaluation6

Experiment: Similar Text Retrieval on QA Forum 1. The rationale achieve the MAP up to 56.5%, getting close to using the title, it also outperform the noisy question body. 2. Reasons are that the question body can contain the same or even complementary information useful for retrieval.

Discussion 1. Encoder-Generator framework; (Reinforcement Learning); 2. Connections with GAN;