Sentiment analysis using deep learning methods

Slides:



Advertisements
Similar presentations
Neural Networks and Kernel Methods
Advertisements

Distributed Representations of Sentences and Documents
Lecture 3b: CNN: Advanced Layers
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.
Xintao Wu University of Arkansas Introduction to Deep Learning 1.
Yann LeCun Other Methods and Applications of Deep Learning Yann Le Cun The Courant Institute of Mathematical Sciences New York University
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Neural networks and support vector machines
Welcome deep loria !.
Big data classification using neural network
Unsupervised Learning of Video Representations using LSTMs
Sentiment Analysis of Twitter Messages Using Word2Vec
RNNs: An example applied to the prediction task
Convolutional Neural Network
CS 388: Natural Language Processing: LSTM Recurrent Neural Networks
CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.
Deep Feedforward Networks
Deep Learning for Bacteria Event Identification
Deep Learning Amin Sobhani.
Compact Bilinear Pooling
Natural Language and Text Processing Laboratory
Sentence Modeling Representation of sentences is the heart of Natural Language Processing A sentence model is a representation and analysis of semantic.
ECE 5424: Introduction to Machine Learning
Recursive Neural Networks
Computer Science and Engineering, Seoul National University
Recurrent Neural Networks for Natural Language Processing
COMP24111: Machine Learning and Optimisation
Matt Gormley Lecture 16 October 24, 2016
A Hierarchical Model of Reviews for Aspect-based Sentiment Analysis
Intro to NLP and Deep Learning
Classification with Perceptrons Reading:
Intelligent Information System Lab
Intro to NLP and Deep Learning
Natural Language Processing of Knee MRI Reports
Neural networks (3) Regularization Autoencoder
Deep learning and applications to Natural language processing
Neural Networks 2 CS446 Machine Learning.
Training Techniques for Deep Neural Networks
Convolutional Networks
Shunyuan Zhang Nikhil Malik
Neural Networks and Backpropagation
A brief introduction to neural network
RNNs: Going Beyond the SRN in Language Prediction
Convolutional Neural Networks for sentence classification
Introduction to Neural Networks
Grid Long Short-Term Memory
Image Captions With Deep Learning Yulia Kogan & Ron Shiff
Recurrent Neural Networks
CS 4501: Introduction to Computer Vision Training Neural Networks II
Word Embedding Word2Vec.
Neural Networks Geoff Hulten.
Lecture 16: Recurrent Neural Networks (RNNs)
RNNs: Going Beyond the SRN in Language Prediction
实习生汇报 ——北邮 张安迪.
Neural networks (3) Regularization Autoencoder
Presentation By: Eryk Helenowski PURE Mentor: Vincent Bindschaedler
Word embeddings (continued)
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Introduction to Deep Learning
Automatic Handwriting Generation
Introduction to Neural Networks
Sentiment Classification
CS295: Modern Systems: Application Case Study Neural Network Accelerator Sang-Woo Jun Spring 2019 Many slides adapted from Hyoukjun Kwon‘s Gatech “Designing.
Image recognition.
Recurrent Neural Networks
Deep learning: Recurrent Neural Networks CV192
LHC beam mode classification
Neural Machine Translation by Jointly Learning to Align and Translate
An introduction to neural network and machine learning
Presentation transcript:

Sentiment analysis using deep learning methods Antti Keurulainen 14.2.2017

Sentiment analysis using deep learning methods Two main approaches: Convolutional neural networks (CNN) Recurrent neural networks (RNN), can be enhanced by using LSTM Antti Keurulainen 14.2.2017

Deep Learning One or more hidden layers and the ability to have trainable parameters in these layers An artificial network, that is organized in hierarchical layers, has the capability to build hierarchical representations of the input data Antti Keurulainen 14.2.2017

Convolutional neural network (CNN) Simple example of convolution operation Input, e.g. an image 1 3 5 2 4 6 7 Filter (Kernel) 0.2 0.7 -0.5 Antti Keurulainen 14.2.2017

Convolutional neural network (CNN) Simple example of convolution operation Input, e.g. an image 1 3 5 2 4 6 7 𝑐 1 =f (0.2∗1+0.7∗3 −0.5 ∗6+0.7∗0) = f(−0.7) Note: Bias terms omitted! Feature map f(-0.7) f represents some non-linear activation function Filter (Kernel) 0.2 0.7 -0.5 Antti Keurulainen 14.2.2017

Convolutional neural network (CNN) Simple example of convolution operation Input, e.g. an image 1 3 5 2 4 6 7 𝑐 1 =f (0.2∗1+0.7∗3 −0.5 ∗6+0.7∗0) = f(−0.7) 𝑐 2 =f(0.2∗3+0.7∗5 −0.5 ∗0+0.7∗2) = f(5.5) Note: Bias terms omitted! Feature map f(-0.7) f(5.5) f represents some non-linear activation function Filter (Kernel) 0.2 0.7 -0.5 Antti Keurulainen 14.2.2017

Convolutional neural network (CNN) Simple example of convolution operation Input, e.g. an image 1 3 5 2 4 6 7 𝑐 1 =f (0.2∗1+0.7∗3 −0.5 ∗6+0.7∗0) = f(−0.7) 𝑐 2 =f(0.2∗3+0.7∗5 −0.5 ∗0+0.7∗2) = f(5.5) 𝑐 3 =… Note: Bias terms omitted! Feature map f(-0.7) f(5.5) … f represents some non-linear activation function Filter (Kernel) 0.2 0.7 -0.5 Antti Keurulainen 14.2.2017

Convolutional neural network (CNN) 𝑦 = 𝑘=1 3 𝑖=1 5 𝑗=1 5 𝑥 𝑘𝑖𝑗 𝜃 𝑘𝑖𝑗 After convolution, some other operations are performed such as applying the activation function (nonlinearity) and pooling During training, the values that are used in the filters are updated and gradually learned The parameter sharing concept brings invariance Antti Keurulainen 14.2.2017

Recurrent Neural Network (RNN) Shallow RNN: 𝑥 𝑡−1 𝑥 𝑡 𝑥 𝑡+1 𝑥 𝑡+2 𝑥 𝑡+3 ℎ 𝑡−1 ℎ 𝑡 ℎ 𝑡+1 ℎ 𝑡+2 ℎ 𝑡+3 𝑜 𝑡−1 𝑜 𝑡 𝑜 𝑡+1 𝑜 𝑡+2 𝑜 𝑡+3 . . . U W V 𝐿 𝑡−1 𝐿 𝑡 𝐿 𝑡+1 𝐿 𝑡+2 𝐿 𝑡+3 𝑦 𝑡−1 𝑦 𝑡 𝑦 𝑡+1 𝑦 𝑡+2 𝑦 𝑡+3 Source: Goodfellow, I., Bengio, Y., Courville, A., Deep Learning, Antti Keurulainen 14.2.2017

Recurrent Neural Network (RNN) 𝑥 𝑡−1 𝑥 𝑡 𝑥 𝑡+1 𝑥 𝑡+2 𝑥 𝑡+3 ℎ 1 𝑡−1 ℎ 1 𝑡 ℎ 1 𝑡+1 ℎ 1 𝑡+2 ℎ 1 𝑡+3 𝑜 𝑡−1 𝑜 𝑡 𝑜 𝑡+1 𝑜 𝑡+2 𝑜 𝑡+3 . . . U W1 V1 𝐿 𝑡−1 𝐿 𝑡 𝐿 𝑡+1 𝐿 𝑡+2 𝐿 𝑡+3 𝑦 𝑡−1 𝑦 𝑡 𝑦 𝑡+1 𝑦 𝑡+2 𝑦 𝑡+3 ℎ 2 𝑡−1 ℎ 2 𝑡 ℎ 2 𝑡+1 ℎ 2 𝑡+2 ℎ 2 𝑡+3 W2 V2 Deep RNN example: Antti Keurulainen 14.2.2017

Vanishing gradient problem and LSTM Problem: gradients propagate over many stages, and involves several multiplications of the weight matrix. -> vanishing or exploding gradients 𝑥 𝑡−1 𝑥 𝑡 𝑥 𝑡+1 𝑥 𝑡+2 𝑥 𝑡+3 ℎ 𝑡−1 ℎ 𝑡 ℎ 𝑡+1 ℎ 𝑡+2 ℎ 𝑡+3 𝑜 𝑡−1 𝑜 𝑡 𝑜 𝑡+1 𝑜 𝑡+2 𝑜 𝑡+3 . . . U W V 𝐿 𝑡−1 𝐿 𝑡 𝐿 𝑡+1 𝐿 𝑡+2 𝐿 𝑡+3 𝑦 𝑡−1 𝑦 𝑡 𝑦 𝑡+1 𝑦 𝑡+2 𝑦 𝑡+3 Antti Keurulainen 14.2.2017

ℎ 𝑡 = 𝑡𝑎𝑛ℎ 𝑈 𝑐 𝑥 𝑡 + 𝑊 𝑐 ℎ 𝑡−1 + 𝑏 𝑐 Standard RNN cell ℎ 𝑡 = 𝑡𝑎𝑛ℎ 𝑈 𝑐 𝑥 𝑡 + 𝑊 𝑐 ℎ 𝑡−1 + 𝑏 𝑐 𝒉 𝒕 Vanilla RNN 𝒉 𝒕−𝟏 𝒉 𝒕 𝒕𝒂𝒏𝒉 𝑈 𝑊 𝒙 𝒕 Visualization idea by Christopher Olah Antti Keurulainen 14.2.2017

𝑠 𝑡 = 𝑡𝑎𝑛ℎ 𝑈 𝑐 𝑥 𝑡 + 𝑊 𝑐 ℎ 𝑡−1 + 𝑏 𝑐 𝑠 𝑡 = 𝑓 𝑡 ∘ 𝑠 𝑡−1 + 𝑖 𝑡 ∘ 𝑠 𝑡 𝑠 𝑡 = 𝑡𝑎𝑛ℎ 𝑈 𝑐 𝑥 𝑡 + 𝑊 𝑐 ℎ 𝑡−1 + 𝑏 𝑐 𝑓 𝑡 = 𝜎 𝑈 𝑓 𝑥 𝑡 + 𝑊 𝑓 ℎ 𝑡−1 + 𝑏 𝑓 ℎ 𝑡 = 𝑜 𝑡 ∘𝑡𝑎𝑛ℎ 𝑠 𝑡 𝑖 𝑡 = 𝜎 𝑈 𝑖 𝑥 𝑡 + 𝑊 𝑖 ℎ 𝑡−1 + 𝑏 𝑖 𝑜 𝑡 = 𝜎 𝑈 𝑜 𝑥 𝑡 + 𝑊 𝑜 ℎ 𝑡−1 + 𝑏 𝑜 ℎ 𝑡 LSTM 𝒔 𝒕−𝟏 𝒔 𝒕 X + 𝑖 𝑡 ∘ 𝑠 𝑡 𝑡𝑎𝑛ℎ 𝑓 𝑡 X 𝑜 𝑡 𝑖 𝑡 𝝈 𝑠 𝑡 X 𝝈 𝒕𝒂𝒏𝒉 𝝈 𝑈 𝑓 𝑊 𝑓 𝑈 𝑜 𝑊 𝑜 ℎ 𝑡 ℎ 𝑡−1 𝑈 𝑖 𝑊 𝑖 𝑈 𝑐 𝑊 𝑐 𝑜 𝑡 ∘𝑡𝑎𝑛ℎ 𝑠 𝑡 𝑥 𝑡 Visualization idea by Christopher Olah Antti Keurulainen 14.2.2017

Sentiment analysis Sentiment analysis is a collection of methods with the main intent to observe the opinion or attitude, for example, of a sentence expressed in natural language. Antti Keurulainen 14.2.2017

Sentiment analysis using CNNs Analysis based on Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP 2014), 1746–1751. http://aclweb.org/anthology/D/D14/D14-1181.pdf a simple CNN with one layer of convolution on top of word vectors obtained from an unsupervised neural language model. Good results are obtained by using pre-trained word vector. Results are still improved by further training the word vectors for specific tasks. Antti Keurulainen 14.2.2017

Sentiment analysis using CNNs Simple CNN model for sentiment analysis Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. Antti Keurulainen 14.2.2017

Sentiment analysis using CNNs Multiple filters sizes (3,4,5) to produce several feature maps (100 of each size) -> magnitude of 0,3 – 0,4 M parameters Max-over-time pooling used to select the most important feature Two input channels used, other with static word vectors and other with trainable vectors Fully connected softmax layer on top to produce probabilities for each class Dropout used in the fully connected layer for regularization, L2 norm gradient clipping for other weights. Early stopping used. Stochastic gradient descent update using Adadelta update rule Pre-trained word2vec used, trained with 100B words from Google news, 300 dimensions Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. Antti Keurulainen 14.2.2017

Sentiment analysis using CNNs Models in [Kim2014] CNN-rand; all words are initialized randomly and trained CNN-static; initialized with word2vec used (unknown initialized randomly) and kept static CNN-non-static; intialized with word2vec and trained further CNN-multichannel; Initialized with word2vec, one channel stays static and other channel is further trained Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. Antti Keurulainen 14.2.2017

Sentiment analysis using CNNs datasets [Kim 2014] “Movie review data”. Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan, EMNLP 2002. Binary classification. “Stanford sentiment treebank 1”. Extension of the above. Fine-graned labels added, Socher et al 2013. “Stanford sentiment treebank 2”. Same as above but with neutral removed and binary labels. “Subjectivity dataset”. 5000 subjective and 5000 objective processed sentences. Pang/Lee ACL 2004 TREC question dataset, classifying a question type into 6 classes Customer review dataset. Reviews of various products like cameras, mp3 players etc. Hu & Liu 2004 MPQA dataset. Opinion polarity subtask from MPQA dataset. Wiebe et al 2005 Antti Keurulainen 14.2.2017

Sentiment analysis using CNNs results [Kim 2014] Antti Keurulainen 14.2.2017

Sentiment analysis using RNNs Antti Keurulainen 14.2.2017

Sentiment analysis using RNNs Analysis based on [Wan2015]: Wang, X., Liu, Y., Sun, C., Wang, B., & Wang, X. (2015). Predicting Polarities of Tweets by Composing Word Embeddings with Long Short-Term Memory. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 1343–1353, Beijing, China. Association for Computational Linguistics. http://www.aclweb.org/anthology/P15-1130 Twitter sentiment prediction, using simple RNN or LSTM recurrent network. Antti Keurulainen 14.2.2017

Sentiment analysis using RNNs The word vectors created by co-occurrence statistics are not always suitable for sentiment analysis (e.g. words “good” and “bad” are close in word2vec representations) Sentiments are expressed by phrases instead of individual words -> how to capture the representation of the whole sentence? Additional challenge: Recurrent Neural Network (RNN) has difficulties to maintain longer time dependencies -> LSTM networks It has been shown, that further task-specific training of the pre-trained word vectors help capturing the polarity information of the sentences Antti Keurulainen 14.2.2017

Sentiment analysis using RNNs Basic RNN architecture [Wan2015] RNN architecture that is used in [Wan2015] In [Wan2015], the sentence is expressed by the hidden state of the last time step. Antti Keurulainen 14.2.2017

Sentiment analysis using RNNs RNN-FLT (Recurrent Neural Network with Fixed Lookup-Table): a simple implementation of the recurrent sentiment classifier Forward pass: Backpropagation: f represents sigmoid function, w are the weights, e are the word embeddings, v includes hidden-output weights, t is the time step, T is the last time step. The loss O is calculated by using cross-entropy loss, and training is conducted by stochastic gradient descent (SGD) Antti Keurulainen 14.2.2017

Sentiment analysis using RNNs RNN-TLT (Recurrent Neural Network with Trainable Lookup-Table) and LSTM-TLT: implementations that include further training of the pre-trained word vectors. In LSTM-TLT the classifier uses LSTM blocks instead of regular RNN blocks. Each regular RNN block is replaced by an LSTM block -> much more complicated functionality -> helps to combat the vanishing gradient problem Antti Keurulainen 14.2.2017

Sentiment analysis using RNNs Experiments are run by using Stanford Twitter Sentiment corpus (STS); 800 000 positive and 800 000 negative tweets. Manually labeled test set includes 177 negative and 182 positive tweets. Training set sentiment labels are retrieved from emoticons. 25 dimensional word vectors that were trained with 1.56M tweets from training set using word2vec. Hidden layer size 60. Non-neural classifiers (Naive Bayes, Maximum Entropy, Support Vector Machine) Neural Bag-of-Words, summation of word vectors as input Dynamic Convolutional Neural Network Recursive Autoencoder Models presented in this paper Antti Keurulainen 14.2.2017

Sentiment analysis using RNNs Additional experiment are run using human-labeled dataset SemEval 2013. The dataset has training set of 4099, development set of 735 and test set of 1742 tweets. Fixed word vectors, pre-trained with word2vec using STS dataset, this time 300-dimensions Antti Keurulainen 14.2.2017

Sentiment analysis using RNNs Which words change most when training the pre-trained word2vec vectors? Antti Keurulainen 14.2.2017

Sentiment analysis using RNNs How the sentiment words are moved during training in 2-d space? 20 most negative and 20 most positive words were tracked during training Before tuning After tuning Antti Keurulainen 14.2.2017

Sentiment analysis experiments using python libraries and tensorflow Antti Keurulainen 14.2.2017

Sentiment analysis experiments using python libraries and tensorflow Dataset: IMDb movie review dataset, 25000 labeled reviews in the training set, 25000 unlabeled in the test set. Models: Bag of words with random forest (pandas, numpy, scikit-learn) Word2vec with random forest (pandas, numpy, scikit-learn, gensim) Word2vec with feed forward (pandas, numpy, gensim, tensorflow) Antti Keurulainen 14.2.2017

Sentiment analysis experiments using python libraries and tensorflow Baseline with bag of words and random forest Step 1: Download from Kaggle.com and clean the IMDb movie review dataset import data to pandas frame use BeatifulSoup to remove html tagging use regular expression to remove non-letters convert to lowercase remove stopwords Step 2: create bag of words representations of the individual reviews (sklearn CountVectorizer) Step 3: fit random forest model on training set and run predictions, submit to Kaggle.com for test set accuracy Accuracy 85,6 % Antti Keurulainen 14.2.2017

Sentiment analysis experiments using python libraries and tensorflow Word2vec with random forest Step 1: Download from Kaggle.com and clean the IMDb movie review dataset import data to pandas frame use BeatifulSoup to remove html tagging use regular expression to remove non-letters convert to lowercase Step 2: create word2vec representations of the individual words (gensim word2vec) Step 3: average all word vectors in a review to form one single vector for a review Step 4: fit random forest model on training set and run predictions, submit to Kaggle.com for test set accuracy Accuracy 83,3 % Antti Keurulainen 14.2.2017

Sentiment analysis experiments using python libraries and tensorflow Word2vec with deep learning Step 1: Download from Kaggle.com and clean the IMDb movie review dataset import data to pandas frame use BeatifulSoup to remove html tagging use regular expression to remove non-letters convert to lowercase Step 2: create word2vec representations of the individual words (gensim word2vec) Step 3: average all word vectors in a review to form one single vector for the review Step 4: fit feed forward deep learning model on training set and run predictions, submit to Kaggle.com for test set accuracy Accuracy 87,0 % Antti Keurulainen 14.2.2017

A lot of hyperparameters and other decisions Remove stopwords? (yes for word2vec, no to sentiment analysis) Remove punctuation? (yes) Dimension of word vectors (300) Word2vec window size (10) Downsampling for frequent words (1e-3) Minimum word count for word2vec (40) Deep Learning (DL) number of layers (3) DL width of the hidden layers (300-150-50-2) DL activation functions (Relu) DL use dropout (tried, did not help. -> no) DL Initialization (random uniform between 0 and 1) DL Optimizer (Adam) DL Adam optimizer parameters, learning rate +3 others DL number of training steps DL regularization method and its parameters (none) DL loss function (cross entropy) Antti Keurulainen 14.2.2017

Homework Consider the CNN-based single-channel version of the model architecture presented in [Kim2014]. Consider a scenario where the pre-trained static word vectors have 300 dimensions, and three different filter sizes are used that span over 3, 4 and 5 words. Each filter size size produces 100 feature maps. The feature maps are calculated by using the the formula (2): meaning that the weight matrix is multiplied with the input vectors, bias is added, and the result is applied through non-linearity such as tanh. Then, the feature maps are max-pooled, and these results are connected to final two dimensional output layer in the fully connected manner. Calculate the number of trainable parameters (weights and biases) there are in this model. Antti Keurulainen 14.2.2017