Convolutional Neural Networks for Sentence Classification

Slides:



Advertisements
Similar presentations
A brief review of non-neural-network approaches to deep learning
Advertisements

Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Advanced topics.
ImageNet Classification with Deep Convolutional Neural Networks
Deep Belief Networks for Spam Filtering
Distributed Representations of Sentences and Documents
End-to-End Text Recognition with Convolutional Neural Networks
Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,
Students: Meera & Si Mentor: Afshin Dehghan WEEK 4: DEEP TRACKING.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Semantic Compositionality through Recursive Matrix-Vector Spaces
ImageNet Classification with Deep Convolutional Neural Networks Presenter: Weicong Chen.
Convolutional Neural Network
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Does one size really fit all? Evaluating classifiers in a Bag-of-Visual-Words classification Christian Hentschel, Harald Sack Hasso Plattner Institute.
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
Xintao Wu University of Arkansas Introduction to Deep Learning 1.
Bassem Makni SML 16 Click to add text 1 Deep Learning of RDF rules Semantic Machine Learning.
Convolutional Neural Networks at Constrained Time Cost (CVPR 2015) Authors : Kaiming He, Jian Sun (MSR) Presenter : Hyunjun Ju 1.
Yann LeCun Other Methods and Applications of Deep Learning Yann Le Cun The Courant Institute of Mathematical Sciences New York University
Welcome deep loria !.
Sentiment analysis using deep learning methods
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Deep Learning for Bacteria Event Identification
Syntax-based Deep Matching of Short Texts
Deep Learning Insights and Open-ended Questions
Many slides and slide ideas thanks to Marc'Aurelio Ranzato and Michael Nielson.
Matt Gormley Lecture 16 October 24, 2016
A Hierarchical Model of Reviews for Aspect-based Sentiment Analysis
Generative Adversarial Networks
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Intro to NLP and Deep Learning
Natural Language Processing of Knee MRI Reports
Neural networks (3) Regularization Autoencoder
ECE 6504 Deep Learning for Perception
Deep learning and applications to Natural language processing
Lecture 5 Smaller Network: CNN
Training Techniques for Deep Neural Networks
Convolutional Networks
Deep Learning Workshop
RNNs: Going Beyond the SRN in Language Prediction
Distributed Representation of Words, Sentences and Paragraphs
Convolutional Neural Networks for sentence classification
Recurrent Neural Networks
ECE 599/692 – Deep Learning Lecture 1 - Introduction
Very Deep Convolutional Networks for Large-Scale Image Recognition
Word Embedding Word2Vec.
Word embeddings based mapping
Word embeddings based mapping
Lecture: Deep Convolutional Neural Networks
Semantic Similarity Detection
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Unsupervised Pretraining for Semantic Parsing
Presentation and project
Vector Representation of Text
实习生汇报 ——北邮 张安迪.
Presentation By: Eryk Helenowski PURE Mentor: Vincent Bindschaedler
Word embeddings (continued)
Heterogeneous convolutional neural networks for visual recognition
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Logistic Regression [Many of the slides were originally created by Prof. Dan Jurafsky from Stanford.]
Department of Computer Science Ben-Gurion University of the Negev
Automatic Handwriting Generation
Presented by: Anurag Paul
CS295: Modern Systems: Application Case Study Neural Network Accelerator Sang-Woo Jun Spring 2019 Many slides adapted from Hyoukjun Kwon‘s Gatech “Designing.
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
Neural Machine Translation using CNN
Bidirectional LSTM-CRF Models for Sequence Tagging
Vector Representation of Text
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Convolutional Neural Networks for Sentence Classification

Abstract We report on a series of experiments with convolutional neural networks (CNN) trained on top of pre- trained word vectors for sentence-level classification tasks. We show that a simple CNN with little hyperparameter tuning and static vectors achieves excellent results on multiple benchmarks. Learning task-specific vectors through fine-tuning offers further gains in performance. We additionally propose a simple modification to the architecture to allow for the use of both task-specific and static vectors. The CNN models discussed herein improve upon the state of the art on 4 out of 7 tasks, which include sentiment analysis and question classification.

Data The dataset we’ll use in this post is the Movie Review data from Rotten Tomatoes – one of the data sets also used in the original paper. The dataset contains 10,662 example review sentences, half positive and half negative. The dataset has a vocabulary of size around 20k. Note that since this data set is pretty small we’re likely to overfit with a powerful model. Also, the dataset doesn’t come with an official train/test split, so we simply use 10% of the data as a dev set. The original paper reported results for 10-fold cross-validation on the data.

Data Sample negative: simplistic , silly and tedious . it's so laddish and juvenile , only teenage boys could possibly find it funny . exploitative and largely devoid of the depth or sophistication that would make watching such a graphic treatment of the crimes bearable . [garbus] discards the potential for pathological study , exhuming instead , the skewed melodrama of the circumstantial situation . a visually flashy but narratively opaque and emotionally vapid exercise in style and mystification . the story is also as unoriginal as they come , already having been recycled more times than i'd care to count . positive: the rock is destined to be the 21st century's new " conan " and that he's going to make a splash even greater than arnold schwarzenegger , jean-claud van damme or steven segal . the gorgeously elaborate continuation of " the lord of the rings " trilogy is so huge that a column of words cannot adequately describe co-writer/director peter jackson's expanded vision of j . r . r . tolkien's middle-earth . effective but too-tepid biopic if you sometimes like to go to the movies to have fun , wasabi is a good place to start .

Model

Variations Model Variations We experiment with several variants of the model. • CNN-rand: Our baseline model where all words are randomly initialized and then modified during training. • CNN-static: A model with pre-trained vectors from word2vec. All words— including the unknown ones that are randomly initialized—are kept static and only the other parameters of the model are learned. • CNN-non-static: Same as above but the pretrained vectors are fine-tuned for each task. • CNN-multichannel: A model with two sets of word vectors. Each set of vectors is treated as a ‘channel’ and each filter is applied to both channels, but gradients are backpropagated only through one of the channels. Hence the model is able to fine-tune one set of vectors while keeping the other static. Both channels are initialized with word2vec.

Results of our CNN models against other methods. RAE: Recursive Autoencoders with pre-trained word vectors from Wikipedia (Socher et al., 2011). MV-RNN: Matrix-Vector Recursive Neural Network with parse trees (Socher et al., 2012). RNTN: Recursive Neural Tensor Network with tensor-based feature function and parse trees (Socher et al., 2013). DCNN:Dynamic Convolutional Neural Network with k-max pooling (Kalchbrenner et al., 2014). Paragraph-Vec: Logistic regression on top of paragraph vectors (Le and Mikolov, 2014). CCAE: Combinatorial Category Autoencoders with combinatorial category grammar operators (Hermann and Blunsom, 2013). Sent-Parser: Sentiment analysis-specific parser (Dong et al.,2014). NBSVM, MNB: Naive Bayes SVM and Multinomial Naive Bayes with uni-bigrams from Wang and Manning (2012). G-Dropout, F-Dropout: Gaussian Dropout and Fast Dropout from Wang and Manning (2013). Tree-CRF: Dependency tree with Conditional Random Fields (Nakagawa et al., 2010). CRF-PR: Conditional Random Fields with Posterior Regularization(Yang and Cardie, 2014). SVMS: SVM with uni-bi-trigrams, wh word, head word, POS, parser, hypernyms, and 60 hand-coded rules as features from Silva et al. (2011).

Benchmarks

Conclusion In the present work we have described a series of experiments with convolutional neural networks built on top of word2vec. Despite little tuning of hyperparameters, a simple CNN with one layer of convolution performs remarkably well. Our results add to the well-established evidence that unsupervised pre-training of word vectors is an important ingredient in deep learning for NLP.

References https://arxiv.org/abs/1408.5882 http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in- tensorflow/ http://www.cs.cornell.edu/people/pabo/movie-review-data/ https://github.com/dennybritz/cnn-text-classification-tf https://github.com/yoonkim/CNN_sentence