Convolutional Neural Networks for Sentence Classification

Slides:

Advertisements

Similar presentations

A brief review of non-neural-network approaches to deep learning

Advertisements

Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.

Advanced topics.

ImageNet Classification with Deep Convolutional Neural Networks

Deep Belief Networks for Spam Filtering

Distributed Representations of Sentences and Documents

End-to-End Text Recognition with Convolutional Neural Networks

Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,

Students: Meera & Si Mentor: Afshin Dehghan WEEK 4: DEEP TRACKING.

Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.

Semantic Compositionality through Recursive Matrix-Vector Spaces

ImageNet Classification with Deep Convolutional Neural Networks Presenter: Weicong Chen.

Convolutional Neural Network

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Does one size really fit all? Evaluating classifiers in a Bag-of-Visual-Words classification Christian Hentschel, Harald Sack Hasso Plattner Institute.

Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.

Xintao Wu University of Arkansas Introduction to Deep Learning 1.

Bassem Makni SML 16 Click to add text 1 Deep Learning of RDF rules Semantic Machine Learning.

Convolutional Neural Networks at Constrained Time Cost (CVPR 2015) Authors : Kaiming He, Jian Sun (MSR) Presenter : Hyunjun Ju 1.

Yann LeCun Other Methods and Applications of Deep Learning Yann Le Cun The Courant Institute of Mathematical Sciences New York University

Welcome deep loria !.

Sentiment analysis using deep learning methods

Data Mining, Machine Learning, Data Analysis, etc. scikit-learn

Deep Learning for Bacteria Event Identification

Syntax-based Deep Matching of Short Texts

Deep Learning Insights and Open-ended Questions

Many slides and slide ideas thanks to Marc'Aurelio Ranzato and Michael Nielson.

Matt Gormley Lecture 16 October 24, 2016

A Hierarchical Model of Reviews for Aspect-based Sentiment Analysis

Generative Adversarial Networks

Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani

Intro to NLP and Deep Learning

Natural Language Processing of Knee MRI Reports

Neural networks (3) Regularization Autoencoder

ECE 6504 Deep Learning for Perception

Deep learning and applications to Natural language processing

Lecture 5 Smaller Network: CNN

Training Techniques for Deep Neural Networks

Convolutional Networks

Deep Learning Workshop

RNNs: Going Beyond the SRN in Language Prediction

Distributed Representation of Words, Sentences and Paragraphs

Convolutional Neural Networks for sentence classification

Recurrent Neural Networks

ECE 599/692 – Deep Learning Lecture 1 - Introduction

Very Deep Convolutional Networks for Large-Scale Image Recognition

Word Embedding Word2Vec.

Word embeddings based mapping

Word embeddings based mapping

Lecture: Deep Convolutional Neural Networks

Semantic Similarity Detection

Data Mining, Machine Learning, Data Analysis, etc. scikit-learn

Unsupervised Pretraining for Semantic Parsing

Presentation and project

Vector Representation of Text

实习生汇报 ——北邮张安迪.

Presentation By: Eryk Helenowski PURE Mentor: Vincent Bindschaedler

Word embeddings (continued)

Heterogeneous convolutional neural networks for visual recognition

Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton

Logistic Regression [Many of the slides were originally created by Prof. Dan Jurafsky from Stanford.]

Department of Computer Science Ben-Gurion University of the Negev

Automatic Handwriting Generation

Presented by: Anurag Paul

CS295: Modern Systems: Application Case Study Neural Network Accelerator Sang-Woo Jun Spring 2019 Many slides adapted from Hyoukjun Kwon‘s Gatech “Designing.

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

Neural Machine Translation using CNN

Bidirectional LSTM-CRF Models for Sequence Tagging

Vector Representation of Text

Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017

Presentation transcript:

Convolutional Neural Networks for Sentence Classification

Abstract We report on a series of experiments with convolutional neural networks (CNN) trained on top of pre- trained word vectors for sentence-level classification tasks. We show that a simple CNN with little hyperparameter tuning and static vectors achieves excellent results on multiple benchmarks. Learning task-specific vectors through fine-tuning offers further gains in performance. We additionally propose a simple modification to the architecture to allow for the use of both task-specific and static vectors. The CNN models discussed herein improve upon the state of the art on 4 out of 7 tasks, which include sentiment analysis and question classification.

Data The dataset we’ll use in this post is the Movie Review data from Rotten Tomatoes – one of the data sets also used in the original paper. The dataset contains 10,662 example review sentences, half positive and half negative. The dataset has a vocabulary of size around 20k. Note that since this data set is pretty small we’re likely to overfit with a powerful model. Also, the dataset doesn’t come with an official train/test split, so we simply use 10% of the data as a dev set. The original paper reported results for 10-fold cross-validation on the data.

Data Sample negative: simplistic , silly and tedious . it's so laddish and juvenile , only teenage boys could possibly find it funny . exploitative and largely devoid of the depth or sophistication that would make watching such a graphic treatment of the crimes bearable . [garbus] discards the potential for pathological study , exhuming instead , the skewed melodrama of the circumstantial situation . a visually flashy but narratively opaque and emotionally vapid exercise in style and mystification . the story is also as unoriginal as they come , already having been recycled more times than i'd care to count . positive: the rock is destined to be the 21st century's new " conan " and that he's going to make a splash even greater than arnold schwarzenegger , jean-claud van damme or steven segal . the gorgeously elaborate continuation of " the lord of the rings " trilogy is so huge that a column of words cannot adequately describe co-writer/director peter jackson's expanded vision of j . r . r . tolkien's middle-earth . effective but too-tepid biopic if you sometimes like to go to the movies to have fun , wasabi is a good place to start .

Model

Variations Model Variations We experiment with several variants of the model. • CNN-rand: Our baseline model where all words are randomly initialized and then modified during training. • CNN-static: A model with pre-trained vectors from word2vec. All words— including the unknown ones that are randomly initialized—are kept static and only the other parameters of the model are learned. • CNN-non-static: Same as above but the pretrained vectors are fine-tuned for each task. • CNN-multichannel: A model with two sets of word vectors. Each set of vectors is treated as a ‘channel’ and each filter is applied to both channels, but gradients are backpropagated only through one of the channels. Hence the model is able to fine-tune one set of vectors while keeping the other static. Both channels are initialized with word2vec.

Results of our CNN models against other methods. RAE: Recursive Autoencoders with pre-trained word vectors from Wikipedia (Socher et al., 2011). MV-RNN: Matrix-Vector Recursive Neural Network with parse trees (Socher et al., 2012). RNTN: Recursive Neural Tensor Network with tensor-based feature function and parse trees (Socher et al., 2013). DCNN:Dynamic Convolutional Neural Network with k-max pooling (Kalchbrenner et al., 2014). Paragraph-Vec: Logistic regression on top of paragraph vectors (Le and Mikolov, 2014). CCAE: Combinatorial Category Autoencoders with combinatorial category grammar operators (Hermann and Blunsom, 2013). Sent-Parser: Sentiment analysis-specific parser (Dong et al.,2014). NBSVM, MNB: Naive Bayes SVM and Multinomial Naive Bayes with uni-bigrams from Wang and Manning (2012). G-Dropout, F-Dropout: Gaussian Dropout and Fast Dropout from Wang and Manning (2013). Tree-CRF: Dependency tree with Conditional Random Fields (Nakagawa et al., 2010). CRF-PR: Conditional Random Fields with Posterior Regularization(Yang and Cardie, 2014). SVMS: SVM with uni-bi-trigrams, wh word, head word, POS, parser, hypernyms, and 60 hand-coded rules as features from Silva et al. (2011).

Benchmarks

Conclusion In the present work we have described a series of experiments with convolutional neural networks built on top of word2vec. Despite little tuning of hyperparameters, a simple CNN with one layer of convolution performs remarkably well. Our results add to the well-established evidence that unsupervised pre-training of word vectors is an important ingredient in deep learning for NLP.

References https://arxiv.org/abs/1408.5882 http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in- tensorflow/ http://www.cs.cornell.edu/people/pabo/movie-review-data/ https://github.com/dennybritz/cnn-text-classification-tf https://github.com/yoonkim/CNN_sentence