Convolutional Neural Networks for sentence classification

Slides:

Advertisements

Similar presentations

A brief review of non-neural-network approaches to deep learning

Advertisements

ImageNet Classification with Deep Convolutional Neural Networks

What is the Best Multi-Stage Architecture for Object Recognition? Ruiwen Wu [1] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object.

Distributed Representations of Sentences and Documents

Spatial Pyramid Pooling in Deep Convolutional

Overview of Back Propagation Algorithm

Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.

Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.

Deep Learning for Efficient Discriminative Parsing Niranjan Balasubramanian September 2 nd, 2015 Slides based on Ronan Collobert’s Paper and video from.

Deep Convolutional Nets

Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov

ImageNet Classification with Deep Convolutional Neural Networks Presenter: Weicong Chen.

Object Recognizing. Deep Learning Success in 2012 DeepNet and speech processing.

Convolutional Neural Network

A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning Ronan Collobert Jason Weston Presented by Jie Peng.

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Xintao Wu University of Arkansas Introduction to Deep Learning 1.

Distributed Representations for Natural Language Processing

Sentiment analysis using deep learning methods

Convolutional Neural Network

CS 388: Natural Language Processing: LSTM Recurrent Neural Networks

CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.

Deep Learning for Bacteria Event Identification

Deep Learning Amin Sobhani.

Compact Bilinear Pooling

Sentence Modeling Representation of sentences is the heart of Natural Language Processing A sentence model is a representation and analysis of semantic.

Data Mining, Neural Network and Genetic Programming

Recursive Neural Networks

Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek

Matt Gormley Lecture 16 October 24, 2016

Intro to NLP and Deep Learning

Natural Language Processing of Knee MRI Reports

Neural networks (3) Regularization Autoencoder

ECE 6504 Deep Learning for Perception

Supervised Training of Deep Networks

Deep learning and applications to Natural language processing

Neural Networks 2 CS446 Machine Learning.

Training Techniques for Deep Neural Networks

Deep Learning Qing LU, Siyuan CAO.

Convolutional Networks

Deep Belief Networks Psychology 209 February 22, 2013.

Random walk initialization for training very deep feedforward networks

Convolutional Neural Networks for Sentence Classification

Dynamic Routing Using Inter Capsule Routing Protocol Between Capsules

Distributed Representation of Words, Sentences and Paragraphs

Introduction to Neural Networks

Towards Understanding the Invertibility of Convolutional Neural Networks Anna C. Gilbert1, Yi Zhang1, Kibok Lee1, Yuting Zhang1, Honglak Lee1,2 1University.

Recurrent Neural Networks

cs540 - Fall 2016 (Shavlik©), Lecture 20, Week 11

Very Deep Convolutional Networks for Large-Scale Image Recognition

Lecture: Deep Convolutional Neural Networks

Visualizing and Understanding Convolutional Networks

Machine Learning – Neural Networks David Fenyő

Unsupervised Pretraining for Semantic Parsing

Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824

Convolutional Neural Networks

Vector Representation of Text

Neural networks (3) Regularization Autoencoder

Word embeddings (continued)

Heterogeneous convolutional neural networks for visual recognition

Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824

Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton

Toward a Great Class Project: Discussion of Stoianov & Zorzi’s Numerosity Model Psych 209 – 2019 Feb 14, 2019.

An introduction to: Deep Learning aka or related to Deep Neural Networks Deep Structural Learning Deep Belief Networks etc,

Presented by: Anurag Paul

CS295: Modern Systems: Application Case Study Neural Network Accelerator Sang-Woo Jun Spring 2019 Many slides adapted from Hyoukjun Kwon‘s Gatech “Designing.

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

Neural Machine Translation using CNN

Bidirectional LSTM-CRF Models for Sequence Tagging

Vector Representation of Text

Presentation transcript:

Convolutional Neural Networks for sentence classification Paper presentation by: Aradhya Chouhan

Author Yoon Kim : New York University

introduction Deep learning models have achieved remarkable results in computer vision (Krizhevsky et al., 2012) and speech recognition (Graves et al., 2013) in recent years Originally invented for computer vision, CNN models have subsequently been shown to be effective for NLP and have achieved excellent results in semantic parsing (Yih et al., 2014), search query retrieval (Shen et al., 2014), sentence modeling (Kalchbrenner et al., 2014), and other traditional NLP tasks (Collobert et al., 2011).

objective In the present work, we train a simple CNN with one layer of convolution on top of word vectors obtained from an unsupervised neural language model.

Model Architecture

Model: data representation Let xi ∈ R k be the k-dimensional word vector corresponding to the i-th word in the sentence. A sentence of length n (padded where necessary) is represented as x1:n = x1 ⊕ x2 ⊕ . . . ⊕ xn, (1) where ⊕ is the concatenation operator. In general, let xi:i+j refer to the concatenation of words xi , xi+1, . . . , xi+j . A convolution operation involves a filter w ∈ R dim hk.

Model: convolution A feature ci is generated from a window of words xi:i+h−1 by ci = f(w · xi:i+h−1 + b). (2) Here b ∈ R is a bias term and f is a nonlinear function such as the hyperbolic tangent. This filter is applied to each possible window of words in the sentence {x1:h, x2:h+1, . . . , xn−h+1:n} to produce a feature map c = [c1, c2, . . . , cn−h+1], (3) with c ∈ R dim n−h+1.

Model: max pooling We then apply a max-overtime pooling operation (Collobert et al., 2011) over the feature map and take the maximum value cˆ = max{c} as the feature corresponding to this particular filter.

Regularization: dropout For regularization we employ dropout on the penultimate layer with a constraint on l2-norms of the weight vectors (Hinton et al., 2012). Dropout prevents co-adaptation of hidden units by randomly dropping out—i.e., setting to zero—a proportion p of the hidden units during forward-backpropagation.

Regularization: dropout Given the penultimate layer z = [ˆc1, . . . , cˆm] (note that here we have m filters), instead of using y = w · z + b (4) for output unit y in forward propagation, dropout uses y = w · (z ◦ r) + b, (5) where ◦ is the element-wise multiplication operator and r ∈ R m is a ‘masking’ vector of Bernoulli random variables with probability p of being 1.

Hyperparameters Final values used in the Convolutional network →

Datasets for testing the model • MR: Movie reviews with one sentence per review. • SST-1: Stanford Sentiment Treebank. • SST-2: Same as SST-1 but with neutral reviews removed. • Subj: Subjectivity dataset. • TREC: TREC question dataset—task. • CR: Customer reviews of various products.

Results These results suggest that the pretrained vectors are good, ‘universal’ feature extractors and can be utilized across datasets. Fine tuning the pre-trained vectors for each task gives still further improvements (CNN-non-static).

conclusion The present work has described a series of experiments with convolutional neural networks built on top of word2vec. Despite little tuning of hyperparameters, a simple CNN with one layer of convolution performs remarkably well. Our results add to the well-established evidence that unsupervised pre-training of word vectors is an important ingredient in deep learning for NLP.