Natural Language and Text Processing Laboratory

Slides:

Advertisements

Similar presentations

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 7: Learning in recurrent networks Geoffrey Hinton.

Advertisements

Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.

Lecture 14 – Neural Networks

Recurrent Neural Networks

Distributed Representations of Sentences and Documents

Artificial Intelligence Chapter 3 Neural Networks Artificial Intelligence Chapter 3 Neural Networks Biointelligence Lab School of Computer Sci. & Eng.

CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.

Chapter 6 Neural Network.

Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Mastering the Pipeline CSCI-GA.2590 Ralph Grishman NYU.

Sparse Coding: A Deep Learning using Unlabeled Data for High - Level Representation Dr.G.M.Nasira R. Vidya R. P. Jaia Priyankka.

Distributed Representations for Natural Language Processing

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Attention Model in NLP Jichuan ZENG.

Today’s Lecture Neural networks Training

Sentiment analysis using deep learning methods

Best viewed with Computer Modern fonts installed

Unsupervised Learning of Video Representations using LSTMs

Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 2, 2017.

CS 388: Natural Language Processing: LSTM Recurrent Neural Networks

Deep Learning Amin Sobhani.

an introduction to: Deep Learning

Sentence Modeling Representation of sentences is the heart of Natural Language Processing A sentence model is a representation and analysis of semantic.

CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.

Recursive Neural Networks

Recurrent Neural Networks for Natural Language Processing

第 3 章神经网络.

Matt Gormley Lecture 16 October 24, 2016

Deep Learning: Model Summary

Intro to NLP and Deep Learning

ICS 491 Big Data Analytics Fall 2017 Deep Learning

Intelligent Information System Lab

Intro to NLP and Deep Learning

Different Units Ramakrishna Vedantam.

Neural networks (3) Regularization Autoencoder

Neural Networks 2 CS446 Machine Learning.

Convolutional Networks

Shunyuan Zhang Nikhil Malik

Distributed Representation of Words, Sentences and Paragraphs

Convolutional Neural Networks for sentence classification

Grid Long Short-Term Memory

Hidden Markov Models Part 2: Algorithms

Advanced Artificial Intelligence

Image Captions With Deep Learning Yulia Kogan & Ron Shiff

Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 8, 2018.

Artificial Intelligence Chapter 3 Neural Networks

Neural Networks Geoff Hulten.

Other Classification Models: Recurrent Neural Network (RNN)

Deep Learning for Non-Linear Control

ML – Lecture 3B Deep NN.

Artificial Intelligence Chapter 3 Neural Networks

Lecture 16: Recurrent Neural Networks (RNNs)

Artificial Intelligence Chapter 3 Neural Networks

Graph Neural Networks Amog Kamsetty January 30, 2019.

Artificial Intelligence Chapter 3 Neural Networks

Neural networks (3) Regularization Autoencoder

Presentation By: Eryk Helenowski PURE Mentor: Vincent Bindschaedler

Word embeddings (continued)

Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton

Attention for translation

-- Ray Mooney, Association for Computational Linguistics (ACL) 2014

An introduction to: Deep Learning aka or related to Deep Neural Networks Deep Structural Learning Deep Belief Networks etc,

Automatic Handwriting Generation

Introduction to Neural Networks

Word representations David Kauchak CS158 – Fall 2016.

Recurrent Neural Networks

Neural Machine Translation by Jointly Learning to Align and Translate

Artificial Intelligence Chapter 3 Neural Networks

Presentation transcript:

Natural Language and Text Processing Laboratory Recurrent Neural Network and Deep Learning in Natural Language Processing Natural Language and Text Processing Laboratory University of Tehran

Recurrent Neural Networks Heshaam Faili, Associate Professor University of Tehran Ali Vardasbi Phd Candidate University of Tehran Behrooz Vedadian Phd Candidate AmirKabir university of Technology Hakimeh Fadae Phd Candidate University of Tehran Golshan Afzali Phd Candidate University of Tehran Recurrent Neural Networks

Recurrent Neural Networks Agenda Session1: RNN RNN, Concept, applications (by Faili) Training, LSTM, GRU (by Vardasbi) Session 2: LM and RNN RNN for Language Model (by Faili) Word2vec, glove (by Vardasbi) Session 3: deep learning in NLP applications Machine Translation(by Fadae) Machine Translation(by Vedadian) Question Answering system (by Afzali) Recurrent Neural Networks

Feature Engineering Traditional ML Method Train data (supervised or unsupervised Feature Extraction Feature Engineering ML methods Your text here Training phase Test phase Model Input Feature Extraction Output I Recurrent Neural Networks

Reasons for Exploring Deep Learning Manually designed features are often over-specified, incomplete and take a long time to design and validate Learned Features are easy to adapt, fast to learn Deep learning provides a very flexible, (almost?) universal, learnable framework for representing world, visual and linguistic information. Deep learning can learn unsupervised (from raw data) and supervised (with specific labels like positive/negative) Recurrent Neural Networks

Reasons for Exploring Deep Learning In 2006 deep learning techniques started outperforming other machine learning techniques. Why now? DL techniques benefit more from a lot of data Faster machines and multicore CPU/GPU help DL New methods for unsupervised pre-training have been developed Restricted Boltzmann Machines = RBMs, autoencoders, contrastive estimation, etc. More efficient parameter estimation methods Better understanding of model regularization, Improved performance (first in speech and vision, then NLP) Recurrent Neural Networks

Deep Learning + NLP = Deep NLP Combine ideas and goals of NLP and use representation learning and deep learning methods to solve them Several big improvements in recent years across different NLP levels: speech, morphology, syntax, semantics applications: machine translation, sentiment analysis and question answering, Recurrent Neural Networks

Representations at NLP Levels: Morphology Deep Learning Recurrent Neural Networks

Representations at NLP Levels: Syntax Recurrent Neural Networks

Representations at NLP Levels: Semantics Recurrent Neural Networks

NLP Applications: Sentiment Analysis Traditional: Curated sentiment dictionaries combined with either bag-of-words representations (ignoring word order) or hand designed negation features (not going to capture everything) Same deep learning model that was used for morphology, syntax and logical semantics can be used!  Recursive NN Recurrent Neural Networks

Recurrent Neural Networks Question Answering Recurrent Neural Networks

Recurrent Neural Networks Machine Translation Many levels of translation have been tried in the past: Traditional MT systems are very large complex systems What do you think is the interlingua for the DL approach to translation? Recurrent Neural Networks

Recurrent Neural Networks Machine Translation Source sentence mapped to vector, then output sentence generated. Sequence to Sequence Learning with Neural Networks by Sutskever et al. 2014; Luong et al. 2016 About to replace very complex hand engineered architectures Recurrent Neural Networks

Recurrent Neural Networks (RNN) Recusrive Neural Network (RNN) ?

Recurrent Neural Networks Recurrent Networks Some problems require previous history/context in order to be able to give proper output (speech recognition, stock forecasting, target tracking, etc. One way to do that is to just provide all the necessary context in one "snap-shot" and use standard learning How big should the snap-shot be? Varies for different instances of the problem. Recurrent Neural Networks

Recurrent Neural Networks Motivation Not all problems can be converted into one with fixedlength inputs and outputs Problems such as Speech Recognition or Time-series Prediction require a system to store and use context information Simple case: Output YES if the number of 1s is even, else NO 1000010101 – YES, 100011 – NO, … Hard/Impossible to choose a fixed context window There can always be a new sample longer than anything seen Recurrent Neural Networks

Recurrent Neural Networks Recurrent Networks Another option is to use a recurrent neural network which lets the network dynamically learn how much context it needs in order to solve the problem Speech Example – Vowels vs Consonants, etc. Acts like a state machine that will give different outputs for the current input depending on the current state Recurrent nets must learn and use this state/context information in order to get high accuracy on the task Temporal Deep Network Recurrent Neural Networks

Recurrent Neural Networks (RNNs) Recurrent Neural Networks take the previous output or hidden states as inputs.! The composite input at time t has some historical information about the happenings at time T < t RNNs are useful as their intermediate values (state) can store information about past inputs for a time that is not fixed a priori Recurrent Neural Networks

Recurrent Neural Networks Recursive NN Recurrent neural networks are recursive artificial neural networks recursive neural networks operate on any hierarchical structure, recurrent neural networks operate on the linear progression of time, Recurrent Neural Networks

Recurrent Neural Networks

Recurrent Neural Networks Sample RNN Recurrent Neural Networks

Recurrent neural networks RNNs are very powerful, because they combine two properties: Distributed hidden state that allows them to store a lot of information about the past efficiently. Non-linear dynamics that allows them to update their hidden state in complicated ways. With enough neurons and time, RNNs can compute anything that can be computed by your computer. Recurrent Neural Networks

Recurrent neural networks

Recurrent Neural Networks RNN examples Some simple examples of RNNs. This one sums its inputs: Recurrent Neural Networks

Recurrent Neural Networks RNN examples This one determines if the total values of the first or second input are larger: Recurrent Neural Networks

Recurrent Neural Networks

Recurrent Neural Networks

Recurrent Neural Networks

Recurrent Neural Networks

Recurrent Neural Networks

Recurrent Neural Networks

Recurrent Neural Networks

Recurrent Neural Networks

Recurrent Neural Networks

Recurrent Neural Networks Example: Parity Assume we have a sequence of binary inputs. We’ll consider how to determine the parity, i.e. whether the number of 1’s is even or odd. We can compute parity incrementally by keeping track of the parity of the input so far: Parity bits: 0 1 1 0 1 1  Input: 0 1 0 1 1 0 1 0 1 1 Each parity bit is the XOR of the input and the previous parity bit. Parity is a classic example of a problem that’s hard to solve with a shallow feed-forward net, but easy to solve with an RNN. Recurrent Neural Networks

Recurrent Neural Networks Example: Parity Assume we have a sequence of binary inputs. We’ll consider how to determine the parity, i.e. whether the number of 1’s is even or odd. Let’s find weights and biases for the RNN on the right so that it computes the parity. All hidden and output units are binary threshold units. Strategy: The output unit tracks the current parity, which is the XOR of the current input and previous output. The hidden units help us compute the XOR. Recurrent Neural Networks

Recurrent Neural Networks Parity Check Recurrent Neural Networks

Recurrent Neural Networks

Recurrent Neural Networks Parity Recurrent Neural Networks

Recurrent Neural Networks Example: Parity The output unit should compute the XOR of the current input and previous output: Recurrent Neural Networks

Recurrent Neural Networks

Recurrent Neural Networks

Recurrent Neural Networks

Recurrent Neural Networks

Recurrent Neural Networks

Recurrent Neural Networks (RNNs) Note that the weights are shared over time Essentially, copies of the RNN cell are made over time (unrolling/unfolding), with different inputs at different time steps The Vanilla RNN Forward Recurrent Neural Networks

Sentiment Classification Classify a restaurant review from Yelp! OR movie review from IMDB OR … as positive or negative Inputs: Multiple words, one or more sentences Outputs: Positive / Negative classification “The food was really good” “The chicken crossed the road because it was uncooked” Recurrent Neural Networks

Sentiment Classification RNN h1 The

Sentiment Classification RNN RNN h1 h2 The food

Sentiment Classification hn RNN RNN RNN h1 h2 hn-1 The food good

Sentiment Classification Linear Classifier hn RNN RNN RNN h1 h2 hn-1 The food good

Sentiment Classification Ignore Ignore Linear Classifier h1 h2 hn RNN RNN RNN h1 h2 hn-1 The food good

Sentiment Classification h = Sum(…) h1 hn h2 RNN RNN RNN h1 h2 hn-1 The food good http://deeplearning.net/tutorial/lstm.html

Sentiment Classification Linear Classifier h = Sum(…) h1 hn h2 RNN RNN RNN h1 h2 hn-1 The food good http://deeplearning.net/tutorial/lstm.html

Recurrent Neural Networks Image Captioning Given an image, produce a sentence describing its contents Inputs: Image feature (from a CNN)! Outputs: Multiple words (let’s consider one sentence) Recurrent Neural Networks

Recurrent Neural Networks Image Captioning Recurrent Neural Networks

RNN Outputs: Image Captions Recurrent Neural Networks

Recurrent Neural Networks Language model language model is a probability distribution over sequences of words. Given such a sequence, say of length m, it assigns a probability p(W1, …,WN) to the whole sentence. Language modeling is used in speech recognition, machine translation, part-of-speech tagging, parsing, handwriting recognition, information retrieval and other applications. Tranditional method: Ngram Neural language model Recurrent Neural Networks

Recurrent Neural Network language model Recurrent Neural Networks

Recurrent Neural Network language model Recurrent Neural Networks

Neural Machine Translation We’d like to translate, e.g., English to French sentences, and we have pairs of translated sentences to train on. What’s wrong with the following setup? The sentences might not be the same length, and the words might not align perfectly. You might need to resolve ambiguities using information from later in the sentence. Recurrent Neural Networks

Neural Machine Translation Instead, the network first reads and memorizes the sentence. When it sees the END token, it starts outputting the translation. Recurrent Neural Networks

Recurrent Neural Networks The Vanilla RNN Cell Recurrent Neural Networks

The Vanilla RNN Forward x2 h1 C2 y2 h1 x1 h0 C1 y1 h3 x3 h2 C3 y3 “Unfold” network through time by making copies at each time-step

BackPropagation Refresher y f(x; W) x

Multiple Layers C y2 f2(y1; W2) y1 f1(x; W1) x

Chain Rule for Gradient Computation y2 f2(y1; W2) y1 f1(x; W1) x Application of the Chain Rule

Chain Rule for Gradient Computation Given: We are interested in computing: y Intrinsic to the layer are: f(x; W) x

Chain Rule for Gradient Computation Given: We are interested in computing: Intrinsic to the layer are: f(x; W)

BackPropagation Through Time (BPTT) One of the methods used to train RNNs The unfolded network (used during forward pass) is treated as one big feed-forward network This unfolded network accepts the whole time series as input The weight updates are computed for each copy in the unfolded network, then summed (or averaged) and then applied to the RNN weights

The Unfolded Vanilla RNN Treat the unfolded network as one big feed-forward network! This big network takes in entire sequence as an input Compute gradients through the usual backpropagation Update shared weights Recurrent Neural Networks

The Vanilla RNN Backward Recurrent Neural Networks