Natural Language and Text Processing Laboratory

Natural Language and Text Processing Laboratory
Recurrent Neural Network and Deep Learning in Natural Language Processing Natural Language and Text Processing Laboratory University of Tehran

Recurrent Neural Networks
Heshaam Faili, Associate Professor University of Tehran Ali Vardasbi Phd Candidate University of Tehran Behrooz Vedadian Phd Candidate AmirKabir university of Technology Hakimeh Fadae Phd Candidate University of Tehran Golshan Afzali Phd Candidate University of Tehran Recurrent Neural Networks

Agenda Session1: RNN RNN, Concept, applications (by Faili) Training, LSTM, GRU (by Vardasbi) Session 2: LM and RNN RNN for Language Model (by Faili) Word2vec, glove (by Vardasbi) Session 3: deep learning in NLP applications Machine Translation(by Fadae) Machine Translation(by Vedadian) Question Answering system (by Afzali) Recurrent Neural Networks

Feature Engineering Traditional ML Method
Train data (supervised or unsupervised Feature Extraction Feature Engineering ML methods Your text here Training phase Test phase Model Input Feature Extraction Output I Recurrent Neural Networks

Reasons for Exploring Deep Learning
Manually designed features are often over-specified, incomplete and take a long time to design and validate Learned Features are easy to adapt, fast to learn Deep learning provides a very flexible, (almost?) universal, learnable framework for representing world, visual and linguistic information. Deep learning can learn unsupervised (from raw data) and supervised (with specific labels like positive/negative) Recurrent Neural Networks

Reasons for Exploring Deep Learning
In 2006 deep learning techniques started outperforming other machine learning techniques. Why now? DL techniques benefit more from a lot of data Faster machines and multicore CPU/GPU help DL New methods for unsupervised pre-training have been developed Restricted Boltzmann Machines = RBMs, autoencoders, contrastive estimation, etc. More efficient parameter estimation methods Better understanding of model regularization, Improved performance (first in speech and vision, then NLP) Recurrent Neural Networks

Deep Learning + NLP = Deep NLP
Combine ideas and goals of NLP and use representation learning and deep learning methods to solve them Several big improvements in recent years across different NLP levels: speech, morphology, syntax, semantics applications: machine translation, sentiment analysis and question answering, Recurrent Neural Networks

Representations at NLP Levels: Morphology
Deep Learning Recurrent Neural Networks

Representations at NLP Levels: Syntax
Recurrent Neural Networks

Representations at NLP Levels: Semantics

NLP Applications: Sentiment Analysis
Traditional: Curated sentiment dictionaries combined with either bag-of-words representations (ignoring word order) or hand designed negation features (not going to capture everything) Same deep learning model that was used for morphology, syntax and logical semantics can be used!  Recursive NN Recurrent Neural Networks

Question Answering Recurrent Neural Networks

Machine Translation Many levels of translation have been tried in the past: Traditional MT systems are very large complex systems What do you think is the interlingua for the DL approach to translation? Recurrent Neural Networks

Machine Translation Source sentence mapped to vector, then output sentence generated. Sequence to Sequence Learning with Neural Networks by Sutskever et al. 2014; Luong et al. 2016 About to replace very complex hand engineered architectures Recurrent Neural Networks

Recurrent Neural Networks (RNN) Recusrive Neural Network (RNN) ?

Recurrent Networks Some problems require previous history/context in order to be able to give proper output (speech recognition, stock forecasting, target tracking, etc. One way to do that is to just provide all the necessary context in one "snap-shot" and use standard learning How big should the snap-shot be? Varies for different instances of the problem. Recurrent Neural Networks

Motivation Not all problems can be converted into one with fixedlength inputs and outputs Problems such as Speech Recognition or Time-series Prediction require a system to store and use context information Simple case: Output YES if the number of 1s is even, else NO – YES, – NO, … Hard/Impossible to choose a fixed context window There can always be a new sample longer than anything seen Recurrent Neural Networks

Recurrent Networks Another option is to use a recurrent neural network which lets the network dynamically learn how much context it needs in order to solve the problem Speech Example – Vowels vs Consonants, etc. Acts like a state machine that will give different outputs for the current input depending on the current state Recurrent nets must learn and use this state/context information in order to get high accuracy on the task Temporal Deep Network Recurrent Neural Networks

Recurrent Neural Networks (RNNs)
Recurrent Neural Networks take the previous output or hidden states as inputs.! The composite input at time t has some historical information about the happenings at time T < t RNNs are useful as their intermediate values (state) can store information about past inputs for a time that is not fixed a priori Recurrent Neural Networks

Recursive NN Recurrent neural networks are recursive artificial neural networks recursive neural networks operate on any hierarchical structure, recurrent neural networks operate on the linear progression of time, Recurrent Neural Networks

Sample RNN Recurrent Neural Networks

Recurrent neural networks
RNNs are very powerful, because they combine two properties: Distributed hidden state that allows them to store a lot of information about the past efficiently. Non-linear dynamics that allows them to update their hidden state in complicated ways. With enough neurons and time, RNNs can compute anything that can be computed by your computer. Recurrent Neural Networks

Recurrent neural networks

RNN examples Some simple examples of RNNs. This one sums its inputs: Recurrent Neural Networks

RNN examples This one determines if the total values of the first or second input are larger: Recurrent Neural Networks

Example: Parity Assume we have a sequence of binary inputs. We’ll consider how to determine the parity, i.e. whether the number of 1’s is even or odd. We can compute parity incrementally by keeping track of the parity of the input so far: Parity bits:  Input: Each parity bit is the XOR of the input and the previous parity bit. Parity is a classic example of a problem that’s hard to solve with a shallow feed-forward net, but easy to solve with an RNN. Recurrent Neural Networks

Example: Parity Assume we have a sequence of binary inputs. We’ll consider how to determine the parity, i.e. whether the number of 1’s is even or odd. Let’s find weights and biases for the RNN on the right so that it computes the parity. All hidden and output units are binary threshold units. Strategy: The output unit tracks the current parity, which is the XOR of the current input and previous output. The hidden units help us compute the XOR. Recurrent Neural Networks

Parity Check Recurrent Neural Networks

Parity Recurrent Neural Networks

Example: Parity The output unit should compute the XOR of the current input and previous output: Recurrent Neural Networks

Recurrent Neural Networks (RNNs)
Note that the weights are shared over time Essentially, copies of the RNN cell are made over time (unrolling/unfolding), with different inputs at different time steps The Vanilla RNN Forward Recurrent Neural Networks

Sentiment Classification
Classify a restaurant review from Yelp! OR movie review from IMDB OR … as positive or negative Inputs: Multiple words, one or more sentences Outputs: Positive / Negative classification “The food was really good” “The chicken crossed the road because it was uncooked” Recurrent Neural Networks

RNN h1 The

RNN RNN h1 h2 The food

hn RNN RNN RNN h1 h2 hn-1 The food good

Linear Classifier hn RNN RNN RNN h1 h2 hn-1 The food good

Ignore Ignore Linear Classifier h1 h2 hn RNN RNN RNN h1 h2 hn-1 The food good

h = Sum(…) h1 hn h2 RNN RNN RNN h1 h2 hn-1 The food good

Linear Classifier h = Sum(…) h1 hn h2 RNN RNN RNN h1 h2 hn-1 The food good

Image Captioning Given an image, produce a sentence describing its contents Inputs: Image feature (from a CNN)! Outputs: Multiple words (let’s consider one sentence) Recurrent Neural Networks

Image Captioning Recurrent Neural Networks

RNN Outputs: Image Captions

Language model language model is a probability distribution over sequences of words. Given such a sequence, say of length m, it assigns a probability p(W1, …,WN) to the whole sentence. Language modeling is used in speech recognition, machine translation, part-of-speech tagging, parsing, handwriting recognition, information retrieval and other applications. Tranditional method: Ngram Neural language model Recurrent Neural Networks

Recurrent Neural Network language model

Neural Machine Translation
We’d like to translate, e.g., English to French sentences, and we have pairs of translated sentences to train on. What’s wrong with the following setup? The sentences might not be the same length, and the words might not align perfectly. You might need to resolve ambiguities using information from later in the sentence. Recurrent Neural Networks

Neural Machine Translation
Instead, the network first reads and memorizes the sentence. When it sees the END token, it starts outputting the translation. Recurrent Neural Networks

The Vanilla RNN Cell Recurrent Neural Networks

The Vanilla RNN Forward
x2 h1 C2 y2 h1 x1 h0 C1 y1 h3 x3 h2 C3 y3 “Unfold” network through time by making copies at each time-step

BackPropagation Refresher
y f(x; W) x

Multiple Layers C y2 f2(y1; W2) y1 f1(x; W1) x

Chain Rule for Gradient Computation
y2 f2(y1; W2) y1 f1(x; W1) x Application of the Chain Rule

Given: We are interested in computing: y Intrinsic to the layer are: f(x; W) x

Given: We are interested in computing: Intrinsic to the layer are: f(x; W)

BackPropagation Through Time (BPTT)
One of the methods used to train RNNs The unfolded network (used during forward pass) is treated as one big feed-forward network This unfolded network accepts the whole time series as input The weight updates are computed for each copy in the unfolded network, then summed (or averaged) and then applied to the RNN weights

The Unfolded Vanilla RNN
Treat the unfolded network as one big feed-forward network! This big network takes in entire sequence as an input Compute gradients through the usual backpropagation Update shared weights Recurrent Neural Networks

The Vanilla RNN Backward

Natural Language and Text Processing Laboratory

Similar presentations

Presentation on theme: "Natural Language and Text Processing Laboratory"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Natural Language and Text Processing Laboratory

Similar presentations

Presentation on theme: "Natural Language and Text Processing Laboratory"— Presentation transcript:

Similar presentations

About project

Feedback