Download presentation
Presentation is loading. Please wait.
Published byReynold Anderson Modified over 6 years ago
1
Natural Language and Text Processing Laboratory
Recurrent Neural Network and Deep Learning in Natural Language Processing Natural Language and Text Processing Laboratory University of Tehran
2
Recurrent Neural Networks
Heshaam Faili, Associate Professor University of Tehran Ali Vardasbi Phd Candidate University of Tehran Behrooz Vedadian Phd Candidate AmirKabir university of Technology Hakimeh Fadae Phd Candidate University of Tehran Golshan Afzali Phd Candidate University of Tehran Recurrent Neural Networks
3
Recurrent Neural Networks
Agenda Session1: RNN RNN, Concept, applications (by Faili) Training, LSTM, GRU (by Vardasbi) Session 2: LM and RNN RNN for Language Model (by Faili) Word2vec, glove (by Vardasbi) Session 3: deep learning in NLP applications Machine Translation(by Fadae) Machine Translation(by Vedadian) Question Answering system (by Afzali) Recurrent Neural Networks
4
Feature Engineering Traditional ML Method
Train data (supervised or unsupervised Feature Extraction Feature Engineering ML methods Your text here Training phase Test phase Model Input Feature Extraction Output I Recurrent Neural Networks
5
Reasons for Exploring Deep Learning
Manually designed features are often over-specified, incomplete and take a long time to design and validate Learned Features are easy to adapt, fast to learn Deep learning provides a very flexible, (almost?) universal, learnable framework for representing world, visual and linguistic information. Deep learning can learn unsupervised (from raw data) and supervised (with specific labels like positive/negative) Recurrent Neural Networks
6
Reasons for Exploring Deep Learning
In 2006 deep learning techniques started outperforming other machine learning techniques. Why now? DL techniques benefit more from a lot of data Faster machines and multicore CPU/GPU help DL New methods for unsupervised pre-training have been developed Restricted Boltzmann Machines = RBMs, autoencoders, contrastive estimation, etc. More efficient parameter estimation methods Better understanding of model regularization, Improved performance (first in speech and vision, then NLP) Recurrent Neural Networks
7
Deep Learning + NLP = Deep NLP
Combine ideas and goals of NLP and use representation learning and deep learning methods to solve them Several big improvements in recent years across different NLP levels: speech, morphology, syntax, semantics applications: machine translation, sentiment analysis and question answering, Recurrent Neural Networks
8
Representations at NLP Levels: Morphology
Deep Learning Recurrent Neural Networks
9
Representations at NLP Levels: Syntax
Recurrent Neural Networks
10
Representations at NLP Levels: Semantics
Recurrent Neural Networks
11
NLP Applications: Sentiment Analysis
Traditional: Curated sentiment dictionaries combined with either bag-of-words representations (ignoring word order) or hand designed negation features (not going to capture everything) Same deep learning model that was used for morphology, syntax and logical semantics can be used! Recursive NN Recurrent Neural Networks
12
Recurrent Neural Networks
Question Answering Recurrent Neural Networks
13
Recurrent Neural Networks
Machine Translation Many levels of translation have been tried in the past: Traditional MT systems are very large complex systems What do you think is the interlingua for the DL approach to translation? Recurrent Neural Networks
14
Recurrent Neural Networks
Machine Translation Source sentence mapped to vector, then output sentence generated. Sequence to Sequence Learning with Neural Networks by Sutskever et al. 2014; Luong et al. 2016 About to replace very complex hand engineered architectures Recurrent Neural Networks
15
Recurrent Neural Networks (RNN) Recusrive Neural Network (RNN) ?
16
Recurrent Neural Networks
Recurrent Networks Some problems require previous history/context in order to be able to give proper output (speech recognition, stock forecasting, target tracking, etc. One way to do that is to just provide all the necessary context in one "snap-shot" and use standard learning How big should the snap-shot be? Varies for different instances of the problem. Recurrent Neural Networks
17
Recurrent Neural Networks
Motivation Not all problems can be converted into one with fixedlength inputs and outputs Problems such as Speech Recognition or Time-series Prediction require a system to store and use context information Simple case: Output YES if the number of 1s is even, else NO – YES, – NO, … Hard/Impossible to choose a fixed context window There can always be a new sample longer than anything seen Recurrent Neural Networks
18
Recurrent Neural Networks
Recurrent Networks Another option is to use a recurrent neural network which lets the network dynamically learn how much context it needs in order to solve the problem Speech Example – Vowels vs Consonants, etc. Acts like a state machine that will give different outputs for the current input depending on the current state Recurrent nets must learn and use this state/context information in order to get high accuracy on the task Temporal Deep Network Recurrent Neural Networks
19
Recurrent Neural Networks (RNNs)
Recurrent Neural Networks take the previous output or hidden states as inputs.! The composite input at time t has some historical information about the happenings at time T < t RNNs are useful as their intermediate values (state) can store information about past inputs for a time that is not fixed a priori Recurrent Neural Networks
20
Recurrent Neural Networks
Recursive NN Recurrent neural networks are recursive artificial neural networks recursive neural networks operate on any hierarchical structure, recurrent neural networks operate on the linear progression of time, Recurrent Neural Networks
21
Recurrent Neural Networks
22
Recurrent Neural Networks
Sample RNN Recurrent Neural Networks
23
Recurrent neural networks
RNNs are very powerful, because they combine two properties: Distributed hidden state that allows them to store a lot of information about the past efficiently. Non-linear dynamics that allows them to update their hidden state in complicated ways. With enough neurons and time, RNNs can compute anything that can be computed by your computer. Recurrent Neural Networks
24
Recurrent neural networks
25
Recurrent Neural Networks
RNN examples Some simple examples of RNNs. This one sums its inputs: Recurrent Neural Networks
26
Recurrent Neural Networks
RNN examples This one determines if the total values of the first or second input are larger: Recurrent Neural Networks
27
Recurrent Neural Networks
28
Recurrent Neural Networks
29
Recurrent Neural Networks
30
Recurrent Neural Networks
31
Recurrent Neural Networks
32
Recurrent Neural Networks
33
Recurrent Neural Networks
34
Recurrent Neural Networks
35
Recurrent Neural Networks
36
Recurrent Neural Networks
Example: Parity Assume we have a sequence of binary inputs. We’ll consider how to determine the parity, i.e. whether the number of 1’s is even or odd. We can compute parity incrementally by keeping track of the parity of the input so far: Parity bits: Input: Each parity bit is the XOR of the input and the previous parity bit. Parity is a classic example of a problem that’s hard to solve with a shallow feed-forward net, but easy to solve with an RNN. Recurrent Neural Networks
37
Recurrent Neural Networks
Example: Parity Assume we have a sequence of binary inputs. We’ll consider how to determine the parity, i.e. whether the number of 1’s is even or odd. Let’s find weights and biases for the RNN on the right so that it computes the parity. All hidden and output units are binary threshold units. Strategy: The output unit tracks the current parity, which is the XOR of the current input and previous output. The hidden units help us compute the XOR. Recurrent Neural Networks
38
Recurrent Neural Networks
Parity Check Recurrent Neural Networks
39
Recurrent Neural Networks
40
Recurrent Neural Networks
Parity Recurrent Neural Networks
41
Recurrent Neural Networks
Example: Parity The output unit should compute the XOR of the current input and previous output: Recurrent Neural Networks
42
Recurrent Neural Networks
43
Recurrent Neural Networks
44
Recurrent Neural Networks
45
Recurrent Neural Networks
46
Recurrent Neural Networks
47
Recurrent Neural Networks (RNNs)
Note that the weights are shared over time Essentially, copies of the RNN cell are made over time (unrolling/unfolding), with different inputs at different time steps The Vanilla RNN Forward Recurrent Neural Networks
48
Sentiment Classification
Classify a restaurant review from Yelp! OR movie review from IMDB OR … as positive or negative Inputs: Multiple words, one or more sentences Outputs: Positive / Negative classification “The food was really good” “The chicken crossed the road because it was uncooked” Recurrent Neural Networks
49
Sentiment Classification
RNN h1 The
50
Sentiment Classification
RNN RNN h1 h2 The food
51
Sentiment Classification
hn RNN RNN RNN h1 h2 hn-1 The food good
52
Sentiment Classification
Linear Classifier hn RNN RNN RNN h1 h2 hn-1 The food good
53
Sentiment Classification
Ignore Ignore Linear Classifier h1 h2 hn RNN RNN RNN h1 h2 hn-1 The food good
54
Sentiment Classification
h = Sum(…) h1 hn h2 RNN RNN RNN h1 h2 hn-1 The food good
55
Sentiment Classification
Linear Classifier h = Sum(…) h1 hn h2 RNN RNN RNN h1 h2 hn-1 The food good
56
Recurrent Neural Networks
Image Captioning Given an image, produce a sentence describing its contents Inputs: Image feature (from a CNN)! Outputs: Multiple words (let’s consider one sentence) Recurrent Neural Networks
57
Recurrent Neural Networks
Image Captioning Recurrent Neural Networks
58
RNN Outputs: Image Captions
Recurrent Neural Networks
59
Recurrent Neural Networks
Language model language model is a probability distribution over sequences of words. Given such a sequence, say of length m, it assigns a probability p(W1, …,WN) to the whole sentence. Language modeling is used in speech recognition, machine translation, part-of-speech tagging, parsing, handwriting recognition, information retrieval and other applications. Tranditional method: Ngram Neural language model Recurrent Neural Networks
60
Recurrent Neural Network language model
Recurrent Neural Networks
61
Recurrent Neural Network language model
Recurrent Neural Networks
62
Neural Machine Translation
We’d like to translate, e.g., English to French sentences, and we have pairs of translated sentences to train on. What’s wrong with the following setup? The sentences might not be the same length, and the words might not align perfectly. You might need to resolve ambiguities using information from later in the sentence. Recurrent Neural Networks
63
Neural Machine Translation
Instead, the network first reads and memorizes the sentence. When it sees the END token, it starts outputting the translation. Recurrent Neural Networks
64
Recurrent Neural Networks
The Vanilla RNN Cell Recurrent Neural Networks
65
The Vanilla RNN Forward
x2 h1 C2 y2 h1 x1 h0 C1 y1 h3 x3 h2 C3 y3 “Unfold” network through time by making copies at each time-step
66
BackPropagation Refresher
y f(x; W) x
67
Multiple Layers C y2 f2(y1; W2) y1 f1(x; W1) x
68
Chain Rule for Gradient Computation
y2 f2(y1; W2) y1 f1(x; W1) x Application of the Chain Rule
69
Chain Rule for Gradient Computation
Given: We are interested in computing: y Intrinsic to the layer are: f(x; W) x
70
Chain Rule for Gradient Computation
Given: We are interested in computing: Intrinsic to the layer are: f(x; W)
71
BackPropagation Through Time (BPTT)
One of the methods used to train RNNs The unfolded network (used during forward pass) is treated as one big feed-forward network This unfolded network accepts the whole time series as input The weight updates are computed for each copy in the unfolded network, then summed (or averaged) and then applied to the RNN weights
72
The Unfolded Vanilla RNN
Treat the unfolded network as one big feed-forward network! This big network takes in entire sequence as an input Compute gradients through the usual backpropagation Update shared weights Recurrent Neural Networks
73
The Vanilla RNN Backward
Recurrent Neural Networks
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.