Presentation is loading. Please wait.

Presentation is loading. Please wait.

Natural Language and Text Processing Laboratory

Similar presentations


Presentation on theme: "Natural Language and Text Processing Laboratory"— Presentation transcript:

1 Natural Language and Text Processing Laboratory
Recurrent Neural Network and Deep Learning in Natural Language Processing Natural Language and Text Processing Laboratory University of Tehran

2 Recurrent Neural Networks
Heshaam Faili, Associate Professor University of Tehran Ali Vardasbi Phd Candidate University of Tehran Behrooz Vedadian Phd Candidate AmirKabir university of Technology Hakimeh Fadae Phd Candidate University of Tehran Golshan Afzali Phd Candidate University of Tehran Recurrent Neural Networks

3 Recurrent Neural Networks
Agenda Session1: RNN RNN, Concept, applications (by Faili) Training, LSTM, GRU (by Vardasbi) Session 2: LM and RNN RNN for Language Model (by Faili) Word2vec, glove (by Vardasbi) Session 3: deep learning in NLP applications Machine Translation(by Fadae) Machine Translation(by Vedadian) Question Answering system (by Afzali) Recurrent Neural Networks

4 Feature Engineering Traditional ML Method
Train data (supervised or unsupervised Feature Extraction Feature Engineering ML methods Your text here Training phase Test phase Model Input Feature Extraction Output I Recurrent Neural Networks

5 Reasons for Exploring Deep Learning
Manually designed features are often over-specified, incomplete and take a long time to design and validate Learned Features are easy to adapt, fast to learn Deep learning provides a very flexible, (almost?) universal, learnable framework for representing world, visual and linguistic information. Deep learning can learn unsupervised (from raw data) and supervised (with specific labels like positive/negative) Recurrent Neural Networks

6 Reasons for Exploring Deep Learning
In 2006 deep learning techniques started outperforming other machine learning techniques. Why now? DL techniques benefit more from a lot of data Faster machines and multicore CPU/GPU help DL New methods for unsupervised pre-training have been developed Restricted Boltzmann Machines = RBMs, autoencoders, contrastive estimation, etc. More efficient parameter estimation methods Better understanding of model regularization, Improved performance (first in speech and vision, then NLP) Recurrent Neural Networks

7 Deep Learning + NLP = Deep NLP
Combine ideas and goals of NLP and use representation learning and deep learning methods to solve them Several big improvements in recent years across different NLP levels: speech, morphology, syntax, semantics applications: machine translation, sentiment analysis and question answering, Recurrent Neural Networks

8 Representations at NLP Levels: Morphology
Deep Learning Recurrent Neural Networks

9 Representations at NLP Levels: Syntax
Recurrent Neural Networks

10 Representations at NLP Levels: Semantics
Recurrent Neural Networks

11 NLP Applications: Sentiment Analysis
Traditional: Curated sentiment dictionaries combined with either bag-of-words representations (ignoring word order) or hand designed negation features (not going to capture everything) Same deep learning model that was used for morphology, syntax and logical semantics can be used!  Recursive NN Recurrent Neural Networks

12 Recurrent Neural Networks
Question Answering Recurrent Neural Networks

13 Recurrent Neural Networks
Machine Translation Many levels of translation have been tried in the past: Traditional MT systems are very large complex systems What do you think is the interlingua for the DL approach to translation? Recurrent Neural Networks

14 Recurrent Neural Networks
Machine Translation Source sentence mapped to vector, then output sentence generated. Sequence to Sequence Learning with Neural Networks by Sutskever et al. 2014; Luong et al. 2016 About to replace very complex hand engineered architectures Recurrent Neural Networks

15 Recurrent Neural Networks (RNN) Recusrive Neural Network (RNN) ?

16 Recurrent Neural Networks
Recurrent Networks Some problems require previous history/context in order to be able to give proper output (speech recognition, stock forecasting, target tracking, etc. One way to do that is to just provide all the necessary context in one "snap-shot" and use standard learning How big should the snap-shot be? Varies for different instances of the problem. Recurrent Neural Networks

17 Recurrent Neural Networks
Motivation Not all problems can be converted into one with fixedlength inputs and outputs Problems such as Speech Recognition or Time-series Prediction require a system to store and use context information Simple case: Output YES if the number of 1s is even, else NO – YES, – NO, … Hard/Impossible to choose a fixed context window There can always be a new sample longer than anything seen Recurrent Neural Networks

18 Recurrent Neural Networks
Recurrent Networks Another option is to use a recurrent neural network which lets the network dynamically learn how much context it needs in order to solve the problem Speech Example – Vowels vs Consonants, etc. Acts like a state machine that will give different outputs for the current input depending on the current state Recurrent nets must learn and use this state/context information in order to get high accuracy on the task Temporal Deep Network Recurrent Neural Networks

19 Recurrent Neural Networks (RNNs)
Recurrent Neural Networks take the previous output or hidden states as inputs.! The composite input at time t has some historical information about the happenings at time T < t RNNs are useful as their intermediate values (state) can store information about past inputs for a time that is not fixed a priori Recurrent Neural Networks

20 Recurrent Neural Networks
Recursive NN Recurrent neural networks are recursive artificial neural networks recursive neural networks operate on any hierarchical structure, recurrent neural networks operate on the linear progression of time, Recurrent Neural Networks

21 Recurrent Neural Networks

22 Recurrent Neural Networks
Sample RNN Recurrent Neural Networks

23 Recurrent neural networks
RNNs are very powerful, because they combine two properties: Distributed hidden state that allows them to store a lot of information about the past efficiently. Non-linear dynamics that allows them to update their hidden state in complicated ways. With enough neurons and time, RNNs can compute anything that can be computed by your computer. Recurrent Neural Networks

24 Recurrent neural networks

25 Recurrent Neural Networks
RNN examples Some simple examples of RNNs. This one sums its inputs: Recurrent Neural Networks

26 Recurrent Neural Networks
RNN examples This one determines if the total values of the first or second input are larger: Recurrent Neural Networks

27 Recurrent Neural Networks

28 Recurrent Neural Networks

29 Recurrent Neural Networks

30 Recurrent Neural Networks

31 Recurrent Neural Networks

32 Recurrent Neural Networks

33 Recurrent Neural Networks

34 Recurrent Neural Networks

35 Recurrent Neural Networks

36 Recurrent Neural Networks
Example: Parity Assume we have a sequence of binary inputs. We’ll consider how to determine the parity, i.e. whether the number of 1’s is even or odd. We can compute parity incrementally by keeping track of the parity of the input so far: Parity bits:  Input: Each parity bit is the XOR of the input and the previous parity bit. Parity is a classic example of a problem that’s hard to solve with a shallow feed-forward net, but easy to solve with an RNN. Recurrent Neural Networks

37 Recurrent Neural Networks
Example: Parity Assume we have a sequence of binary inputs. We’ll consider how to determine the parity, i.e. whether the number of 1’s is even or odd. Let’s find weights and biases for the RNN on the right so that it computes the parity. All hidden and output units are binary threshold units. Strategy: The output unit tracks the current parity, which is the XOR of the current input and previous output. The hidden units help us compute the XOR. Recurrent Neural Networks

38 Recurrent Neural Networks
Parity Check Recurrent Neural Networks

39 Recurrent Neural Networks

40 Recurrent Neural Networks
Parity Recurrent Neural Networks

41 Recurrent Neural Networks
Example: Parity The output unit should compute the XOR of the current input and previous output: Recurrent Neural Networks

42 Recurrent Neural Networks

43 Recurrent Neural Networks

44 Recurrent Neural Networks

45 Recurrent Neural Networks

46 Recurrent Neural Networks

47 Recurrent Neural Networks (RNNs)
Note that the weights are shared over time Essentially, copies of the RNN cell are made over time (unrolling/unfolding), with different inputs at different time steps The Vanilla RNN Forward Recurrent Neural Networks

48 Sentiment Classification
Classify a restaurant review from Yelp! OR movie review from IMDB OR … as positive or negative Inputs: Multiple words, one or more sentences Outputs: Positive / Negative classification “The food was really good” “The chicken crossed the road because it was uncooked” Recurrent Neural Networks

49 Sentiment Classification
RNN h1 The

50 Sentiment Classification
RNN RNN h1 h2 The food

51 Sentiment Classification
hn RNN RNN RNN h1 h2 hn-1 The food good

52 Sentiment Classification
Linear Classifier hn RNN RNN RNN h1 h2 hn-1 The food good

53 Sentiment Classification
Ignore Ignore Linear Classifier h1 h2 hn RNN RNN RNN h1 h2 hn-1 The food good

54 Sentiment Classification
h = Sum(…) h1 hn h2 RNN RNN RNN h1 h2 hn-1 The food good

55 Sentiment Classification
Linear Classifier h = Sum(…) h1 hn h2 RNN RNN RNN h1 h2 hn-1 The food good

56 Recurrent Neural Networks
Image Captioning Given an image, produce a sentence describing its contents Inputs: Image feature (from a CNN)! Outputs: Multiple words (let’s consider one sentence) Recurrent Neural Networks

57 Recurrent Neural Networks
Image Captioning Recurrent Neural Networks

58 RNN Outputs: Image Captions
Recurrent Neural Networks

59 Recurrent Neural Networks
Language model language model is a probability distribution over sequences of words. Given such a sequence, say of length m, it assigns a probability p(W1, …,WN) to the whole sentence. Language modeling is used in speech recognition, machine translation, part-of-speech tagging, parsing, handwriting recognition, information retrieval and other applications. Tranditional method: Ngram Neural language model Recurrent Neural Networks

60 Recurrent Neural Network language model
Recurrent Neural Networks

61 Recurrent Neural Network language model
Recurrent Neural Networks

62 Neural Machine Translation
We’d like to translate, e.g., English to French sentences, and we have pairs of translated sentences to train on. What’s wrong with the following setup? The sentences might not be the same length, and the words might not align perfectly. You might need to resolve ambiguities using information from later in the sentence. Recurrent Neural Networks

63 Neural Machine Translation
Instead, the network first reads and memorizes the sentence. When it sees the END token, it starts outputting the translation. Recurrent Neural Networks

64 Recurrent Neural Networks
The Vanilla RNN Cell Recurrent Neural Networks

65 The Vanilla RNN Forward
x2 h1 C2 y2 h1 x1 h0 C1 y1 h3 x3 h2 C3 y3 “Unfold” network through time by making copies at each time-step

66 BackPropagation Refresher
y f(x; W) x

67 Multiple Layers C y2 f2(y1; W2) y1 f1(x; W1) x

68 Chain Rule for Gradient Computation
y2 f2(y1; W2) y1 f1(x; W1) x Application of the Chain Rule

69 Chain Rule for Gradient Computation
Given: We are interested in computing: y Intrinsic to the layer are: f(x; W) x

70 Chain Rule for Gradient Computation
Given: We are interested in computing: Intrinsic to the layer are: f(x; W)

71 BackPropagation Through Time (BPTT)
One of the methods used to train RNNs The unfolded network (used during forward pass) is treated as one big feed-forward network This unfolded network accepts the whole time series as input The weight updates are computed for each copy in the unfolded network, then summed (or averaged) and then applied to the RNN weights

72 The Unfolded Vanilla RNN
Treat the unfolded network as one big feed-forward network! This big network takes in entire sequence as an input Compute gradients through the usual backpropagation Update shared weights Recurrent Neural Networks

73 The Vanilla RNN Backward
Recurrent Neural Networks


Download ppt "Natural Language and Text Processing Laboratory"

Similar presentations


Ads by Google