Natural Language and Text Processing Laboratory Recurrent Neural Network and Deep Learning in Natural Language Processing Natural Language and Text Processing Laboratory University of Tehran
Recurrent Neural Networks Heshaam Faili, Associate Professor University of Tehran Ali Vardasbi Phd Candidate University of Tehran Behrooz Vedadian Phd Candidate AmirKabir university of Technology Hakimeh Fadae Phd Candidate University of Tehran Golshan Afzali Phd Candidate University of Tehran Recurrent Neural Networks
Recurrent Neural Networks Agenda Session1: RNN RNN, Concept, applications (by Faili) Training, LSTM, GRU (by Vardasbi) Session 2: LM and RNN RNN for Language Model (by Faili) Word2vec, glove (by Vardasbi) Session 3: deep learning in NLP applications Machine Translation(by Fadae) Machine Translation(by Vedadian) Question Answering system (by Afzali) Recurrent Neural Networks
Feature Engineering Traditional ML Method Train data (supervised or unsupervised Feature Extraction Feature Engineering ML methods Your text here Training phase Test phase Model Input Feature Extraction Output I Recurrent Neural Networks
Reasons for Exploring Deep Learning Manually designed features are often over-specified, incomplete and take a long time to design and validate Learned Features are easy to adapt, fast to learn Deep learning provides a very flexible, (almost?) universal, learnable framework for representing world, visual and linguistic information. Deep learning can learn unsupervised (from raw data) and supervised (with specific labels like positive/negative) Recurrent Neural Networks
Reasons for Exploring Deep Learning In 2006 deep learning techniques started outperforming other machine learning techniques. Why now? DL techniques benefit more from a lot of data Faster machines and multicore CPU/GPU help DL New methods for unsupervised pre-training have been developed Restricted Boltzmann Machines = RBMs, autoencoders, contrastive estimation, etc. More efficient parameter estimation methods Better understanding of model regularization, Improved performance (first in speech and vision, then NLP) Recurrent Neural Networks
Deep Learning + NLP = Deep NLP Combine ideas and goals of NLP and use representation learning and deep learning methods to solve them Several big improvements in recent years across different NLP levels: speech, morphology, syntax, semantics applications: machine translation, sentiment analysis and question answering, Recurrent Neural Networks
Representations at NLP Levels: Morphology Deep Learning Recurrent Neural Networks
Representations at NLP Levels: Syntax Recurrent Neural Networks
Representations at NLP Levels: Semantics Recurrent Neural Networks
NLP Applications: Sentiment Analysis Traditional: Curated sentiment dictionaries combined with either bag-of-words representations (ignoring word order) or hand designed negation features (not going to capture everything) Same deep learning model that was used for morphology, syntax and logical semantics can be used! Recursive NN Recurrent Neural Networks
Recurrent Neural Networks Question Answering Recurrent Neural Networks
Recurrent Neural Networks Machine Translation Many levels of translation have been tried in the past: Traditional MT systems are very large complex systems What do you think is the interlingua for the DL approach to translation? Recurrent Neural Networks
Recurrent Neural Networks Machine Translation Source sentence mapped to vector, then output sentence generated. Sequence to Sequence Learning with Neural Networks by Sutskever et al. 2014; Luong et al. 2016 About to replace very complex hand engineered architectures Recurrent Neural Networks
Recurrent Neural Networks (RNN) Recusrive Neural Network (RNN) ?
Recurrent Neural Networks Recurrent Networks Some problems require previous history/context in order to be able to give proper output (speech recognition, stock forecasting, target tracking, etc. One way to do that is to just provide all the necessary context in one "snap-shot" and use standard learning How big should the snap-shot be? Varies for different instances of the problem. Recurrent Neural Networks
Recurrent Neural Networks Motivation Not all problems can be converted into one with fixedlength inputs and outputs Problems such as Speech Recognition or Time-series Prediction require a system to store and use context information Simple case: Output YES if the number of 1s is even, else NO 1000010101 – YES, 100011 – NO, … Hard/Impossible to choose a fixed context window There can always be a new sample longer than anything seen Recurrent Neural Networks
Recurrent Neural Networks Recurrent Networks Another option is to use a recurrent neural network which lets the network dynamically learn how much context it needs in order to solve the problem Speech Example – Vowels vs Consonants, etc. Acts like a state machine that will give different outputs for the current input depending on the current state Recurrent nets must learn and use this state/context information in order to get high accuracy on the task Temporal Deep Network Recurrent Neural Networks
Recurrent Neural Networks (RNNs) Recurrent Neural Networks take the previous output or hidden states as inputs.! The composite input at time t has some historical information about the happenings at time T < t RNNs are useful as their intermediate values (state) can store information about past inputs for a time that is not fixed a priori Recurrent Neural Networks
Recurrent Neural Networks Recursive NN Recurrent neural networks are recursive artificial neural networks recursive neural networks operate on any hierarchical structure, recurrent neural networks operate on the linear progression of time, Recurrent Neural Networks
Recurrent Neural Networks
Recurrent Neural Networks Sample RNN Recurrent Neural Networks
Recurrent neural networks RNNs are very powerful, because they combine two properties: Distributed hidden state that allows them to store a lot of information about the past efficiently. Non-linear dynamics that allows them to update their hidden state in complicated ways. With enough neurons and time, RNNs can compute anything that can be computed by your computer. Recurrent Neural Networks
Recurrent neural networks
Recurrent Neural Networks RNN examples Some simple examples of RNNs. This one sums its inputs: Recurrent Neural Networks
Recurrent Neural Networks RNN examples This one determines if the total values of the first or second input are larger: Recurrent Neural Networks
Recurrent Neural Networks
Recurrent Neural Networks
Recurrent Neural Networks
Recurrent Neural Networks
Recurrent Neural Networks
Recurrent Neural Networks
Recurrent Neural Networks
Recurrent Neural Networks
Recurrent Neural Networks
Recurrent Neural Networks Example: Parity Assume we have a sequence of binary inputs. We’ll consider how to determine the parity, i.e. whether the number of 1’s is even or odd. We can compute parity incrementally by keeping track of the parity of the input so far: Parity bits: 0 1 1 0 1 1 Input: 0 1 0 1 1 0 1 0 1 1 Each parity bit is the XOR of the input and the previous parity bit. Parity is a classic example of a problem that’s hard to solve with a shallow feed-forward net, but easy to solve with an RNN. Recurrent Neural Networks
Recurrent Neural Networks Example: Parity Assume we have a sequence of binary inputs. We’ll consider how to determine the parity, i.e. whether the number of 1’s is even or odd. Let’s find weights and biases for the RNN on the right so that it computes the parity. All hidden and output units are binary threshold units. Strategy: The output unit tracks the current parity, which is the XOR of the current input and previous output. The hidden units help us compute the XOR. Recurrent Neural Networks
Recurrent Neural Networks Parity Check Recurrent Neural Networks
Recurrent Neural Networks
Recurrent Neural Networks Parity Recurrent Neural Networks
Recurrent Neural Networks Example: Parity The output unit should compute the XOR of the current input and previous output: Recurrent Neural Networks
Recurrent Neural Networks
Recurrent Neural Networks
Recurrent Neural Networks
Recurrent Neural Networks
Recurrent Neural Networks
Recurrent Neural Networks (RNNs) Note that the weights are shared over time Essentially, copies of the RNN cell are made over time (unrolling/unfolding), with different inputs at different time steps The Vanilla RNN Forward Recurrent Neural Networks
Sentiment Classification Classify a restaurant review from Yelp! OR movie review from IMDB OR … as positive or negative Inputs: Multiple words, one or more sentences Outputs: Positive / Negative classification “The food was really good” “The chicken crossed the road because it was uncooked” Recurrent Neural Networks
Sentiment Classification RNN h1 The
Sentiment Classification RNN RNN h1 h2 The food
Sentiment Classification hn RNN RNN RNN h1 h2 hn-1 The food good
Sentiment Classification Linear Classifier hn RNN RNN RNN h1 h2 hn-1 The food good
Sentiment Classification Ignore Ignore Linear Classifier h1 h2 hn RNN RNN RNN h1 h2 hn-1 The food good
Sentiment Classification h = Sum(…) h1 hn h2 RNN RNN RNN h1 h2 hn-1 The food good http://deeplearning.net/tutorial/lstm.html
Sentiment Classification Linear Classifier h = Sum(…) h1 hn h2 RNN RNN RNN h1 h2 hn-1 The food good http://deeplearning.net/tutorial/lstm.html
Recurrent Neural Networks Image Captioning Given an image, produce a sentence describing its contents Inputs: Image feature (from a CNN)! Outputs: Multiple words (let’s consider one sentence) Recurrent Neural Networks
Recurrent Neural Networks Image Captioning Recurrent Neural Networks
RNN Outputs: Image Captions Recurrent Neural Networks
Recurrent Neural Networks Language model language model is a probability distribution over sequences of words. Given such a sequence, say of length m, it assigns a probability p(W1, …,WN) to the whole sentence. Language modeling is used in speech recognition, machine translation, part-of-speech tagging, parsing, handwriting recognition, information retrieval and other applications. Tranditional method: Ngram Neural language model Recurrent Neural Networks
Recurrent Neural Network language model Recurrent Neural Networks
Recurrent Neural Network language model Recurrent Neural Networks
Neural Machine Translation We’d like to translate, e.g., English to French sentences, and we have pairs of translated sentences to train on. What’s wrong with the following setup? The sentences might not be the same length, and the words might not align perfectly. You might need to resolve ambiguities using information from later in the sentence. Recurrent Neural Networks
Neural Machine Translation Instead, the network first reads and memorizes the sentence. When it sees the END token, it starts outputting the translation. Recurrent Neural Networks
Recurrent Neural Networks The Vanilla RNN Cell Recurrent Neural Networks
The Vanilla RNN Forward x2 h1 C2 y2 h1 x1 h0 C1 y1 h3 x3 h2 C3 y3 “Unfold” network through time by making copies at each time-step
BackPropagation Refresher y f(x; W) x
Multiple Layers C y2 f2(y1; W2) y1 f1(x; W1) x
Chain Rule for Gradient Computation y2 f2(y1; W2) y1 f1(x; W1) x Application of the Chain Rule
Chain Rule for Gradient Computation Given: We are interested in computing: y Intrinsic to the layer are: f(x; W) x
Chain Rule for Gradient Computation Given: We are interested in computing: Intrinsic to the layer are: f(x; W)
BackPropagation Through Time (BPTT) One of the methods used to train RNNs The unfolded network (used during forward pass) is treated as one big feed-forward network This unfolded network accepts the whole time series as input The weight updates are computed for each copy in the unfolded network, then summed (or averaged) and then applied to the RNN weights
The Unfolded Vanilla RNN Treat the unfolded network as one big feed-forward network! This big network takes in entire sequence as an input Compute gradients through the usual backpropagation Update shared weights Recurrent Neural Networks
The Vanilla RNN Backward Recurrent Neural Networks