Different Units Ramakrishna Vedantam.

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

A brief review of non-neural-network approaches to deep learning
Dougal Sutherland, 9/25/13.
NEURAL NETWORKS Backpropagation Algorithm
Neural networks Introduction Fitting neural networks
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 7: Learning in recurrent networks Geoffrey Hinton.
Kai Sheng-Tai, Richard Socher, Christopher D. Manning
ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –Recurrent Neural Networks (RNNs) –BackProp Through Time (BPTT) –Vanishing / Exploding.
Neural Networks Lecture 11: Learning in recurrent networks Geoffrey Hinton.
Predicting the dropouts rate of online course using LSTM method
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation EMNLP’14 paper by Kyunghyun Cho, et al.
Deep Learning Methods For Automated Discourse CIS 700-7
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Convolutional Sequence to Sequence Learning
Best viewed with Computer Modern fonts installed
CSE 190 Modeling sequences: A brief overview
Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 2, 2017.
SD Study RNN & LSTM 2016/11/10 Seitaro Shinagawa.
Best viewed with Computer Modern fonts installed
End-To-End Memory Networks
CS 388: Natural Language Processing: LSTM Recurrent Neural Networks
CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.
Deep Learning Amin Sobhani.
Recursive Neural Networks
Recurrent Neural Networks for Natural Language Processing
Recurrent Neural Networks
Show and Tell: A Neural Image Caption Generator (CVPR 2015)
Matt Gormley Lecture 16 October 24, 2016
Intro to NLP and Deep Learning
Backpropagation in fully recurrent and continuous networks
Intelligent Information System Lab
CSE 190 Modeling sequences: A brief overview
Hybrid computing using a neural network with dynamic external memory
CSE P573 Applications of Artificial Intelligence Neural Networks
Neural Language Model CS246 Junghoo “John” Cho.
Master’s Thesis defense Ming Du Advisor: Dr. Yi Shang
RNNs: Going Beyond the SRN in Language Prediction
Grid Long Short-Term Memory
Advanced Artificial Intelligence
Image Captions With Deep Learning Yulia Kogan & Ron Shiff
A First Look at Music Composition using LSTM Recurrent Neural Networks
Recurrent Neural Networks
Recurrent Neural Networks
Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 8, 2018.
CSE 573 Introduction to Artificial Intelligence Neural Networks
Understanding LSTM Networks
Code Completion with Neural Attention and Pointer Networks
The Big Health Data–Intelligent Machine Paradox
Neural Networks Geoff Hulten.
Other Classification Models: Recurrent Neural Network (RNN)
Lecture 16: Recurrent Neural Networks (RNNs)
Learning linguistic structure with simple recurrent neural networks
Graph Neural Networks Amog Kamsetty January 30, 2019.
Neural networks (1) Traditional multi-layer perceptrons
实习生汇报 ——北邮 张安迪.
Presentation By: Eryk Helenowski PURE Mentor: Vincent Bindschaedler
COSC 4335: Part2: Other Classification Techniques
LSTM: Long Short Term Memory
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
CSC321: Neural Networks Lecture 11: Learning in recurrent networks
-- Ray Mooney, Association for Computational Linguistics (ACL) 2014
Recurrent Neural Networks (RNNs)
Neural Machine Translation using CNN
Recurrent Neural Networks
Deep learning: Recurrent Neural Networks CV192
Bidirectional LSTM-CRF Models for Sequence Tagging
Neural Machine Translation by Jointly Learning to Align and Translate
Presentation transcript:

Different Units Ramakrishna Vedantam

Motivation Recurrent Neural Nets are an extremely powerful class of models Useful for a lot of tasks, turing complete in the space of programs

However, RNN’s are difficult to train (as seen in previous classes)

Architectures to facilitate learning/representation Long Short-Term Memory Hochreiter and Schmidhuber, 1997 Bidirectional RNN’s Schuster and Paliwal, 1997 Gated Feedback Recurrent Neural Networks Chung et.al., 2015 Tree Structured LSTM Kai et.al., 2015 Multi-Dimensional RNN’s Graves et.al., 2007

Long Short-Term Memory (LSTM) RNN’s use the hidden state to store representations of recent inputs (“short term memory”) This is as opposed to long term memory (stored in weights) How do we enhance the short term memory of an RNN, so that it is useful for noisy inputs, and long range dependencies? Long Short-Term Memory!

Image credit: Chris Olah

From Dhruv’s Lecture

LSTM Can bridge time intervals in excess of 1000 steps Handles noisy inputs without compromising on short time lag capabilities Architecture, and learning algorithm can set up constant error - carousels for error to back propagate

Constant Error Carousel For a linear unit, if the activation remains the same, the error passes back unaffected.

c a t c a r

Limitation for any sequential RNN.

Naive Solution Use hidden states for a fixed offset (t + M) when making predictions at t Problem M becomes a hyper-parameter to cross-validate* Different for different tasks Although Dhruv would tell you that is not an issue at all. (Div-M-best) *Sorry, Dhruv! :)

Another Solution Use two RNN’s one forward, one backward Average the predictions, treating as an ensemble Problem Not a true ensemble, inputs different at test time Not clear if averaging makes sense

Bidirectional RNN Simple Idea: Split hidden state into half forward and half backward Image credit: BRNN Paper

Next Question.

How would we do the forward pass? How would we do the backward pass?

Output: read gate Hidden State: write gate Input: k Input: Read and Write Gates!

Fast forward 18 years

Different Units Today GF RNN GRU Tree RNN

Gated Recurrent Unit (GRU) Reset gate helps ignore previous hidden states Update gate modulates how much of the previous hidden state and how much of the present hidden state need to be mixed Update gate : z Reset gate : r Figure credit: Chris Olah

Gated Feedback RNN (GF-RNN) People have been using “stacked” RNN’s for a while The idea is that the temporal dependencies resolve in a hierarchy (Bengio et.al)

RNN Stack

Gated Feedback RNN People have been using “stacked” RNN’s for a while The idea is that the temporal dependencies resolve in a hierarchy Recent work proposed a CW-RNN where units were updated in intervals of 2^i (where i ranges from 1 to N)

Gated Feedback RNN (GF-RNN) Can we learn CW-RNN like interactions? Global Reset Gate

GF-RNN With the feedback links between stacks, GF-RNN can be applied to various models LSTM, GRU and vanilla RNN are explored: Vanilla RNN LSTM GRU

Experiments Character level language modeling Python program evaluation Training objective: negative log likelihood of sequences Evaluation metric: BPC (Bits Per Character)

Figure credit: Main Paper

Validation BPC

Effect of Global Reset Gates

Python Program Evaluation

An RNN that is not an RNN We use RNN’s after unrolling them, in any case Why bother with unrolling one?

Meet Tree RNN

Tree RNN State of the art / close to it on Semantic Relatedness, and sentiment classification benchmarks

Many more! Check out this link for more awesome RNN’s: https://github.com/kjw0612/awesome-rnn Thanks to Dhruv!

Thank You!