ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –Recurrent Neural Networks (RNNs) –BackProp Through Time (BPTT) –Vanishing / Exploding.

Slides:

Advertisements

Similar presentations

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Regression –Model selection, Cross-validation –Error decomposition.

Advertisements

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –SVM –Multi-class SVMs –Neural Networks –Multi-layer Perceptron Readings:

Branch Prediction with Neural- Networks: Hidden Layers and Recurrent Connections Andrew Smith CSE Dept. June 10, 2004.

Introduction to Recurrent neural networks (RNN), Long short-term memory (LSTM) Wenjie Pei In this coffee talk, I would like to present you some basic.

Artificial Neural Networks

ECE 5984: Introduction to Machine Learning

Appendix B: An Example of Back-propagation algorithm

ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –Neural Networks –Backprop –Modular Design.

ECE 6504: Deep Learning for Perception

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Nearest Neighbour Readings: Barber 14 (kNN)

ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets.

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Logistic Regression –NB & LR connections Readings: Barber.

Kai Sheng-Tai, Richard Socher, Christopher D. Manning

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Regression Readings: Barber 17.1, 17.2.

Haitham Elmarakeby.  Speech recognition

Recurrent Neural Networks - Learning, Extension and Applications

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16.

ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –LSTMs (intuition and variants) –[Abhishek:] Lua / Torch Tutorial.

Recurrent Neural Networks Long Short-Term Memory Networks

Predicting the dropouts rate of online course using LSTM method

Introduction to Neural Networks Gianluca Pollastri, Head of Lab School of Computer Science and Informatics and Complex and Adaptive Systems Labs University.

Xintao Wu University of Arkansas Introduction to Deep Learning 1.

Attention Model in NLP Jichuan ZENG.

Deep Learning RUSSIR 2017 – Day 3

Best viewed with Computer Modern fonts installed

CS 388: Natural Language Processing: LSTM Recurrent Neural Networks

CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.

Deep Learning Amin Sobhani.

Dhruv Batra Georgia Tech

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning

Recursive Neural Networks

Recurrent Neural Networks for Natural Language Processing

Show and Tell: A Neural Image Caption Generator (CVPR 2015)

Matt Gormley Lecture 16 October 24, 2016

Intro to NLP and Deep Learning

ICS 491 Big Data Analytics Fall 2017 Deep Learning

Intelligent Information System Lab

Different Units Ramakrishna Vedantam.

Neural networks (3) Regularization Autoencoder

Neural Networks 2 CS446 Machine Learning.

Please, Pay Attention, Neural Attention

CSE P573 Applications of Artificial Intelligence Neural Networks

ECE 5424: Introduction to Machine Learning

Zhu Han University of Houston

A critical review of RNN for sequence learning Zachary C

Goodfellow: Chap 6 Deep Feedforward Networks

RNN and LSTM Using MXNet Cyrus M Vahid, Principal Solutions Architect

Image Captions With Deep Learning Yulia Kogan & Ron Shiff

Recurrent Neural Networks

Recurrent Neural Networks (RNN)

Tips for Training Deep Network

RNNs & LSTM Hadar Gorodissky Niv Haim.

CSE 573 Introduction to Artificial Intelligence Neural Networks

Understanding LSTM Networks

Neural Networks Geoff Hulten.

Lecture 16: Recurrent Neural Networks (RNNs)

Machine Learning – Neural Networks David Fenyő

Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824

实习生汇报 ——北邮张安迪.

Neural networks (3) Regularization Autoencoder

Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824

Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton

Recurrent Neural Networks

Deep learning: Recurrent Neural Networks CV192

Mahdi Kalayeh David Hill

Dhruv Batra Georgia Tech

Dhruv Batra Georgia Tech

Presentation transcript:

ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –Recurrent Neural Networks (RNNs) –BackProp Through Time (BPTT) –Vanishing / Exploding Gradients –[Abhishek:] Lua / Torch Tutorial

Administrativia HW3 –Out today –Due in 2 weeks –Please please please please please start early – (C) Dhruv Batra2

Plan for Today Model –Recurrent Neural Networks (RNNs) Learning –BackProp Through Time (BPTT) –Vanishing / Exploding Gradients [Abhishek:] Lua / Torch Tutorial (C) Dhruv Batra3

New Topic: RNNs (C) Dhruv Batra4 Image Credit: Andrej Karpathy

Synonyms Recurrent Neural Networks (RNNs) Recursive Neural Networks –General familty; think graphs instead of chains Types: –Long Short Term Memory (LSTMs) –Gated Recurrent Units (GRUs) –Hopfield network –Elman networks –… Algorithms –BackProp Through Time (BPTT) –BackProp Through Structure (BPTS) (C) Dhruv Batra5

What’s wrong with MLPs? Problem 1: Can’t model sequences –Fixed-sized Inputs & Outputs –No temporal structure Problem 2: Pure feed-forward processing –No “memory”, no feedback (C) Dhruv Batra6 Image Credit: Alex Graves, book

Sequences are everywhere… (C) Dhruv Batra7 Image Credit: Alex Graves and Kevin Gimpel

(C) Dhruv Batra8 Even where you might not expect a sequence… Image Credit: Vinyals et al.

Even where you might not expect a sequence… 9 Image Credit: Ba et al.; Gregor et al Input ordering = sequence (C) Dhruv Batra

10 Image Credit: [Pinheiro and Collobert, ICML14]

Why model sequences? Figure Credit: Carlos Guestrin

Why model sequences? (C) Dhruv Batra12 Image Credit: Alex Graves

Name that model Hidden Markov Model (HMM) Y 1 = {a,…z} X 1 = Y 5 = {a,…z}Y 3 = {a,…z}Y 4 = {a,…z}Y 2 = {a,…z} X2 =X2 =X 3 =X 4 =X 5 = Figure Credit: Carlos Guestrin(C) Dhruv Batra13

How do we model sequences? No input (C) Dhruv Batra14 Image Credit: Bengio, Goodfellow, Courville

How do we model sequences? With inputs (C) Dhruv Batra15 Image Credit: Bengio, Goodfellow, Courville

How do we model sequences? With inputs and outputs (C) Dhruv Batra16 Image Credit: Bengio, Goodfellow, Courville

How do we model sequences? With Neural Nets (C) Dhruv Batra17 Image Credit: Alex Graves

How do we model sequences? It’s a spectrum… (C) Dhruv Batra18 Input: No sequence Output: No sequence Example: “standard” classification / regression problems Input: No sequence Output: Sequence Example: Im2Caption Input: Sequence Output: No sequence Example: sentence classification, multiple-choice question answering Input: Sequence Output: Sequence Example: machine translation, video captioning, open- ended question answering, video question answering Image Credit: Andrej Karpathy

Things can get arbitrarily complex (C) Dhruv Batra19 Image Credit: Herbert Jaeger

Key Ideas Parameter Sharing + Unrolling –Keeps numbers of parameters in check –Allows arbitrary sequence lengths! “Depth” –Measured in the usual sense of layers –Not unrolled timesteps Learning –Is tricky even for “shallow” models due to unrolling (C) Dhruv Batra20

Plan for Today Model –Recurrent Neural Networks (RNNs) Learning –BackProp Through Time (BPTT) –Vanishing / Exploding Gradients [Abhishek:] Lua / Torch Tutorial (C) Dhruv Batra21

BPTT a (C) Dhruv Batra22 Image Credit: Richard Socher

Illustration [Pascanu et al] Intuition Error surface of a single hidden unit RNN; High curvature walls Solid lines: standard gradient descent trajectories Dashed lines: gradient rescaled to fix problem (C) Dhruv Batra23

Fix #1 Pseudocode (C) Dhruv Batra24 Image Credit: Richard Socher

Fix #2 Smart Initialization and ReLus –[Socher et al 2013] –A Simple Way to Initialize Recurrent Networks of Rectified Linear Units, Le et al (C) Dhruv Batra25