Shunyuan Zhang Nikhil Malik

Slides:

Advertisements

Similar presentations

Dougal Sutherland, 9/25/13.

Advertisements

Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.

Overview of Back Propagation Algorithm

Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.

Convolutional LSTM Networks for Subcellular Localization of Proteins

Object Recognizing. Deep Learning Success in 2012 DeepNet and speech processing.

Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Xintao Wu University of Arkansas Introduction to Deep Learning 1.

Deep Learning a.k.o. Neural Network.

Today’s Lecture Neural networks Training

Neural networks and support vector machines

Convolutional Sequence to Sequence Learning

Unsupervised Learning of Video Representations using LSTMs

Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 2, 2017.

Convolutional Neural Network

CS 388: Natural Language Processing: LSTM Recurrent Neural Networks

CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.

The Relationship between Deep Learning and Brain Function

CS 6501: 3D Reconstruction and Understanding Convolutional Neural Networks Connelly Barnes.

Environment Generation with GANs

Deep Learning Amin Sobhani.

Natural Language and Text Processing Laboratory

Data Mining, Neural Network and Genetic Programming

Recurrent Neural Networks for Natural Language Processing

Learning Mid-Level Features For Recognition

Show and Tell: A Neural Image Caption Generator (CVPR 2015)

Matt Gormley Lecture 16 October 24, 2016

LECTURE ??: DEEP LEARNING

Deep Learning: Model Summary

Intro to NLP and Deep Learning

ICS 491 Big Data Analytics Fall 2017 Deep Learning

Introductory Seminar on Research: Fall 2017

Intelligent Information System Lab

Neural networks (3) Regularization Autoencoder

Supervised Training of Deep Networks

Lecture 5 Smaller Network: CNN

Neural Networks 2 CS446 Machine Learning.

Convolutional Networks

CS6890 Deep Learning Weizhen Cai

Non-linear classifiers Neural networks

Department of Electrical and Computer Engineering

Introduction to Neural Networks

Grid Long Short-Term Memory

Image Captions With Deep Learning Yulia Kogan & Ron Shiff

Deep learning Introduction Classes of Deep Learning Networks

March 2017 Project: IEEE P Working Group for Wireless Personal Area Networks (WPANs) Submission Title: Deep Leaning Method for OWC Date Submitted:

LECTURE 35: Introduction to EEG Processing

Long Short Term Memory within Recurrent Neural Networks

Neural Networks Geoff Hulten.

Other Classification Models: Recurrent Neural Network (RNN)

Lecture: Deep Convolutional Neural Networks

LECTURE 33: Alternative OPTIMIZERS

Lecture 16: Recurrent Neural Networks (RNNs)

Recurrent Encoder-Decoder Networks for Time-Varying Dense Predictions

Convolutional Neural Networks

实习生汇报 ——北邮张安迪.

Neural networks (3) Regularization Autoencoder

Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton

Automatic Handwriting Generation

Introduction to Neural Networks

A unified extension of lstm to deep network

Image recognition.

Recurrent Neural Networks

Deep learning: Recurrent Neural Networks CV192

Week 7 Presentation Ngoc Ta Aidean Sharghi

Principles of Back-Propagation

Presentation transcript:

Shunyuan Zhang Nikhil Malik Deep Learning A tutorial for dummies Param Vir Singh Shunyuan Zhang Nikhil Malik

Dokyun Lee: The Deep Learner http://leedokyun.com/deep-learning-reading-list.html

“Deep Learning doesn’t do different things, “Deep Learning doesn’t do different things, it does things differently”

Performance vs Sample Size Deep Learning algorithms Performance Traditional ML algorithms Size of Data

Discriminator Network Outline Supervised Learning Convolutional Neural Network Sequence Modelling: RNN and its extensions Unsupervised Learning Autoencoder Stacked Denoising Autoencoder Unsupervised Learning (+Supervised) Generative Adversarial Networks Reinforcement Learning Deep Reinforcement Learning GAN Produce Poetry like Shakespeare Output Generated Shakespeare Poetry Generator Network Discriminator Network Input Fake Real Shakespeare Poetry Real

Supervised Learning Traditional pattern recognition models work with hand crafted features and relatively simple trainable classifiers. Limitations Very tedious and costly to develop hand crafted features. The hand-crafted features are usually highly dependents on one application. Trainable Classifier (e.g. SVM, Random Forrest) Extract Hand Crafted Features Output (e.g. Outdoor Yes or No)

Deep Learning Deep learning has an inbuilt automatic multi stage feature learning process that learns rich hierarchical representations (i.e. features). Low Level Features Mid Level Features High Level Features Output (e.g. outdoor, indoor) Trainable Classifier

Deep Learning Input Image Text Low Level Features Mid Level Features High Level Features Input Trainable Classifier Output Image Pixel Edge Texture Motif Part Object Text Character Word Word-group Clause Sentence Story Each module in Deep Learning transforms its input representation into a higher-level one, in a way similar to human cortex.

Let us see how it all works!

A Simple Neural Network An Artificial Neural Network is an information processing paradigm that is inspired by the biological nervous systems, such as the human brain’s information processing mechanism. x1 a1(1) x2 a2(1) Y a1(2) x3 a3(1) x4 a4(1) Input Hidden Layers Output

A Simple Neural Network Softmax x1 x2 x3 x4 a1(1) w4 w3 w2 w1 1 1+ 𝑒 −𝑤 ∗𝑎1 (2) x1 x2 x3 x4 Y a1(1) a2(1) a3(1) a4(1) a1(2) 𝑎1 (1) =𝑓 𝑤1∗𝑥1+𝑤2∗𝑥2+𝑤3∗𝑥3+𝑤4∗𝑥4 f( ) is activation function: Relu or sigmoid 𝑅𝑒𝑙𝑢:max⁡(0,𝑥) 𝑎1 (1) =𝑚𝑎𝑥 0, 𝑤1∗𝑥1+𝑤2∗𝑥2+𝑤3∗𝑥3+𝑤4∗𝑥4

4*4 + 4 +1 Number of Parameters Softmax Y Input Hidden Layers Output

If the input is an Image? Y 400 X 400 X 3 Input Hidden Layers Output Number of Parameters 480000*480000 + 480000 +1 = approximately 230 Billion !!! 480000*1000 + 1000 +1 = approximately 480 million !!!

Let us see how convolutional layers help.

Convolutional Layers Filter 0 1 0 1 -4 1 Filter Input Image Convoluted Image Inspired by the neurophysiological experiments conducted by Hubel and Wiesel 1962.

Convolutional Layers What is Convolution? ℎ2=𝑓 𝑏∗𝑤1+𝑐∗𝑤2+𝑓∗𝑤3+𝑔∗𝑤4 ℎ1=𝑓 𝑎∗𝑤1+𝑏∗𝑤2+𝑒∗𝑤3+𝑓∗𝑤4 a b c d e f g h i j k l m n o p w1 w2 w3 w4 h1 h2 Input Image Filter Convolved Image (Feature Map) ℎ2=𝑓 𝑏∗𝑤1+𝑐∗𝑤2+𝑓∗𝑤3+𝑔∗𝑤4 Number of Parameters for one feature map = 4 Number of Parameters for 100 feature map = 4*100

Lower Level to More Complex Features Filter 1 Filter 2 Input Image Layer 1 Feature Map Layer 2 Feature Map In Convolutional neural networks, hidden units are only connected to local receptive field.

Pooling Max pooling: reports the maximum output within a rectangular neighborhood. Average pooling: reports the average output of a rectangular neighborhood. MaxPool with 2X2 filter with stride of 2 1 3 5 4 2 4 5 3 Input Matrix Output Matrix

Convolutional Neural Network Maxpool Output Vector Feature Extraction Architecture 64 Living Room 128 Bed Room 256 512 512 Kitchen Bathroom Outdoor Max Pool Filter Fully Connected Layers

Convolutional Neural Networks Output: Binary, Multinomial, Continuous, Count Input: fixed size, can use padding to make all images same size. Architecture: Choice is ad hoc requires experimentation. Optimization: Backward propagation hyper parameters for very deep model can be estimated properly only if you have billions of images. Use an architecture and trained hyper parameters from other papers (Imagenet or Microsoft/Google APIs etc) Computing Power: Buy a GPU!!

Automatic Colorization of Black and White Images

Optimizing Images Post Processing Feature Optimization (Color Curves and Details) Post Processing Feature Optimization (Illumination) Post Processing Feature Optimization (Color Tone: Warmness)

Recurrent Neural Networks

Why RNN? The limitations of the Convolutional Neural Networks Take fixed length vectors as input and produce fixed length vectors as output. Allow fixed amount of computational steps. We need to model the data with temporal or sequential structures and varying length of inputs and outputs e.g. This movie is ridiculously good. This movie is very slow in the beginning but picks up pace later on and has some great action sequences and comedy scenes.

Modeling Sequences Awesome tutorial. Positive Happy Diwali A person riding a motorbike on dirt road Image Captioning Awesome tutorial. Positive Sentiment Analysis Happy Machine Translation Diwali शुभ दीपावली

What is RNN? Recurrent neural networks are connectionist models with the ability to selectively pass information across sequence steps, while processing sequential data one element at a time. Allows a memory of the previous inputs to persist in the model’s internal state and influence the outcome. OUTPUT h(t) h(t) Hidden Layer Delay h(t-1) x(t) INPUT

RNN (rolled over time) RNN is awesome 𝑓 ℎ () 𝑤 ℎ 𝑤 𝑥 𝑓 ℎ () 𝑤 ℎ 𝑤 𝑥 OUTPUT x1 h0 𝑓 ℎ () 𝑤 ℎ 𝑤 𝑥 h1 x2 𝑓 ℎ () 𝑤 ℎ 𝑤 𝑥 h2 x3 𝑓 ℎ () 𝑤 ℎ 𝑤 𝑥 h3 𝑤 𝑦 𝑓 𝑦 () is awesome RNN ℎ 𝑡 = 𝑓 ℎ 𝑤 ℎ ∗ℎ 𝑡−1 + 𝑤 𝑥 ∗𝑥 𝑡

RNN (rolled over time) RNN is so cool OUTPUT x1 h0 𝑓 ℎ () 𝑤 ℎ 𝑤 𝑥 h1 𝑤 𝑦 𝑓 𝑦 () OUTPUT RNN is so cool

The Vanishing Gradient Problem RNN’s use back propagation. Back propagation uses chain rule. Chain rule multiplies derivatives If these derivatives are between 0 and 1 the product vanishes as the chain gets longer. or the product explodes if the derivatives are greater than 1. Sigmoid activation function in RNN leads to this problem. Relu, in theory, avoids this problem but not in practice.

Problem with Vanishing or Exploding Gradients Don’t allow us to learn long term dependencies. Param is a hard worker. VS. Param, student of Yong, is a hard worker. BAD!!!! Misguided!!!! Unacceptable!!!!

Long Short Term Memory (LSTM) LSTM provide solution to the vanishing/exploding gradient problem. Solution: Memory Cell, which is updated at each step in the sequence. Three Gates control the flow of information to and from the Memory cell Input Gate: protect the current step from irrelevant inputs Output Gate: prevents current step from passing irrelevant information to later steps. Forget Gate: limits information passed from one cell to the next.

LSTM + c0 Forget f1 c1 Input i1 𝑐 1 = 𝑓 1 . 𝑐 0 + 𝑖 𝑖 . 𝑢 1 x1 h0 𝑓 ℎ () 𝑤 ℎ 𝑤 𝑥 h1 u1 𝑐 𝑡 = 𝑓 𝑡 . 𝑐 𝑡−1 + 𝑖 𝑡 . 𝑢 𝑡

LSTM + 𝑓 𝑓 () 𝑤 ℎ𝑓 𝑓 1 = 𝑓 𝑓 𝑊 ℎ𝑓 ∗ ℎ 0 + 𝑊 𝑥𝑓 ∗ 𝑥 1 𝑤 ℎ 𝑓 ℎ () x1 h0 𝑓 ℎ () 𝑤 ℎ 𝑤 𝑥 h1 u1 c0 c1 + Forget f1 Input i1 x1 h0 𝑤 ℎ𝑓 𝑤 𝑥𝑓 𝑓 𝑓 () Forget f1 𝑓 1 = 𝑓 𝑓 𝑊 ℎ𝑓 ∗ ℎ 0 + 𝑊 𝑥𝑓 ∗ 𝑥 1 𝑓 𝑡 = 𝑓 𝑓 𝑊 ℎ𝑓 ∗ ℎ 𝑡−1 + 𝑊 𝑥𝑓 ∗ 𝑥 𝑡

LSTM + 𝑤 ℎ𝑖 𝑓 𝑖 () 𝑖 1 = 𝑓 𝑖 𝑊 ℎ𝑖 ∗ ℎ 0 + 𝑊 𝑥𝑖 ∗ 𝑥 1 𝑤 ℎ 𝑓 ℎ () x1 h0 𝑓 ℎ () 𝑤 ℎ 𝑤 𝑥 h1 u1 c0 c1 + Forget f1 Input i1 x1 h0 𝑤 ℎ𝑖 𝑤 𝑥𝑖 𝑓 𝑖 () Input i1 𝑖 1 = 𝑓 𝑖 𝑊 ℎ𝑖 ∗ ℎ 0 + 𝑊 𝑥𝑖 ∗ 𝑥 1 𝑖 𝑡 = 𝑓 𝑓 𝑊 ℎ𝑖 ∗ ℎ 𝑡−1 + 𝑊 𝑥𝑖 ∗ 𝑥 𝑡

LSTM + 𝑜 𝑡 = 𝑓 𝑜 𝑊 ℎ𝑜 ∗ ℎ 𝑡−1 + 𝑊 𝑥𝑜 ∗ 𝑥 𝑡 ℎ 𝑡 = 𝑜 𝑡 .𝑡𝑎𝑛ℎ 𝑐 𝑡 𝑓 𝑜 () 𝑜 𝑡 = 𝑓 𝑜 𝑊 ℎ𝑜 ∗ ℎ 𝑡−1 + 𝑊 𝑥𝑜 ∗ 𝑥 𝑡 ℎ 𝑡 = 𝑜 𝑡 .𝑡𝑎𝑛ℎ 𝑐 𝑡 x1 h0 𝑓 ℎ () 𝑤 ℎ 𝑤 𝑥 h1 u1 c0 c1 + Forget f1 Input i1 𝑓 𝑜 () Output o1 h1 x1 h0 𝑓 ℎ () 𝑤 ℎ 𝑤 𝑥 h1

LSTM + + x1 h0 𝑓 ℎ () 𝑤 ℎ 𝑤 𝑥 u1 c0 c1 Forget f1 Input i1 𝑓 𝑜 () h1 Output o1 x2 𝑓 ℎ () 𝑤 ℎ 𝑤 𝑥 u2 c2 + Forget f2 Input i2 𝑓 𝑜 () h2 Output o2

Combining CNN and LSTM

Visual Question Answering