Deep Learning.

Slides:



Advertisements
Similar presentations
Neural Networks: Learning
Advertisements

Neural networks Introduction Fitting neural networks
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Machine Learning Neural Networks
Neural Networks I CMPUT 466/551 Nilanjan Ray. Outline Projection Pursuit Regression Neural Network –Background –Vanilla Neural Networks –Back-propagation.
Supervised and Unsupervised learning and application to Neuroscience Cours CA6b-4.
Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods.
Radial Basis Functions
Statistics 350 Lecture 23. Today Today: Exam next day Good Chapter 7 questions: 7.1, 7.2, 7.3, 7.28, 7.29.
Lecture 4 Neural Networks ICS 273A UC Irvine Instructor: Max Welling Read chapter 4.
Artificial Neural Networks
1 1 Slide Simple Linear Regression Chapter 14 BA 303 – Spring 2011.
Classification Part 3: Artificial Neural Networks
Artificial Neural Networks (ANN). Output Y is 1 if at least two of the three inputs are equal to 1.
Chapter 11 – Neural Networks COMP 540 4/17/2007 Derek Singer.
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Backpropagation An efficient way to compute the gradient Hung-yi Lee.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
Classification / Regression Neural Networks 2
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Machine Learning Introduction Study on the Coursera All Right Reserved : Andrew Ng Lecturer:Much Database Lab of Xiamen University Aug 12,2014.
Model representation Linear regression with one variable
Andrew Ng Linear regression with one variable Model representation Machine Learning.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
From Machine Learning to Deep Learning. Topics that I will Cover (subject to some minor adjustment) Week 2: Introduction to Deep Learning Week 3: Logistic.
Neural Networks Dr. Thompson March 19, Artificial Intelligence Robotics Computer Vision & Speech Recognition Expert Systems Pattern Recognition.
Linear Discrimination Reading: Chapter 2 of textbook.
Non-Bayes classifiers. Linear discriminants, neural networks.
11 1 Backpropagation Multilayer Perceptron R – S 1 – S 2 – S 3 Network.
Chapter 8: Adaptive Networks
Hazırlayan NEURAL NETWORKS Backpropagation Network PROF. DR. YUSUF OYSAL.
Neural Networks Vladimir Pleskonjić 3188/ /20 Vladimir Pleskonjić General Feedforward neural networks Inputs are numeric features Outputs are in.
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with.
WEEK 2 SOFT COMPUTING & MACHINE LEARNING YOSI KRISTIAN Gradient Descent for Linear Regression.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Fall 2004 Backpropagation CS478 - Machine Learning.
Extreme Learning Machine
Lecture 3: Linear Regression (with One Variable)
Matt Gormley Lecture 16 October 24, 2016
Photorealistic Image Colourization with Generative Adversarial Nets
Machine Learning I & II.
Deep Learning Libraries
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Classification with Perceptrons Reading:
LECTURE 28: NEURAL NETWORKS
Softmax Classifier + Generalization
CS 2750: Machine Learning Linear Regression
Probabilistic Models for Linear Regression
Neural Networks and Backpropagation
Classification / Regression Neural Networks 2
Machine Learning Today: Reading: Maria Florina Balcan
CSC 578 Neural Networks and Deep Learning
Convolutional networks
Backpropagation.
LECTURE 28: NEURAL NETWORKS
Forward and Backward Max Pooling
Softmax Classifier.
Backpropagation Disclaimer: This PPT is modified based on
Neural Networks.
Artificial Intelligence 10. Neural Networks
Backpropagation David Kauchak CS159 – Fall 2019.
Backpropagation and Neural Nets
Programming with Data Lab 6
Backpropagation.
Deep Learning Libraries
Linear regression with one variable
Presentation transcript:

Deep Learning

About the Course CS6501: Vision and Language Instructor: Vicente Ordonez Email: vicente@virginia.edu Website: http://www.cs.virginia.edu/~vicente/vislang Location: Thornton Hall E316 Times: Tuesday - Thursday 12:30PM - 1:45PM Faculty Office hours: Tuesdays 3 - 4pm (Rice 310) Discuss in Piazza: http://piazza.com/virginia/spring2017/cs6501004

Today Quick review into Machine Learning. Linear Regression Neural Networks Backpropagation

Linear Regression 𝑎 𝑗 = 𝑖 𝑤 𝑗𝑖 𝑥 𝑖 + 𝑏 𝑗 𝑎= 𝑊 𝑇 𝑥+𝑏 Prediction, Inference, Testing 𝑎 𝑗 = 𝑖 𝑤 𝑗𝑖 𝑥 𝑖 + 𝑏 𝑗 𝑎= 𝑊 𝑇 𝑥+𝑏 Training, Learning, Parameter estimation Objective minimization 𝐿 𝑊, 𝑏 = 𝑑=1 |𝐷| 𝑙( 𝑎 𝑑 , 𝑦 (𝑑) ) 𝐷={( 𝑥 𝑑 , 𝑦 𝑑 )} 𝑊 ∗ , 𝑏 ∗ =argmin 𝐿(𝑊, 𝑏)

total revenue international Linear Regression Example: Hollywood movie data input variables x output variables y production costs promotional costs genre of the movie box office first week total book sales total revenue USA total revenue international 𝑥 1 (1) 𝑥 2 (1) 𝑥 3 (1) 𝑥 4 (1) 𝑥 5 (1) 𝑥 6 (1) 𝑥 7 (1) 𝑥 1 (2) 𝑥 2 (2) 𝑥 3 (2) 𝑥 4 (2) 𝑥 5 (2) 𝑥 6 (2) 𝑥 7 (2) 𝑥 1 (3) 𝑥 2 (3) 𝑥 3 (3) 𝑥 4 (3) 𝑥 5 (3) 𝑥 6 (3) 𝑥 7 (3) 𝑥 1 (4) 𝑥 2 (4) 𝑥 3 (4) 𝑥 4 (4) 𝑥 5 (4) 𝑥 6 (4) 𝑥 7 (4) 𝑥 1 (5) 𝑥 2 (5) 𝑥 3 (5) 𝑥 4 (5) 𝑥 5 (5) 𝑥 6 (5) 𝑥 7 (5)

total revenue international Linear Regression Example: Hollywood movie data input variables x output variables y production costs promotional costs genre of the movie box office first week total book sales total revenue USA total revenue international 𝑥 1 (1) 𝑥 2 (1) 𝑥 3 (1) 𝑥 4 (1) 𝑥 5 (1) 𝑦 1 (1) 𝑦 2 (1) 𝑥 1 (2) 𝑥 2 (2) 𝑥 3 (2) 𝑥 4 (2) 𝑥 5 (2) 𝑦 1 (2) 𝑦 2 (2) 𝑥 1 (3) 𝑥 2 (3) 𝑥 3 (3) 𝑥 4 (3) 𝑥 5 (3) 𝑦 1 (3) 𝑦 2 (3) 𝑥 1 (4) 𝑥 2 (4) 𝑥 3 (4) 𝑥 4 (4) 𝑥 5 (4) 𝑦 1 (4) 𝑦 2 (4) 𝑥 1 (5) 𝑥 2 (5) 𝑥 3 (5) 𝑥 4 (5) 𝑥 5 (5) 𝑦 1 (5) 𝑦 2 (5)

total revenue international Linear Regression Example: Hollywood movie data input variables x output variables y production costs promotional costs genre of the movie box office first week total book sales total revenue USA total revenue international 𝑥 1 (1) 𝑥 2 (1) 𝑥 3 (1) 𝑥 4 (1) 𝑥 5 (1) 𝑦 1 (1) 𝑦 2 (1) training data 𝑥 1 (2) 𝑥 2 (2) 𝑥 3 (2) 𝑥 4 (2) 𝑥 5 (2) 𝑦 1 (2) 𝑦 2 (2) 𝑥 1 (3) 𝑥 2 (3) 𝑥 3 (3) 𝑥 4 (3) 𝑥 5 (3) 𝑦 1 (3) 𝑦 2 (3) 𝑥 1 (4) 𝑥 2 (4) 𝑥 3 (4) 𝑥 4 (4) 𝑥 5 (4) 𝑦 1 (4) 𝑦 2 (4) test data 𝑥 1 (5) 𝑥 2 (5) 𝑥 3 (5) 𝑥 4 (5) 𝑥 5 (5) 𝑦 1 (5) 𝑦 2 (5)

Linear Regression – Least Squares 𝑎 𝑗 = 𝑖 𝑤 𝑗𝑖 𝑥 𝑖 + 𝑏 𝑗 𝑎= 𝑊 𝑇 𝑥+𝑏 Training, Learning, Parameter estimation Objective minimization 𝐿 𝑊, 𝑏 = 𝑑=1 |𝐷| 𝑙( 𝑎 𝑑 , 𝑦 (𝑑) ) 𝐷={( 𝑥 𝑑 , 𝑦 𝑑 )} 𝑊 ∗ , 𝑏 ∗ =argmin 𝐿(𝑊, 𝑏)

Linear Regression – Least Squares 𝑎 𝑗 = 𝑖 𝑤 𝑗𝑖 𝑥 𝑖 + 𝑏 𝑗 𝑎= 𝑊 𝑇 𝑥+𝑏 Training, Learning, Parameter estimation Objective minimization 𝐿 𝑊, 𝑏 = 𝑑=1 |𝐷| 𝑎 𝑑 − 𝑦 𝑑 2 𝐷={( 𝑥 𝑑 , 𝑦 𝑑 )} 𝑊 ∗ , 𝑏 ∗ =argmin 𝐿(𝑊, 𝑏)

Linear Regression – Least Squares 𝐿 𝑊, 𝑏 = 𝑑=1 |𝐷| 𝑎 𝑑 − 𝑦 𝑑 2 𝑎 𝑗 (𝑑) = 𝑖 𝑤 𝑗𝑖 𝑥 𝑖 (𝑑) + 𝑏 𝑗 𝐿 𝑗 𝑊, 𝑏 = 𝑑=1 |𝐷| 𝑖 𝑤 𝑗𝑖 𝑥 𝑖 (𝑑) + 𝑏 𝑗 − 𝑦 𝑗 𝑑 2

Linear Regression – Least Squares 𝐿 𝑗 𝑊, 𝑏 = 𝑑=1 |𝐷| 𝑖 𝑤 𝑗𝑖 𝑥 𝑖 (𝑑) + 𝑏 𝑗 − 𝑦 𝑗 𝑑 2 𝑊 ∗ , 𝑏 ∗ =argmin 𝐿(𝑊, 𝑏) 𝑑 𝐿 𝑗 𝑑 𝑤 𝑢𝑣 𝑊, 𝑏 = 𝑑𝐿 𝑑 𝑤 𝑢𝑣 ( 𝑑=1 |𝐷| 𝑖 𝑤 𝑗𝑖 𝑥 𝑖 (𝑑) + 𝑏 𝑗 − 𝑦 𝑗 𝑑 2 )

Linear Regression – Least Squares 𝑑 𝐿 𝑗 𝑑 𝑤 𝑢𝑣 𝑊, 𝑏 = 𝑑 𝐿 𝑗 𝑑 𝑤 𝑢𝑣 ( 𝑑=1 |𝐷| 𝑖 𝑤 𝑗𝑖 𝑥 𝑖 (𝑑) + 𝑏 𝑗 − 𝑦 𝑗 𝑑 2 ) 𝑑 𝐿 𝑗 𝑑 𝑤 𝑢𝑣 𝑊, 𝑏 = ( 𝑑=1 |𝐷| 𝑑 𝐿 𝑗 𝑑 𝑤 𝑢𝑣 𝑖 𝑤 𝑗𝑖 𝑥 𝑖 (𝑑) + 𝑏 𝑗 − 𝑦 𝑗 𝑑 2 ) =0 … 𝑊= (𝑋 𝑇 𝑋) −1 𝑋 𝑇 𝑌

Neural Network with One Layer 𝑊=[𝑤 𝑗𝑖 ] 𝑥 1 𝑎 1 𝑥 2 𝑥 3 𝑎 2 𝑥 4 𝑥 5 𝑎 𝑗 =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑖 𝑤 𝑗𝑖 𝑥 𝑖 + 𝑏 𝑗 )

Neural Network with One Layer 𝐿 𝑊, 𝑏 = 𝑑=1 |𝐷| 𝑎 𝑑 − 𝑦 𝑑 2 𝑎 𝑗 (𝑑) =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑖 𝑤 𝑗𝑖 𝑥 𝑖 𝑑 + 𝑏 𝑗 ) 𝐿 𝑗 𝑊, 𝑏 = 𝑑=1 |𝐷| 𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑖 𝑤 𝑗𝑖 𝑥 𝑖 𝑑 + 𝑏 𝑗 )− 𝑦 𝑗 𝑑 2

Neural Network with One Layer 𝐿 𝑗 𝑊, 𝑏 = 𝑑=1 |𝐷| 𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑖 𝑤 𝑗𝑖 𝑥 𝑖 𝑑 + 𝑏 𝑗 )− 𝑦 𝑗 𝑑 2 𝑑 𝐿 𝑗 𝑑 𝑤 𝑢𝑣 = 𝑑=1 |𝐷| 𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑖 𝑤 𝑗𝑖 𝑥 𝑖 𝑑 + 𝑏 𝑗 )− 𝑦 𝑗 𝑑 2 =0 We can compute this derivative but there is no closed-form solution for W when dL/dw = 0

Gradient Descent 1. Start with a random value of w (e.g. w = 12) 𝐿 𝑤 2. Compute the gradient (derivative) of L(w) at point w = 12. (e.g. dL/dw = 6) w=12 3. Recompute w as: w = w – lambda * (dL / dw) 𝑤

Gradient Descent expensive 𝜆=0.01 for e = 0, num_epochs do end Initialize w and b randomly 𝑑𝐿(𝑤,𝑏)/𝑑𝑤 𝑑𝐿(𝑤,𝑏)/𝑑𝑏 Compute: and Update w: Update b: 𝑤=𝑤 −𝜆 𝑑𝐿(𝑤,𝑏)/𝑑𝑤 𝑏=𝑏 −𝜆 𝑑𝐿(𝑤,𝑏)/𝑑𝑏 Print: 𝐿(𝑤,𝑏) // Useful to see if this is becoming smaller or not. 𝐿(𝑤,𝑏)= 𝑖=1 𝑛 𝑙(𝑤,𝑏)

Stochastic Gradient Descent 𝜆=0.01 for e = 0, num_epochs do end Initialize w and b randomly 𝑑 𝐿 𝐵 (𝑤,𝑏)/𝑑𝑤 𝑑 𝐿 𝐵 (𝑤,𝑏)/𝑑𝑏 Compute: and Update w: Update b: 𝑤=𝑤 −𝜆 𝑑𝑙(𝑤,𝑏)/𝑑𝑤 𝑏=𝑏 −𝜆 𝑑𝑙(𝑤,𝑏)/𝑑𝑏 Print: 𝐿 𝐵 (𝑤,𝑏) // Useful to see if this is becoming smaller or not. 𝐿 𝐵 (𝑤,𝑏)= 𝑖=1 𝐵 𝑙(𝑤,𝑏)

Deep Learning Lab 𝑎 𝑗 =𝑠𝑖𝑔𝑚𝑜𝑖𝑑( 𝑖 𝑤 𝑗𝑖 𝑥 𝑖 + 𝑏 𝑗 )

Two Layer Neural Network 𝑎 1 𝑥 1 𝑎 2 𝑥 2 𝑦 1 𝑦 1 𝑎 3 𝑥 3 𝑎 4 𝑥 4

Forward Pass 𝑎 1 𝑥 1 𝑎 2 𝑥 2 𝑦 1 𝑦 1 𝑎 3 𝑥 3 𝑎 4 𝑥 4 𝑧 𝑖 = 𝑖=0 𝑛 𝑤 1𝑖𝑗 𝑥 𝑖 + 𝑏 1 𝑎 𝑖 = 𝑆𝑖𝑔𝑚𝑜𝑖𝑑( 𝑧 𝑖 ) 𝑝 1 = 𝑖=0 𝑛 𝑤 2𝑖 𝑎 𝑖 + 𝑏 2 𝑦 1 = 𝑆𝑖𝑔𝑚𝑜𝑖𝑑( 𝑝 𝑖 ) 𝑎 1 𝑥 1 𝑎 2 𝑥 2 𝐿𝑜𝑠𝑠=𝐿( 𝑦 1 , 𝑦 1 ) 𝑦 1 𝑦 1 𝑎 3 𝑥 3 𝑎 4 𝑥 4

Backward Pass - Backpropagation 𝜕𝐿 𝜕 𝑥 𝑘 =( 𝜕 𝜕 𝑥 𝑘 𝑖=0 𝑛 𝑤 1𝑖𝑗 𝑥 𝑖 + 𝑏 1 ) 𝜕𝐿 𝜕 𝑧 𝑖 𝜕𝐿 𝜕 𝑤 1𝑖𝑗 = 𝜕 𝑥 𝑘 𝜕 𝑤 1𝑖𝑗 𝜕𝐿 𝜕 𝑥 𝑘 𝜕𝐿 𝜕 𝑧 𝑖 = 𝜕 𝜕 𝑧 𝑖 𝑆𝑖𝑔𝑚𝑜𝑖𝑑( 𝑧 𝑖 ) 𝜕𝐿 𝜕 𝑎 𝑘 GradInputs 𝜕𝐿 𝜕 𝑎 𝑘 =( 𝜕 𝜕 𝑎 𝑘 𝑖=0 𝑛 𝑤 2𝑖 𝑎 𝑖 + 𝑏 2 ) 𝜕𝐿 𝜕 𝑝 1 𝜕𝐿 𝜕 𝑤 2𝑖 = 𝜕 𝑎 𝑘 𝜕 𝑤 2𝑖 𝜕𝐿 𝜕 𝑎 𝑘 𝜕𝐿 𝜕 𝑝 1 = 𝜕 𝜕 𝑝 1 𝑆𝑖𝑔𝑚𝑜𝑖𝑑( 𝑝 𝑖 ) 𝜕𝐿 𝜕 𝑦 1 𝑎 1 𝑥 1 𝑎 2 𝑥 2 𝜕𝐿 𝜕 𝑦 1 = 𝜕 𝜕 𝑦 1 𝐿( 𝑦 1 , 𝑦 1 ) 𝑦 1 𝑦 1 𝑎 3 𝑥 3 𝑎 4 𝑥 4 GradParams

Layer-wise implementation

Layer-wise implementation

Automatic Differentiation You only need to write code for the forward pass, backward pass is computed automatically. https://github.com/nlintz/TensorFlow-Tutorials/blob/master/03_net.ipynb

Questions?