Neural Networks: Learning

Slides:



Advertisements
Similar presentations
NEURAL NETWORKS Backpropagation Algorithm
Advertisements

Neural networks Introduction Fitting neural networks
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 7: Learning in recurrent networks Geoffrey Hinton.
Neural Networks: Backpropagation algorithm Data Mining and Semantic Web University of Belgrade School of Electrical Engineering Chair of Computer Engineering.
Reading for Next Week Textbook, Section 9, pp A User’s Guide to Support Vector Machines (linked from course website)
Logistic Regression Classification Machine Learning.
Artificial Neural Networks
Neural Networks I CMPUT 466/551 Nilanjan Ray. Outline Projection Pursuit Regression Neural Network –Background –Vanilla Neural Networks –Back-propagation.
Lecture 14 – Neural Networks
Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods.
Handwritten Character Recognition Using Artificial Neural Networks Shimie Atkins & Daniel Marco Supervisor: Johanan Erez Technion - Israel Institute of.
Radial Basis Functions
Multi Layer Perceptrons (MLP) Course website: The back-propagation algorithm Following Hertz chapter 6.
Hazırlayan NEURAL NETWORKS Radial Basis Function Networks II PROF. DR. YUSUF OYSAL.
CSSE463: Image Recognition Day 21 Upcoming schedule: Upcoming schedule: Exam covers material through SVMs Exam covers material through SVMs.
Biointelligence Laboratory, Seoul National University
Classification Part 3: Artificial Neural Networks
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
Artificial Neural Networks (ANN). Output Y is 1 if at least two of the three inputs are equal to 1.
Backpropagation An efficient way to compute the gradient Hung-yi Lee.
Classification / Regression Neural Networks 2
Neural Network Introduction Hung-yi Lee. Review: Supervised Learning Training: Pick the “best” Function f * Training Data Model Testing: Hypothesis Function.
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
M Machine Learning F# and Accord.net. Alena Dzenisenka Software architect at Luxoft Poland Member of F# Software Foundation Board of Trustees Researcher.
Multi-Layer Perceptron
Non-Bayes classifiers. Linear discriminants, neural networks.
Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
Neural Networks Vladimir Pleskonjić 3188/ /20 Vladimir Pleskonjić General Feedforward neural networks Inputs are numeric features Outputs are in.
BACKPROPAGATION (CONTINUED) Hidden unit transfer function usually sigmoid (s-shaped), a smooth curve. Limits the output (activation) unit between 0..1.
Neural Networks Lecture 11: Learning in recurrent networks Geoffrey Hinton.
Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with.
Intro. ANN & Fuzzy Systems Lecture 11. MLP (III): Back-Propagation.
Chapter 11 – Neural Nets © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Machine Learning Supervised Learning Classification and Regression
Neural networks.
Neural Networks Review
The Gradient Descent Algorithm
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
CSSE463: Image Recognition Day 17
Classification / Regression Neural Networks 2
Logistic Regression Classification Machine Learning.
Logistic Regression Classification Machine Learning.
Prof. Carolina Ruiz Department of Computer Science
Non-linear hypotheses
Lecture 11. MLP (III): Back-Propagation
Classification Neural Networks 1
Logistic Regression Classification Machine Learning.
CSC 578 Neural Networks and Deep Learning
network of simple neuron-like computing elements
Neural Network - 2 Mayank Vatsa
Neural Networks Geoff Hulten.
CSSE463: Image Recognition Day 18
CSSE463: Image Recognition Day 17
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Artificial Neural Networks
Artificial Intelligence 10. Neural Networks
Backpropagation David Kauchak CS159 – Fall 2019.
CSSE463: Image Recognition Day 17
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
CSC321: Neural Networks Lecture 11: Learning in recurrent networks
Logistic Regression Classification Machine Learning.
Logistic Regression Classification Machine Learning.
PYTHON Deep Learning Prof. Muhammad Saeed.
Logistic Regression Classification Machine Learning.
Artificial Neural Networks / Spring 2002
Prof. Carolina Ruiz Department of Computer Science
Presentation transcript:

Neural Networks: Learning Cost function Machine Learning

Multi-class classification (K classes) Neural Network (Classification) Layer 1 Layer 2 Layer 3 Layer 4 total no. of layers in network no. of units (not counting bias unit) in layer Multi-class classification (K classes) K output units Binary classification 1 output unit E.g. , , , pedestrian car motorcycle truck

Cost function Logistic regression: Neural network:

Backpropagation algorithm Neural Networks: Learning Backpropagation algorithm Machine Learning

Gradient computation Need code to compute:

Given one training example ( , ): Forward propagation: Gradient computation Given one training example ( , ): Forward propagation: Layer 1 Layer 2 Layer 3 Layer 4

Gradient computation: Backpropagation algorithm Intuition: “error” of node in layer . Layer 1 Layer 2 Layer 3 Layer 4 For each output unit (layer L = 4)

Backpropagation algorithm Training set Set (for all ). For Set Perform forward propagation to compute for Using , compute Compute

Backpropagation intuition Neural Networks: Learning Backpropagation intuition Machine Learning

Forward Propagation

Forward Propagation

What is backpropagation doing? Focusing on a single example , , the case of 1 output unit, and ignoring regularization ( ), (Think of ) I.e. how well is the network doing on example i?

Forward Propagation “error” of cost for (unit in layer ). Formally, (for ), where

Implementation note: Unrolling parameters Neural Networks: Learning Implementation note: Unrolling parameters Machine Learning

Advanced optimization function [jVal, gradient] = costFunction(theta) optTheta = fminunc(@costFunction, initialTheta, options) … Neural Network (L=4): - matrices (Theta1, Theta2, Theta3) - matrices (D1, D2, D3) “Unroll” into vectors

Example thetaVec = [ Theta1(:); Theta2(:); Theta3(:)]; DVec = [D1(:); D2(:); D3(:)]; Theta1 = reshape(thetaVec(1:110),10,11); Theta2 = reshape(thetaVec(111:220),10,11); Theta3 = reshape(thetaVec(221:231),1,11);

Have initial parameters . Unroll to get initialTheta to pass to Learning Algorithm Have initial parameters . Unroll to get initialTheta to pass to fminunc(@costFunction, initialTheta, options) function [jval, gradientVec] = costFunction(thetaVec) From thetaVec, get . Use forward prop/back prop to compute and . Unroll to get gradientVec.

Neural Networks: Learning Gradient checking Machine Learning

Numerical estimation of gradients Implement: gradApprox = (J(theta + EPSILON) – J(theta – EPSILON)) /(2*EPSILON)

Parameter vector (E.g. is “unrolled” version of )

Check that gradApprox ≈ DVec for i = 1:n, thetaPlus = theta; thetaPlus(i) = thetaPlus(i) + EPSILON; thetaMinus = theta; thetaMinus(i) = thetaMinus(i) – EPSILON; gradApprox(i) = (J(thetaPlus) – J(thetaMinus)) /(2*EPSILON); end; Check that gradApprox ≈ DVec

Implementation Note: Implement backprop to compute DVec (unrolled ). Implement numerical gradient check to compute gradApprox. Make sure they give similar values. Turn off gradient checking. Using backprop code for learning. Important: Be sure to disable your gradient checking code before training your classifier. If you run numerical gradient computation on every iteration of gradient descent (or in the inner loop of costFunction(…))your code will be very slow.

Random initialization Neural Networks: Learning Random initialization Machine Learning

Consider gradient descent Set ? Initial value of For gradient descent and advanced optimization method, need initial value for . optTheta = fminunc(@costFunction, initialTheta, options) Consider gradient descent Set ? initialTheta = zeros(n,1)

Zero initialization After each update, parameters corresponding to inputs going into each of two hidden units are identical.

Random initialization: Symmetry breaking Initialize each to a random value in (i.e. ) E.g. Theta1 = rand(10,11)*(2*INIT_EPSILON) - INIT_EPSILON; Theta2 = rand(1,11)*(2*INIT_EPSILON) - INIT_EPSILON;

Neural Networks: Learning Putting it together Machine Learning

Training a neural network Pick a network architecture (connectivity pattern between neurons) No. of input units: Dimension of features No. output units: Number of classes Reasonable default: 1 hidden layer, or if >1 hidden layer, have same no. of hidden units in every layer (usually the more the better)

Training a neural network Randomly initialize weights Implement forward propagation to get for any Implement code to compute cost function Implement backprop to compute partial derivatives for i = 1:m Perform forward propagation and backpropagation using example (Get activations and delta terms for ).

Training a neural network Use gradient checking to compare computed using backpropagation vs. using numerical estimate of gradient of . Then disable gradient checking code. Use gradient descent or advanced optimization method with backpropagation to try to minimize as a function of parameters

Backpropagation example: Autonomous driving (optional) Neural Networks: Learning Backpropagation example: Autonomous driving (optional) Machine Learning

[Courtesy of Dean Pomerleau]