Neural Network Introduction Hung-yi Lee. Review: Supervised Learning Training: Pick the “best” Function f * Training Data Model Testing: Hypothesis Function.

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Multi-Layer Perceptron (MLP)
Slides from: Doug Gray, David Poole
NEURAL NETWORKS Backpropagation Algorithm
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Introduction to Neural Networks Computing
Artificial Neural Networks
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Classification Neural Networks 1
Machine Learning Neural Networks
Lecture 14 – Neural Networks
Supervised and Unsupervised learning and application to Neuroscience Cours CA6b-4.
Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods.
RBF Neural Networks x x1 Examples inside circles 1 and 2 are of class +, examples outside both circles are of class – What NN does.
September 30, 2010Neural Networks Lecture 8: Backpropagation Learning 1 Sigmoidal Neurons In backpropagation networks, we typically choose  = 1 and 
Back-Propagation Algorithm
Neural Networks Chapter Feed-Forward Neural Networks.
Artificial Neural Networks
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
CS 4700: Foundations of Artificial Intelligence
Hazırlayan NEURAL NETWORKS Radial Basis Function Networks II PROF. DR. YUSUF OYSAL.
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Radial Basis Function Networks
Artificial Neural Networks
Biointelligence Laboratory, Seoul National University
Multiple-Layer Networks and Backpropagation Algorithms
Neural NetworksNN 11 Neural netwoks thanks to: Basics of neural network theory and practice for supervised and unsupervised.
1 Artificial Neural Networks Sanun Srisuk EECP0720 Expert Systems – Artificial Neural Networks.
Appendix B: An Example of Back-propagation algorithm
Backpropagation An efficient way to compute the gradient Hung-yi Lee.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 21 Oct 28, 2005 Nanjing University of Science & Technology.
Artificial Intelligence Chapter 3 Neural Networks Artificial Intelligence Chapter 3 Neural Networks Biointelligence Lab School of Computer Sci. & Eng.
Non-Bayes classifiers. Linear discriminants, neural networks.
11 1 Backpropagation Multilayer Perceptron R – S 1 – S 2 – S 3 Network.
ADALINE (ADAptive LInear NEuron) Network and
1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.
Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.
EEE502 Pattern Recognition
Hazırlayan NEURAL NETWORKS Backpropagation Network PROF. DR. YUSUF OYSAL.
Neural Networks Vladimir Pleskonjić 3188/ /20 Vladimir Pleskonjić General Feedforward neural networks Inputs are numeric features Outputs are in.
Neural Networks 2nd Edition Simon Haykin
Artificial Intelligence Methods Neural Networks Lecture 3 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Kim HS Introduction considering that the amount of MRI data to analyze in present-day clinical trials is often on the order of hundreds or.
Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.
Today’s Lecture Neural networks Training
Machine Learning & Deep Learning
The Gradient Descent Algorithm
Data Mining, Neural Network and Genetic Programming
第 3 章 神经网络.
Computing Gradient Hung-yi Lee 李宏毅
Mastering the game of Go with deep neural network and tree search
Matt Gormley Lecture 16 October 24, 2016
Deep Learning Hung-yi Lee 李宏毅.
CSE 473 Introduction to Artificial Intelligence Neural Networks
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
CS 4501: Introduction to Computer Vision Basics of Neural Networks, and Training Neural Nets I Connelly Barnes.
CSC 578 Neural Networks and Deep Learning
Artificial Intelligence Chapter 3 Neural Networks
network of simple neuron-like computing elements
Artificial Intelligence Chapter 3 Neural Networks
Backpropagation.
Artificial Intelligence Chapter 3 Neural Networks
Backpropagation Disclaimer: This PPT is modified based on
Artificial Intelligence Chapter 3 Neural Networks
Backpropagation and Neural Nets
Introduction to Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Principles of Back-Propagation
Derivatives and Gradients
Presentation transcript:

Neural Network Introduction Hung-yi Lee

Review: Supervised Learning Training: Pick the “best” Function f * Training Data Model Testing: Hypothesis Function Set “Best” Function “2” (label) x: function input y: function output “2”“2”

Neural Network Realize it How to pick the “best” function? What is the “best” function? What does the function hypothesis set (model) look like?

Neural Network Realize it How to pick the “best” function? What is the “best” function? What does the function hypothesis set (model) look like?

Neural Network Fully Connected Feedforward Network …… Layer 1 …… Layer 2 …… Layer L …… … Input Output You can always connect the neurons in your own way. vector x vector y

Neural Network …… Layer 1 …… Layer 2 …… Layer L …… … Input Output Input layer Output layer Hidden Layers vector x vector y

Notation …… nodes Layer …… Layer nodes …… Output of a neuron: Neuron i Layer Output of one layer: : a vector

Notation …… nodes Layer …… Layer nodes …… Layer to Layer from neuron j to neuron i (Layer )

Notation …… nodes Layer …… Layer nodes …… : bias for neuron i at layer l bias for all neurons in layer l

Notation …… nodes Layer …… Layer nodes …… : input of the activation function for neuron i at layer l : input of the activation function all the neurons in layer l

Notation - Summary :output of a neuron :output of a layer : input of activation function : input of activation function for a layer : a weight : a weight matrix : a bias : a bias vector

Relations between Layer Outputs …… nodes Layer …… Layer nodes ……

Relations between Layer Outputs …… nodes Layer …… Layer nodes ……

Relations between Layer Outputs …… nodes Layer …… Layer nodes ……

Relations between Layer Outputs …… nodes Layer …… Layer nodes ……

Function of Neural Network vector x vector y

Neural Network Realize it How to pick the “best” function? What is the “best” function? What does the function hypothesis set (model) look like?

Format of Training Data The input/output of neural network model are vectors. Object x and label y should also be represented as vectors. “2” Example: Handwriting Digit Recognition “1” 10 dimensions for digit recognition “1” “2” “3” “1” “2” “3” 1: for ink, 0: otherwise Each pixel corresponds to an element in the vector 28 x x 28 = 784 dimensions x: y:

What is the “Best” Function? Given training data: The “best” function f * is the one who makes for all training examples x r is most close to The best function f * is the one minimizes C. C(f) evaluate the badness of a function f C(f) is a “function of function” (error function, cost function, objective function ……)

 What is the “Best” Function? The best function f * is the one minimizes C(f). Do you like this definition of “best”? Question  Is the distance a good measure to evaluate the closeness? Reference: Golik, Pavel, Patrick Doetsch, and Hermann Ney. "Cross- entropy vs. squared error training: a theoretical and experimental comparison." INTERSPEECH

What is the “Best” Function? Error function: Given training data: (“function of function”) How to find the best parameter θ * that minimizes C(θ). Pick the “best” parameter set θ* (Hypothesis Function Set) Pick the “best” function f*

Neural Network Realize it How to pick the “best” function? What is the “best” function? What does the function hypothesis set (model) look like?

Possible Solution Statement of problems: There is a function C(θ) θ is a set of parameters θ = {θ 1, θ 2, θ 3, ……} Find θ * that minimizes C(θ) Brute force? Enumerate all possible θ Calculus? Find θ * such that

Gradient descent Starting Parameters Hopefully, with sufficient iterations, we can finally find θ* such that C(θ*) is minimized. ……

Gradient descent – one variable For simplification, first consider that θ has only one variable  Randomly start at a point θ 0  Compute C(θ 0 -ε) and C(θ 0 +ε)  If C(θ 0 +ε) < C(θ 0 -ε) θ 1 = θ 0 + ε ……

Gradient descent – two variables Suppose that θ has two variables {θ 1, θ 2 } How to find the smallest value on the red circle? C(θ)

Taylor series Let h(x) be infinitely differentiable around x = x 0.

Taylor series Taylor series for h(x)=sin(x) around x 0 =π/4 sin(x)=

Taylor series Taylor series for h(x)=sin(x) around x 0 =π/4 The approximation is good around π/4. sin(x)= ……

Taylor series One variable: Multivariable: When x is close to x 0 When x and y is close to x 0 and y 0

Gradient descent – two variables Red Circle:(If the radius is small)

Gradient descent – two variables Red Circle:(If the radius is small) Find θ 1 and θ 2 to minimize C’(θ) Simple, right?

Gradient descent – two variables Red Circle:(If the radius is small) Find θ 1 and θ 2 to minimize C’(θ) To minimize C’(θ)

Gradient descent – two variables The results is intuitive, isn’t it?

Gradient descent – High dimension Space of parameter set θ A ball …… The point with minimum C(θ) on the ball is at θ = {θ 1, θ 2, θ 3, ……}

Gradient descent Starting Parameters …… η should be small enough, but should not be too small. η is called “learning rate”

Gradient descent - Problem Different Initializations lead to different local minimums Who is Afraid of Non-Convex Loss Functions?

Gradient descent - Problem Different Initializations lead to different local minimums 20 x y Toy Example

Neural Network Realize it How to pick the “best” function? What is the “best” function? What does the function hypothesis set (model) look like?

Gradient descent for Neural Network

Chain Rule Case 1 Case 2

(chain rule) Gradient descent for Neural Network … Layer L (Output layer) … … … Layer L-1 … … … … Example:

Gradient descent for Neural Network … Layer L (Output layer) … … … Layer L-1 … … … … (constant) (chain rule) Example:

Gradient descent for Neural Network … Layer L (Output layer) … … … Layer L-1 … … … … (chain rule) Example:

Gradient descent for Neural Network (as input is “1”) … Layer L (Output layer) … … … Layer L-1 … … … … Example:

… … Layer L-2 … Layer L (Output layer) … … … Layer L-1 … … … …

(chain rule) Sum over layer L … Layer L … … … Layer L-1 … … … …

(chain rule) Sum over layer L … … Layer L-2 … … Layer L-1

… … Layer L-2 … Layer L (Output layer) … … … Layer L-1 … … Layer L-3 … … … …

Sum over layer L Sum over layer L-1

Summarizing what we have done For parameters between layer L and L-1 For parameters between layer L-2 and L-1 For parameters between layer L-3 and L-2 There are efficient way to compute the gradient – backpropagation.

Reference for Neural network Chapter 2 of Neural network and Deep Learning ap2.html LeCun, Yann A., et al. "Efficient backprop." 98b.pdf Bengio, Yoshua. "Practical recommendations for gradient-based training of deep architectures.“ YB-tricks.pdf

Thank you for your listening!

Appendix

Layer-by-layer

(constant)

(chain rule) Sum over layer L … … Layer L-2 … … Layer L-1

(chain rule) Sum over layer L … Layer L … … … Layer L-1 … … … …

Gradient descent for Neural Network … Layer L (Output layer) … … … Layer L-1 … … … … (as input is “1”) Example:

What is the “Best” Function? (Hypothesis Function Set) The best function θ * is the one minimizes C(θ). Different θ Different f Different C Objective function C is a function of θ C(θ) How to find θ * ? The best function f * is the one minimizes C.

Notation