Neural Networks I CMPUT 466/551 Nilanjan Ray. Outline Projection Pursuit Regression Neural Network –Background –Vanilla Neural Networks –Back-propagation.

Slides:



Advertisements
Similar presentations
Neural Networks and Kernel Methods
Advertisements

Multi-Layer Perceptron (MLP)
Slides from: Doug Gray, David Poole
NEURAL NETWORKS Backpropagation Algorithm
Neural networks Introduction Fitting neural networks
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Neural Networks  A neural network is a network of simulated neurons that can be used to recognize instances of patterns. NNs learn by searching through.
Artificial neural networks
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
CS 4700: Foundations of Artificial Intelligence
Reading for Next Week Textbook, Section 9, pp A User’s Guide to Support Vector Machines (linked from course website)
Lecture 13 – Perceptrons Machine Learning March 16, 2010.
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Machine Learning Neural Networks
Lecture 14 – Neural Networks
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Example: ZIP Code Recognition Classification of handwritten numerals.
Chapter 5 NEURAL NETWORKS
Neural Networks Multi-stage regression/classification model activation function PPR also known as ridge functions in PPR output function bias unit synaptic.
Neural Networks Marco Loog.
Lecture 4 Neural Networks ICS 273A UC Irvine Instructor: Max Welling Read chapter 4.
MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
ICS 273A UC Irvine Instructor: Max Welling Neural Networks.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Artificial Neural Networks
Biointelligence Laboratory, Seoul National University
Classification Part 3: Artificial Neural Networks
Introduction to Neural Networks Debrup Chakraborty Pattern Recognition and Machine Learning 2006.
Chapter 11 – Neural Networks COMP 540 4/17/2007 Derek Singer.
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
Classification / Regression Neural Networks 2
Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.
Multi-Layer Perceptron
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
Neural Networks - lecture 51 Multi-layer neural networks  Motivation  Choosing the architecture  Functioning. FORWARD algorithm  Neural networks as.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
Chapter 8: Adaptive Networks
Learning Neural Networks (NN) Christina Conati UBC
Neural Networks 2nd Edition Simon Haykin
Convolutional Neural Network
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Machine Learning Supervised Learning Classification and Regression
Deep Learning Amin Sobhani.
ECE 5424: Introduction to Machine Learning
CS623: Introduction to Computing with Neural Nets (lecture-5)
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Neural Networks A neural network is a network of simulated neurons that can be used to recognize instances of patterns. NNs learn by searching through.
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Classification / Regression Neural Networks 2
CSC 578 Neural Networks and Deep Learning
Artificial Neural Networks
Neural Network - 2 Mayank Vatsa
Neural Networks Geoff Hulten.
Capabilities of Threshold Neurons
Multilayer Perceptron: Learning : {(xi, f(xi)) | i = 1 ~ N} → W
Ch4: Backpropagation (BP)
Neural networks (1) Traditional multi-layer perceptrons
Artificial Intelligence 10. Neural Networks
CS623: Introduction to Computing with Neural Nets (lecture-5)
Computer Vision Lecture 19: Object Recognition III
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.
Ch4: Backpropagation (BP)
Outline Announcement Neural networks Perceptrons - continued
Presentation transcript:

Neural Networks I CMPUT 466/551 Nilanjan Ray

Outline Projection Pursuit Regression Neural Network –Background –Vanilla Neural Networks –Back-propagation –Examples

Projection Pursuit Regression Additive model with non-linear g m’s Features X is projected to parameters w m, which we have to find from training data Precursors to neural networks

Fitting a PPR Model Minimize squared-error loss function: Proceed in forward stages: M=1, 2,…etc. –At each stage, estimate g, given w (Say, by fitting a spline function) –Estimate w, given g (details provided in the next slide) The value of M is decided by cross-validation

Fitting a PPR Model… At stage m, given g, compute w (Gauss- Newton search) weights adjusted responses So, this is a weighted least-square technique Residual after stage m-1

Vanilla Neural Network hidden layer Input layer Output layer

The Sigmoid Function σ(0.5v) σ(10v) σ(sv) = 1/(1+exp(-sv)) a smooth (regularized) threshold function s controls the activation rate s↑, hard activation s↓, close to identity function

Multilayer Feed Forward NN Examples architectures

NN: Universal Approximator A NN with one hidden units, can approximate arbitrarily well any functional continuous mapping from one finite dimensional space to another, provided number of hidden units is sufficiently large. Proof is based on Fourier expansion of a function (see Bishop).

NN: Kolomogorov’s Theorem Any continuous mapping f(x) from d input variables can be expressed by a neural networks with two hidden layers of nodes. The first layer contains d(2d+1) nodes, and the second layer contains (2d+1) nodes. So, why bother about topology at all? This ‘universal’ architecture is impractical because the functions represented by hidden units will be non-smooth and are unsuitable for learning. (see Bishop for more.)

The XOR Problem and NN Activation functions are hard thresholds at 0

Fitting Neural Networks Parameters to learn from training data Cost functions –Sum-of-squared errors for regression –Cross-entropy errors for classification

Gradient descent: Back- propagation

Back-propagation: Implementation Step 1: Initialize the parameters (weights) of NN Iterate –Forward pass: compute f k (X) for the current parameter values starting at the input layer and moving all the up to the output layer. –Backward pass: Start at the output layer; compute  i ; go down one layer at a time and compute s mi all the way down to the input layer –Update weights by gradient descent rule

Issues in Training Neural Networks Initial values of parameters –Back-propagation finds local minimum Overfitting –Neural networks have too many parameters –Early stop and regularization Scaling of the inputs –Inputs are typically scaled to have zero mean and unit standard deviation Number of hidden units and layers –Better to have too many than too few –With ‘traditional’ back-propagation a long NN gets stuck in local minima and does not learn well

Avoiding Overfitting Weight decay cost function: Weight elimination penalty function:

Example

Example: ZIP Code Recognition

Some Architectures for ZIP Code Recognition

Architectures and Parameters Net-1: No hidden layer, equivalent to multinomial logistic regression Net-2: One hidden layer, 12 hidden units fully connected Net-3: Two hidden layers, locally connected Net-4: Two hidden layers, locally connected with weight sharing Net-5: Two hidden layers, locally connected, two levels of weight sharing Weight sharing is also known as convolutional neural networks

More on Architectures and Results ArchitectureLinksWeights%Correct Net-1Single layer Net-2Two layer Net-3Locally connected Net-4Constrained Net-5Constrained Net-1:#Links/Weights-2570 = 16*16*10+10 Net-2:#Links/Weights-16*16* *10+10=3214 Net-3:#Links/Weights-8*8*3*3+8*8+4*4*5*5+4*4+10*4*4+10=1226 Net-4:#Links- 2*8*8*3*3 + 2*8*8 + 4*4*5*5*2 + 4*4 + 10*4*4+10 = 2266 #Weights- 2*3*3 + 2*8*8 + 4*4*5*5*2 + 4*4 + 10*4*4+10 = 1132 Net-5:#Links- 2*8*8*3*3 + 2*8*8 + 4*4*4*5*5*2 + 4*4*4 + 4*4*4* = 5194 #Weights- 2*3*3 + 2*8*8 + 4*5*5*2 + 4*4*4 + 4*4*4* = 1060

Performance vs. Training Time

Some References C.M. Bishop, Neural Networks for Pattern Recognition, Oxford Univ. Press, (For good understanding) S. Haykin, Neural Networks and Learning Machines, Prentice Hall, (For very basic reading, lots of examples etc.) Prominent Researchers: –Yann LeCun ( –G.E. Hinton ( –Yosua Bengio html