CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: www.csc.villanova.edu/~map/4510/

Slides:



Advertisements
Similar presentations
Beyond Linear Separability
Advertisements

Slides from: Doug Gray, David Poole
NEURAL NETWORKS Backpropagation Algorithm
Neural networks Introduction Fitting neural networks
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.
For Wednesday Read chapter 19, sections 1-3 No homework.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
CS 4700: Foundations of Artificial Intelligence
Reading for Next Week Textbook, Section 9, pp A User’s Guide to Support Vector Machines (linked from course website)
Lecture 13 – Perceptrons Machine Learning March 16, 2010.
Machine Learning Neural Networks
The back-propagation training algorithm
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Neural Networks Marco Loog.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Back-Propagation Algorithm
Artificial Neural Networks
MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Artificial Neural Networks
LOGO Classification III Lecturer: Dr. Bo Yuan
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
INTRODUCTION TO ARTIFICIAL INTELLIGENCE
CS 484 – Artificial Intelligence
For Monday Read chapter 22, sections 1-3 No homework.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
CISC 4631 Data Mining Lecture 11: Neural Networks.
Neural Networks Chapter 18.7
Artificial Neural Networks
1 CS 343: Artificial Intelligence Neural Networks Raymond J. Mooney University of Texas at Austin.
Artificial Neural Networks
Neural Networks Ellen Walker Hiram College. Connectionist Architectures Characterized by (Rich & Knight) –Large number of very simple neuron-like processing.
1 Machine Learning Neural Networks Neural Networks NN belong to the category of trained classification models The learned classification model is an.
Chapter 11 – Neural Networks COMP 540 4/17/2007 Derek Singer.
Machine Learning Chapter 4. Artificial Neural Networks
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Machine Learning Dr. Shazzad Hosain Department of EECS North South Universtiy
Classification / Regression Neural Networks 2
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi.
Neural Networks and Backpropagation Sebastian Thrun , Fall 2000.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
1 CS 391L: Machine Learning Neural Networks Raymond J. Mooney University of Texas at Austin.
For Friday No reading Take home exam due Exam 2. For Monday Read chapter 22, sections 1-3 FOIL exercise due.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Artificial Neural Networks (Cont.) Chapter 4 Perceptron Gradient Descent Multilayer Networks Backpropagation Algorithm 1.
1 Machine Learning Neural Networks Tomorrow no lesson! (will recover on Wednesday 11, WEKA !!)
Artificial Neural Network
Artificial Intelligence CIS 342 The College of Saint Rose David Goldschmidt, Ph.D.
1 Machine Learning Neural Networks and Deep Learning.
Artificial Neural Network. Introduction Robust approach to approximating real-valued, discrete-valued, and vector-valued target functions Backpropagation.
Machine Learning Supervised Learning Classification and Regression
CS 388: Natural Language Processing: Neural Networks
Artificial Neural Networks
第 3 章 神经网络.
Real Neurons Cell structures Cell body Dendrites Axon
CISC 4631/6930 Data Mining Lecture 11: Neural Networks.
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Neural Networks CS 446 Machine Learning.
Machine Learning Today: Reading: Maria Florina Balcan
CSC 578 Neural Networks and Deep Learning
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Data Mining Neural Networks Last modified 1/9/19.
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Seminar on Machine Learning Rada Mihalcea
Artificial Intelligence Chapter 3 Neural Networks
Presentation transcript:

CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: 8: Multilayer Neural Networks 1 CSC M.A. Papalaskari - Villanova University Some of the slides in this presentation are adapted from: Prof. Frank Klassner’s ML class at Villanova Prof Ray Mooney’s UTexas course The Stanford online ML course Some of the slides in this presentation are adapted from: Prof. Frank Klassner’s ML class at Villanova the University of Manchester ML course The Stanford online ML course Some of the slides in this presentation are adapted from: Prof. Frank Klassner’s ML class at Villanova the University of Manchester ML course The Stanford online ML course Some of the slides in this presentation are adapted from: Prof. Frank Klassner’s ML class at Villanova the University of Manchester ML course The Stanford online ML course Some of the slides in this presentation are adapted from: Prof. Frank Klassner’s ML class at Villanova the University of Manchester ML course The Stanford online ML course Some of the slides in this presentation are adapted from: Prof. Frank Klassner’s ML class at Villanova the University of Manchester ML course The Stanford online ML course Some of the slides in this presentation are adapted from: Prof. Frank Klassner’s ML class at Villanova the University of Manchester ML course The Stanford online ML course Some of the slides in this presentation are adapted from: Prof. Frank Klassner’s ML class at Villanova the University of Manchester ML course The Stanford online ML course Some of the slides in this presentation are adapted from: Prof. Frank Klassner’s ML class at Villanova the University of Manchester ML course The Stanford online ML course

Reminder: Motivation for Neural Networks Learning a non-linear function Take inspiration from the brain CSC M.A. Papalaskari - Villanova University 2 Some of the slides in this presentation are adapted from: Prof. Frank Klassner’s ML class at Villanova the University of Manchester ML course The Stanford online ML course

3 Neural Network Learning Learning approach based on modeling adaptation in biological neural systems. Perceptron: Initial algorithm for learning simple neural networks (single layer) developed in the 1950 ’ s. Backpropagation: More complex algorithm for learning multi-layer neural networks developed in the 1980 ’ s.

Learning Algorithms Many (>100) algorithms for multilayer neural net learning Most popular: backpropagation CSC M.A. Papalaskari - Villanova University 4

5 Backpropagation Training Algorithm 1.Intialization: Set weights to small random real values. 2.Activation (Forward propagation): Calculate output for input x 1,…, x n 3.Weight training (Backward propagation): Update the weights by propagating backwards the errors from the output layer Calculate the error and error gradient for each neuron Calculate weight corrections and update weights 4.Iteration: Go back to step 2 and repeat until all training examples produce the correct value (within ε), or mean squared error ceases to decrease, or other termination criterion.

6 Learning in Multi-Layer Nets In order to apply gradient descent, we need the output of a unit to be a differentiable function of its input and weights. Standard linear threshold function is not differentiable at the threshold. net j oioi TjTj 0 1

7 Differentiable Output Function Need non-linear output function to move beyond linear functions. – A multi-layer linear network is still linear. Standard solution is to use the non-linear, differentiable sigmoidal “ logistic ” function: net j TjTj 0 1 Can also use tanh or Gaussian output function

8 Comments on Training Algorithm Not guaranteed to converge to zero training error, may converge to local optima or oscillate indefinitely. However, in practice, does converge to low error for many large networks on real data. Many epochs (thousands) may be required, hours or days of training for large networks. To avoid local-minima problems, run several trials starting with different random weights (random restarts). – Take results of trial with lowest training set error. – Build a committee of results from multiple trials (possibly weighting votes by training set accuracy).

9 Representational Power Boolean functions: Any boolean function can be represented by a two-layer network with sufficient hidden units. Continuous functions: Any bounded continuous function can be approximated with arbitrarily small error by a two-layer network. – Sigmoid functions can act as a set of basis functions for composing more complex functions, like sine waves in Fourier analysis. Arbitrary function: Any function can be approximated to arbitrary accuracy by a three-layer network.

10 Hidden Unit Representations Trained hidden units can be seen as newly constructed features that make the target concept linearly separable in the transformed space. On many real domains, hidden units can be interpreted as representing meaningful features such as vowel detectors or edge detectors, etc.. However, the hidden layer can also become a distributed representation of the input in which each individual unit is not easily interpretable as a meaningful feature.

11 Over-Training Prevention Running too many epochs can result in over-fitting. Keep a hold-out validation set and test accuracy on it after every epoch. Stop training when additional epochs actually increase validation error. To avoid losing training data for validation: – Use internal 10-fold CV on the training set to compute the average number of epochs that maximizes generalization accuracy. – Train final network on complete training set for this many epochs. error on training data on test data 0 # training epochs

12 Determining the Best Number of Hidden Units Too few hidden units prevents the network from adequately fitting the data. Too many hidden units can result in over-fitting. Use internal cross-validation to empirically determine an optimal number of hidden units. error on training data on test data 0 # hidden units

13 Successful Applications Text to Speech (NetTalk) Fraud detection Financial Applications – HNC (eventually bought by Fair Isaac) Chemical Plant Control – Pavillion Technologies Automated Vehicles Game Playing – Neurogammon Handwriting recognition