Artificial Neural Networks ML 4.6-4.9 Paul Scheible.

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

EA C461 - Artificial Intelligence
Slides from: Doug Gray, David Poole
NEURAL NETWORKS Backpropagation Algorithm
Neural networks Introduction Fitting neural networks
Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Tuomas Sandholm Carnegie Mellon University Computer Science Department
Reading for Next Week Textbook, Section 9, pp A User’s Guide to Support Vector Machines (linked from course website)
Neural Nets Using Backpropagation Chris Marriott Ryan Shirley CJ Baker Thomas Tannahill.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Biological neuron artificial neuron.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.
Back-Propagation Algorithm
Artificial Neural Networks
October 14, 2010Neural Networks Lecture 12: Backpropagation Examples 1 Example I: Predicting the Weather We decide (or experimentally determine) to use.
MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Artificial Neural Networks
LOGO Classification III Lecturer: Dr. Bo Yuan
ICS 273A UC Irvine Instructor: Max Welling Neural Networks.
CS 484 – Artificial Intelligence
131 Feed-Forward Artificial Neural Networks MEDINFO 2004, T02: Machine Learning Methods for Decision Support and Discovery Constantin F. Aliferis & Ioannis.
Artificial Neural Networks
Biointelligence Laboratory, Seoul National University
December 5, 2012Introduction to Artificial Intelligence Lecture 20: Neural Network Application Design III 1 Example I: Predicting the Weather Since the.
Computer Science and Engineering
Integrating Neural Network and Genetic Algorithm to Solve Function Approximation Combined with Optimization Problem Term presentation for CSC7333 Machine.
1 Artificial Neural Networks Sanun Srisuk EECP0720 Expert Systems – Artificial Neural Networks.
Neural Networks Chapter 6 Joost N. Kok Universiteit Leiden.
Chapter 11 – Neural Networks COMP 540 4/17/2007 Derek Singer.
Machine Learning Chapter 4. Artificial Neural Networks
START OF DAY 4 Reading: Chap. 3 & 4. Project Topics & Teams Select topics/domains Select teams Deliverables – Description of the problem – Selection.
Backpropagation An efficient way to compute the gradient Hung-yi Lee.
Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.
Classification / Regression Neural Networks 2
Neural Network Introduction Hung-yi Lee. Review: Supervised Learning Training: Pick the “best” Function f * Training Data Model Testing: Hypothesis Function.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
Multi-Layer Perceptron
Neural Networks and Backpropagation Sebastian Thrun , Fall 2000.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.
Today’s Topics Read: Chapters 7, 8, and 9 on Logical Representation and Reasoning HW3 due at 11:55pm THURS (ditto for your Nannon Tourney Entry) Recipe.
Artificial Neural Networks (Cont.) Chapter 4 Perceptron Gradient Descent Multilayer Networks Backpropagation Algorithm 1.
Artificial Neural Network
Learning Neural Networks (NN) Christina Conati UBC
Multilayer Neural Networks (sometimes called “Multilayer Perceptrons” or MLPs)
BACKPROPAGATION (CONTINUED) Hidden unit transfer function usually sigmoid (s-shaped), a smooth curve. Limits the output (activation) unit between 0..1.
Artificial Neural Network. Introduction Robust approach to approximating real-valued, discrete-valued, and vector-valued target functions Backpropagation.
Artificial Neural Network. Introduction Robust approach to approximating real-valued, discrete-valued, and vector-valued target functions Backpropagation.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
CPH Dr. Charnigo Chap. 11 Notes Figure 11.2 provides a diagram which shows, at a glance, what a neural network does. Inputs X 1, X 2,.., X P are.
Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.
129 Feed-Forward Artificial Neural Networks AMIA 2003, Machine Learning Tutorial Constantin F. Aliferis & Ioannis Tsamardinos Discovery Systems Laboratory.
Neural Networks - Berrin Yanıkoğlu1 MLP & Backpropagation Issues.
Machine Learning Supervised Learning Classification and Regression
Fall 2004 Backpropagation CS478 - Machine Learning.
Linear separability Hyperplane In 2D: Feature 1 Feature 2 A perceptron can separate data that is linearly separable.
Linear separability Hyperplane In 2D: Feature 1 Feature 2 A perceptron can separate data that is linearly separable.
Artificial Neural Networks
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Artificial Neural Networks
Classification / Regression Neural Networks 2
Prof. Carolina Ruiz Department of Computer Science
Artificial Neural Networks
Neural Networks Geoff Hulten.
Artificial Intelligence 10. Neural Networks
Seminar on Machine Learning Rada Mihalcea
Prof. Carolina Ruiz Department of Computer Science
Presentation transcript:

Artificial Neural Networks ML Paul Scheible

BackPropagation Algorithm

Convergence to Local Minima Performs well in many practical problems Local minima are less troubling than one might think  Multi-dimensional problems are unlikely to have a local minima in all dimensions  Any dimension without a local minimum provides an escape route  Starting with small initial weights tend to avoid local minima

Convergence to Local Minima Still no methods known to find when local minima will present a problem  Use momentum  Use stochastic gradient descent  Train several networks with data but different starting bias

Representational Power Boolean functions Continuous functions Arbitrary functions

Other Aspects Continuous hypothesis space Smooth interpolation inductive bias Able to determine its own internal configuration Avoiding overfitting  Weight decay  Cross validation

Face Recognition Example

Problem Identify face orientation from an image Training set of 640 images with resolution 120x128 pixels and 255 value grey scale Training set has varied backgrounds, clothing, expressions, and eye wear (sun glasses)‏

Design Choices Input encoding  Image reduced to 30x32 pixels  Pixels averaged to obtain reduced values  Pixel values scaled from 0 to 255 to 0 to 1 Output encoding  Four outputs  Each output corresponds to a face orientation

Design Choices Network graph structure  Acyclic  Two layer  Hidden units Five minutes to train with 3 units (chosen)‏ One hour to train with 30 units Learning rate η: 0.3 Momentum α: 0.3

Design Choices Full gradient descent Small random weights on output Zero weights on input

Advanced Topics

Alternate Error Functions Add penalty for weight magnitude  Prefers small magnitude vectors  Equivalent of weight decay

Alternate Error Functions Add term for error in slope  Requires knowledge of target function

Alternative Error Functions Minimize cross entropy Relate weights to each other by some design constraint

Alternative Error Minimization Procedures Line Search Conjugate gradient

Recurrent Networks Directed cyclic graphs Used to find recursive functions

Dynamic Modification of Network Structure Cascade-Correlation  Start with one layer  Add hidden node if error too great and retrain holding the hidden node weights constant  Add additional hidden nodes until error is acceptable  Can easily result in overfitting

Pruning Start with complex network Prune unneeded nodes  Weights close to zero  Nodes with little effect on output (better)‏