Robert J. Marks II CIA Lab Baylor University School of Engineering CiaLab.org Artificial Neural Networks: Supervised Models.

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Multi-Layer Perceptron (MLP)
Beyond Linear Separability
Slides from: Doug Gray, David Poole
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Introduction to Neural Networks Computing
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Supervised Learning Recap
Machine Learning Neural Networks
Artificial Neural Networks
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Lecture 14 – Neural Networks
Simple Neural Nets For Pattern Classification
The back-propagation training algorithm
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Back-Propagation Algorithm
Artificial Neural Networks
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
CS 4700: Foundations of Artificial Intelligence
CS 484 – Artificial Intelligence
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Neural Networks Lecture 8: Two simple learning algorithms
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Artificial Neural Networks
Computer Science and Engineering
Artificial Neural Networks
1 Artificial Neural Networks Sanun Srisuk EECP0720 Expert Systems – Artificial Neural Networks.
Chapter 9 Neural Network.
CS464 Introduction to Machine Learning1 Artificial N eural N etworks Artificial neural networks (ANNs) provide a general, practical method for learning.
Machine Learning Chapter 4. Artificial Neural Networks
Chapter 3 Neural Network Xiu-jun GONG (Ph. D) School of Computer Science and Technology, Tianjin University
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
1 Chapter 6: Artificial Neural Networks Part 2 of 3 (Sections 6.4 – 6.6) Asst. Prof. Dr. Sukanya Pongsuparb Dr. Srisupa Palakvangsa Na Ayudhya Dr. Benjarath.
Artificial Intelligence Techniques Multilayer Perceptrons.
Artificial Intelligence Chapter 3 Neural Networks Artificial Intelligence Chapter 3 Neural Networks Biointelligence Lab School of Computer Sci. & Eng.
Non-Bayes classifiers. Linear discriminants, neural networks.
Neural Networks and Backpropagation Sebastian Thrun , Fall 2000.
Fundamentals of Artificial Neural Networks Chapter 7 in amlbook.com.
Chapter 2 Single Layer Feedforward Networks
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
EEE502 Pattern Recognition
Chapter 6 Neural Network.
Artificial Intelligence Methods Neural Networks Lecture 3 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
129 Feed-Forward Artificial Neural Networks AMIA 2003, Machine Learning Tutorial Constantin F. Aliferis & Ioannis Tsamardinos Discovery Systems Laboratory.
Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning Supervised Learning Classification and Regression
Neural networks.
第 3 章 神经网络.
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Artificial Neural Networks
Machine Learning Today: Reading: Maria Florina Balcan
CSC 578 Neural Networks and Deep Learning
Artificial Intelligence Chapter 3 Neural Networks
Neural Networks Chapter 5
Artificial Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Machine Learning: Lecture 4
Machine Learning: UNIT-2 CHAPTER-1
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Outline Announcement Neural networks Perceptrons - continued
Presentation transcript:

Robert J. Marks II CIA Lab Baylor University School of Engineering CiaLab.org Artificial Neural Networks: Supervised Models

Robert J. Marks II Supervised Learning u Given: l Input (Stimulus)/Output (Response) Data u Object: l Train a machine to simulate the input/output relationship l Types u Classification (Discrete Outputs) u Regression (Continuous Outputs)

Robert J. Marks II Training a Classifier > classifier < Marks > classifier < not Marks > classifier < not Marks > classifier < not Marks > classifier < Marks > classifier < not Marks

Robert J. Marks II Recall from a Trained Classifier > Classifier > Marks Note: The test image does not appear in the training data. Learning = Memorization

Robert J. Marks II Classifier In Feature Space, After Training representation concept (truth) = training data = Marks = not Marks = test data (Marks)

Robert J. Marks II Supervised Regression (Interpolation) u Output data is continuous rather than discrete u Example - Load Forecasting l Training (from historical data): u Input: temperatures, current load, day of week, holiday(?), etc. u Output: next day’s load l Test u Input: forecasted temperatures, current load, day of week, holiday(?), etc. u Output: tomorrow’s load forecast

Robert J. Marks II Properties of Good Classifiers and Regression Machines u Good accuracy outside of training set u Explanation Facility l Generate rules after training u Fast training u Fast testing

Robert J. Marks II Some Classifiers and Regression Machines u Classification & Autoregression Trees (CART) u Nearest Neighbor Look-Up u Neural Networks l Layered Perceptron (or MLP’s) l Recurrent Perceptrons l Cascade Correlation Neural Networks l Radial Basis Function Neural Networks

Robert J. Marks II A Model of an Artificial Neuron w4w4 w3w3 w2w2 w5w5 w1w1 s1s1 s5s5 s4s4 s3s3 s2s2 s = state =  (sum)  (. ) = squashing function sum s n sum =  w n s n   ( sum )

Robert J. Marks II Squashing Functions sum  (sum) 1 sigmoid:  ( x ) = __________ e - x

Robert J. Marks II A Layered Perceptron interconnects neurons hidden layer output input

Robert J. Marks II Training u Given Training Data, l input vector set : { i n | 1 < n < N } l corresponding output (target) vector set: { t n | 1 < n < N } u Find the weights of the interconnects using training data to minimize error in the test data

Robert J. Marks II Error u Input, target & response l input vector set : { i n | 1 < n < N } l target vector set: { t n | 1 < n < N } l o n = neural network output when the input is i n. (Note: o n = t n ) u Error       o n - t n   n 1212

Robert J. Marks II Error Minimization Techniques u The error is a function of the l fixed training and test data l neural network weights u Find weights that minimize error (Standard Optimization) l conjugate gradient descent l random search l genetic algorithms l steepest descent (error backpropagation)

Robert J. Marks II Minimizing Error Using Steepest Descent u The main idea: Find the way downhill and take a step: E x minimum downhill = - _____ d E d x  = step size x x -   d E d x

Robert J. Marks II Example of Steepest Descent E ( x ) = _ x 2 ; minimum at x = 0 - ___ = - x x x -      x Solution to difference equation x p    x p-1 is x p    p  x 0. for |    | < 1, x 1   d E d x 1212 d E d x

Robert J. Marks II Training the Perceptron       o n - t n          w nk  i k -t n     i m   w mk  i k - t m   i j  o m - t m  n = 1 n=1 k=1 k= d  d w m j  o 1 o 2 i 1 i 2 i 3 i 4 w 11 w 24

Robert J. Marks II Weight Update   i j  o m - t m   for m = 4 and j = 2 w 24 w 24 -  i 4  o 2 - t 2  o 1 o 2 i 1 i 2 i 3 i 4 w 11 w 24 d  d w m j 

Robert J. Marks II No Hidden Alters = Linear Separation o =  (   w n i n ) For classifier, threshold: If o > ___, announce class #1 If o < ___, announce class #2 Classification boundary: o = ___, or  w n i n = 0. This is the equation of a plane ! n n o w1w1 w3w3 w2w2 i 1 i 2 i 3

Robert J. Marks II   w n i n = 0 = line through origin. n i 2 Classification Boundary i 1

Robert J. Marks II Adding Bias Term o w1w1 w2w2 w3w3 w4w4 i 1 i 2 i 3 1 Classification boundary still a line, but need not go through origin. i 2 i 1

Robert J. Marks II The Minsky-Papert Objection i 2 i The simple operation of the exclusive or (XOR) cannot be resolved using a linear perceptron with bias.  More important problems can probably thus not be resolved with a linear perceptron with bias. ?

Robert J. Marks II The Layered Perceptron interconnect: weights = w jk ( l ) neurons: states = s j ( l ) hidden layer: l output: l = L input: l = 0

Robert J. Marks II Error Backpropagation d E d w jk ( l ) ______ = _____ ________ ________ Problem: For an arbitrary weight, w jk ( l ), update w jk ( l ) w jk ( l ) -  ______ A Solution:  Error Backpropagation  Chain rule for partial fractions d  d  d s j ( l ) d sum j ( l ) d w m j d s j ( l ) d sum j ( l ) d w m j

Robert J. Marks II Each Partial is Evaluated (Beautiful Math!!!) d s j ( l ) d 1 d sum j ( l ) d sum j ( l ) 1 + exp[ - sum j ( l ) ] = s j ( l ) [ 1 - s j ( l ) ] d sum j ( l ) d w m j d E d s j ( l ) ________ = _______ _________________ ________ = s ( l -1) =  j ( l ) =   n ( l +1) s n ( l +1) [ 1 - s n ( l +1) ] w nj ( l ) n m

Robert J. Marks II Weight Update d E d w jk ( l ) ______ = _____ ________ ________ w jk ( l ) w jk ( l ) -  ______ d  d  d s j ( l ) d sum j ( l ) d w m j d s j ( l ) d sum j ( l ) d w m j =  j ( l ) s j ( l +1) [ 1 - s j ( l +1) ] s k ( l -1)

Robert J. Marks II Step #1: Input Data & Feedforward s 1 (2) = o 1 s 2 (2) = o 2 s 1 (1) s 2 (1) s 3 (1) i 1 i 2 = s 2 (0) The states of all of the neurons are determined by the states of the neurons below them and the interconnect weights.

Robert J. Marks II Step #2: Evaluate output error, backpropagate to find  ’s for each neuron o 1, t 1 o 2, t 2  1 (2)  2 (2) s 1 (1) s 2 (1) s 3 (1)  1 (1)  2 (1)  3 (1) i 1 i 2 = s 2 (0)  1 (0)  2 (0) Each neuron now keeps track of two numbers. The  ’s for each neuron are determined by “back- propagating” the output error towards the input.

Robert J. Marks II Step #3: Update Weights o 1, t 1 o 2, t 2  1 (2)  2 (2) s 1 (1) s 2 (1) s 3 (1)  1 (1)  2 (1)  3 (1) i 1 i 2 = s 2 (0)  1 (0)  2 (0) w 32 (1) -   3 (1) s 3 (1) [ 1 - s 3 (1) ] s 2 (0) Weight updates are performed within the neural network architecture

Robert J. Marks II Neural Smithing u Bias u Momentum u Batch Training u Learning Versus Memorization u Cross Validation u The Curse of Dimensionality u Variations

Robert J. Marks II Bias u Bias is used with MLP l At input l Hidden layers (sometimes)

Robert J. Marks II Momentum u Steepest descent w jk ( l ) w jk ( l ) +   w jk ( l )  With Momentum,  w jk ( l ) = w jk ( l ) +   w jk ( l ) +  w jk ( l ) l New step effected by previous step l m is the iteration number l Convergence is improved mmm+1

Robert J. Marks II Back Propagation Batch Training u Accumulate error from all training data prior to weight update l True steepest descent l Update weights each epoch u Training Layered Perceptron One Data pair at a time l Randomize data to avoid structure l The Widrow-Hoff Algorithm

Robert J. Marks II Learning versus Memorization: Both have zero training error good generalization (learning) concept (truth) bad generalization (memorization) = training data = test data

Robert J. Marks II Alternate View: concept learning memorization (over fitting)

Robert J. Marks II Learning versus Memorization (cont.) u Successful Learning: l Recognizing data outside the training set, e.g. data in the test set. l i.e. the neural network must successfully classify (interpolate) inputs it has not seen before. u How can we assure learning? l Cross Validation l Choosing neural network structure u Pruning u Genetic Algorithms

Robert J. Marks II Cross Validation iterations (m) test error training error minimum

Robert J. Marks II The Curse of Dimensionality For many problems, the required number of training data increases to the power of the input’s dimension. Example: For N=2 inputs, suppose that 100 = 10 2 training data pairs For N=3 inputs, 10 3 = 1000 training data pairs are needed In general, 10 N training data pairs are needed for many important problems.

Robert J. Marks II Example: Classifying a circle in a square i1i1 i2i2 neural net o i 1 i = 10 2 points are shown.

Robert J. Marks II Example: Classifying a sphere in a cube N=3 neural net o i 1 i 2 i 3 i3i3 i2i2 i1i1 10 layers each with 10 2 points = 10 3 points = 10 N points

Robert J. Marks II Variations u Architecture variation for MLP’s l Recurrent Neural Networks l Radial Basis Functions l Cascade Correlation l Fuzzy MLP’s u Training Algorithms

Robert J. Marks II Applications u Power Engineering u Finance u Bioengineering u Control u Industrial Applications u Politics

Robert J. Marks II Political Applications Robert Novak syndicated column Washington, February 18, 1996 UNDECIDED BOWLERS “President Clinton’s pollsters have identified the voters who will determine whether he will be elected to a second term: two-parent families whose members bowl for recreation.” “Using a technique they call the ‘neural network,’ Clinton advisors contend that these family bowlers are the quintessential undecided voters. Therefore, these are the people who must be targeted by the president.”

Robert J. Marks II “A footnote: Two decades ago, Illinois Democratic Gov. Dan Walker campaigned heavily in bowling alleys in the belief he would find swing voters there. Walker had national political ambitions but ended up in federal prison.” Robert Novak syndicated column Washington, February 18, 1996 (continued)

Robert J. Marks II Finis