CS623: Introduction to Computing with Neural Nets (lecture-4)

Slides:



Advertisements
Similar presentations
CS344: Principles of Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 11, 12: Perceptron Training 30 th and 31 st Jan, 2012.
Advertisements

A Brief Overview of Neural Networks By Rohit Dua, Samuel A. Mulder, Steve E. Watkins, and Donald C. Wunsch.
Introduction to Neural Networks Computing
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.
Rutgers CS440, Fall 2003 Neural networks Reading: Ch. 20, Sec. 5, AIMA 2 nd Ed.
Connectionist Modeling Some material taken from cspeech.ucd.ie/~connectionism and Rich & Knight, 1991.
The McCulloch-Pitts Neuron. Characteristics The activation of a McCulloch Pitts neuron is binary. Neurons are connected by directed weighted paths. A.
CS623: Introduction to Computing with Neural Nets (lecture-10) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
Artificial Neural Networks
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 35: Backpropagation; need for.
Appendix B: An Example of Back-propagation algorithm
 Diagram of a Neuron  The Simple Perceptron  Multilayer Neural Network  What is Hidden Layer?  Why do we Need a Hidden Layer?  How do Multilayer.
Artificial Intelligence Lecture No. 29 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
CS621: Artificial Intelligence Lecture 11: Perceptrons capacity Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 31: Feedforward N/W; sigmoid.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 30: Perceptron training convergence;
Multi-Layer Perceptron
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 29: Perceptron training and.
Instructor: Prof. Pushpak Bhattacharyya 13/08/2004 CS-621/CS-449 Lecture Notes CS621/CS449 Artificial Intelligence Lecture Notes Set 4: 24/08/2004, 25/08/2004,
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
CS621 : Artificial Intelligence
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 32: sigmoid neuron; Feedforward.
Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 21: Perceptron training and convergence.
EEE502 Pattern Recognition
Bab 5 Classification: Alternative Techniques Part 4 Artificial Neural Networks Based Classifer.
CS623: Introduction to Computing with Neural Nets (lecture-9) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 20: Neural Net Basics: Perceptron.
CS621: Artificial Intelligence Lecture 10: Perceptrons introduction Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Multiple-Layer Networks and Backpropagation Algorithms
Fall 2004 Backpropagation CS478 - Machine Learning.
Neural Networks.
Learning with Perceptrons and Neural Networks
CS623: Introduction to Computing with Neural Nets (lecture-5)
Real Neurons Cell structures Cell body Dendrites Axon
CS344: Introduction to Artificial Intelligence (associated lab: CS386)
CSE 473 Introduction to Artificial Intelligence Neural Networks
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
CS621: Artificial Intelligence
Machine Learning Today: Reading: Maria Florina Balcan
Data Mining with Neural Networks (HK: Chapter 7.5)
CS623: Introduction to Computing with Neural Nets (lecture-2)
CS621: Artificial Intelligence Lecture 17: Feedforward network (lecture 16 was on Adaptive Hypermedia: Debraj, Kekin and Raunak) Pushpak Bhattacharyya.
Chapter 3. Artificial Neural Networks - Introduction -
Neural Network - 2 Mayank Vatsa
Pushpak Bhattacharyya Computer Science and Engineering Department
CS 621 Artificial Intelligence Lecture 25 – 14/10/05
Capabilities of Threshold Neurons
CS623: Introduction to Computing with Neural Nets (lecture-9)
Backpropagation.
CS621 : Artificial Intelligence
CS621: Artificial Intelligence
CS623: Introduction to Computing with Neural Nets (lecture-5)
CS623: Introduction to Computing with Neural Nets (lecture-3)
CS344 : Introduction to Artificial Intelligence
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.
David Kauchak CS158 – Spring 2019
Prof. Pushpak Bhattacharyya, IIT Bombay
CS621: Artificial Intelligence Lecture 14: perceptron training
CS621: Artificial Intelligence Lecture 18: Feedforward network contd
CS623: Introduction to Computing with Neural Nets (lecture-11)
Outline Announcement Neural networks Perceptrons - continued
CS621: Artificial Intelligence Lecture 17: Feedforward network (lecture 16 was on Adaptive Hypermedia: Debraj, Kekin and Raunak) Pushpak Bhattacharyya.
Presentation transcript:

CS623: Introduction to Computing with Neural Nets (lecture-4) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay

Weights in a ff NN wmn is the weight of the connection from the nth neuron to the mth neuron E vs surface is a complex surface in the space defined by the weights wij gives the direction in which a movement of the operating point in the wmn co-ordinate space will result in maximum decrease in error m wmn n

Sigmoid neurons Gradient Descent needs a derivative computation - not possible in perceptron due to the discontinuous step function used!  Sigmoid neurons with easy-to-compute derivatives used! Computing power comes from non-linearity of sigmoid function.

Derivative of Sigmoid function

Training algorithm Initialize weights to random values. For input x = <xn,xn-1,…,x0>, modify weights as follows Target output = t, Observed output = o Iterate until E <  (threshold)

Calculation of ∆wi

Observations Does the training technique support our intuition? The larger the xi, larger is ∆wi Error burden is borne by the weight values corresponding to large input values

Observations contd. ∆wi is proportional to the departure from target Saturation behaviour when o is 0 or 1 If o < t, ∆wi > 0 and if o > t, ∆wi < 0 which is consistent with the Hebb’s law

Hebb’s law If nj and ni are both in excitatory state (+1) wji ni If nj and ni are both in excitatory state (+1) Then the change in weight must be such that it enhances the excitation The change is proportional to both the levels of excitation ∆wji α e(nj) e(ni) If ni and nj are in a mutual state of inhibition ( one is +1 and the other is -1), Then the change in weight is such that the inhibition is enhanced (change in weight is negative)

Saturation behavior The algorithm is iterative and incremental If the weight values or number of input values is very large, the output will be large, then the output will be in saturation region. The weight values hardly change in the saturation region

If Sigmoid Neurons Are Used, Do We Need MLP? Does sigmoid have the power of separating non-linearly separable data? Can sigmoid solve the X-OR problem (X-ority is non-linearly separable data) link

Typically yl << 0.5 , yu >> 0.5 1 O yu O = 1 / 1+ e -net yl net O = 1 if O > yu O = 0 if O < yl Typically yl << 0.5 , yu >> 0.5

Inequalities O = 1 / (1+ e –net ) W2 X2 W1 X1 W0 X0=-1

<0, 0> O = 0 i.e 0 < yl 1 / 1 + e(–w1x1- w2x2+w0) < yl i.e. (1 / (1+ ewo)) < yl (1)

1/(1+ e (–w1x1- w2x2 + w0)) > yu <0, 1> O = 1 i.e. 0 > yu 1/(1+ e (–w1x1- w2x2 + w0)) > yu (1 / (1+ e-w2+w0)) > yu (2)

<1, 0> <1, 1> O = 1 i.e. (1/1+ e-w1+w0) > yu (3) O = 0 i.e. 1/(1+ e-w1-w2+w0) < yl (4)

i.e. Wo > ln ((1- yl) / yl) Rearranging, 1 gives 1/(1+ ewo) < yl i.e. 1+ ewo > 1 / yl i.e. Wo > ln ((1- yl) / yl) (5)

2 Gives 1/1+ e-w2+w0 > yu i.e. 1+ e-w2+w0 < 1 / yu i.e. e-w2+w0 < 1-yu / yu i.e. -W2 + Wo < ln (1-yu) / yu i.e. W2 - Wo > ln (yu / (1–yu)) (6)

3 Gives (7) W1 - Wo > ln (yu / (1- yu)) 4 Gives -W1 – W2 + Wo > ln ((1- yl)/ yl) (8)

5 + 6 + 7 + 8 Gives 0 > 2ln (1- yl )/ yl + 2 ln yu / (1 – yu ) i.e. 0 > ln [ (1- yl )/ yl * yu / (1 – yu )] i.e. ((1- yl ) / yl) * (yu / (1 – yu )) < 1

[(1- yl ) / (1- yy )] * [yu / yl] < 1 From i, ii and iii; Contradiction, hence sigmoid cannot compute X-OR

Exercise Use the fact that any non-linearly separable function has positive linear combination to study if sigmoid can compute any non-linearly separable function.

Non-linearity is the source of power y = m1(h1.w1 + h2.w2) + c1 h1 = m2(w3.x1 + w4.x2) + c2 h2 = m3(w5.x1 + w6.x2) + c3 Substituting h1 & h2 y = k1x1 + k2x2 + c’ Thus a multilayer network can be collapsed into an eqv. 2 layer n/w without the hidden layer w2 w1 h2 h1 w5 w3 w6 w4 x1 x2

Can a linear neuron compute X-OR? y = mx + c yU yL y > yU is regarded as y = 1 y < yL is regarded as y = 0 yU > yL

Linear Neuron & X-OR We want y = w1x1 + w2x2 + c y w2 w1 x2 x1

Linear Neuron 1/4 for (1,1), (0,0) y < yL For (0,1), (1,0) y > yU yU > yL Can (w1, w2, c) be found

Linear Neuron 2/4 (0,0) y = w1.0 + w2.0 + c = c y < yL c < yL - (1) (0,1) y = w1.1 + w2.0 + c y > yU w1 + c > yU - (2)

Linear Neuron 3/4 1,0 w2 + c > yU - (3) 1,1 w1 + w2 + c < yL - (4) yU > yL - (5)

Linear Neuron 4/4 c < yL - (1) w1 + c > yU - (2) w1 + w2 + c < yL - (4) yU > yL - (5) Inconsistent

Observations A linear neuron cannot compute XOR A multilayer network with linear characteristic neurons is collapsible to a single linear neuron. Therefore addition of layers does not contribute to computing power. Neurons in feedforward network must be non-linear Threshold elements will do iff we can linearize a non-linearly function.