CS621 : Artificial Intelligence

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Multi-Layer Perceptron (MLP)
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
Simple Neural Nets For Pattern Classification
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Chapter 6: Multilayer Neural Networks
CS 484 – Artificial Intelligence
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
CS621: Artificial Intelligence Lecture 24: Backpropagation Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
CS623: Introduction to Computing with Neural Nets (lecture-6) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
CS621: Artificial Intelligence Lecture 27: Backpropagation applied to recognition problems; start of logic Pushpak Bhattacharyya Computer Science and Engineering.
Artificial Neural Networks
Neural Networks Chapter 6 Joost N. Kok Universiteit Leiden.
Neural Networks AI – Week 23 Sub-symbolic AI Multi-Layer Neural Networks Lee McCluskey, room 3/10
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 35: Backpropagation; need for.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
 Diagram of a Neuron  The Simple Perceptron  Multilayer Neural Network  What is Hidden Layer?  Why do we Need a Hidden Layer?  How do Multilayer.
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 25: Backpropagation and NN based IR.
Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Artificial Intelligence Techniques Multilayer Perceptrons.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 31: Feedforward N/W; sigmoid.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 30: Perceptron training convergence;
Multi-Layer Perceptron
Non-Bayes classifiers. Linear discriminants, neural networks.
Instructor: Prof. Pushpak Bhattacharyya 13/08/2004 CS-621/CS-449 Lecture Notes CS621/CS449 Artificial Intelligence Lecture Notes Set 4: 24/08/2004, 25/08/2004,
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
CS621 : Artificial Intelligence
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 32: sigmoid neuron; Feedforward.
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 21: Perceptron training and convergence.
Pushpak Bhattacharyya Computer Science and Engineering Department
EEE502 Pattern Recognition
CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 45– Backpropagation issues; applications 11 th Nov, 2010.
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 25: Backpropagation and Application.
Previous Lecture Perceptron W  t+1  W  t  t  d(t) - sign (w(t)  x)] x Adaline W  t+1  W  t  t  d(t) - f(w(t)  x)] f’ x Gradient.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Machine Learning Supervised Learning Classification and Regression
Neural networks.
Multiple-Layer Networks and Backpropagation Algorithms
Fall 2004 Backpropagation CS478 - Machine Learning.
Artificial Neural Networks
Supervised Learning in ANNs
Learning in Neural Networks
CS623: Introduction to Computing with Neural Nets (lecture-5)
CSE 473 Introduction to Artificial Intelligence Neural Networks
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
CSE P573 Applications of Artificial Intelligence Neural Networks
CS621: Artificial Intelligence
CSC 578 Neural Networks and Deep Learning
Classification Neural Networks 1
CS623: Introduction to Computing with Neural Nets (lecture-2)
CS621: Artificial Intelligence Lecture 17: Feedforward network (lecture 16 was on Adaptive Hypermedia: Debraj, Kekin and Raunak) Pushpak Bhattacharyya.
Artificial Neural Network & Backpropagation Algorithm
of the Artificial Neural Networks.
CSE 573 Introduction to Artificial Intelligence Neural Networks
Artificial Neural Networks
Neural Network - 2 Mayank Vatsa
CS623: Introduction to Computing with Neural Nets (lecture-4)
CS 621 Artificial Intelligence Lecture 25 – 14/10/05
Capabilities of Threshold Neurons
Backpropagation David Kauchak CS159 – Fall 2019.
CS623: Introduction to Computing with Neural Nets (lecture-5)
Computer Vision Lecture 19: Object Recognition III
CS623: Introduction to Computing with Neural Nets (lecture-3)
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.
David Kauchak CS158 – Spring 2019
Prof. Pushpak Bhattacharyya, IIT Bombay
CS621: Artificial Intelligence Lecture 18: Feedforward network contd
Presentation transcript:

CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 24 Sigmoid neuron and backpropagation

Feedforward n/w A multilayer feedforward neural network has Input layer Output layer Hidden layer (assists computation) Output units and hidden units are called computation units.

Architecture of the n/w …. Output layer (m o/p neurons) j wji …. i Hidden layers …. …. Input layer (n i/p neurons) Fully connected feed forward network Pure FF network (no jumping of connections over layers)

Training of the MLP Multilayer Perceptron (MLP) Question:- How to find weights for the hidden layers when no target output is available? Credit assignment problem – to be solved by “Gradient Descent”

Gradient Descent Technique Let E be the error at the output layer ti = target output; oi = observed output i is the index going over n neurons in the outermost layer j is the index going over the p patterns (1 to p) Ex: XOR:– p=4 and n=1

Weights in a ff NN wmn is the weight of the connection from the nth neuron to the mth neuron E vs surface is a complex surface in the space defined by the weights wij gives the direction in which a movement of the operating point in the wmn co-ordinate space will result in maximum decrease in error m wmn n

Sigmoid neurons Gradient Descent needs a derivative computation - not possible in perceptron due to the discontinuous step function used!  Sigmoid neurons with easy-to-compute derivatives used! Computing power comes from non-linearity of sigmoid function.

Derivative of Sigmoid function

Training algorithm Initialize weights to random values. For input x = <xn,xn-1,…,x0>, modify weights as follows Target output = t, Observed output = o Iterate until E <  (threshold)

Calculation of ∆wi

Observations Does the training technique support our intuition? The larger the xi, larger is ∆wi Error burden is borne by the weight values corresponding to large input values

Observations contd. ∆wi is proportional to the departure from target Saturation behaviour when o is 0 or 1 If o < t, ∆wi > 0 and if o > t, ∆wi < 0 which is consistent with the Hebb’s law

Hebb’s law If nj and ni are both in excitatory state (+1) wji ni If nj and ni are both in excitatory state (+1) Then the change in weight must be such that it enhances the excitation The change is proportional to both the levels of excitation ∆wji is prop. to e(nj) e(ni) If ni and nj are in a mutual state of inhibition ( one is +1 and the other is -1), Then the change in weight is such that the inhibition is enhanced (change in weight is negative)

Saturation behavior The algorithm is iterative and incremental If the weight values or number of input values is very large, the output will be large, then the output will be in saturation region. The weight values hardly change in the saturation region

Backpropagation algorithm …. Output layer (m o/p neurons) j wji …. i Hidden layers …. …. Input layer (n i/p neurons) Fully connected feed forward network Pure FF network (no jumping of connections over layers)

Gradient Descent Equations

Example - Character Recognition Output layer – 26 neurons (all capital) First output neuron has the responsibility of detecting all forms of ‘A’ Centralized representation of outputs In distributed representations, all output neurons participate in output

Backpropagation – for outermost layer

Backpropagation for hidden layers …. Output layer (m o/p neurons) k …. j Hidden layers …. i …. Input layer (n i/p neurons) k is propagated backwards to find value of j

Backpropagation – for hidden layers

General Backpropagation Rule General weight updating rule: Where for outermost layer for hidden layers

Issues in the training algorithm Algorithm is greedy. It always changes weight such that E reduces. The algorithm may get stuck up in a local minimum. If we observe that E is not getting reduced anymore, the following may be the reasons:

Issues in the training algorithm contd. Stuck in local minimum. Network paralysis. (High –ve or +ve i/p makes neurons to saturate.) (learning rate) is too small.

Diagnostics in action(1) 1) If stuck in local minimum, try the following: Re-initializing the weight vector. Increase the learning rate. Introduce more neurons in the hidden layer.

Diagnostics in action (1) contd. 2) If it is network paralysis, then increase the number of neurons in the hidden layer. Problem: How to configure the hidden layer ? Known: One hidden layer seems to be sufficient. [Kolmogorov (1960’s)]

Diagnostics in action(2) Kolgomorov statement: A feedforward network with three layers (input, output and hidden) with appropriate I/O relation that can vary from neuron to neuron is sufficient to compute any function. More hidden layers reduce the size of individual layers.

Diagnostics in action(3) 3) Observe the outputs: If they are close to 0 or 1, try the following: Scale the inputs or divide by a normalizing factor. Change the shape and size of the sigmoid.

Diagnostics in action(3) Introduce momentum factor. Accelerates the movement out of the trough. Dampens oscillation inside the trough. Choosing : If is large, we may jump over the minimum.

An application in Medical Domain

Expert System for Skin Diseases Diagnosis Bumpiness and scaliness of skin Mostly for symptom gathering and for developing diagnosis skills Not replacing doctor’s diagnosis

Architecture of the FF NN 96-20-10 96 input neurons, 20 hidden layer neurons, 10 output neurons Inputs: skin disease symptoms and their parameters Location, distribution, shape, arrangement, pattern, number of lesions, presence of an active norder, amount of scale, elevation of papuls, color, altered pigmentation, itching, pustules, lymphadenopathy, palmer thickening, results of microscopic examination, presence of herald pathc, result of dermatology test called KOH

Output 10 neurons indicative of the diseases: psoriasis, pityriasis rubra pilaris, lichen planus, pityriasis rosea, tinea versicolor, dermatophytosis, cutaneous T-cell lymphoma, secondery syphilis, chronic contact dermatitis, soberrheic dermatitis

Training data Input specs of 10 model diseases from 250 patients 0.5 is some specific symptom value is not knoiwn Trained using standard error backpropagation algorithm

Testing Previously unused symptom and disease data of 99 patients Result: Correct diagnosis achieved for 70% of papulosquamous group skin diseases Success rate above 80% for the remaining diseases except for psoriasis psoriasis diagnosed correctly only in 30% of the cases Psoriasis resembles other diseases within the papulosquamous group of diseases, and is somewhat difficult even for specialists to recognise.

Explanation capability Rule based systems reveal the explicit path of reasoning through the textual statements Connectionist expert systems reach conclusions through complex, non linear and simultaneous interaction of many units Analysing the effect of a single input or a single group of inputs would be difficult and would yield incor6rect results

Explanation contd. The hidden layer re-represents the data Outputs of hidden neurons are neither symtoms nor decisions

(Seborrheic dermatitis node) Figure : Explanation of dermatophytosis diagnosis using the DESKNET expert system. 5 (Dermatophytosis node) ( Psoriasis node ) Disease diagnosis -2.71 -2.48 -3.46 -2.68 19 14 13 1.62 1.43 2.13 1.68 1.58 1.22 Symptoms & parameters Duration of lesions : weeks 1 6 10 36 71 95 96 of lesions : weeks Minimal itching Positive KOH test Lesions located on feet Minimal increase in pigmentation Positive test for pseudohyphae And spores Bias Internal representation 1.46 20 -2.86 -3.31 9 (Seborrheic dermatitis node)

Discussion Symptoms and parameters contributing to the diagnosis found from the n/w Standard deviation, mean and other tests of significance used to arrive at the importance of contributing parameters The n/w acts as apprentice to the expert