Data Mining with Neural Networks (HK: Chapter 7.5)

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Multi-Layer Perceptron (MLP)
Slides from: Doug Gray, David Poole
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
Neural Network I Week 7 1. Team Homework Assignment #9 Read pp. 327 – 334 and the Week 7 slide. Design a neural network for XOR (Exclusive OR) Explore.
Artificial Neural Networks
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Perceptron.
Machine Learning Neural Networks
Overview over different methods – Supervised Learning
Simple Neural Nets For Pattern Classification
A Review: Architecture
The back-propagation training algorithm
Introduction to Neural Networks John Paxton Montana State University Summer 2003.
Artificial Neural Networks
20.5 Nerual Networks Thanks: Professors Frank Hoffmann and Jiawei Han, and Russell and Norvig.
Rutgers CS440, Fall 2003 Neural networks Reading: Ch. 20, Sec. 5, AIMA 2 nd Ed.
Connectionist Modeling Some material taken from cspeech.ucd.ie/~connectionism and Rich & Knight, 1991.
Machine Learning Motivation for machine learning How to set up a problem How to design a learner Introduce one class of learners (ANN) –Perceptrons –Feed-forward.
Artificial Neural Networks
Before we start ADALINE
LOGO Classification III Lecturer: Dr. Bo Yuan
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
CS 4700: Foundations of Artificial Intelligence
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Artificial Neural Networks
Computer Science and Engineering
Artificial Neural Networks
Neural NetworksNN 11 Neural netwoks thanks to: Basics of neural network theory and practice for supervised and unsupervised.
Chapter 9 Neural Network.
CS464 Introduction to Machine Learning1 Artificial N eural N etworks Artificial neural networks (ANNs) provide a general, practical method for learning.
Chapter 3 Neural Network Xiu-jun GONG (Ph. D) School of Computer Science and Technology, Tianjin University
Machine Learning Dr. Shazzad Hosain Department of EECS North South Universtiy
 Diagram of a Neuron  The Simple Perceptron  Multilayer Neural Network  What is Hidden Layer?  Why do we Need a Hidden Layer?  How do Multilayer.
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks.
Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 8: Neural Networks.
Linear Discrimination Reading: Chapter 2 of textbook.
Non-Bayes classifiers. Linear discriminants, neural networks.
Neural Networks and Backpropagation Sebastian Thrun , Fall 2000.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
CS621 : Artificial Intelligence
Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.
SUPERVISED LEARNING NETWORK
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
Supervised learning network G.Anuradha. Learning objectives The basic networks in supervised learning Perceptron networks better than Hebb rule Single.
Artificial Neural Networks Chapter 4 Perceptron Gradient Descent Multilayer Networks Backpropagation Algorithm 1.
Artificial Neural Network
EEE502 Pattern Recognition
Previous Lecture Perceptron W  t+1  W  t  t  d(t) - sign (w(t)  x)] x Adaline W  t+1  W  t  t  d(t) - f(w(t)  x)] f’ x Gradient.
Chapter 6 Neural Network.
Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with.
Bab 5 Classification: Alternative Techniques Part 4 Artificial Neural Networks Based Classifer.
Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Neural networks.
Fall 2004 Backpropagation CS478 - Machine Learning.
Learning with Perceptrons and Neural Networks
Artificial Neural Networks
Data Mining with Neural Networks (HK: Chapter 7.5)
Neural Networks Chapter 5
Seminar on Machine Learning Rada Mihalcea
David Kauchak CS158 – Spring 2019
Outline Announcement Neural networks Perceptrons - continued
Presentation transcript:

Data Mining with Neural Networks (HK: Chapter 7.5)

Biological Neural Systems Neuron switching time : > secs Number of neurons in the human brain: ~10 10 Connections (synapses) per neuron : ~10 4 –10 5 Face recognition : 0.1 secs High degree of distributed and parallel computation Highly fault tolerent Highly efficient Learning is key

Excerpt from Russell and Norvig

Modeling A Neuron on Computer Computation: input signals  input function(linear)  activation function(nonlinear)  output signal y output links  xixi output Input links WiWi y = output(a)

Part 1. Perceptrons: Simple NN  x1x1 x2x2 xnxn w1w1 w2w2 wnwn a=  i=1 n w i x i Xi’s range: [0, 1] 1 if a   y = 0 if a <  y { inputs weights activation output 

To be learned: W i and x1x1 x2x2 Decision line w 1 x 1 + w 2 x 2 =  w 

Converting To (1) (2) (3) (4) (5)

Threshold as Weight: W 0  x1x1 x2x2 xnxn w1w1 w2w2 wnwn w 0 =  x 0 =-1 a=  i=0 n w i x i y 1 if a   y= 0 if a <  {

x1x2t X 0 =-1

Linear Separability x1x1 x2x Logical AND x1x1 x2x2 ay w 1 =1 w 2 =1  =1.5 a=  i=0 n w i x i 1 if a   y= 0 if a <  { t

XOR cannot be separated! x1x w 1 =? w 2 =?  = ? 1 Logical XOR x1x1 x2x2 t Thus, one level neural network can only learn linear functions (straight lines)

Training the Perceptron Training set S of examples {x,t} x is an input vector and T the desired target vector (Teacher) Example: Logical And Iterative process Present a training example x, compute network output y, compare output y with target t, adjust weights and thresholds Learning rule Specifies how to change the weights W of the network as a function of the inputs x, output Y and target t. x1x1 x2x2 t

Perceptron Learning Rule w i := w i +  w i = w i +  (t-y) x i (i=1..n) The parameter  is called the learning rate. In Han’s book it is lower case L It determines the magnitude of weight updates  w i. If the output is correct (t=y) the weights are not changed (  w i =0). If the output is incorrect (t  y) the weights w i are changed such that the output of the Perceptron for the new weights w’ i is closer/further to the input x i.

Perceptron Training Algorithm Repeat for each training vector pair (x,t) evaluate the output y when x is the input if y  t then form a new weight vector w’ according to w’=w +  (t-y) x else do nothing end if end for Until fixed number of iterations; or error less than a predefined value  set by the user; typically = 0.01

Example: Learning the AND Function : Step 1. x1x1 x2x2 t W0W0 W1W1 W2W2 0.5 a=(-1)*0.5+0*0.5+0*0.5=-0.5, Thus, y=0. Correct. No need to change W  : = 0.1

Example: Learning the AND Function : Step 2. x1x1 x2x2 t W0W0 W1W1 W2W2 0.5 a=(-1)*0.5+0* *0.5=0, Thus, y=1. t=0, Wrong.  W 0 = 0.1*(0-1)*(-1)=0.1,  W 1 = 0.1*(0-1)*(0)=0  W 2 = 0.1*(0-1)*(1)=-0.1  : = 0.1 W 0 = =0.6 W 1 =0.5+0=0.5 W 2 = =0.4

Example: Learning the AND Function : Step 3. x1x1 x2x2 t W0W0 W1W1 W2W a=(-1)*0.6+1* *0.4=-0.1, Thus, y=0. t=0, Correct!  : = 0.1

Example: Learning the AND Function : Step 2. x1x1 x2x2 t W0W0 W1W1 W2W a=(-1)*0.6+1* *0.4=0.3, Thus, y=1. t=1, Correct  : = 0.1

Final Solution: x1x1 x2x Logical AND x1x1 x2x w 1 =0.5 w 2 =0.4 w 0 =0.6 a= 0.5x *x if a   y= 0 if a <  { y

Perceptron Convergence Theorem The algorithm converges to the correct classification if the training data is linearly separable and learning rate is sufficiently small (Rosenblatt 1962). The final weights in the solution w is not unique: there are many possible lines to separate the two classes.

Experiments

Handwritten Recognition Example

Each letter  one output unit y  x1x1 x2x2 xnxn w1w1 w2w2 wnwn Input pattern Association units weights (trained) SummationThreshold fixed

Multiple Output Perceptrons Handwritten alphabetic character recognition 26 classes : A,B,C…,Z First output unit distinguishes between “A”s and “non-A”s, second output unit between “B”s and “non-B”s etc. x1x1 x2x2 x3x3 xnxn y1y1 y2y2 y w’ ji = w ji +  (t j -y j ) x i w ji connects x i with y j

Part 2. Multi Layer Networks Output nodes Input nodes Hidden nodes Output vector Input vector

Sigmoid-Function for Continuous Output  x1x1 x2x2 xnxn w1w1 w2w2 wnwn a=  i=0 n w i x i O =1/(1+e -a ) O inputs weights activation output Output between 0 and 1 (when a = negative infinity, O = 0; when a= positive infinity, O=1.

Gradient Descent Learning Rule For each training example X, Let O be the output (bewteen 0 and 1) Let T be the correct target value Continuous output O a= w 1 x 1 + … + w n x n + O =1/(1+e -a ) Train the w i ’s such that they minimize the squared error E[w 1,…,w n ] = ½  k  D (T k -O k ) 2 where D is the set of training examples

Explanation: Gradient Descent Learning Rule  w i =  O k (1-O k ) (T k -O k ) x i k xixi wiwi OkOk activation of pre-synaptic neuron error  k of post-synaptic neuron derivative of activation function learning rate

Backpropagation Algorithm (Han, Figure 6.16) Initialize each w i to some small random value Until the termination condition is met, Do For each training example Do Input the instance (x 1,…,x n ) to the network and compute the network outputs O k For each output unit k Err k =O k (1-O k )(t k -O k ) For each hidden unit h Err h =O h (1-O h )  k w h,k Err k For each network weight w i,j Do w i,j =w i,j +  w i,j where  w i,j =  Err j* O i,,  set by the user;

Example 6.9 (HK book, page 333) Eight weights to be learned: W ij: W 14, W 15, … W 46, W 56, …, and  X1X1 X2X2 X3X3 Output: y x1x2x3t 1011 Training example: Learning rate: =0.9

Initial Weights: randomly assigned (HK: Tables 7.3, 7.4) x1x2x3w14w15w24w25w34w35w46w56 44 55 6 Net input at unit 4: Output at unit 4:

Feed Forward: (Table 7.4) Continuing for units 5, 6 we get: Output at unit 6 = 0.474

Calculating the error (Tables 7.5) Error at Unit 6: (t-y)=( ) Error to be backpropagated from unit 6: Weight update :

Weight update (Table 7.6) w14w15w24w25w34w35w46w56 44 55 6 Thus, new weights after training with {(1, 0, 1), t=1}: If there are more training examples, the same procedure is followed as above. Repeat the rest of the procedures.