Data Mining with Neural Networks (HK: Chapter 7.5)

Slides:

Advertisements

Similar presentations

Slides from: Doug Gray, David Poole

Advertisements

1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)

Introduction to Neural Networks Computing

1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.

Neural Network I Week 7 1. Team Homework Assignment #9 Read pp. 327 – 334 and the Week 7 slide. Design a neural network for XOR (Exclusive OR) Explore.

Navneet Goyal, BITS-Pilani Perceptrons. Labeled data is called Linearly Separable Data (LSD) if there is a linear decision boundary separating the classes.

Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.

Overview over different methods – Supervised Learning

Simple Neural Nets For Pattern Classification

20.5 Nerual Networks Thanks: Professors Frank Hoffmann and Jiawei Han, and Russell and Norvig.

Artificial Neural Networks

Data Mining with Neural Networks (HK: Chapter 7.5)

CHAPTER 11 Back-Propagation Ming-Feng Yeh.

CS 4700: Foundations of Artificial Intelligence

Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.

Artificial Neural Networks

Neural NetworksNN 11 Neural netwoks thanks to: Basics of neural network theory and practice for supervised and unsupervised.

CS464 Introduction to Machine Learning1 Artificial N eural N etworks Artificial neural networks (ANNs) provide a general, practical method for learning.

Machine Learning Dr. Shazzad Hosain Department of EECS North South Universtiy

LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.

Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.

CS 478 – Tools for Machine Learning and Data Mining Backpropagation.

1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks.

1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.

Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi.

Non-Bayes classifiers. Linear discriminants, neural networks.

Neural Networks and Backpropagation Sebastian Thrun , Fall 2000.

SUPERVISED LEARNING NETWORK

Supervised learning network G.Anuradha. Learning objectives The basic networks in supervised learning Perceptron networks better than Hebb rule Single.

EEE502 Pattern Recognition

Neural NetworksNN 21 Architecture We consider the architecture: feedforward NN with one layer It is sufficient to study single layer perceptrons with.

Bab 5 Classification: Alternative Techniques Part 4 Artificial Neural Networks Based Classifer.

Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.

Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.

Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.

CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.

Neural networks.

Fall 2004 Backpropagation CS478 - Machine Learning.

CS 388: Natural Language Processing: Neural Networks

Learning with Perceptrons and Neural Networks

Learning in Neural Networks

Chapter 2 Single Layer Feedforward Networks

Artificial neural networks:

Real Neurons Cell structures Cell body Dendrites Axon

with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017

Artificial Neural Networks

CSSE463: Image Recognition Day 17

CS621: Artificial Intelligence

Classification Neural Networks 1

Chapter 3. Artificial Neural Networks - Introduction -

CSE (c) S. Tanimoto, 2004 Neural Networks

Artificial Intelligence Chapter 3 Neural Networks

Perceptron as one Type of Linear Discriminants

Neural Networks Chapter 5

Neural Network - 2 Mayank Vatsa

CSE (c) S. Tanimoto, 2001 Neural Networks

CSSE463: Image Recognition Day 17

CSE (c) S. Tanimoto, 2002 Neural Networks

Artificial Intelligence Chapter 3 Neural Networks

Neuro-Computing Lecture 2 Single-Layer Perceptrons

Chapter - 3 Single Layer Percetron

Artificial Intelligence Chapter 3 Neural Networks

CSSE463: Image Recognition Day 17

Seminar on Machine Learning Rada Mihalcea

CSE (c) S. Tanimoto, 2007 Neural Nets

The McCullough-Pitts Neuron

CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.

David Kauchak CS158 – Spring 2019

Outline Announcement Neural networks Perceptrons - continued

Presentation transcript:

Data Mining with Neural Networks (HK: Chapter 7.5)

Biological Neural Systems Neuron switching time : > 10-3 secs Number of neurons in the human brain: ~1010 Connections (synapses) per neuron : ~104–105 Face recognition : 0.1 secs High degree of distributed and parallel computation Highly fault tolerent Highly efficient Learning is key

Excerpt from Russell and Norvig http://faculty.washington.edu/chudler/cells.html

Modeling A Neuron on Computer xi Wi output y Input links output links å y = output(a) Computation: input signals  input function(linear)  activation function(nonlinear)  output signal

Part 1. Perceptrons: Simple NN inputs weights x1 w1 output activation w2 x2  y . q a=i=1n wi xi wn xn Xi’s range: [0, 1] 1 if a  q y= 0 if a < q {

To be learned: Wi and q 1 1 Decision line w1 x1 + w2 x2 = q x2 w 1 x1 x1 1

Converting To (1) (2) (3) (4) (5)

{ Threshold as Weight: W0  . a= i=0n wi xi x0=-1 x1 w1 w0 =q w2 x2 y wn 1 if a  0 y= 0 if a <0 xn {

x1 x2 t 1 X0=-1

{ Linear Separability a= i=0n wi xi 1 if a  0 y= 0 if a <0 x2 q=1.5 1 x1 a= i=0n wi xi 1 if a  0 y= 0 if a <0 { Logical AND x1 x2 a y 1 2 t 1

XOR cannot be separated! Logical XOR w1=? w2=? q= ? x1 x2 t 1 1 x1 1 Thus, one level neural network can only learn linear functions (straight lines)

Training the Perceptron Training set S of examples {x,t} x is an input vector and T the desired target vector (Teacher) Example: Logical And Iterative process Present a training example x , compute network output y , compare output y with target t, adjust weights and thresholds Learning rule Specifies how to change the weights W of the network as a function of the inputs x, output Y and target t. x1 x2 t 1

Perceptron Learning Rule wi := wi + Dwi = wi + a (t-y) xi (i=1..n) The parameter a is called the learning rate. In Han’s book it is lower case L It determines the magnitude of weight updates Dwi . If the output is correct (t=y) the weights are not changed (Dwi =0). If the output is incorrect (t  y) the weights wi are changed such that the output of the Perceptron for the new weights w’i is closer/further to the input xi.

Perceptron Training Algorithm Repeat for each training vector pair (x,t) evaluate the output y when x is the input if yt then form a new weight vector w’ according to w’=w + a (t-y) x else do nothing end if end for Until fixed number of iterations; or error less than a predefined value a: set by the user; typically = 0.01

Example: Learning the AND Function : Step 1. W0 W1 W2 0.5 x1 x2 t 1 a=(-1)*0.5+0*0.5+0*0.5=-0.5, Thus, y=0. Correct. No need to change W a: = 0.1

Example: Learning the AND Function : Step 2. 1 W0 W1 W2 0.5 a=(-1)*0.5+0*0.5 + 1*0.5=0, Thus, y=1. t=0, Wrong. DW0= 0.1*(0-1)*(-1)=0.1, DW1= 0.1*(0-1)*(0)=0 DW2= 0.1*(0-1)*(1)=-0.1 a: = 0.1 W0=0.5+0.1=0.6 W1=0.5+0=0.5 W2=0.5-0.1=0.4

Example: Learning the AND Function : Step 3. 1 W0 W1 W2 0.6 0.5 0.4 a=(-1)*0.6+1*0.5 + 0*0.4=-0.1, Thus, y=0. t=0, Correct! a: = 0.1

Example: Learning the AND Function : Step 2. 1 W0 W1 W2 0.6 0.5 0.4 a=(-1)*0.6+1*0.5 + 1*0.4=0.3, Thus, y=1. t=1, Correct a: = 0.1

{ Final Solution: a= 0.5x1+0.4*x2 -0.6 1 if a  0 y= 0 if a <0 x2 w1=0.5 w2=0.4 w0=0.6 1 x1 a= 0.5x1+0.4*x2 -0.6 1 if a  0 y= 0 if a <0 { Logical AND x1 x2 1 y 1

Perceptron Convergence Theorem The algorithm converges to the correct classification if the training data is linearly separable and learning rate is sufficiently small (Rosenblatt 1962). The final weights in the solution w is not unique: there are many possible lines to separate the two classes.

Experiments

Handwritten Recognition Example

Each letter  one output unit y x1 w1 w2 x2  . wn xn weights (trained) fixed Input pattern Association units Summation Threshold

Part 2. Multi Layer Networks Output vector Output nodes Hidden nodes Input nodes Input vector

Sigmoid-Function for Continuous Output inputs weights x1 w1 output activation w2 x2  O . a=i=0n wi xi wn xn O =1/(1+e-a) Output between 0 and 1 (when a = negative infinity, O = 0; when a= positive infinity, O=1.

Gradient Descent Learning Rule For each training example X, Let O be the output (bewteen 0 and 1) Let T be the correct target value Continuous output O a= w1 x1 + … + wn xn + O =1/(1+e-a) Train the wi’s such that they minimize the squared error E[w1,…,wn] = ½ kD (Tk-Ok)2 where D is the set of training examples

Explanation: Gradient Descent Learning Rule Ok wi xi wi = a Ok (1-Ok) (Tk-Ok) xik activation of pre-synaptic neuron learning rate error dk of post-synaptic neuron derivative of activation function

Backpropagation Algorithm (Han, Figure 6.16) Initialize each wi to some small random value Until the termination condition is met, Do For each training example <(x1,…xn),t> Do Input the instance (x1,…,xn) to the network and compute the network outputs Ok For each output unit k Errk=Ok(1-Ok)(tk-Ok) For each hidden unit h Errh=Oh(1-Oh) k wh,k Errk For each network weight wi,j Do wi,j=wi,j+wi,j where wi,j= a Errj* Oi,, a: set by the user;

Example 6.9 (HK book, page 333) X1 1 4 Output: y X2 6 2 5 X3 3 Eight weights to be learned: Wij: W14, W15, … W46, W56, …, and q Training example: x1 x2 x3 t 1 Learning rate: =0.9

Initial Weights: randomly assigned (HK: Tables 7.3, 7.4) x1 x2 x3 w14 w15 w24 w25 w34 w35 w46 w56 q4 q5 q6 1 0.2 -0.3 0.4 0.1 -0.5 -0.2 -0.4 Net input at unit 4: Output at unit 4:

Feed Forward: (Table 7.4) Continuing for units 5, 6 we get: Output at unit 6 = 0.474

Calculating the error (Tables 7.5) Error at Unit 6: (t-y)=(1-0.474) Error to be backpropagated from unit 6: Weight update :

Weight update (Table 7.6) Thus, new weights after training with {(1, 0, 1), t=1}: w14 w15 w24 w25 w34 w35 w46 w56 q4 q5 q6 0.192 -0.306 0.4 0.1 -0.506 0.194 -0.261 -0.138 -0.408 0.218 If there are more training examples, the same procedure is followed as above. Repeat the rest of the procedures.

Questions Let w be a norm, Hyperplane w dot x + b = 0 |w| be its length. Hyperplane w dot x + b = 0 Prove that for a point x0, Distance from x0 to the hyperplane is |w dot x0 + b| / |w| Prove that Loss = - Sum y_i*(w dot x + b) Derive Learning rule for M mistakes: Delta_w L = - Sum_M (yi*xi) Delta_b L= -Sum_b (yi) Hint: See 李航book