Longin Jan Latecki Temple University

Slides:



Advertisements
Similar presentations
Perceptron Lecture 4.
Advertisements

Slides from: Doug Gray, David Poole
Introduction to Neural Networks Computing
Ch. Eick: More on Machine Learning & Neural Networks Different Forms of Learning: –Learning agent receives feedback with respect to its actions (e.g. using.
NEURAL NETWORKS Perceptron
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
Navneet Goyal, BITS-Pilani Perceptrons. Labeled data is called Linearly Separable Data (LSD) if there is a linear decision boundary separating the classes.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Artificial Neural Networks
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Machine Learning Neural Networks
Overview over different methods – Supervised Learning
Artificial Neural Networks
Lecture 14 – Neural Networks
Simple Neural Nets For Pattern Classification
x – independent variable (input)
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Artificial Neural Networks
Lecture 4 Neural Networks ICS 273A UC Irvine Instructor: Max Welling Read chapter 4.
Data Mining with Neural Networks (HK: Chapter 7.5)
Artificial Neural Networks
ICS 273A UC Irvine Instructor: Max Welling Neural Networks.
CS 484 – Artificial Intelligence
Collaborative Filtering Matrix Factorization Approach
Neural Networks Lecture 8: Two simple learning algorithms
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Artificial Neural Networks
CS464 Introduction to Machine Learning1 Artificial N eural N etworks Artificial neural networks (ANNs) provide a general, practical method for learning.
Machine Learning Chapter 4. Artificial Neural Networks
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.
Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi.
Linear Discrimination Reading: Chapter 2 of textbook.
Non-Bayes classifiers. Linear discriminants, neural networks.
ADALINE (ADAptive LInear NEuron) Network and
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
Fundamentals of Artificial Neural Networks Chapter 7 in amlbook.com.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 7: Linear and Generalized Discriminant Functions.
1 Perceptron as one Type of Linear Discriminants IntroductionIntroduction Design of Primitive UnitsDesign of Primitive Units PerceptronsPerceptrons.
Perceptrons Michael J. Watts
Chapter 6 Neural Network.
Artificial Intelligence Methods Neural Networks Lecture 3 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Start with student evals. What function does perceptron #4 represent?
Bab 5 Classification: Alternative Techniques Part 4 Artificial Neural Networks Based Classifer.
Lecture 2 Introduction to Neural Networks and Fuzzy Logic President UniversityErwin SitompulNNFL 2/1 Dr.-Ing. Erwin Sitompul President University
129 Feed-Forward Artificial Neural Networks AMIA 2003, Machine Learning Tutorial Constantin F. Aliferis & Ioannis Tsamardinos Discovery Systems Laboratory.
Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
Neural networks.
Fall 2004 Backpropagation CS478 - Machine Learning.
Artificial Neural Networks
Learning with Perceptrons and Neural Networks
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Neural Networks CS 446 Machine Learning.
Classification with Perceptrons Reading:
Machine Learning Today: Reading: Maria Florina Balcan
CSC 578 Neural Networks and Deep Learning
Biological and Artificial Neuron
Collaborative Filtering Matrix Factorization Approach
Biological and Artificial Neuron
Neuro-Computing Lecture 4 Radial Basis Function Network
Perceptron as one Type of Linear Discriminants
Biological and Artificial Neuron
Neural Networks ICS 273A UC Irvine Instructor: Max Welling
Neural networks (1) Traditional multi-layer perceptrons
Presentation transcript:

Longin Jan Latecki Temple University latecki@temple.edu Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective.  CRC 2009 based on slides from Stephen Marsland, from Romain Thibaux (regression slides), and Moshe Sipper Longin Jan Latecki Temple University latecki@temple.edu

McCulloch and Pitts Neurons x1 w1 x2 w2 h o wm  xm Greatly simplified biological neurons Sum the inputs If total is less than some threshold, neuron fires Otherwise does not 161.326 Stephen Marsland

McCulloch and Pitts Neurons for some threshold  The weight wj can be positive or negative Inhibitory or exitatory Use only a linear sum of inputs Use a simple output instead of a pulse (spike train) 161.326 Stephen Marsland

Neural Networks Can put lots of McCulloch & Pitts neurons together Connect them up in any way we like In fact, assemblies of the neurons are capable of universal computation Can perform any computation that a normal computer can Just have to solve for all the weights wij 161.326 Stephen Marsland

Training Neurons Adapting the weights is learning How does the network know it is right? How do we adapt the weights to make the network right more often? Training set with target outputs Learning rule 161.326 Stephen Marsland

2.2 The Perceptron The perceptron is considered the simplest kind of feed-forward neural network. Definition from Wikipedia: The perceptron is a binary classifier which maps its input x (a real-valued vector) to an output value f(x) (a single binary value) across the matrix: In order to not explicitly write b, we extend the input vector x by one more dimension that is always set to -1, e.g., x=(-1,x_1, …, x_7) with x_0=-1, and extend the weight vector to w=(w_0,w_1, …, w_7). Then adjusting w_0 corresponds to adjusting b.

Bias Replaces Threshold -1 Inputs Outputs 161.326 Stephen Marsland

Perceptron Decision = Recall Outputs are: For example, y=(y_1, …, y_5)=(1, 0, 0, 1, 1) is a possible output. We may have a different function g in the place of sign, as in (2.4) in the book. 161.326 Stephen Marsland

Perceptron Learning = Updating the Weights We want to change the values of the weights Aim: minimise the error at the output If E = t-y, want E to be 0 Use: Input Learning rate Error 161.326 Stephen Marsland

Example 1: The Logical OR -1 X_1 X_2 t 1 W_0 W_1 W_2 Initial values: w_0(0)=-0.05, w_1(0) =-0.02, w_2(0)=0.02, and =0.25 Take first row of our training table: y_1= sign( -0.05×-1 + -0.02×0 + 0.02×0 ) = 1 w_0(1) = -0.05 + 0.25×(0-1)×-1=0.2 w_1(1) = -0.02 + 0.25×(0-1)×0=-0.02 w_2(1) = 0.02 + 0.25×(0-1)×0=0.02 We continue with the new weights and the second row, and so on We make several passes over the training data.

Decision boundary for OR perceptron 161.326 Stephen Marsland

Perceptron Learning Applet http://lcn.epfl.ch/tutorial/english/perceptron/html/index.html

Example 2: Obstacle Avoidance with the Perceptron LS RS LM RM w1 w2 w3 w4  = 0.3  = -0.01 LS RS LM RM 161.326 Stephen Marsland

Obstacle Avoidance with the Perceptron LS RS LM RM 1 -1 X 161.326 Stephen Marsland

Obstacle Avoidance with the Perceptron LS RS w1 w2 w4 w1=0+0.3 * (1-1) * 0 = 0 w3 LM RM 161.326 Stephen Marsland

Obstacle Avoidance with the Perceptron LS RS w2=0+0.3 * (1-1) * 0 = 0 w1 w2 w4 And the same for w3, w4 w3 LM RM 161.326 Stephen Marsland

Obstacle Avoidance with the Perceptron LS RS LM RM 1 -1 X 161.326 Stephen Marsland

Example 1: Obstacle Avoidance with the Perceptron LS RS w1=0+0.3 * (-1-1) * 0 = 0 w1 w2 w4 w3 LM RM 161.326 Stephen Marsland

Obstacle Avoidance with the Perceptron LS RS w1=0+0.3 * (-1-1) * 0 = 0 w2=0+0.3 * ( 1-1) * 0 = 0 w1 w2 w4 w3 LM RM 161.326 Stephen Marsland

Obstacle Avoidance with the Perceptron LS RS w1=0+0.3 * (-1-1) * 0 = 0 w2=0+0.3 * ( 1-1) * 0 = 0 w3=0+0.3 * (-1-1) * 1 = -0.6 w1 w2 w4 w3 LM RM 161.326 Stephen Marsland

Obstacle Avoidance with the Perceptron LS RS w1=0+0.3 * (-1-1) * 0 = 0 w2=0+0.3 * ( 1-1) * 0 = 0 w3=0+0.3 * (-1-1) * 1 = -0.6 w4=0+0.3 * ( 1-1) * 1 = 0 w1 w2 w4 w3 LM RM 161.326 Stephen Marsland

Obstacle Avoidance with the Perceptron LS RS LM RM 1 -1 X 161.326 Stephen Marsland

Obstacle Avoidance with the Perceptron LS RS w1=0+0.3 * ( 1-1) * 1 = 0 w2=0+0.3 * (-1-1) * 1 = -0.6 w3=-0.6+0.3 * ( 1-1) * 0 = -0.6 w4=0+0.3 * (-1-1) * 0 = 0 w1 w2 w4 w3 LM RM 161.326 Stephen Marsland

Obstacle Avoidance with the Perceptron LS RS -0.6 -0.6 -0.01 -0.01 LM RM 161.326 Stephen Marsland

2.3 Linear Separability Outputs are: where and  is the angle between vectors x and w. 161.326 Stephen Marsland

Geometry of linear Separability The equation of a line is w_0 + w_1*x + w_2*y=0 It also means that point (x,y) is on the line This equation is equivalent to wx = (w_0, w_1,w_2) (1,x,y) = 0 If wx > 0, then the angle between w and x is less than 90 degree, which means that w and x lie on the same side of the line. w Each output node of perceptron tries to separate the training data Into two classes (fire or no-fire) with a linear decision boundary, i.e., straight line in 2D, plane in 3D, and hyperplane in higher dim. 161.326 Stephen Marsland

Linear Separability The Binary AND Function 161.326 Stephen Marsland

Gradient Descent Learning Rule Consider linear unit without threshold and continuous output o (not just –1,1) y=w0 + w1 x1 + … + wn xn Train the wi’s such that they minimize the squared error E[w1,…,wn] = ½ dD (td-yd)2 where D is the set of training examples

Supervised Learning Training and test data sets Training set; input & target

Gradient Descent D={<(1,1),1>,<(-1,-1),1>, <(1,-1),-1>,<(-1,1),-1>} (w1,w2) Gradient: E[w]=[E/w0,… E/wn] w=- E[w] (w1+w1,w2 +w2) wi=- E/wi /wi 1/2d(td-yd)2 = d /wi 1/2(td-i wi xi)2 = d(td- yd)(-xi)

Gradient Descent Error wi=- E/wi 161.326 Stephen Marsland

Incremental Stochastic Gradient Descent Batch mode : gradient descent w=w -  ED[w] over the entire data D ED[w]=1/2d(td-yd)2 Incremental mode: gradient descent w=w -  Ed[w] over individual training examples d Ed[w]=1/2 (td-yd)2 Incremental Gradient Descent can approximate Batch Gradient Descent arbitrarily closely if  is small enough

Gradient Descent Perceptron Learning Gradient-Descent(training_examples, ) Each training example is a pair of the form <(x1,…xn),t> where (x1,…,xn) is the vector of input values, and t is the target output value,  is the learning rate (e.g. 0.1) Initialize each wi to some small random value Until the termination condition is met, Do For each <(x1,…xn),t> in training_examples Do Input the instance (x1,…,xn) to the linear unit and compute the output y For each linear unit weight wi Do wi=  (t-y) xi wi=wi+wi

Limitations of the Perceptron The Exclusive Or (XOR) function. Linear Separability A B Out 1 161.326 Stephen Marsland

Limitations of the Perceptron W1 > 0 W2 > 0 W1 + W2 < 0 ? 161.326 Stephen Marsland

Limitations of the Perceptron? In2 A B C Out 1 In1 In3 161.326 Stephen Marsland

2.4 Linear regression Temperature Given examples Predict 40 26 24 Temperature 20 22 20 30 40 20 30 10 20 Figure 1: scatter(1:20,10+(1:20)+2*randn(1,20),'k','filled'); a=axis; a(3)=0; axis(a); 10 20 10 Given examples Predict given a new point

Linear regression Temperature Prediction Prediction 10 20 30 40 22 24 26 40 Temperature 20 Figure 1: scatter(1:20,10+(1:20)+2*randn(1,20),'k','filled'); a=axis; a(3)=0; axis(a); 20 Prediction Prediction

Ordinary Least Squares (OLS) Error or “residual” Observation Prediction Figure 1: scatter(1:20,10+(1:20)+2*randn(1,20),'k','filled'); a=axis; a(3)=0; axis(a); 20 Sum squared error

Minimize the sum squared error Linear equation Linear system

Alternative derivation Solve the system (it’s better not to invert the matrix)

Beyond lines and planes 10 20 40 still linear in everything is the same with

Geometric interpretation 20 10 400 300 -10 200 100 10 20 [Matlab demo]

Ordinary Least Squares [summary] Given examples Let For example Let n d Minimize by solving Predict

Probabilistic interpretation 20 Likelihood

Summery Perceptron and regression optimize the same target function In both cases we compute the gradient (vector of partial derivatives) In the case of regression, we set the gradient to zero and solve for vector w. As the solution we have a closed formula for w such that the target function obtains the global minimum. In the case of perceptron, we iteratively go in the direction of the minimum by going in the direction of minus the gradient. We do this incrementally making small steps for each data point.

Homework 1 (Ch. 2.3.3) Implement perceptron in Matlab and test it on the Pmia Indian Dataset from UCI Machine Learning Repository: http://archive.ics.uci.edu/ml/ (Ch. 2.4.1) Implementing linear regression in Matlab and apply it to auto-mpg dataset.

From Ch. 3: Testing How do we evaluate our trained network? Can’t just compute the error on the training data - unfair, can’t see overfitting Keep a separate testing set After training, evaluate on this test set How do we check for overfitting? Can’t use training or testing sets 161.326 Stephen Marsland

Validation Keep a third set of data for this Train the network on training data Periodically, stop and evaluate on validation set After training has finished, test on test set This is coming expensive on data! 161.326 Stephen Marsland

Hold Out Cross Validation Inputs Targets … Training Validation 161.326 Stephen Marsland

Hold Out Cross Validation Partition training data into K subsets Train on K-1 of subsets, validate on Kth Repeat for new network, leaving out a different subset Choose network that has best validation error Traded off data for computation Extreme version: leave-one-out 161.326 Stephen Marsland

Early Stopping When should we stop training? Could set a minimum training error Danger of overfitting Could set a number of epochs Danger of underfitting or overfitting Can use the validation set Measure the error on the validation set during training 161.326 Stephen Marsland

Early Stopping Time to stop training Error Validation Training Number of epochs 161.326 Stephen Marsland