Artificial Intelligence 9. Perceptron Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka
Outline Feature space Perceptrons The averaged perceptron Lecture slides http://www.jaist.ac.jp/~tsuruoka/lectures/
Feature space Instances are represented by vectors in a feature space
Feature space Instances are represented by vectors in a feature space 正例 <Outlook = sunny, Temperature = cool, Humidity = normal> 負例 <Outlook = rain, Temperature = high, Humidity = high>
Separating instances with a hyperplane Find a hyperplane that separates the positive and negative examples
Perceptron learning Can always find such a hyperplane if the given examples are linearly separable
Linear classification Binary classification with a linear model : instance : feature vector : weight vector bias If the inner product of the feature vector with the linear weights is greater than or equal to zero, then it is classified as a positive example, otherwise it is classified as a negative example
The Perceptron learning algorithm Initialize the weight vector Choose an example (randomly) from the training data If it is not classified correctly, If it is a positive example If it is a negative example Step 2 and 3 are repeated until all examples are correctly classified.
Learning the concept OR Training data Negative Positive Positive Positive
Iteration 1 x1 Wrong!
Iteration 2 x4 Wrong!
Iteration 3 x2 OK!
Iteration 4 x3 OK!
Iteration 5 x1 Wrong!
Separating hyperplane Final weight vector t 1 Separating hyperplane s 1 s and t are the input (the second and the third elements of the feature vector)
Why the update rule works When a positive example has not been correctly classified This values was too small Original value This is always positive The update rule makes it less likely for the perceptron to make the same mistake
Convergence The Perceptron training algorithm converges after a finite number of iterations to a hyperplane that perfectly classifies the training data, provided the training examples are linearly separable. The number of iterations can be very large The algorithm does not converge if the training data are not linearly separable
Learning the PlayTennis concept Final weight vector Feature space 11 binary features Perceptron learning Converged in 239 steps Bias Outlook = Sunny -3 Outlook = Overcast 5 Outlook = Rain -2 Temperature = Hot Temperature = Mild 3 Temperature = Cool Humidity = High -4 Humidity = Normal 4 Wind = Strong Wind = Weak
Averaged Perceptron A variant of the Perceptron learning algorithm Output the weight vector which is averaged over iterations rather than the final weight vector Do not wait until convergence Determine when to stop by observing the performance on the validation set Practical and widely used
Naive Bayes vs Perceptrons The naive Bayes model assumes conditional independence between features Adding informative features does not necessarily improve the performance Percetrons allow one to incorporate diverse types of features The training takes longer