Artificial Intelligence

Slides:

Advertisements

Similar presentations

Classification Classification Examples

Advertisements

Linear Classifiers (perceptrons)

Support Vector Machines

SVM—Support Vector Machines

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

CS Perceptrons1. 2 Basic Neuron CS Perceptrons3 Expanded Neuron.

CS 188: Artificial Intelligence Fall 2009

CS 188: Artificial Intelligence Spring 2007 Lecture 19: Perceptrons 4/2/2007 Srini Narayanan – ICSI and UC Berkeley.

CS 188: Artificial Intelligence Fall 2009 Lecture 23: Perceptrons 11/17/2009 Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint manual.

Large-scale Classification and Regression Shannon Quinn (with thanks to J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.

Neural Networks Lecture 8: Two simple learning algorithms

Classification III Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,

Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.

8/25/05 Cognitive Computations Software Tutorial Page 1 SNoW: Sparse Network of Winnows Presented by Nick Rizzolo.

CSE 446 Perceptron Learning Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.

Classification: Feature Vectors

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.

Classification: Feature Vectors Hello, Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just # free : 2 YOUR_NAME.

Text Classification 2 David Kauchak cs459 Fall 2012 adapted from:

1 Chapter 6. Classification and Prediction Overview Classification algorithms and methods Decision tree induction Bayesian classification Lazy learning.

School of Engineering and Computer Science Victoria University of Wellington Copyright: Peter Andreae, VUW Image Recognition COMP # 18.

CS 188: Artificial Intelligence Fall 2007 Lecture 25: Kernels 11/27/2007 Dan Klein – UC Berkeley.

CS 188: Artificial Intelligence Learning II: Linear Classification and Neural Networks Instructors: Stuart Russell and Pat Virtue University of California,

CSE 446 Logistic Regression Perceptron Learning Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.

Artificial Intelligence Methods Neural Networks Lecture 3 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.

CS 188: Artificial Intelligence Fall 2008 Lecture 24: Perceptrons II 11/24/2008 Dan Klein – UC Berkeley 1.

CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.

First Lecture of Machine Learning Hung-yi Lee. Learning to say “yes/no” Binary Classification.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Neural networks and support vector machines

CS 9633 Machine Learning Support Vector Machines

Evaluating Classifiers

Fun with Hyperplanes: Perceptrons, SVMs, and Friends

Large Margin classifiers

Dan Roth Department of Computer and Information Science

Naïve Bayes Classifiers

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

CS 188: Artificial Intelligence Fall 2006

Artificial Intelligence

Perceptrons Lirong Xia.

Classification with Perceptrons Reading:

Artificial Intelligence

CS 188: Artificial Intelligence Fall 2008

CS 188: Artificial Intelligence

CS 4/527: Artificial Intelligence

Linear Discriminators

Perceptrons for Dummies

Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis

CS 188: Artificial Intelligence

CHAPTER 7 BAYESIAN NETWORK INDEPENDENCE BAYESIAN NETWORK INFERENCE MACHINE LEARNING ISSUES.

Large Scale Support Vector Machines

CSSE463: Image Recognition Day 14

Artificial Intelligence Lecture No. 28

Advanced Artificial Intelligence Classification

The Naïve Bayes (NB) Classifier

Support Vector Machines

CS 188: Artificial Intelligence Spring 2006

Support Vector Machines and Kernels

CS 188: Artificial Intelligence Fall 2008

Artificial Intelligence 9. Perceptron

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence Spring 2006

Introduction to Neural Networks

Word representations David Kauchak CS158 – Fall 2016.

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Perceptrons Lirong Xia.

Support Vector Machines 2

Patterson: Chap 1 A Review of Machine Learning

Presentation transcript:

Artificial Intelligence Perceptrons Instructors: David Suter and Qince Li Course Delivered @ Harbin Institute of Technology [Many slides adapted from those created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. Some others from colleagues at Adelaide University.]

Error-Driven Classification

What to Do About Errors Problem: there’s still spam in your inbox Need more features – words aren’t enough! Have you emailed the sender before? Have 1M other people just gotten the same email? Is the sending information consistent? Is the email in ALL CAPS? Do inline URLs point where they say they point? Does the email address you by (your) name? Naïve Bayes models can incorporate a variety of features, but tend to do best in homogeneous cases (e.g. all features are word occurrences)

Linear Classifiers

Feature Vectors SPAM or + “2” Hello, # free : 2 YOUR_NAME : 0 Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just # free : 2 YOUR_NAME : 0 MISSPELLED : 2 FROM_FRIEND : 0 ... SPAM or + PIXEL-7,12 : 1 PIXEL-7,13 : 0 ... NUM_LOOPS : 1 “2”

Some (Simplified) Biology Very loose inspiration: human neurons

Linear Classifiers  Inputs are feature values Each feature has a weight Sum is the activation If the activation is: Positive, output +1 Negative, output -1 w1  f1 w2 >0? f2 w3 f3

Weights Binary case: compare features to a weight vector Learning: figure out the weight vector from examples # free : 4 YOUR_NAME :-1 MISSPELLED : 1 FROM_FRIEND :-3 ... # free : 2 YOUR_NAME : 0 MISSPELLED : 2 FROM_FRIEND : 0 ... # free : 0 YOUR_NAME : 1 MISSPELLED : 1 FROM_FRIEND : 1 ... Dot product positive means the positive class

Decision Rules

Binary Decision Rule In the space of feature vectors Examples are points Any weight vector is a hyperplane One side corresponds to Y=+1 Other corresponds to Y=-1 money 2 +1 = SPAM 1 BIAS : -3 free : 4 money : 2 ... -1 = HAM 1 free

Weight Updates

Learning: Binary Perceptron Start with weights = 0 For each training instance: Classify with current weights If correct (i.e., y=y*), no change! If wrong: adjust the weight vector http://isl.ira.uka.de/neuralNetCourse/2004/VL_11_5/Perceptron.html

Learning: Binary Perceptron Start with weights = 0 For each training instance: Classify with current weights If correct (i.e., y=y*), no change! If wrong: adjust the weight vector by adding or subtracting the feature vector. Subtract if y* is -1. http://isl.ira.uka.de/neuralNetCourse/2004/VL_11_5/Perceptron.html

Examples: Perceptron Separable Case

Real Data

Multiclass Decision Rule If we have multiple classes: A weight vector for each class: Score (activation) of a class y: Prediction highest score wins Binary = multiclass where the negative class has weight zero

Learning: Multiclass Perceptron Start with all weights = 0 Pick up training examples one by one Predict with current weights If correct, no change! If wrong: lower score of wrong answer, raise score of right answer

The concept of having a separate set of weights (one for each class)can be thought of as having separate “neurons” – a layer of neurons – one for each class. A single layer network. Rather than taking class “max” over the weights – one can train to learn a coding vector…

E.G. Learning Digits 0,1….9

Properties of Perceptrons Separable Separability: true if some parameters get the training set perfectly correct Convergence: if the training is separable, perceptron will eventually converge (binary case) Non-Separable

Improving the Perceptron

Problems with the Perceptron Noise: if the data isn’t separable, weights might thrash Averaging weight vectors over time can help (averaged perceptron) Mediocre generalization: finds a “barely” separating solution Overtraining: test / held-out accuracy usually rises, then falls Overtraining is a kind of overfitting

Fixing the Perceptron Lots of literature on changing the stepsize, averaging weight updates etc….

Linear Separators Which of these linear separators is optimal?

Support Vector Machines Maximizing the margin: good according to intuition, theory, practice Only support vectors matter; other training examples are ignorable Support vector machines (SVMs) find the separator with max margin Basically, SVMs are MIRA where you optimize over all examples at once SVM

Classification: Comparison Naïve Bayes Builds a model training data Gives prediction probabilities Strong assumptions about feature independence One pass through data (counting) Perceptrons / SVN: Makes less assumptions about data (? – linear separability is a big assumption! But kernel SVN’s etc weaken that assumption) Mistake-driven learning Multiple passes through data (prediction) Often more accurate