CS 188: Artificial Intelligence

Slides:



Advertisements
Similar presentations
Classification Classification Examples
Advertisements

G53MLE | Machine Learning | Dr Guoping Qiu
Artificial Neural Networks (1)
Linear Classifiers (perceptrons)
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Lecture 13 – Perceptrons Machine Learning March 16, 2010.
Lecture 14 – Neural Networks
x – independent variable (input)
CS 188: Artificial Intelligence Fall 2009
CS 188: Artificial Intelligence Spring 2007 Lecture 19: Perceptrons 4/2/2007 Srini Narayanan – ICSI and UC Berkeley.
CS 188: Artificial Intelligence Fall 2009 Lecture 23: Perceptrons 11/17/2009 Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint manual.
Large-scale Classification and Regression Shannon Quinn (with thanks to J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Neural Networks Lecture 8: Two simple learning algorithms
Classification III Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
CSE 446 Perceptron Learning Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
Classification: Feature Vectors
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Classification: Feature Vectors Hello, Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just # free : 2 YOUR_NAME.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
Linear Models for Classification
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
CS 188: Artificial Intelligence Fall 2007 Lecture 25: Kernels 11/27/2007 Dan Klein – UC Berkeley.
CS 188: Artificial Intelligence Learning II: Linear Classification and Neural Networks Instructors: Stuart Russell and Pat Virtue University of California,
CSE 446 Logistic Regression Perceptron Learning Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
CS 2750: Machine Learning Linear Models for Classification Prof. Adriana Kovashka University of Pittsburgh February 15, 2016.
Artificial Intelligence Methods Neural Networks Lecture 3 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
CS 188: Artificial Intelligence Fall 2008 Lecture 24: Perceptrons II 11/24/2008 Dan Klein – UC Berkeley 1.
Page 1 CS 546 Machine Learning in NLP Review 1: Supervised Learning, Binary Classifiers Dan Roth Department of Computer Science University of Illinois.
Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
Neural networks and support vector machines
Fun with Hyperplanes: Perceptrons, SVMs, and Friends
Dan Roth Department of Computer and Information Science
Artificial Intelligence
Learning with Perceptrons and Neural Networks
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Dan Roth Department of Computer and Information Science
Perceptrons Lirong Xia.
10701 / Machine Learning.
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Classification with Perceptrons Reading:
CS 188: Artificial Intelligence Fall 2008
CS 188: Artificial Intelligence
CS 4/527: Artificial Intelligence
Logistic Regression Classification Machine Learning.
Perceptrons for Dummies
Logistic Regression Classification Machine Learning.
دسته بندی با استفاده از مدل های خطی
CS480/680: Intro to ML Lecture 01: Perceptron 9/11/18 Yao-Liang Yu.
Neural Networks Geoff Hulten.
Artificial Intelligence Lecture No. 28
Lecture Notes for Chapter 4 Artificial Neural Networks
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
CS 188: Artificial Intelligence Spring 2006
CS 188: Artificial Intelligence Fall 2008
Artificial Intelligence 9. Perceptron
CS 188: Artificial Intelligence
Logistic Regression Chapter 7.
Announcements HW10 Due tonight! That is all..
Introduction to Neural Networks
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Logistic Regression Classification Machine Learning.
Perceptrons Lirong Xia.
Support Vector Machines 2
Logistic Regression Classification Machine Learning.
Presentation transcript:

CS 188: Artificial Intelligence Perceptrons and Logistic Regression Pieter Abbeel & Dan Klein University of California, Berkeley

Linear Classifiers

Feature Vectors SPAM or + “2” Hello, # free : 2 YOUR_NAME : 0 Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just # free : 2 YOUR_NAME : 0 MISSPELLED : 2 FROM_FRIEND : 0 ... SPAM or + PIXEL-7,12 : 1 PIXEL-7,13 : 0 ... NUM_LOOPS : 1 “2”

Some (Simplified) Biology Very loose inspiration: human neurons

Linear Classifiers  Inputs are feature values Each feature has a weight Sum is the activation If the activation is: Positive, output +1 Negative, output -1 w1  f1 w2 >0? f2 w3 f3

Weights Binary case: compare features to a weight vector Learning: figure out the weight vector from examples # free : 4 YOUR_NAME :-1 MISSPELLED : 1 FROM_FRIEND :-3 ... # free : 2 YOUR_NAME : 0 MISSPELLED : 2 FROM_FRIEND : 0 ... # free : 0 YOUR_NAME : 1 MISSPELLED : 1 FROM_FRIEND : 1 ... Dot product positive means the positive class

Decision Rules

Binary Decision Rule In the space of feature vectors Examples are points Any weight vector is a hyperplane One side corresponds to Y=+1 Other corresponds to Y=-1 money 2 +1 = SPAM 1 BIAS : -3 free : 4 money : 2 ... -1 = HAM 1 free

Weight Updates

Learning: Binary Perceptron Start with weights = 0 For each training instance: Classify with current weights If correct (i.e., y=y*), no change! If wrong: adjust the weight vector http://isl.ira.uka.de/neuralNetCourse/2004/VL_11_5/Perceptron.html

Learning: Binary Perceptron Start with weights = 0 For each training instance: Classify with current weights If correct (i.e., y=y*), no change! If wrong: adjust the weight vector by adding or subtracting the feature vector. Subtract if y* is -1. http://isl.ira.uka.de/neuralNetCourse/2004/VL_11_5/Perceptron.html

Examples: Perceptron Separable Case

Multiclass Decision Rule If we have multiple classes: A weight vector for each class: Score (activation) of a class y: Prediction highest score wins Binary = multiclass where the negative class has weight zero

Learning: Multiclass Perceptron Start with all weights = 0 Pick up training examples one by one Predict with current weights If correct, no change! If wrong: lower score of wrong answer, raise score of right answer

Example: Multiclass Perceptron “win the vote” “win the election” “win the game” BIAS : 1 win : 0 game : 0 vote : 0 the : 0 ... BIAS : 0 win : 0 game : 0 vote : 0 the : 0 ... BIAS : 0 win : 0 game : 0 vote : 0 the : 0 ...

Properties of Perceptrons Separable Separability: true if some parameters get the training set perfectly correct Convergence: if the training is separable, perceptron will eventually converge (binary case) Mistake Bound: the maximum number of mistakes (binary case) related to the margin or degree of separability Non-Separable

Problems with the Perceptron Noise: if the data isn’t separable, weights might thrash Averaging weight vectors over time can help (averaged perceptron) Mediocre generalization: finds a “barely” separating solution Overtraining: test / held-out accuracy usually rises, then falls Overtraining is a kind of overfitting

Improving the Perceptron

Non-Separable Case: Deterministic Decision Even the best linear boundary makes at least one mistake

Non-Separable Case: Probabilistic Decision 0.9 | 0.1 0.7 | 0.3 0.5 | 0.5 0.3 | 0.7 0.1 | 0.9

How to get probabilistic decisions? Perceptron scoring: If very positive  want probability going to 1 If very negative  want probability going to 0 Sigmoid function

Best w? Maximum likelihood estimation: with: = Logistic Regression

Separable Case: Deterministic Decision – Many Options

Separable Case: Probabilistic Decision – Clear Preference 0.7 | 0.3 0.5 | 0.5 0.7 | 0.3 0.3 | 0.7 0.5 | 0.5 0.3 | 0.7

Multiclass Logistic Regression Recall Perceptron: A weight vector for each class: Score (activation) of a class y: Prediction highest score wins How to make the scores into probabilities? original activations softmax activations

Best w? Maximum likelihood estimation: with: = Multi-Class Logistic Regression

Next Lecture Optimization i.e., how do we solve: