Artificial Intelligence Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

Support Vector Machine
Lecture 9 Support Vector Machines
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Classification / Regression Support Vector Machines

Support Vector Machines
SVM—Support Vector Machines
Support vector machine
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Lecture 13 – Perceptrons Machine Learning March 16, 2010.
Overview over different methods – Supervised Learning
Support Vector Machine
Lecture 14 – Neural Networks
Pattern Recognition and Machine Learning
Support Vector Machines (and Kernel Methods in general)
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Cooperating Intelligent Systems Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)
Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.
Support Vector Machines Kernel Machines
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Support Vector Machine (SVM) Classification
Binary Classification Problem Learn a Classifier from the Training Set
Support Vector Machines
CS 4700: Foundations of Artificial Intelligence
Machine Learning Motivation for machine learning How to set up a problem How to design a learner Introduce one class of learners (ANN) –Perceptrons –Feed-forward.
Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic.
Lecture 10: Support Vector Machines
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Neural Networks Lecture 8: Two simple learning algorithms
Classification III Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,
This week: overview on pattern recognition (related to machine learning)
Support Vector Machine & Image Classification Applications
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Support Vector Machine (SVM) Based on Nello Cristianini presentation
INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio LECTURE: Support Vector Machines.
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
CS Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct
Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.
Non-Bayes classifiers. Linear discriminants, neural networks.
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Linear Models for Classification
CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
CS621 : Artificial Intelligence
Fundamentals of Artificial Neural Networks Chapter 7 in amlbook.com.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 23: Linear Support Vector Machines Geoffrey Hinton.
Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
Learning with Perceptrons and Neural Networks
Geometrical intuition behind the dual problem
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Support Vector Machines
Support Vector Machines 2
Outline Announcement Neural networks Perceptrons - continued
Presentation transcript:

Artificial Intelligence Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

Artificial neural networks The brain is a pretty intelligent system. Can we ”copy” it? There are approx neurons in the brain. There are approx. 23  10 9 neurons in the male cortex (females have about 15% less).

The simple model The McCulloch-Pitts model (1943) Image from Neuroscience: Exploring the brain by Bear, Connors, and Paradiso y = g(w 0 +w 1 x 1 +w 2 x 2 +w 3 x 3 ) w1w1 w2w2 w3w3

Neuron firing rate Firing rate  Depolarization 

Transfer functions g(z) The Heaviside function The logistic function

The simple perceptron With {-1,+1} representation Traditionally (early 60:s) trained with Perceptron learning.

Perceptron learning Repeat until no errors are made anymore 1.Pick a random example [x(n),f(n)] 2.If the classification is correct, i.e. if y(x(n)) = f(n), then do nothing 3.If the classification is wrong, then do the following update to the parameters (, the learning rate, is a small positive number) Desired output

Example: Perceptron learning The AND function x1x1 x2x2 x1x1 x2x2 f Initial values:  = 0.3

Example: Perceptron learning The AND function x1x1 x2x2 x1x1 x2x2 f This one is correctly classified, no action.

Example: Perceptron learning The AND function x1x1 x2x2 x1x1 x2x2 f This one is incorrectly classified, learning action.

Example: Perceptron learning The AND function x1x1 x2x2 x1x1 x2x2 f This one is incorrectly classified, learning action.

Example: Perceptron learning The AND function x1x1 x2x2 x1x1 x2x2 f This one is correctly classified, no action.

Example: Perceptron learning The AND function x1x1 x2x2 x1x1 x2x2 f This one is incorrectly classified, learning action.

Example: Perceptron learning The AND function x1x1 x2x2 x1x1 x2x2 f This one is incorrectly classified, learning action.

Example: Perceptron learning The AND function x1x1 x2x2 x1x1 x2x2 f Final solution

Perceptron learning Perceptron learning is guaranteed to find a solution in finite time, if a solution exists. Perceptron learning cannot be generalized to more complex networks. Better to use gradient descent – based on formulating an error and differentiable functions

Gradient search W E(W)E(W) “Go downhill” The “learning rate” () is set heuristically W(k)W(k) W(k+1) = W(k) +  W(k)

The Multilayer Perceptron (MLP) Combine several single layer perceptrons. Each single layer perceptron uses a sigmoid function (C  ) E.g. output input Can be trained using gradient descent

Example: One hidden layer Can approximate any continuous function (z) = sigmoid or linear, (z) = sigmoid.

Training: Backpropagation (Gradient descent)

Support vector machines

Linear classifier on a linearly separable problem There are infinitely many lines that have zero training error. Which line should we choose?

There are infinitely many lines that have zero training error. Which line should we choose?  Choose the line with the largest margin. The “large margin classifier” margin Linear classifier on a linearly separable problem

There are infinitely many lines that have zero training error. Which line should we choose?  Choose the line with the largest margin. The “large margin classifier” margin Linear classifier on a linearly separable problem ”Support vectors”

The plane separating and is defined by The dashed planes are given by margin Computing the margin w

Divide by b Define new w = w/b and  = a/b margin Computing the margin w We have defined a scale for w and b

We have which gives margin Computing the margin w x x + w

w Maximizing the margin is equal to minimizing ||w|| subject to the constraints w T x(n) –   +1 for all w T x(n) –   -1 for all Quadratic programming problem, constraints can be included with Lagrange multipliers. Linear classifier on a linearly separable problem

Quadratic programming problem Minimize cost (Lagrangian) Minimum of L p occurs at the maximum of (the Wolfe dual) Only scalar product in cost. IMPORTANT!

Linear Support Vector Machine Test phase, the predicted output Where  is determined e.g. by looking at one of the support vectors. Still only scalar products in the expression.

How deal with nonlinear case? Project data into high-dimensional space Z. There we know that it will be linearly separable (due to VC dimension of linear classifier). We don’t even have to know the projection...!

Scalar product kernel trick If we can find kernel such that Then we don’t even have to know the mapping to solve the problem...

Valid kernels (Mercer’s theorem) Define the matrix If K is symmetric, K = K T, and positive semi-definite, then K[x(i),x(j)] is a valid kernel.

Examples of kernels First, Gaussian kernel. Second, polynomial kernel. With d=1 we have linear SVM. Linear SVM often used with good success on high dimensional data (e.g. text classification).

Example: Robot color vision (Competition 1999) Classify the Lego pieces into red, blue, and yellow. Classify white balls, black sideboard, and green carpet.

What the camera sees (RGB space) Yellow Green Red

Mapping RGB (3D) to rgb (2D)

Lego in normalized rgb space Input is 2D x1x1 x2x2 Output is 6D: {red, blue, yellow, green, black, white}

MLP classifier E_train = 0.21% E_test = 0.24% MLP Levenberg- Marquardt Training time (150 epochs): 51 seconds

SVM classifier E_train = 0.19% E_test = 0.20% SVM with  = 1000 Training time: 22 seconds