Review for test #3 Radial basis functions SVM SOM.

Slides:



Advertisements
Similar presentations
Support Vector Machine
Advertisements

Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
Support Vector Machines
Lecture 9 Support Vector Machines
ECG Signal processing (2)
INTRODUCTION TO Machine Learning 2nd Edition
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine
Classification / Regression Support Vector Machines

CHAPTER 10: Linear Discrimination
Pattern Recognition and Machine Learning
Support Vector Machines
1 Lecture 5 Support Vector Machines Large-margin linear classifier Non-separable case The Kernel trick.
Support vector machine
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Support Vector Machines
Support Vector Machines (and Kernel Methods in general)
Support Vector Machines and Kernel Methods
Support Vector Machines and Kernel Methods
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
Lecture 10: Support Vector Machines
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Machine Learning Queens College Lecture 13: SVM Again.
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
CS Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.
CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Text Classification using Support Vector Machine Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
Review for final exam 2015 Fundamentals of ANN RBF-ANN using clustering Bayesian decision theory Genetic algorithm SOM SVM.
© Eric CMU, Machine Learning Support Vector Machines Eric Xing Lecture 4, August 12, 2010 Reading:
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004 Linear Separation and Margins. Non-Separable and.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Machine Learning: A Brief Introduction Fu Chang Institute of Information Science Academia Sinica ext. 1819
SVMs in a Nutshell.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
CSSE463: Image Recognition Day 14 Lab due Weds. Lab due Weds. These solutions assume that you don't threshold the shapes.ppt image: Shape1: elongation.
An Introduction of Support Vector Machine In part from of Jinwei Gu.
Copyright 2005 by David Helmbold1 Support Vector Machines (SVMs) References: Cristianini & Shawe-Taylor book; Vapnik’s book; and “A Tutorial on Support.
SUPPORT VECTOR MACHINES
Support vector machines
CSSE463: Image Recognition Day 14
PREDICT 422: Practical Machine Learning
Support Vector Machine
Omer Boehm A tutorial about SVM Omer Boehm
Geometrical intuition behind the dual problem
LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS
Support Vector Machines
Statistical Learning Dong Liu Dept. EEIS, USTC.
CSSE463: Image Recognition Day 14
CSSE463: Image Recognition Day 14
Support vector machines
CSSE463: Image Recognition Day 14
INTRODUCTION TO Machine Learning
Support Vector Machines
CSSE463: Image Recognition Day 14
Support vector machines
CSCE833 Machine Learning Lecture 9 Linear Discriminant Analysis
Support vector machines
CSSE463: Image Recognition Day 14
Support vector machines
Radial Basis Functions: Alternative to Back Propagation
Linear Discrimination
SVMs for Document Ranking
Presentation transcript:

Review for test #3 Radial basis functions SVM SOM

How many prototype vectors will be generated in the SOM application illustrated?

Are the bars that illustrate convergence of this SOM, elastic nets, semantic maps or UMATS?

Convert the last bar in this illustration to a gray-scale UMAT

Is this elastic net covering input space or the lattice of output nodes? What are the dimensions of the output-node array in the SOM that produced this elastic net?

Use the stars to draw a boundary on the cluster that contains horse and cow

3 local minima of the u-matrix are shown. Draw the stars of the 3 clusters that contain these minima

What is wrong with these equations as a start to the development of an SVM for soft-margin hyperplans

is a discriminant for the classes Given is a discriminant for the classes on the margins of the hyperplane Find 2 ways to show that 9 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Data point xt with rt = 1 is misclassified. What are the bounds on hinge loss if xt is in the margins? What are the bounds on hinge loss if xt is outside the margins? Would the answer to these questions be different if rt = -1 ? Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 10

One-Class VSM Machines Consider a sphere with center a and radius R Find a and R that define a soft boundary on high-density data (a) data not involved in finding sphere (b) data on sphere (xt = 0) used to find R given a (c) data outside of sphere (xt > 0) Objective: distinguish (a) & (b) from (c) What are the primal variables? Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 11

Add Lagrange multipliers at>0 and gt>0 for constraints Set derivatives with respect to primal variables R, a and xt = 0 0 < at < C Substituting back into Lp we get dual to be maximized What happened to the R2 term in Lp ? Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 12

One-Class VSM Machines no slack variables ||xt – a||2 = (xt – a)T (xt – a) What are the primal variables? What are derivatives of Lp with respect to primal variables? What relationships result from setting these derivatives to zero?

Optimal soft margin hyperplane with slack variables What are the primal variables? What are the dual variables? What is the meaning of any other variables? Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 14

One-Class VSM Machines What are the primal variables? What are the dual variables? What is the meaning of any other variables?

n-SVM: another approach to soft margins is a regularization parameter shown to be an upper bound on fraction of instances in margin r is a primal variable related to optimum margin width = 2r /||w|| Additional primal variables are w, w0, and slack variables maximize dual 16

The Kernel Trick transform inputs xt by basis functions zt = φ(xt) g(zt)=wTzt linear discriminant g(xt)=wT φ(xt) non-linear Explicit transformation is unnecessary Where did this equation come from? Kernel is a function defined on the input space Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 17

In 2D attribute space, the quadratic kernel K=[(xt)Tx + 1]2 becomes Kernel machines are based on transformation to feature space defined by K(xt, x) = f(xt)Tf(x) where zt = f(xt) How are the features related to attributes in this case?

How is the input related to the hidden layer in a Radial Basis Function (RBF) network? How is the output related to the hidden layer?

Clustering the data set with N = 5 by K-means with K = 2 produced the Gaussians jj(x) = exp(-½(|x-mj|/sj)2) j = 1, 2 Set up the linear system of equations that determine the weights connecting the hidden layer to the output

Solve normal equations DTDw = DTr for a vector w What are the dimensions of this linear system of equations?

K-means has converged. mi can be used for mi in Gaussian basis functions How do we get si?

Input data has dimension d > 2 I believe the data forms clusters How can I investigate the number of clusters in the data?

Single-link: smallest distance between all possible pairs Agglomerative Clustering: Start with N groups each with one instance and merge the two closest groups at each iteration Options for distance between groups Gi and Gj Single-link: smallest distance between all possible pairs Complete-link: largest distance between all possible pairs Average-link, distance between centroids (average of inputs in clusters on each itterration) Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 24

Example: single-linked clusters Dendrogram Grid spacing is h = 1 Is the dendrogram consistent with single-linkage clustering of the data? Is the answer different for complete-linkage clustering? Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 25