Download presentation
Presentation is loading. Please wait.
Published byHans-Petter Klausen Modified over 5 years ago
1
Review for test #3 Radial basis functions SVM SOM
2
How many prototype vectors
will be generated in the SOM application illustrated?
3
Are the bars that illustrate convergence of this SOM,
elastic nets, semantic maps or UMATS?
4
Convert the last bar in this illustration to a gray-scale UMAT
5
Is this elastic net covering input space or the lattice of
output nodes? What are the dimensions of the output-node array in the SOM that produced this elastic net?
6
Use the stars to draw a boundary on the cluster that contains
horse and cow
7
3 local minima of the u-matrix are shown. Draw the
stars of the 3 clusters that contain these minima
8
What is wrong with these equations as a start to the
development of an SVM for soft-margin hyperplans
9
is a discriminant for the classes
Given is a discriminant for the classes on the margins of the hyperplane Find 2 ways to show that 9 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
10
Data point xt with rt = 1 is misclassified.
What are the bounds on hinge loss if xt is in the margins? What are the bounds on hinge loss if xt is outside the margins? Would the answer to these questions be different if rt = -1 ? Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 10
11
One-Class VSM Machines
Consider a sphere with center a and radius R Find a and R that define a soft boundary on high-density data (a) data not involved in finding sphere (b) data on sphere (xt = 0) used to find R given a (c) data outside of sphere (xt > 0) Objective: distinguish (a) & (b) from (c) What are the primal variables? Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 11
12
Add Lagrange multipliers at>0 and gt>0 for constraints
Set derivatives with respect to primal variables R, a and xt = 0 0 < at < C Substituting back into Lp we get dual to be maximized What happened to the R2 term in Lp ? Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 12
13
One-Class VSM Machines no slack variables
||xt – a||2 = (xt – a)T (xt – a) What are the primal variables? What are derivatives of Lp with respect to primal variables? What relationships result from setting these derivatives to zero?
14
Optimal soft margin hyperplane with slack variables
What are the primal variables? What are the dual variables? What is the meaning of any other variables? Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 14
15
One-Class VSM Machines
What are the primal variables? What are the dual variables? What is the meaning of any other variables?
16
n-SVM: another approach to soft margins
is a regularization parameter shown to be an upper bound on fraction of instances in margin r is a primal variable related to optimum margin width = 2r /||w|| Additional primal variables are w, w0, and slack variables maximize dual 16
17
The Kernel Trick transform inputs xt by basis functions
zt = φ(xt) g(zt)=wTzt linear discriminant g(xt)=wT φ(xt) non-linear Explicit transformation is unnecessary Where did this equation come from? Kernel is a function defined on the input space Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 17
18
In 2D attribute space, the quadratic kernel
K=[(xt)Tx + 1]2 becomes Kernel machines are based on transformation to feature space defined by K(xt, x) = f(xt)Tf(x) where zt = f(xt) How are the features related to attributes in this case?
19
How is the input related to the hidden layer in a
Radial Basis Function (RBF) network? How is the output related to the hidden layer?
20
Clustering the data set
with N = 5 by K-means with K = 2 produced the Gaussians jj(x) = exp(-½(|x-mj|/sj)2) j = 1, 2 Set up the linear system of equations that determine the weights connecting the hidden layer to the output
21
Solve normal equations DTDw = DTr for a vector w
What are the dimensions of this linear system of equations?
22
K-means has converged. mi can be used for mi in Gaussian basis functions How do we get si?
23
Input data has dimension d > 2
I believe the data forms clusters How can I investigate the number of clusters in the data?
24
Single-link: smallest distance between all possible pairs
Agglomerative Clustering: Start with N groups each with one instance and merge the two closest groups at each iteration Options for distance between groups Gi and Gj Single-link: smallest distance between all possible pairs Complete-link: largest distance between all possible pairs Average-link, distance between centroids (average of inputs in clusters on each itterration) Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 24
25
Example: single-linked clusters
Dendrogram Grid spacing is h = 1 Is the dendrogram consistent with single-linkage clustering of the data? Is the answer different for complete-linkage clustering? Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 25
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.