Review for test #3 Radial basis functions SVM SOM
How many prototype vectors will be generated in the SOM application illustrated?
Are the bars that illustrate convergence of this SOM, elastic nets, semantic maps or UMATS?
Convert the last bar in this illustration to a gray-scale UMAT
Is this elastic net covering input space or the lattice of output nodes? What are the dimensions of the output-node array in the SOM that produced this elastic net?
Use the stars to draw a boundary on the cluster that contains horse and cow
3 local minima of the u-matrix are shown. Draw the stars of the 3 clusters that contain these minima
What is wrong with these equations as a start to the development of an SVM for soft-margin hyperplans
is a discriminant for the classes Given is a discriminant for the classes on the margins of the hyperplane Find 2 ways to show that 9 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Data point xt with rt = 1 is misclassified. What are the bounds on hinge loss if xt is in the margins? What are the bounds on hinge loss if xt is outside the margins? Would the answer to these questions be different if rt = -1 ? Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 10
One-Class VSM Machines Consider a sphere with center a and radius R Find a and R that define a soft boundary on high-density data (a) data not involved in finding sphere (b) data on sphere (xt = 0) used to find R given a (c) data outside of sphere (xt > 0) Objective: distinguish (a) & (b) from (c) What are the primal variables? Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 11
Add Lagrange multipliers at>0 and gt>0 for constraints Set derivatives with respect to primal variables R, a and xt = 0 0 < at < C Substituting back into Lp we get dual to be maximized What happened to the R2 term in Lp ? Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 12
One-Class VSM Machines no slack variables ||xt – a||2 = (xt – a)T (xt – a) What are the primal variables? What are derivatives of Lp with respect to primal variables? What relationships result from setting these derivatives to zero?
Optimal soft margin hyperplane with slack variables What are the primal variables? What are the dual variables? What is the meaning of any other variables? Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 14
One-Class VSM Machines What are the primal variables? What are the dual variables? What is the meaning of any other variables?
n-SVM: another approach to soft margins is a regularization parameter shown to be an upper bound on fraction of instances in margin r is a primal variable related to optimum margin width = 2r /||w|| Additional primal variables are w, w0, and slack variables maximize dual 16
The Kernel Trick transform inputs xt by basis functions zt = φ(xt) g(zt)=wTzt linear discriminant g(xt)=wT φ(xt) non-linear Explicit transformation is unnecessary Where did this equation come from? Kernel is a function defined on the input space Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 17
In 2D attribute space, the quadratic kernel K=[(xt)Tx + 1]2 becomes Kernel machines are based on transformation to feature space defined by K(xt, x) = f(xt)Tf(x) where zt = f(xt) How are the features related to attributes in this case?
How is the input related to the hidden layer in a Radial Basis Function (RBF) network? How is the output related to the hidden layer?
Clustering the data set with N = 5 by K-means with K = 2 produced the Gaussians jj(x) = exp(-½(|x-mj|/sj)2) j = 1, 2 Set up the linear system of equations that determine the weights connecting the hidden layer to the output
Solve normal equations DTDw = DTr for a vector w What are the dimensions of this linear system of equations?
K-means has converged. mi can be used for mi in Gaussian basis functions How do we get si?
Input data has dimension d > 2 I believe the data forms clusters How can I investigate the number of clusters in the data?
Single-link: smallest distance between all possible pairs Agglomerative Clustering: Start with N groups each with one instance and merge the two closest groups at each iteration Options for distance between groups Gi and Gj Single-link: smallest distance between all possible pairs Complete-link: largest distance between all possible pairs Average-link, distance between centroids (average of inputs in clusters on each itterration) Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 24
Example: single-linked clusters Dendrogram Grid spacing is h = 1 Is the dendrogram consistent with single-linkage clustering of the data? Is the answer different for complete-linkage clustering? Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 25