Download presentation
Presentation is loading. Please wait.
Published byHarmony Coxen Modified over 10 years ago
1
Christoph F. Eick Questions and Topics Review Nov. 30, 2010 1.Give an example of a problem that might benefit from feature creation 2.How does DENCLUE form clusters? Why does DENCLUE use grid-cells? What are the main differences between DENCLUE and DBSCAN? 3.Compute the Silhouette of the following clustering that consists of 2 clusters: {(0,0), (0,1), (2,2)} {(3,2), (3,3)}. 4.Compare Decision Trees, Support Vector Machines, and K-NN with respect to the number of decision boundary each approach uses! 5.K-NN is a lazy approach; what does it mean? What are the disadvantages of K-NN’s lazy approach? Do you see any advantages in using K-NN’s lazy approach. 6.Why do some support vector machine approaches map examples from a lower dimensional space to a higher dimensional space? 7.What is the role of slack variables in the Linear/SVM/Non-separable approach (textbook pages 266-270)—what do they measure? What properties of hyperplanes are maximized by the objective function f(w) (on page 268) in the approach? Silhouette: For an individual point, i –Calculate a = average distance of i to the points in its cluster –Calculate b = min (average distance of i to points in another cluster) –The silhouette coefficient for a point is then given by: s = (b-a)/max(a,b)
2
Christoph F. Eick Support Vector Machines What if the problem is not linearly separable?
3
Tan, Steinbach, Kumar, Eick: NN-Classifiers and Support Vector Machines Linear SVM for Non-linearly Separable Problems What if the problem is not linearly separable? –Introduce slack variables Need to minimize: Subject to (i=1,..,N): C is chosen using a validation set trying to keep the margins wide while keeping the training error low. Measures testing error Inverse size of margin between hyperplanes Parameter Slack variable allows constraint violation to a certain degree
4
Christoph F. Eick Questions and Topics Review Nov. 30, 2010 1.Discussion of Problem1/2of Assignment4 2.Give an example of a problem that might benefit from feature creation 3.How does DENCLUE form clusters? Why does DENCLUE use grid-cells? What are the main differences between DENCLUE and DBSCAN? 4.Compute the Silhouette of the following clustering that consists of 2 clusters: {(0,0), (0.1), (2,2)} {(3,2), (3,3)}. 6.Compare Decision Trees, Support Vector Machines, and K-NN with respect to the number of decision boundary each approach uses! DT: many, rectangular for numerical attributes K-NN: many, convex polygons (Voronoi cells), SVM: one, hyperplane 6.K-NN is a lazy approach; what does it mean? What are the disadvantages of K-NN’s lazy approach? Do you see any advantages in using K-NN’s lazy approach. … advantages: for quickly changing streaming data learning the model might be a waste of time and a lazy approach might be better… 7.Why do some support vector machine approaches map examples from a lower dimensional space to a higher dimensional space? To make them linearly separable. 7.What is the role of slack variables in the Linear/SVM/Non-separable approach (textbook pages 266-270)—what do they measure? What properties of hyperplanes are maximized by the objective function f(w) (on page 268) in the approach?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.