Christoph F. Eick Questions and Topics Review Nov. 22, 2011 1.Assume you have to do feature selection for a classification task. What are the characteristics.

Slides:



Advertisements
Similar presentations
Support Vector Machine
Advertisements

Christoph F. Eick Questions and Topics Review Nov. 30, Give an example of a problem that might benefit from feature creation 2.How does DENCLUE.
ECG Signal processing (2)
Christoph F. Eick Questions and Topics Review Dec. 10, Compare AGNES /Hierarchical clustering with K-means; what are the main differences? 2. K-means.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Classification / Regression Support Vector Machines
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques

CHAPTER 10: Linear Discrimination
Pattern Recognition and Machine Learning
An Introduction of Support Vector Machine
Support Vector Machines
1 Lecture 5 Support Vector Machines Large-margin linear classifier Non-separable case The Kernel trick.
SVM—Support Vector Machines
Support vector machine
Machine learning continued Image source:
Classification and Decision Boundaries
Christoph F. Eick Questions and Topics Review Dec. 1, Give an example of a problem that might benefit from feature creation 2.Compute the Silhouette.
Support Vector Machines
Fuzzy Support Vector Machines (FSVMs) Weijia Wang, Huanren Zhang, Vijendra Purohit, Aditi Gupta.
1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.
CES 514 – Data Mining Lecture 8 classification (contd…)
Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Unconstrained Optimization Problem
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
Support Vector Machines
Lecture 10: Support Vector Machines
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Optimization Theory Primal Optimization Problem subject to: Primal Optimal Value:
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
10/18/ Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
An Introduction to Support Vector Machines (M. Law)
1 Chapter 6. Classification and Prediction Overview Classification algorithms and methods Decision tree induction Bayesian classification Lazy learning.
Christoph F. Eick Questions and Topics Review Dec. 6, Compare AGNES /Hierarchical clustering with K-means; what are the main differences? 2 Compute.
Christoph F. Eick Questions and Topics Review November 11, Discussion of Midterm Exam 2.Assume an association rule if smoke then cancer has a confidence.
CS 478 – Tools for Machine Learning and Data Mining SVM.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Computational Intelligence: Methods and Applications Lecture 24 SVM in the non-linear case Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
SVMs in a Nutshell.
Christoph F. Eick Questions Review October 12, How does post decision tree post-pruning work? What is the purpose of applying post-pruning in decision.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
High resolution product by SVM. L’Aquila experience and prospects for the validation site R. Anniballe DIET- Sapienza University of Rome.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
SUPPORT VECTOR MACHINES
Support Vector Machine
PREDICT 422: Practical Machine Learning
Support Vector Machine
Pawan Lingras and Cory Butz
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Nearest-Neighbor Classifiers
COSC 4335: Other Classification Techniques
Support Vector Machines
Other Classification Models: Support Vector Machine (SVM)
COSC 4368 Machine Learning Organization
SVMs for Document Ranking
Presentation transcript:

Christoph F. Eick Questions and Topics Review Nov. 22, Assume you have to do feature selection for a classification task. What are the characteristics of features (attributes) you might remove from the dataset prior to learning the classification algorithm? 2.How is region discovery in spatial datasets different from traditional clustering? 3.What are the unique characteristics of hierarchical clustering? 4.Compute the Silhouette of the following clustering that consists of 2 clusters: {(0,0), (0,1), (2,2)} {(3,2), (3,3)}. To be discussed in review on Dec. 1! Silhouette: For an individual point, i –Calculate a = average distance of i to the points in its cluster –Calculate b = min (average distance of i to points in another cluster) –The silhouette coefficient for a point is then given by: s = (b-a)/max(a,b) 5.Compare Decision Trees, Support Vector Machines, and K-NN with respect to the number of decision boundary each approach uses! 6.K-NN is a lazy approach; what does it mean? What are the disadvantages of K-NN’s lazy approach? Do you see any advantages in using K-NN’s lazy approach? 7.Why do some support vector machine approaches map examples from a lower dimensional space to a higher dimensional space? 8.What is the role of slack variables in the Linear/SVM/Non-separable approach (textbook pages )—what do they measure? What properties of hyperplanes are maximized by the objective function f(w) (on page 268) in the approach?

Christoph F. Eick Support Vector Machines What if the problem is not linearly separable? Square hyperplane Circle hyperplane

Tan, Steinbach, Kumar, Eick: NN-Classifiers and Support Vector Machines Linear SVM for Non-linearly Separable Problems  What if the problem is not linearly separable? –Introduce slack variables  Need to minimize:  Subject to (i=1,..,N):  C is chosen using a validation set trying to keep the margins wide while keeping the training error low. Measures error Inverse size of margin between hyperplanes Parameter Slack variable allows constraint violation to a certain degree

Christoph F. Eick Answers Review Nov. 22, Assume you have to do feature selection for a classification task. What are the characteristics of features (attributes) you might remove from the dataset prior to learning the classification algorithm? Redundant attributes; irrelevant attributes 2.How is region discovery in spatial datasets different from traditional clustering? Clustering is performed in the subspace of spatial attributes; e.g. clusters are contiguous in the space of the spatial attributes; separation between spatial and non-spatial attributes---non- spatial attributes are on used by objective functions/cluster evaluation measures. 3.What are the unique characteristics of hierarchical clustering? Computes a dendrogram and multiple clusterings; a dendrogram captures hierarchical relationships between clusters; HC relies on agglomerative/divisive approaches to compute clusters based on an union operator: C=C1  C2 4.… 5.Compare Decision Trees, Support Vector Machines, and K-NN with respect to the number of decision boundary each approach uses! DT: many, rectangular for numerical attributes K-NN: many, convex polygons (Voronoi cells), SVM: one, a single hyperplane 7.Why do some support vector machine approaches map examples from a lower dimensional space to a higher dimensional space? To make them linearly separable.

Christoph F. Eick Answers2 Review Nov. 22, K-NN is a lazy approach; what does it mean? What are the disadvantages of K-NN’s lazy approach? Do you see any advantages in using K-NN’s lazy approach. Lazy: postpones “learning a model” disadvantages: slow as the model is “obtained” when classifying an object and not beforehand…; (other disadvantage (not asked for here)): there is no explicit model and therefore no way to show it to a domain expert which makes it difficult to establish trust into the learnt model. Advantage of “being lazy”: for quickly changing streaming data learning the model might be a waste of time as the model changes quickly over time and a lazy approach might be better… 8. What is the role of slack variables in the Linear/SVM/Non-separable approach (textbook pages )—what do they measure? What properties of hyperplanes are maximized by the objective function f(w) (on page 268) in the approach? The slack variables  i measure the distance of an object i to the object’s class hyperplane if it is on the wrong side of the object’s class hyperplane, and 0 if the example is on the correct side of the hyperplane.  i measures the error associated with the i’s example. Soft Margin SVM solve a 2-objective optimization problem, trying to minimize the errors associated with the examples (  i ) while keeping the margins as wide as possible (minimizing ||w||)—the parameter C determines how much emphasis is put on each of the two objectives.