SVM Lab material borrowed from tutorial by David Meyer FH Technikum Wien, Austria see: http://cran.r-project.org/web/packages/e1071/vignettes/svmdoc.pdfl.

Slides:



Advertisements
Similar presentations
ECG Signal processing (2)
Advertisements

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Pattern Recognition and Machine Learning
Particle swarm optimization for parameter determination and feature selection of support vector machines Shih-Wei Lin, Kuo-Ching Ying, Shih-Chieh Chen,
Heterogeneous Forests of Decision Trees Krzysztof Grąbczewski & Włodzisław Duch Department of Informatics, Nicholas Copernicus University, Torun, Poland.
Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
CITYFLUX – Filter Sample (J Floor)
5/30/2006EE 148, Spring Visual Categorization with Bags of Keypoints Gabriella Csurka Christopher R. Dance Lixin Fan Jutta Willamowski Cedric Bray.
SVM Lab material borrowed from tutorial by David Meyer FH Technikum Wien, Austria see:
Ensemble Learning (2), Tree and Forest
Efficient Model Selection for Support Vector Machines
Automatically Identifying Localizable Queries Center for E-Business Technology Seoul National University Seoul, Korea Nam, Kwang-hyun Intelligent Database.
Prediction model building and feature selection with SVM in breast cancer diagnosis Cheng-Lung Huang, Hung-Chang Liao, Mu- Chen Chen Expert Systems with.
From Genomic Sequence Data to Genotype: A Proposed Machine Learning Approach for Genotyping Hepatitis C Virus Genaro Hernandez Jr CMSC 601 Spring 2011.
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 10a, April 1, 2014 Support Vector Machines.
Symbol Review ? ? ? ? ? ? ?.
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 10b, April 4, 2014 Lab: More on Support Vector Machines, Trees, and your projects.
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 11a, April 7, 2014 Support Vector Machines, Decision Trees, Cross- validation.
JBR1 Support Vector Machines Classification Venables & Ripley Section 12.5 CSU Hayward Statistics 6601 Joseph Rickert & Timothy McKusick December 1, 2004.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
WEKA Machine Learning Toolbox. You can install Weka on your computer from
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
An Exercise in Machine Learning
Improving Support Vector Machine through Parameter Optimized Rujiang Bai, Junhua Liao Shandong University of Technology Library Zibo , China { brj,
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
Notes on HW 1 grading I gave full credit as long as you gave a description, confusion matrix, and working code Many people’s descriptions were quite short.
Machine Learning in Practice Lecture 10 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Identifying glass Using the top-notch data-mining algorithms from the Leiden Institute of Advanced Computer Science (LIACS) Presented by Jan-Willem and.
ECE 471/571 – Lecture 22 Support Vector Machine 11/24/15.
Gist 2.3 John H. Phan MIBLab Summer Workshop June 28th, 2006.
Next, this study employed SVM to classify the emotion label for each EEG segment. The basic idea is to project input data onto a higher dimensional feature.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
A distributed PSO – SVM hybrid system with feature selection and parameter optimization Cheng-Lung Huang & Jian-Fan Dun Soft Computing 2008.
1 Peter Fox Data Analytics – ITWS-4600/ITWS-6600 Week 11a, April 12, 2016 Interpreting: MDS, DR, SVM Factor Analysis; and Boosting.
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Support Vector Machines
A Smart Tool to Predict Salary Trends of H1-B Holders
Support Vector Machine 04/26/17
Zhenshan, Wen SVM Implementation Zhenshan, Wen
Predicting E. Coli Promoters Using SVM
COMP24111: Machine Learning and Optimisation
Pfizer HTS Machine Learning Algorithms: November 2002
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Assignment 8: due Use Weka to classify beer-bottle glass by brewery
Interpreting: MDS, DR, SVM Factor Analysis
Labs: Dimension Reduction, Factor Analysis
Labs: Dimension Reduction, Factor Analysis
Labs: Dimension Reduction, Multi-dimensional Scaling, SVM
Peter Fox Data Analytics – ITWS-4600/ITWS-6600 Week 9b, April 1, 2016
Photo 11/12/2018.
Labs: Dimension Reduction, Multi-dimensional Scaling, SVM
Hyperparameters, bias-variance tradeoff, validation
Interpreting: MDS, DR, SVM Factor Analysis
AHED Automatic Human Emotion Detection
Interpreting: MDS, DR, SVM Factor Analysis
Labs: Trees, Dimension Reduction, Multi-dimensional Scaling, SVM
Shih-Wei Lin, Kuo-Ching Ying, Shih-Chieh Chen, Zne-Jung Lee
Support Vector Machine _ 2 (SVM)
Nearest Neighbors CSC 576: Data Mining.
Classification by multivariate linear regression
Predicting Loan Defaults
Classification problem with small dataset
Increase in Ease of Oxidation
Support Vector Machines 2
Presentation transcript:

SVM Lab material borrowed from tutorial by David Meyer FH Technikum Wien, Austria see: http://cran.r-project.org/web/packages/e1071/vignettes/svmdoc.pdfl

Packages # Start by loading relevant libraries: # e1071 # mlbench # # If mlbench isn’t available then you will have to install it

Glass Dataset # Retrieve/Access "Glass" data from mlbench package data(Glass, package="mlbench") #The description of the Glass data set is on the following slide # Number of Attributes: 10 (including an Id#) plus the class # attribute -- all attributes are continuously valued

Attribute Information: 1. Id number: 1 to 214 2. RI: refractive index 3. Na: Sodium (unit measurement: weight percent in corresponding oxide, as are attributes 4-10) 4. Mg: Magnesium 5. Al: Aluminum 6. Si: Silicon 7. K: Potassium 8. Ca: Calcium 9. Ba: Barium 10. Fe: Iron

Class Information: Type of glass: (class attribute) Type 1 building_windows_float_processed 2 building_windows_non_float_processed 3 vehicle_windows_float_processed 4 vehicle_windows_non_float_processed (none in this database) 5 containers 6 tableware 7 headlamps

Create Training and Test Sets # Create a row index index <- 1:nrow(Glass) # Create an index of test samples by randomly selecting 1/3 of the samples testindex <- sample(index, trunc(length(index)/3)) # Create test set testset <- Glass[testindex,] # Create training set trainset <- Glass[-testindex,]

Train the SVM model # Train the svm model using: # "Type" (column 10) as the dependent variable, # # cost = 100 as the penalty cost for C-classification # This is the ‘C’-constant of the regularization term in # the Lagrange formulation # gamma = 1 as the radial basis kernel function-specific parameter svm.model <- svm(Type ~ ., data = trainset, cost = 100, gamma = 1)

Apply SVM Model # Use the SVM to predict the classification for the testset svm.pred <- predict(svm.model, testset[,-10]) # Compute the SVM confusion matrix table(pred = svm.pred, true = testset[,10]) # determine accuracy t = table(pred = svm.pred, true = testset[,10]) sum(diag(t))/sum(t)

Optimize Parameters # Approach: Grid search with 10-fold cross validation # Note: a random mixing precedes the partitioning of the data # Optimize parameters to the svm with RBF kernel # The grid search iterates with gamma = 2^-4 through 2 # and cost = 2 through 2^7 # The returned object reports the best gamma & cost # and the corresponding classification error obj = tune.svm(Type~., data = Glass, gamma = 2^(-4:1), cost = 2^(1:7))

Optimize Parameters # Inspect the results # Note the results will very unless you set the seed for the # random number generator which is used to mix the data # before the partitioning > obj Parameter tuning of ‘svm’: - sampling method: 10-fold cross validation - best parameters: gamma cost 0.0625 128 best performance: 0.2898268 Note: The performance is reported as the error The accuracy is 1 – error, in this case 1- 0.2898268 = 0.7101732

Investigate Data From Midterm Exam # Recall the Mystery Data Set testSet <- Mystery[1:436,] trainSet <- Mystery[467:1389,] # The accuracies were # Naïve Bayes: 26.7% # 1st Decision Tree: 38.4% # 2nd Decision Tree: 67%

Investigate Data From Midterm Exam # Learn the model using trainSet svm.model <- svm(class ~ Feature1 + Feature2 + Feature3 + Feature4 + Feature5+Feature6+Feature7+Feature8+Feature9, data = trainSet, cost = 2, gamma = 0.25) # Classify the data in testSet svm.pred <- predict(svm.model, testSet[,-1]) # Create the confusion matrix t = table(pred = svm.pred, true = testSet[,1]) # Calculate the accuracy sum(diag(t))/sum(t) # What do you think?

Try training with the entire dataset (This is for educational purpose only. DO NOT DO THIS IN PRACTICE) # Learn the model using trainSet svm.model <- svm(class ~ Feature1 + Feature2 + Feature3 + Feature4 + Feature5+Feature6+Feature7+Feature8+Feature9, data = Mystery, cost = 2, gamma = 0.25) # Classify the data in testSet svm.pred <- predict(svm.model, testSet[,-1]) # Create the confusion matrix t = table(pred = svm.pred, true = testSet[,1]) # Calculate the accuracy sum(diag(t))/sum(t) # What do you think?

Another online resource is: http://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Classification/SVM