Labs: Trees, Dimension Reduction, Multi-dimensional Scaling, SVM

Slides:



Advertisements
Similar presentations
CSCE555 Bioinformatics Lecture 15 classification for microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Advertisements

1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7a, March 10, 2015 Labs: more data, models, prediction, deciding with trees.
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 10b, April 10, 2015 Labs: Cross Validation, RandomForest, Multi- Dimensional Scaling, Dimension Reduction,
Machine Learning in R and its use in the statistical offices
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
1) Reminder about HW #3 (Due Thurs 10/1) 2) Lecture over Chapter 5
SVM Lab material borrowed from tutorial by David Meyer FH Technikum Wien, Austria see:
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction.
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 10a, April 1, 2014 Support Vector Machines.
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 10b, April 4, 2014 Lab: More on Support Vector Machines, Trees, and your projects.
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 11a, April 7, 2014 Support Vector Machines, Decision Trees, Cross- validation.
JBR1 Support Vector Machines Classification Venables & Ripley Section 12.5 CSU Hayward Statistics 6601 Joseph Rickert & Timothy McKusick December 1, 2004.
WEKA Machine Learning Toolbox. You can install Weka on your computer from
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
SVM Lab material borrowed from tutorial by David Meyer FH Technikum Wien, Austria see:
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
1 Peter Fox Data Analytics – ITWS-4600/ITWS-6600 Week 3b, February 12, 2016 Lab exercises /assignment 2.
1 Peter Fox Data Analytics – 4600/6600 Week 9a, March 29, 2016 Dimension reduction and MD scaling, Support Vector Machines.
1 Peter Fox Data Analytics – ITWS-4600/ITWS-6600 Week 7a, March 8, 2016 Decision trees, cross-validation.
1 Peter Fox Data Analytics – ITWS-4600/ITWS-6600 Week 11a, April 12, 2016 Interpreting: MDS, DR, SVM Factor Analysis; and Boosting.
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
1 C.A.L. Bailer-Jones. Machine Learning. Model selection and combination Machine learning, pattern recognition and statistical data modelling Lecture 10.
Peter Fox and Greg Hughes
PREDICT 422: Practical Machine Learning
Machine Learning Models
Data Analytics – ITWS-4600/ITWS-6600
CS240A Final Project 2.
Group 1 Lab 2 exercises /assignment 2
Predicting E. Coli Promoters Using SVM
Trees, bagging, boosting, and stacking
Classification, Clustering and Bayes…
Pfizer HTS Machine Learning Algorithms: November 2002
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Interpreting: MDS, DR, SVM Factor Analysis
COMP61011 : Machine Learning Ensemble Models
Basic machine learning background with Python scikit-learn
Labs: Dimension Reduction, Factor Analysis
Labs: Dimension Reduction, Factor Analysis
Labs: Dimension Reduction, Multi-dimensional Scaling, SVM
Peter Fox Data Analytics – ITWS-4600/ITWS-6600 Week 9b, April 1, 2016
Data Analytics – ITWS-4600/ITWS-6600/MATP-4450
Machine Learning Week 1.
INTRODUCTION TO SUPPORT VECTOR MACHINES
Data Analytics – ITWS-4600/ITWS-6600/MATP-4450
Group 1 Lab 2 exercises and Assignment 2
Peter Fox and Greg Hughes Data Analytics – ITWS-4600/ITWS-6600
Data Analytics – ITWS-4600/ITWS-6600/MATP-4450
Weighted kNN, clustering, “early” trees and Bayesian
Balance Scale Data Set This data set was generated to model psychological experimental results. Each example is classified as having the balance scale.
Labs: Dimension Reduction, Multi-dimensional Scaling, SVM
Interpreting: MDS, DR, SVM Factor Analysis
Implementing AdaBoost
Classification, Clustering and Bayes…
Assignment 2 (in lab) Peter Fox and Greg Hughes
Data Analytics – ITWS-4600/ITWS-6600/MATP-4450
Dimension reduction : PCA and Clustering
Interpreting: MDS, DR, SVM Factor Analysis
ITWS-4600/ITWS-6600/MATP-4450/CSCI-4960
Lab weighted kNN, decision trees, random forest (“cross-validation” built in – more labs on it later in the course) Peter Fox and Greg Hughes Data Analytics.
Support Vector Machine _ 2 (SVM)
Cross-validation Brenda Thomson/ Peter Fox Data Analytics
Comparison of the csEN algorithm to existing predictive methods and model reduction. Comparison of the csEN algorithm to existing predictive methods and.
Peter Fox Data Analytics – ITWS-4600/ITWS-6600 Week 10b, April 8, 2016
ITWS-4600/ITWS-6600/MATP-4450/CSCI-4960
Classification, Clustering and Bayes…
Local Regression, LDA, and Mixed Model Lab
Data Analytics – ITWS-4600/ITWS-6600/MATP-4450
ITWS-4600/ITWS-6600/MATP-4450/CSCI-4960
Group 1 Lab 2 exercises and Assignment 2
Machine Learning for Cyber
Presentation transcript:

Labs: Trees, Dimension Reduction, Multi-dimensional Scaling, SVM Peter Fox Data Analytics ITWS-4600/ITWS-6600/MATP-4450/CSCI-4960 Group 3 Lab 1, October 26, 2018

Weighted KNN group3/lab1_kknn1.R Make sure you look carefully at the results Apply it to other datasets! We will discuss and interpret in class

Rpart – recursive partitioning and Conditional Inference group3/lab1_rpart1.R group3/lab1_rpart2.R group3/lab1_rpart3.R group3/lab1_rpart4.R Try rpart for “Rings” on the Abalone dataset group3/lab1_ctree1.R group3/lab1_ctree2.R group3/lab1_ctree3.R

randomForest group3/lab1_randomforest1.R Do your own Random Forest , i.e. different implementations, cforest {party} on the other datasets

Trees for the Titanic data(Titanic) rpart, ctree, hclust, randomForest for: Survived ~ .

Run through these demos library(EDR) # effective dimension reduction library(dr) library(clustrd) install.packages("edrGraphicalTools") library(edrGraphicalTools) demo(edr_ex1) demo(edr_ex2) demo(edr_ex3) demo(edr_ex4)

Some examples – group3/ lab1_dr1.R lab1_dr2.R lab1_dr3.R lab1_dr4.R

MDS – group3/ lab1_mds1.R lab1_mds2.R lab1_mds3.R http://www.statmethods.net/advstats/mds.html http://gastonsanchez.com/blog/how-to/2013/01/23/MDS-in-R.html

R – many ways (of course) library(igraph) g <- graph.full(nrow(dist.au)) V(g)$label <- city.names layout <- layout.mds(g, dist = as.matrix(dist.au)) plot(g, layout = layout, vertex.size = 3)

Work through these… lecture next week lab1_svm1.R –> lab1_svm11.R lab1_svm_rpart1.R Exercise various parts of SVM, parameters, kernels, etc… Karatzoglou et al. 2006 - http://aquarius.tw.rpi.edu/html/DA/v15i09.pdf

Ozone > library(e1071) > library(rpart) > data(Ozone, package=“mlbench”) > # http://math.furman.edu/~dcs/courses/math47/R/library/mlbench/html/Ozone.html # for field codes > ## split data into a train and test set > index <- 1:nrow(Ozone) > testindex <- sample(index, trunc(length(index)/3)) > testset <- na.omit(Ozone[testindex,-3]) > trainset <- na.omit(Ozone[-testindex,-3]) > svm.model <- svm(V4 ~ ., data = trainset, type=“C-classification”,cost = 1000, gamma = 0.0001) > svm.pred <- predict(svm.model, testset[,-3]) > crossprod(svm.pred - testset[,3]) / length(testindex) See: http://cran.r-project.org/web/packages/e1071/vignettes/svmdoc.pdf

Glass library(e1071) library(rpart) data(Glass, package="mlbench") index <- 1:nrow(Glass) testindex <- sample(index, trunc(length(index)/3)) testset <- Glass[testindex,] trainset <- Glass[-testindex,] svm.model <- svm(Type ~ ., data = trainset, cost = 100, gamma = 1) # cost = “C” coefficient (Lagrange multiplier) svm.pred <- predict(svm.model, testset[,-10])

> table(pred = svm.pred, true = testset[,10]) true pred 1 2 3 5 6 7 1 12 9 1 0 0 0 2 6 19 6 5 2 2 3 1 0 2 0 0 0 5 0 0 0 0 0 0 6 0 0 0 0 1 0 7 0 1 0 0 0 4

Example lab1_svm1.R n <- 150 # number of data points p <- 2 # dimension sigma <- 1 # variance of the distribution meanpos <- 0 # centre of the distribution of positive examples meanneg <- 3 # centre of the distribution of negative examples npos <- round(n/2) # number of positive examples nneg <- n-npos # number of negative examples # Generate the positive and negative examples xpos <- matrix(rnorm(npos*p,mean=meanpos,sd=sigma),npos,p) xneg <- matrix(rnorm(nneg*p,mean=meanneg,sd=sigma),npos,p) x <- rbind(xpos,xneg) # Generate the labels y <- matrix(c(rep(1,npos),rep(-1,nneg))) # Visualize the data plot(x,col=ifelse(y>0,1,2)) legend("topleft",c('Positive','Negative'),col=seq(2),pch=1,text.col=seq(2))

Example 1

Train/ test ntrain <- round(n*0.8) # number of training examples tindex <- sample(n,ntrain) # indices of training samples xtrain <- x[tindex,] xtest <- x[-tindex,] ytrain <- y[tindex] ytest <- y[-tindex] istrain=rep(0,n) istrain[tindex]=1 # Visualize plot(x,col=ifelse(y>0,1,2),pch=ifelse(istrain==1,1,2)) legend("topleft",c('Positive Train','Positive Test','Negative Train','Negative Test'),col=c(1,1,2,2), pch=c(1,2,1,2), text.col=c(1,1,2,2))

Comparison of test classifier

Example ctd svp <- ksvm(xtrain,ytrain,type="C-svc", kernel='vanilladot', C=100,scaled=c()) # General summary svp # Attributes that you can access attributes(svp) # did you look? # For example, the support vectors alpha(svp) alphaindex(svp) b(svp) # remember b? # Use the built-in function to pretty-plot the classifier plot(svp,data=xtrain) > # For example, the support vectors > alpha(svp) [[1]] [1] 71.05875 28.94125 100.00000 > alphaindex(svp) [1] 10 74 93 > b(svp) [1] -17.3651

Do SVM for iris

SVM for Swiss

e.g. Probabilities… library(kernlab) data(promotergene) ## create test and training set ind <- sample(1:dim(promotergene)[1],20) genetrain <- promotergene[-ind, ] genetest <- promotergene[ind, ] ## train a support vector machine gene <- ksvm(Class~.,data=genetrain,kernel="rbfdot",\ kpar=list(sigma=0.015),C=70,cross=4,prob.model=TRUE) ## predict gene type probabilities on the test set genetype <- predict(gene,genetest,type="probabilities")

Result > genetype + - [1,] 0.205576217 0.794423783 [2,] 0.150094660 0.849905340 [3,] 0.262062226 0.737937774 [4,] 0.939660586 0.060339414 [5,] 0.003164823 0.996835177 [6,] 0.502406898 0.497593102 [7,] 0.812503448 0.187496552 [8,] 0.996382257 0.003617743 [9,] 0.265187582 0.734812418 [10,] 0.998832291 0.001167709 [11,] 0.576491204 0.423508796 [12,] 0.973798660 0.026201340 [13,] 0.098598411 0.901401589 [14,] 0.900670101 0.099329899 [15,] 0.012571774 0.987428226 [16,] 0.977704079 0.022295921 [17,] 0.137304637 0.862695363 [18,] 0.972861575 0.027138425 [19,] 0.224470227 0.775529773 [20,] 0.004691973 0.995308027

R-SVM http://www.stanford.edu/group/wonglab/RSVMpage/r-svm.tar.gz http://www.stanford.edu/group/wonglab/RSVMpage/R-SVM.html Read/ skim the paper Explore this method on a dataset of your choice, e.g. one of the R built-in datasets

kernlab http://aquarius.tw.rpi.edu/html/DA/svmbasic_notes.pdf Some scripts: lab1_svm12.R, lab1_svm13.R

kernlab, svmpath and klaR http://aquarius.tw.rpi.edu/html/DA/v15i09.pdf Start at page 9 (bottom)