Labs: Trees, Dimension Reduction, Multi-dimensional Scaling, SVM Peter Fox Data Analytics ITWS-4600/ITWS-6600/MATP-4450/CSCI-4960 Group 3 Lab 1, October 26, 2018
Weighted KNN group3/lab1_kknn1.R Make sure you look carefully at the results Apply it to other datasets! We will discuss and interpret in class
Rpart – recursive partitioning and Conditional Inference group3/lab1_rpart1.R group3/lab1_rpart2.R group3/lab1_rpart3.R group3/lab1_rpart4.R Try rpart for “Rings” on the Abalone dataset group3/lab1_ctree1.R group3/lab1_ctree2.R group3/lab1_ctree3.R
randomForest group3/lab1_randomforest1.R Do your own Random Forest , i.e. different implementations, cforest {party} on the other datasets
Trees for the Titanic data(Titanic) rpart, ctree, hclust, randomForest for: Survived ~ .
Run through these demos library(EDR) # effective dimension reduction library(dr) library(clustrd) install.packages("edrGraphicalTools") library(edrGraphicalTools) demo(edr_ex1) demo(edr_ex2) demo(edr_ex3) demo(edr_ex4)
Some examples – group3/ lab1_dr1.R lab1_dr2.R lab1_dr3.R lab1_dr4.R
MDS – group3/ lab1_mds1.R lab1_mds2.R lab1_mds3.R http://www.statmethods.net/advstats/mds.html http://gastonsanchez.com/blog/how-to/2013/01/23/MDS-in-R.html
R – many ways (of course) library(igraph) g <- graph.full(nrow(dist.au)) V(g)$label <- city.names layout <- layout.mds(g, dist = as.matrix(dist.au)) plot(g, layout = layout, vertex.size = 3)
Work through these… lecture next week lab1_svm1.R –> lab1_svm11.R lab1_svm_rpart1.R Exercise various parts of SVM, parameters, kernels, etc… Karatzoglou et al. 2006 - http://aquarius.tw.rpi.edu/html/DA/v15i09.pdf
Ozone > library(e1071) > library(rpart) > data(Ozone, package=“mlbench”) > # http://math.furman.edu/~dcs/courses/math47/R/library/mlbench/html/Ozone.html # for field codes > ## split data into a train and test set > index <- 1:nrow(Ozone) > testindex <- sample(index, trunc(length(index)/3)) > testset <- na.omit(Ozone[testindex,-3]) > trainset <- na.omit(Ozone[-testindex,-3]) > svm.model <- svm(V4 ~ ., data = trainset, type=“C-classification”,cost = 1000, gamma = 0.0001) > svm.pred <- predict(svm.model, testset[,-3]) > crossprod(svm.pred - testset[,3]) / length(testindex) See: http://cran.r-project.org/web/packages/e1071/vignettes/svmdoc.pdf
Glass library(e1071) library(rpart) data(Glass, package="mlbench") index <- 1:nrow(Glass) testindex <- sample(index, trunc(length(index)/3)) testset <- Glass[testindex,] trainset <- Glass[-testindex,] svm.model <- svm(Type ~ ., data = trainset, cost = 100, gamma = 1) # cost = “C” coefficient (Lagrange multiplier) svm.pred <- predict(svm.model, testset[,-10])
> table(pred = svm.pred, true = testset[,10]) true pred 1 2 3 5 6 7 1 12 9 1 0 0 0 2 6 19 6 5 2 2 3 1 0 2 0 0 0 5 0 0 0 0 0 0 6 0 0 0 0 1 0 7 0 1 0 0 0 4
Example lab1_svm1.R n <- 150 # number of data points p <- 2 # dimension sigma <- 1 # variance of the distribution meanpos <- 0 # centre of the distribution of positive examples meanneg <- 3 # centre of the distribution of negative examples npos <- round(n/2) # number of positive examples nneg <- n-npos # number of negative examples # Generate the positive and negative examples xpos <- matrix(rnorm(npos*p,mean=meanpos,sd=sigma),npos,p) xneg <- matrix(rnorm(nneg*p,mean=meanneg,sd=sigma),npos,p) x <- rbind(xpos,xneg) # Generate the labels y <- matrix(c(rep(1,npos),rep(-1,nneg))) # Visualize the data plot(x,col=ifelse(y>0,1,2)) legend("topleft",c('Positive','Negative'),col=seq(2),pch=1,text.col=seq(2))
Example 1
Train/ test ntrain <- round(n*0.8) # number of training examples tindex <- sample(n,ntrain) # indices of training samples xtrain <- x[tindex,] xtest <- x[-tindex,] ytrain <- y[tindex] ytest <- y[-tindex] istrain=rep(0,n) istrain[tindex]=1 # Visualize plot(x,col=ifelse(y>0,1,2),pch=ifelse(istrain==1,1,2)) legend("topleft",c('Positive Train','Positive Test','Negative Train','Negative Test'),col=c(1,1,2,2), pch=c(1,2,1,2), text.col=c(1,1,2,2))
Comparison of test classifier
Example ctd svp <- ksvm(xtrain,ytrain,type="C-svc", kernel='vanilladot', C=100,scaled=c()) # General summary svp # Attributes that you can access attributes(svp) # did you look? # For example, the support vectors alpha(svp) alphaindex(svp) b(svp) # remember b? # Use the built-in function to pretty-plot the classifier plot(svp,data=xtrain) > # For example, the support vectors > alpha(svp) [[1]] [1] 71.05875 28.94125 100.00000 > alphaindex(svp) [1] 10 74 93 > b(svp) [1] -17.3651
Do SVM for iris
SVM for Swiss
e.g. Probabilities… library(kernlab) data(promotergene) ## create test and training set ind <- sample(1:dim(promotergene)[1],20) genetrain <- promotergene[-ind, ] genetest <- promotergene[ind, ] ## train a support vector machine gene <- ksvm(Class~.,data=genetrain,kernel="rbfdot",\ kpar=list(sigma=0.015),C=70,cross=4,prob.model=TRUE) ## predict gene type probabilities on the test set genetype <- predict(gene,genetest,type="probabilities")
Result > genetype + - [1,] 0.205576217 0.794423783 [2,] 0.150094660 0.849905340 [3,] 0.262062226 0.737937774 [4,] 0.939660586 0.060339414 [5,] 0.003164823 0.996835177 [6,] 0.502406898 0.497593102 [7,] 0.812503448 0.187496552 [8,] 0.996382257 0.003617743 [9,] 0.265187582 0.734812418 [10,] 0.998832291 0.001167709 [11,] 0.576491204 0.423508796 [12,] 0.973798660 0.026201340 [13,] 0.098598411 0.901401589 [14,] 0.900670101 0.099329899 [15,] 0.012571774 0.987428226 [16,] 0.977704079 0.022295921 [17,] 0.137304637 0.862695363 [18,] 0.972861575 0.027138425 [19,] 0.224470227 0.775529773 [20,] 0.004691973 0.995308027
R-SVM http://www.stanford.edu/group/wonglab/RSVMpage/r-svm.tar.gz http://www.stanford.edu/group/wonglab/RSVMpage/R-SVM.html Read/ skim the paper Explore this method on a dataset of your choice, e.g. one of the R built-in datasets
kernlab http://aquarius.tw.rpi.edu/html/DA/svmbasic_notes.pdf Some scripts: lab1_svm12.R, lab1_svm13.R
kernlab, svmpath and klaR http://aquarius.tw.rpi.edu/html/DA/v15i09.pdf Start at page 9 (bottom)