Interpreting: MDS, DR, SVM Factor Analysis

Slides:

Advertisements

Similar presentations

Pseudo Inverse Heisenberg Uncertainty for Data Mining Explicit Principal Components Implicit Principal Components NIPALS Algorithm for Eigenvalues and.

Advertisements

Factor Analysis Continued

1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 10b, April 10, 2015 Labs: Cross Validation, RandomForest, Multi- Dimensional Scaling, Dimension Reduction,

Principal Component Analysis

Pattern Recognition Topic 1: Principle Component Analysis Shapiro chap

CSE 300: Software Reliability Engineering Topics covered: Software metrics and software reliability.

SVM Lab material borrowed from tutorial by David Meyer FH Technikum Wien, Austria see:

Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)

How To Do Multivariate Pattern Analysis

1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 10a, April 1, 2014 Support Vector Machines.

1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 10b, April 4, 2014 Lab: More on Support Vector Machines, Trees, and your projects.

Classification Course web page: vision.cis.udel.edu/~cv May 12, 2003  Lecture 33.

Lecture 27: Recognition Basics CS4670/5670: Computer Vision Kavita Bala Slides from Andrej Karpathy and Fei-Fei Li

1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 11a, April 7, 2014 Support Vector Machines, Decision Trees, Cross- validation.

JBR1 Support Vector Machines Classification Venables & Ripley Section 12.5 CSU Hayward Statistics 6601 Joseph Rickert & Timothy McKusick December 1, 2004.

WEKA Machine Learning Toolbox. You can install Weka on your computer from

Reduces time complexity: Less computation Reduces space complexity: Less parameters Simpler models are more robust on small datasets More interpretable;

Assignments CS fall Assignment 1 due Generate the in silico data set of 2sin(1.5x)+ N (0,1) with 100 random values of x between.

SVM Lab material borrowed from tutorial by David Meyer FH Technikum Wien, Austria see:

Feature Selection and Dimensionality Reduction. “Curse of dimensionality” – The higher the dimensionality of the data, the more data is needed to learn.

Market analysis for the S&P500 Giulio Genovese Tuesday, December

Gist 2.3 John H. Phan MIBLab Summer Workshop June 28th, 2006.

Chapter 15: Classification of Time- Embedded EEG Using Short-Time Principal Component Analysis by Nguyen Duc Thang 5/2009.

Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.

Principal Components Analysis ( PCA)

1 Peter Fox Data Analytics – ITWS-4600/ITWS-6600 Week 11a, April 12, 2016 Interpreting: MDS, DR, SVM Factor Analysis; and Boosting.

Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.

Descriptive Statistics The means for all but the C 3 features exhibit a significant difference between both classes. On the other hand, the variances for.

Principal Component Analysis (PCA)

Course Review Questions will not be all on one topic, i.e. questions may have parts covering more than one area.

ASEN 5070: Statistical Orbit Determination I Fall 2015

Theme Introduction : Learning from Data

Linear Discrimant Analysis(LDA)

Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani

Interpreting: MDS, DR, SVM Factor Analysis

Labs: Dimension Reduction, Factor Analysis

Labs: Dimension Reduction, Factor Analysis

Labs: Dimension Reduction, Multi-dimensional Scaling, SVM

Peter Fox Data Analytics – ITWS-4600/ITWS-6600 Week 9b, April 1, 2016

Machine Learning Week 1.

Principal Component Analysis

Measuring latent variables

INTRODUCTION TO SUPPORT VECTOR MACHINES

Finding Clusters within a Class to Improve Classification Accuracy

Group 1 Lab 2 exercises and Assignment 2

Peter Fox and Greg Hughes Data Analytics – ITWS-4600/ITWS-6600

Labs: Dimension Reduction, Multi-dimensional Scaling, SVM

Measuring latent variables

PCA is “an orthogonal linear transformation that transfers the data to a new coordinate system such that the greatest variance by any projection of the.

Measuring latent variables

Implementing AdaBoost

Classification, Clustering and Bayes…

Data Analytics – ITWS-4600/ITWS-6600/MATP-4450

Further Matrix Algebra

Interpreting: MDS, DR, SVM Factor Analysis

Labs: Trees, Dimension Reduction, Multi-dimensional Scaling, SVM

Somi Jacob and Christian Bach

Comparison of the csEN algorithm to existing predictive methods and model reduction. Comparison of the csEN algorithm to existing predictive methods and.

Peter Fox Data Analytics – ITWS-4600/ITWS-6600 Week 10b, April 8, 2016

ITWS-4600/ITWS-6600/MATP-4450/CSCI-4960

Factor Analysis (Principal Components) Output

Feature Selection Methods

Classification, Clustering and Bayes…

Local Regression, LDA, and Mixed Model Lab

Comparison of observed and predicted coverage patterns.

Group 1 Lab 2 exercises and Assignment 2

Measuring latent variables

Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017

An introduction to neural network and machine learning

Presentation transcript:

Interpreting: MDS, DR, SVM Factor Analysis Peter Fox Data Analytics – ITWS-4600/ITWS-6600/MATP-4450 Group 3 Module 10, March 29, 2018

MDS lab3_mds1.R lab3_mds2.R lab3_mds3.R http://www.statmethods.net/advstats/mds.html http://gastonsanchez.com/blog/how-to/2013/01/23/MDS-in-R.html

Eurodist

You worked on these… lab3_svm1.R –> lab3_svm11.R lab3_svm_rpart1.R Karatzoglou et al. 2006 - http://aquarius.tw.rpi.edu/html/DA/v15i09.pdf Who worked on this starting from page 9 (bottom)?

Ozone > library(e1071) > library(rpart) > data(Ozone, package=“mlbench”) > # http://math.furman.edu/~dcs/courses/math47/R/library/mlbench/html/Ozone.html # for field codes > ## split data into a train and test set > index <- 1:nrow(Ozone) > testindex <- sample(index, trunc(length(index)/3)) > testset <- na.omit(Ozone[testindex,-3]) > trainset <- na.omit(Ozone[-testindex,-3]) > svm.model <- svm(V4 ~ ., data = trainset, type=“C-classification”,cost = 1000, gamma = 0.0001) > svm.pred <- predict(svm.model, testset[,-3]) > crossprod(svm.pred - testset[,3]) / length(testindex) See: http://cran.r-project.org/web/packages/e1071/vignettes/svmdoc.pdf

Glass library(e1071) library(rpart) data(Glass, package="mlbench") index <- 1:nrow(Glass) testindex <- sample(index, trunc(length(index)/3)) testset <- Glass[testindex,] trainset <- Glass[-testindex,] svm.model <- svm(Type ~ ., data = trainset, cost = 100, gamma = 1) svm.pred <- predict(svm.model, testset[,-10])

> table(pred = svm.pred, true = testset[,10]) true pred 1 2 3 5 6 7 1 12 9 1 0 0 0 2 6 19 6 5 2 2 3 1 0 2 0 0 0 5 0 0 0 0 0 0 6 0 0 0 0 1 0 7 0 1 0 0 0 4

Example lab3_svm1.R n <- 150 # number of data points p <- 2 # dimension sigma <- 1 # variance of the distribution meanpos <- 0 # centre of the distribution of positive examples meanneg <- 3 # centre of the distribution of negative examples npos <- round(n/2) # number of positive examples nneg <- n-npos # number of negative examples # Generate the positive and negative examples xpos <- matrix(rnorm(npos*p,mean=meanpos,sd=sigma),npos,p) xneg <- matrix(rnorm(nneg*p,mean=meanneg,sd=sigma),npos,p) x <- rbind(xpos,xneg) # Generate the labels y <- matrix(c(rep(1,npos),rep(-1,nneg))) # Visualize the data plot(x,col=ifelse(y>0,1,2)) legend("topleft",c('Positive','Negative'),col=seq(2),pch=1,text.col=seq(2))

Example 1

Train/ test ntrain <- round(n*0.8) # number of training examples tindex <- sample(n,ntrain) # indices of training samples xtrain <- x[tindex,] xtest <- x[-tindex,] ytrain <- y[tindex] ytest <- y[-tindex] istrain=rep(0,n) istrain[tindex]=1 # Visualize plot(x,col=ifelse(y>0,1,2),pch=ifelse(istrain==1,1,2)) legend("topleft",c('Positive Train','Positive Test','Negative Train','Negative Test'),col=c(1,1,2,2), pch=c(1,2,1,2), text.col=c(1,1,2,2))

Comparison of test classifier

Example ctd svp <- ksvm(xtrain,ytrain,type="C-svc", kernel='vanilladot', C=100,scaled=c()) # General summary svp # Attributes that you can access attributes(svp) # did you look? # For example, the support vectors alpha(svp) alphaindex(svp) b(svp) # remember b? # Use the built-in function to pretty-plot the classifier plot(svp,data=xtrain) > # For example, the support vectors > alpha(svp) [[1]] [1] 71.05875 28.94125 100.00000 > alphaindex(svp) [1] 10 74 93 > b(svp) [1] -17.3651

SVM for iris

SVM for Swiss

e.g. Probabilities… library(kernlab) data(promotergene) ## create test and training set ind <- sample(1:dim(promotergene)[1],20) genetrain <- promotergene[-ind, ] genetest <- promotergene[ind, ] ## train a support vector machine gene <- ksvm(Class~.,data=genetrain,kernel="rbfdot",\ kpar=list(sigma=0.015),C=70,cross=4,prob.model=TRUE) ## predict gene type probabilities on the test set genetype <- predict(gene,genetest,type="probabilities")

Result > genetype + - [1,] 0.205576217 0.794423783 [2,] 0.150094660 0.849905340 [3,] 0.262062226 0.737937774 [4,] 0.939660586 0.060339414 [5,] 0.003164823 0.996835177 [6,] 0.502406898 0.497593102 [7,] 0.812503448 0.187496552 [8,] 0.996382257 0.003617743 [9,] 0.265187582 0.734812418 [10,] 0.998832291 0.001167709 [11,] 0.576491204 0.423508796 [12,] 0.973798660 0.026201340 [13,] 0.098598411 0.901401589 [14,] 0.900670101 0.099329899 [15,] 0.012571774 0.987428226 [16,] 0.977704079 0.022295921 [17,] 0.137304637 0.862695363 [18,] 0.972861575 0.027138425 [19,] 0.224470227 0.775529773 [20,] 0.004691973 0.995308027

kernlab http://aquarius.tw.rpi.edu/html/DA/svmbasic_notes.pdf Some scripts: lab1_svm12.R, lab1_svm13.R

These example_exploratoryFactorAnalysis.R on dataset_exploratoryFactorAnalysis.csv (on website) http://rtutorialseries.blogspot.com/2011/10/r-tutorial-series-exploratory-factor.html (this was the example skipped over in the lecture) http://www.statmethods.net/advstats/factor.html http://stats.stackexchange.com/questions/1576/what-are-the-differences-between-factor-analysis-and-principal-component-analysi Do these – lab2_fa{1,2,4,5}.R

Factor Analysis data(iqitems) # data(ability) ability.irt <- irt.fa(ability) ability.scores <- score.irt(ability.irt,ability) data(attitude) cor(attitude) # Compute eigenvalues and eigenvectors of the correlation matrix. pfa.eigen<-eigen(cor(attitude)) pfa.eigen$values # set a value for the number of factors (for clarity) factors<-2 # Extract and transform two components. pfa.eigen$vectors [ , 1:factors ] %*% + diag ( sqrt (pfa.eigen$values [ 1:factors ] ),factors,factors )

Glass index <- 1:nrow(Glass) testindex <- sample(index, trunc(length(index)/3)) testset <- Glass[testindex,] trainset <- Glass[-testindex,] Cor(testset) Factor Analysis?