Interpreting: MDS, DR, SVM Factor Analysis

Slides:



Advertisements
Similar presentations
Pseudo Inverse Heisenberg Uncertainty for Data Mining Explicit Principal Components Implicit Principal Components NIPALS Algorithm for Eigenvalues and.
Advertisements

EigenFaces and EigenPatches Useful model of variation in a region –Region must be fixed shape (eg rectangle) Developed for face recognition Generalised.
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 10b, April 10, 2015 Labs: Cross Validation, RandomForest, Multi- Dimensional Scaling, Dimension Reduction,
As applied to face recognition.  Detection vs. Recognition.
A Comprehensive Study on Third Order Statistical Features for Image Splicing Detection Xudong Zhao, Shilin Wang, Shenghong Li and Jianhua Li Shanghai Jiao.
Principal Component Analysis
CSE 300: Software Reliability Engineering Topics covered: Software metrics and software reliability.
Bayesian belief networks 2. PCA and ICA
SVM Lab material borrowed from tutorial by David Meyer FH Technikum Wien, Austria see:
Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 10a, April 1, 2014 Support Vector Machines.
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 10b, April 4, 2014 Lab: More on Support Vector Machines, Trees, and your projects.
Classification Course web page: vision.cis.udel.edu/~cv May 12, 2003  Lecture 33.
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 11a, April 7, 2014 Support Vector Machines, Decision Trees, Cross- validation.
JBR1 Support Vector Machines Classification Venables & Ripley Section 12.5 CSU Hayward Statistics 6601 Joseph Rickert & Timothy McKusick December 1, 2004.
WEKA Machine Learning Toolbox. You can install Weka on your computer from
Reduces time complexity: Less computation Reduces space complexity: Less parameters Simpler models are more robust on small datasets More interpretable;
Assignments CS fall Assignment 1 due Generate the in silico data set of 2sin(1.5x)+ N (0,1) with 100 random values of x between.
SVM Lab material borrowed from tutorial by David Meyer FH Technikum Wien, Austria see:
Feature Selection and Dimensionality Reduction. “Curse of dimensionality” – The higher the dimensionality of the data, the more data is needed to learn.
Market analysis for the S&P500 Giulio Genovese Tuesday, December
Gist 2.3 John H. Phan MIBLab Summer Workshop June 28th, 2006.
Chapter 15: Classification of Time- Embedded EEG Using Short-Time Principal Component Analysis by Nguyen Duc Thang 5/2009.
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
Principal Components Analysis ( PCA)
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
1 Peter Fox Data Analytics – ITWS-4600/ITWS-6600 Week 11a, April 12, 2016 Interpreting: MDS, DR, SVM Factor Analysis; and Boosting.
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
Descriptive Statistics The means for all but the C 3 features exhibit a significant difference between both classes. On the other hand, the variances for.
Principal Component Analysis (PCA)
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Course Review Questions will not be all on one topic, i.e. questions may have parts covering more than one area.
Principal Components Shyh-Kang Jeng
ASEN 5070: Statistical Orbit Determination I Fall 2015
Theme Introduction : Learning from Data
Linear Discrimant Analysis(LDA)
Classification, Clustering and Bayes…
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Labs: Dimension Reduction, Factor Analysis
Labs: Dimension Reduction, Factor Analysis
Labs: Dimension Reduction, Multi-dimensional Scaling, SVM
Peter Fox Data Analytics – ITWS-4600/ITWS-6600 Week 9b, April 1, 2016
Machine Learning Week 1.
Principal Component Analysis
Measuring latent variables
INTRODUCTION TO SUPPORT VECTOR MACHINES
Peter Fox and Greg Hughes Data Analytics – ITWS-4600/ITWS-6600
Labs: Dimension Reduction, Multi-dimensional Scaling, SVM
Interpreting: MDS, DR, SVM Factor Analysis
Bayesian belief networks 2. PCA and ICA
Measuring latent variables
PCA is “an orthogonal linear transformation that transfers the data to a new coordinate system such that the greatest variance by any projection of the.
Measuring latent variables
Introduction PCA (Principal Component Analysis) Characteristics:
Machine Learning Math Essentials Part 2
Data Analytics – ITWS-4600/ITWS-6600/MATP-4450
Further Matrix Algebra
Local Regression, LDA, and Mixed Model Lab
Interpreting: MDS, DR, SVM Factor Analysis
Labs: Trees, Dimension Reduction, Multi-dimensional Scaling, SVM
CS4670: Intro to Computer Vision
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Comparison of the csEN algorithm to existing predictive methods and model reduction. Comparison of the csEN algorithm to existing predictive methods and.
Peter Fox Data Analytics – ITWS-4600/ITWS-6600 Week 10b, April 8, 2016
ITWS-4600/ITWS-6600/MATP-4450/CSCI-4960
Factor Analysis (Principal Components) Output
Feature Selection Methods
Measuring latent variables
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Interpreting: MDS, DR, SVM Factor Analysis Peter Fox and Greg Hughes Data Analytics – ITWS-4600/ITWS-6600 Group 3 Module 10, March 20, 2017

This? library(EDR) # effective dimension reduction library(dr) library(clustrd) ##### install.packages("edrGraphicalTools") ##### ? library(edrGraphicalTools) demo(edr_ex1) demo(edr_ex2) demo(edr_ex3) demo(edr_ex4)

Some examples group3/ lab3_dr1.R lab3_dr2.R lab3_dr3.R lab3_dr4.R

Spellman

MDS lab3_mds1.R lab3_mds2.R lab3_mds3.R http://www.statmethods.net/advstats/mds.html http://gastonsanchez.com/blog/how-to/2013/01/23/MDS-in-R.html

Eurodist

You worked on these… lab3_svm1.R –> lab3_svm11.R lab3_svm_rpart1.R Karatzoglou et al. 2006 - http://aquarius.tw.rpi.edu/html/DA/v15i09.pdf Who worked on this starting from page 9 (bottom)?

Ozone > library(e1071) > library(rpart) > data(Ozone, package=“mlbench”) > # http://math.furman.edu/~dcs/courses/math47/R/library/mlbench/html/Ozone.html # for field codes > ## split data into a train and test set > index <- 1:nrow(Ozone) > testindex <- sample(index, trunc(length(index)/3)) > testset <- na.omit(Ozone[testindex,-3]) > trainset <- na.omit(Ozone[-testindex,-3]) > svm.model <- svm(V4 ~ ., data = trainset, type=“C-classification”,cost = 1000, gamma = 0.0001) > svm.pred <- predict(svm.model, testset[,-3]) > crossprod(svm.pred - testset[,3]) / length(testindex) See: http://cran.r-project.org/web/packages/e1071/vignettes/svmdoc.pdf

Glass library(e1071) library(rpart) data(Glass, package="mlbench") index <- 1:nrow(Glass) testindex <- sample(index, trunc(length(index)/3)) testset <- Glass[testindex,] trainset <- Glass[-testindex,] svm.model <- svm(Type ~ ., data = trainset, cost = 100, gamma = 1) svm.pred <- predict(svm.model, testset[,-10])

> table(pred = svm.pred, true = testset[,10]) true pred 1 2 3 5 6 7 1 12 9 1 0 0 0 2 6 19 6 5 2 2 3 1 0 2 0 0 0 5 0 0 0 0 0 0 6 0 0 0 0 1 0 7 0 1 0 0 0 4

Example lab3_svm1.R n <- 150 # number of data points p <- 2 # dimension sigma <- 1 # variance of the distribution meanpos <- 0 # centre of the distribution of positive examples meanneg <- 3 # centre of the distribution of negative examples npos <- round(n/2) # number of positive examples nneg <- n-npos # number of negative examples # Generate the positive and negative examples xpos <- matrix(rnorm(npos*p,mean=meanpos,sd=sigma),npos,p) xneg <- matrix(rnorm(nneg*p,mean=meanneg,sd=sigma),npos,p) x <- rbind(xpos,xneg) # Generate the labels y <- matrix(c(rep(1,npos),rep(-1,nneg))) # Visualize the data plot(x,col=ifelse(y>0,1,2)) legend("topleft",c('Positive','Negative'),col=seq(2),pch=1,text.col=seq(2))

Example 1

Train/ test ntrain <- round(n*0.8) # number of training examples tindex <- sample(n,ntrain) # indices of training samples xtrain <- x[tindex,] xtest <- x[-tindex,] ytrain <- y[tindex] ytest <- y[-tindex] istrain=rep(0,n) istrain[tindex]=1 # Visualize plot(x,col=ifelse(y>0,1,2),pch=ifelse(istrain==1,1,2)) legend("topleft",c('Positive Train','Positive Test','Negative Train','Negative Test'),col=c(1,1,2,2), pch=c(1,2,1,2), text.col=c(1,1,2,2))

Comparison of test classifier

Example ctd svp <- ksvm(xtrain,ytrain,type="C-svc", kernel='vanilladot', C=100,scaled=c()) # General summary svp # Attributes that you can access attributes(svp) # did you look? # For example, the support vectors alpha(svp) alphaindex(svp) b(svp) # remember b? # Use the built-in function to pretty-plot the classifier plot(svp,data=xtrain) > # For example, the support vectors > alpha(svp) [[1]] [1] 71.05875 28.94125 100.00000 > alphaindex(svp) [1] 10 74 93 > b(svp) [1] -17.3651

SVM for iris

SVM for Swiss

e.g. Probabilities… library(kernlab) data(promotergene) ## create test and training set ind <- sample(1:dim(promotergene)[1],20) genetrain <- promotergene[-ind, ] genetest <- promotergene[ind, ] ## train a support vector machine gene <- ksvm(Class~.,data=genetrain,kernel="rbfdot",\ kpar=list(sigma=0.015),C=70,cross=4,prob.model=TRUE) ## predict gene type probabilities on the test set genetype <- predict(gene,genetest,type="probabilities")

Result > genetype + - [1,] 0.205576217 0.794423783 [2,] 0.150094660 0.849905340 [3,] 0.262062226 0.737937774 [4,] 0.939660586 0.060339414 [5,] 0.003164823 0.996835177 [6,] 0.502406898 0.497593102 [7,] 0.812503448 0.187496552 [8,] 0.996382257 0.003617743 [9,] 0.265187582 0.734812418 [10,] 0.998832291 0.001167709 [11,] 0.576491204 0.423508796 [12,] 0.973798660 0.026201340 [13,] 0.098598411 0.901401589 [14,] 0.900670101 0.099329899 [15,] 0.012571774 0.987428226 [16,] 0.977704079 0.022295921 [17,] 0.137304637 0.862695363 [18,] 0.972861575 0.027138425 [19,] 0.224470227 0.775529773 [20,] 0.004691973 0.995308027

kernlab http://aquarius.tw.rpi.edu/html/DA/svmbasic_notes.pdf Some scripts: lab3_svm12.R, lab3_svm13.R

These example_exploratoryFactorAnalysis.R on dataset_exploratoryFactorAnalysis.csv (on website) http://rtutorialseries.blogspot.com/2011/10/r-tutorial-series-exploratory-factor.html (this was the example skipped over in the lecture) http://www.statmethods.net/advstats/factor.html http://stats.stackexchange.com/questions/1576/what-are-the-differences-between-factor-analysis-and-principal-component-analysi Do these - Lab10b_fa{1,2,4,5}_2016.R

Factor Analysis data(iqitems) # data(ability) ability.irt <- irt.fa(ability) ability.scores <- score.irt(ability.irt,ability) data(attitude) cor(attitude) # Compute eigenvalues and eigenvectors of the correlation matrix. pfa.eigen<-eigen(cor(attitude)) pfa.eigen$values # set a value for the number of factors (for clarity) factors<-2 # Extract and transform two components. pfa.eigen$vectors [ , 1:factors ] %*% + diag ( sqrt (pfa.eigen$values [ 1:factors ] ),factors,factors )

Glass index <- 1:nrow(Glass) testindex <- sample(index, trunc(length(index)/3)) testset <- Glass[testindex,] trainset <- Glass[-testindex,] Cor(testset) Factor Analysis?