1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 11a, April 7, 2014 Support Vector Machines, Decision Trees, Cross- validation.

Slides:

Advertisements

Similar presentations

The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke

Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.

Random Forest Predrag Radenković 3237/10

CHAPTER 9: Decision Trees

1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7a, March 10, 2015 Labs: more data, models, prediction, deciding with trees.

1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 4a, February 11, 2014, SAGE 3101 Introduction to Analytic Methods, Types of Data Mining for Analytics.

Chapter 7 – Classification and Regression Trees

Chapter 7 – Classification and Regression Trees

Sparse vs. Ensemble Approaches to Supervised Learning

Ensemble Learning: An Introduction

1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.

Classification Continued

Lecture 5 (Classification with Decision Trees)

Three kinds of learning

ICS 273A Intro Machine Learning

ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.

SVM Lab material borrowed from tutorial by David Meyer FH Technikum Wien, Austria see:

Ensemble Learning (2), Tree and Forest

1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.

1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 3b, February 7, 2014 Lab exercises: datasets and data infrastructure.

Tree-Based Methods (V&R 9.1) Demeke Kasaw, Andreas Nguyen, Mariana Alvaro STAT 6601 Project.

Lecture Notes 4 Pruning Zhangxi Lin ISQS

1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7b, March 13, 2015 Interpreting weighted kNN, decision trees, cross-validation, dimension reduction.

Chapter 9 – Classification and Regression Trees

Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.

1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 10a, April 1, 2014 Support Vector Machines.

1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 1b, January 24, 2014 Relevant software and getting it installed.

1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 8b, March 21, 2014 Using the models, prediction, deciding.

1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 10b, April 4, 2014 Lab: More on Support Vector Machines, Trees, and your projects.

Today Ensemble Methods. Recap of the course. Classifier Fusion

Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.

1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7a, March 3, 2014, SAGE 3101 Interpreting weighted kNN, forms of clustering, decision trees and Bayesian.

Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.

ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.

MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.

Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.

Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.

SVM Lab material borrowed from tutorial by David Meyer FH Technikum Wien, Austria see:

Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Validation methods.

Classification and Regression Trees

Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.

Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.

Ensemble Learning, Boosting, and Bagging: Scaling up Decision Trees (with thanks to William Cohen of CMU, Michael Malohlava of 0xdata, and Manish Amde.

Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.

Decision Tree Lab. Load in iris data: Display iris data as a sanity.

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.

DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.

1 Peter Fox Data Analytics – ITWS-4600/ITWS-6600 Week 7a, March 8, 2016 Decision trees, cross-validation.

By Subhasis Dasgupta Asst Professor Praxis Business School, Kolkata Classification Modeling Decision Tree (Part 2)

1 Peter Fox Data Analytics – ITWS-4600/ITWS-6600 Week 11a, April 12, 2016 Interpreting: MDS, DR, SVM Factor Analysis; and Boosting.

Using the models, prediction, deciding

More Bayes, Decision trees, and cross-validation

DECISION TREES An internal node represents a test on an attribute.

Computational Intelligence: Methods and Applications

Interpreting: MDS, DR, SVM Factor Analysis

Labs: Dimension Reduction, Multi-dimensional Scaling, SVM

Peter Fox Data Analytics – ITWS-4600/ITWS-6600 Week 9b, April 1, 2016

Classification and Prediction

Labs: Dimension Reduction, Multi-dimensional Scaling, SVM

Interpreting: MDS, DR, SVM Factor Analysis

CSCI N317 Computation for Scientific Applications Unit Weka

Interpreting: MDS, DR, SVM Factor Analysis

Labs: Trees, Dimension Reduction, Multi-dimensional Scaling, SVM

Cross-validation Brenda Thomson/ Peter Fox Data Analytics

Classification with CART

©Jiawei Han and Micheline Kamber

Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017

Presentation transcript:

1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 11a, April 7, 2014 Support Vector Machines, Decision Trees, Cross- validation

Reading? 2

Probabilities… library(kernlab) data(promotergene) ## create test and training set ind <- sample(1:dim(promotergene)[1],20) genetrain <- promotergene[-ind, ] genetest <- promotergene[ind, ] ## train a support vector machine gene <- ksvm(Class~.,data=genetrain,kernel="rbfdot",\ kpar=list(sigma=0.015),C=70,cross=4,prob.model=TRUE) ## predict gene type probabilities on the test set genetype <- predict(gene,genetest,type="probabilities") 3

Result > genetype + - [1,] [2,] [3,] [4,] [5,] [6,] [7,] [8,] [9,] [10,] [11,] [12,] [13,] [14,] [15,] [16,] [17,] [18,] [19,] [20,]

Glass library(e1071) library(rpart) data(Glass, package="mlbench") index <- 1:nrow(Glass) testindex <- sample(index, trunc(length(index)/3)) testset <- Glass[testindex,] trainset <- Glass[-testindex,] svm.model <- svm(Type ~., data = trainset, cost = 100, gamma = 1) svm.pred <- predict(svm.model, testset[,-10]) 5

> table(pred = svm.pred, true = testset[,10]) true pred

Now what? # now what happens? > rpart.model <- rpart(Type ~., data = trainset) > rpart.pred <- predict(rpart.model, testset[,-10], type = "class”) 7

General idea behind trees Although the basic philosophy of all the classifiers based on decision trees is identical, there are many possibilities for its construction. Among all the key points in the selection of an algorithm to build decision trees some of them should be highlighted for their importance: –Criteria for the choice of feature to be used in each node –How to calculate the partition of the set of examples –When you decide that a node is a leaf –What is the criterion to select the class to assign to each leaf 8

Some important advantages can be pointed to the decision trees, including: –Can be applied to any type of data –The final structure of the classifier is quite simple and can be stored and handled in a graceful manner –Handles very efficiently conditional information, subdividing the space into sub-spaces that are handled individually –Reveal normally robust and insensitive to misclassification in the training set –The resulting trees are usually quite understandable and can be easily used to obtain a better understanding of the phenomenon in question. This is perhaps the most important of all the advantages listed 9

Stopping – leaves on the tree A number of stopping conditions can be used to stop the recursive process. The algorithm stops when any one of the conditions is true: –All the samples belong to the same class, i.e. have the same label since the sample is already "pure" –Stop if most of the points are already of the same class. This is a generalization of the first approach, with some error threshold –There are no remaining attributes on which the samples may be further partitioned –There are no samples for the branch test attribute 10

Recursive partitioning Recursive partitioning is a fundamental tool in data mining. It helps us explore the structure of a set of data, while developing easy to visualize decision rules for predicting a categorical (classification tree) or continuous (regression tree) outcome. The rpart programs build classification or regression models of a very general structure using a two stage procedure; the resulting models can be represented as binary trees. 11

Recursive partitioning The tree is built by the following process: –first the single variable is found which best splits the data into two groups ('best' will be defined later). The data is separated, and then this process is applied separately to each sub-group, and so on recursively until the subgroups either reach a minimum size or until no improvement can be made. –second stage of the procedure consists of using cross-validation to trim back the full tree. 12

Why are we careful doing this? Because we will USE these trees, i.e. apply them to make decisions about what things are and what to do with them! 13

> printcp(rpart.model) Classification tree: rpart(formula = Type ~., data = trainset) Variables actually used in tree construction: [1] Al Ba Mg RI Root node error: 92/143 = n= 143 CP nsplit rel error xerror xstd

plotcp(rpart.model) 15

> rsq.rpart(rpart.model) Classification tree: rpart(formula = Type ~., data = trainset) Variables actually used in tree construction: [1] Al Ba Mg RI Root node error: 92/143 = n= 143 CP nsplit rel error xerror xstd Warning message: In rsq.rpart(rpart.model) : may not be applicable for this method 16

rsq.rpart 17

> print(rpart.model) n= 143 node), split, n, loss, yval, (yprob) * denotes terminal node 1) root ( ) 2) Ba< ( ) 4) Al< ( ) 8) RI>= ( ) 16) RI< ( ) * 17) RI>= ( ) 34) RI>= ( ) 68) Mg>= ( ) * 69) Mg< ( ) * 35) RI< ( ) * 9) RI< ( ) * 5) Al>= ( ) 10) Mg>= ( ) * 11) Mg< ( ) * 3) Ba>= ( ) * 18

Tree plot 19 plot(object, uniform=FALSE, branch=1, compress=FALSE, nspace, margin=0, minbranch=.3, args) > plot(rpart.model,compress=TRUE) > text(rpart.model, use.n=TRUE)

And if you are brave summary(rpart.model) … pages…. 20

Wait, did anyone LOOK at the data? > names(Glass) [1] "RI" "Na" "Mg" "Al" "Si" "K" "Ca" "Ba" "Fe" "Type" > head(Glass) RI Na Mg Al Si K Ca Ba Fe Type

rpart.pred > rpart.pred Levels:

plot(rpart.pred) 23

Hierarchical clustering 24 > dswiss <- dist(as.matrix(swiss)) > hs <- hclust(dswiss) > plot(hs)

ctree 25 require(party) swiss_ctree <- ctree(Fertility ~ Agriculture + Education + Catholic, data = swiss) plot(swiss_ctree)

26

hclust for iris 27

plot(iris_ctree) 28

Ctree > iris_ctree <- ctree(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data=iris) > print(iris_ctree) Conditional inference tree with 4 terminal nodes Response: Species Inputs: Sepal.Length, Sepal.Width, Petal.Length, Petal.Width Number of observations: 150 1) Petal.Length <= 1.9; criterion = 1, statistic = )* weights = 50 1) Petal.Length > 1.9 3) Petal.Width <= 1.7; criterion = 1, statistic = ) Petal.Length <= 4.8; criterion = 0.999, statistic = )* weights = 46 4) Petal.Length > 4.8 6)* weights = 8 3) Petal.Width > 1.7 7)* weights = 46 29

> plot(iris_ctree, type="simple”) 30

New dataset to work with trees fitK <- rpart(Kyphosis ~ Age + Number + Start, method="class", data=kyphosis) printcp(fitK) # display the results plotcp(fitK) # visualize cross-validation results summary(fitK) # detailed summary of splits # plot tree plot(fitK, uniform=TRUE, main="Classification Tree for Kyphosis") text(fitK, use.n=TRUE, all=TRUE, cex=.8) # create attractive postscript plot of tree post(fitK, file = “kyphosistree.ps", title = "Classification Tree for Kyphosis") # might need to convert to PDF (distill) 31

32

33 > pfitK<- prune(fitK, cp= fitK$cptable[which.min(fitK$cptable[,"xerror"]),"CP"]) > plot(pfitK, uniform=TRUE, main="Pruned Classification Tree for Kyphosis") > text(pfitK, use.n=TRUE, all=TRUE, cex=.8) > post(pfitK, file = “ptree.ps", title = "Pruned Classification Tree for Kyphosis”)

34 > fitK <- ctree(Kyphosis ~ Age + Number + Start, data=kyphosis) > plot(fitK, main="Conditional Inference Tree for Kyphosis”)

35 > plot(fitK, main="Conditional Inference Tree for Kyphosis",type="simple")

randomForest > require(randomForest) > fitKF <- randomForest(Kyphosis ~ Age + Number + Start, data=kyphosis) > print(fitKF) # view results Call: randomForest(formula = Kyphosis ~ Age + Number + Start, data = kyphosis) Type of random forest: classification Number of trees: 500 No. of variables tried at each split: 1 OOB estimate of error rate: 20.99% Confusion matrix: absent present class.error absent present > importance(fitKF) # importance of each predictor MeanDecreaseGini Age Number Start Random forests improve predictive accuracy by generating a large number of bootstrapped trees (based on random samples of variables), classifying a case using each tree in this new "forest", and deciding a final predicted outcome by combining the results across all of the trees (an average in regression, a majority vote in classification).

More on another dataset. # Regression Tree Example library(rpart) # build the tree fitM <- rpart(Mileage~Price + Country + Reliability + Type, method="anova", data=cu.summary) printcp(fitM) # display the results …. Root node error: /60 = n=60 (57 observations deleted due to missingness) CP nsplit rel error xerror xstd

Mileage… plotcp(fitM) # visualize cross-validation results summary(fitM) # detailed summary of splits 38

39 par(mfrow=c(1,2)) rsq.rpart(fitM) # visualize cross-validation results

# plot tree plot(fitM, uniform=TRUE, main="Regression Tree for Mileage ") text(fitM, use.n=TRUE, all=TRUE, cex=.8) # prune the tree pfitM<- prune(fitM, cp= ) # from cptable # plot the pruned tree plot(pfitM, uniform=TRUE, main="Pruned Regression Tree for Mileage") text(pfitM, use.n=TRUE, all=TRUE, cex=.8) 40

41 ?????

# Conditional Inference Tree for Mileage fit2M <- ctree(Mileage~Price + Country + Reliability + Type, data=na.omit(cu.summary)) 42

Example n <- 150 # number of data points p <- 2 # dimension sigma <- 1 # variance of the distribution meanpos <- 0 # centre of the distribution of positive examples meanneg <- 3 # centre of the distribution of negative examples npos <- round(n/2) # number of positive examples nneg <- n-npos # number of negative examples # Generate the positive and negative examples xpos <- matrix(rnorm(npos*p,mean=meanpos,sd=sigma),npos,p) xneg <- matrix(rnorm(nneg*p,mean=meanneg,sd=sigma),npos,p) x <- rbind(xpos,xneg) # Generate the labels y <- matrix(c(rep(1,npos),rep(-1,nneg))) 43

Train/ test ntrain <- round(n*0.8) # number of training examples tindex <- sample(n,ntrain) # indices of training samples xtrain <- x[tindex,] xtest <- x[-tindex,] ytrain <- y[tindex] ytest <- y[-tindex] istrain=rep(0,n) istrain[tindex]=1 44

Example svp <- ksvm(xtrain,ytrain,type="C-svc", kernel='vanilladot', C=100,scaled=c()) > alpha(svp) [[1]] [1] > svp Support Vector Machine object of class "ksvm” SV type: C-svc (classification) parameter : cost C = 100 Linear (vanilla) kernel function. Number of Support Vectors : 8 Objective Function Value : Training error :

Example > alphaindex(svp) [[1]] [1] > b(svp) [1] plot(svp,data=xtrain) 46

Cross-validation Cross-validation is a model validation technique for assessing how the results of a statistical analysis will generalize to an independent data set. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice. I.e. predictive and prescriptive analytics… 47

Cross-validation In a prediction problem, a model is usually given a dataset of known data on which training is run (training dataset), and a dataset of unknown data (or first seen data) against which the model is tested (testing dataset). Sound familiar? 48

Cross-validation The goal of cross validation is to define a dataset to "test" the model in the training phase (i.e., the validation dataset), in order to limit problems like overfitting And, give an insight on how the model will generalize to an independent data set (i.e., an unknown dataset, for instance from a real problem), etc. 49

Common type of x-validation K-fold 2-fold (do you know this one?) Rep-random-subsample Leave out-subsample Lab on Friday… to try these out 50

Admin info (keep/ print this slide) Class: ITWS-4963/ITWS 6965 Hours: 12:00pm-1:50pm Tuesday/ Friday Location: SAGE 3101 Instructor: Peter Fox Instructor contact: (do not leave a Contact hours: Monday** 3:00-4:00pm (or by appt) Contact location: Winslow 2120 (sometimes Lally 207A announced by ) TA: Lakshmi Chenicheri Web site: –Schedule, lectures, syllabus, reading, assignments, etc. 51

Ozone > library(e1071) > library(rpart) > data(Ozone, package=“mlbench”) # beware of “()” > ## split data into a train and test set > index <- 1:nrow(Ozone) > testindex <- sample(index, trunc(length(index)/3)) > testset <- na.omit(Ozone[testindex,-3]) > trainset <- na.omit(Ozone[-testindex,-3]) > svm.model <- svm(V4 ~., data = trainset, cost = 1000, gamma = ) > svm.pred <- predict(svm.model, testset[,-3]) > crossprod(svm.pred - testset[,3]) / length(testindex) See: 52