Decision Tree Lab. Load in iris data: Display iris data as a sanity.

Slides:



Advertisements
Similar presentations
Writing functions in R Some handy advice for creating your own functions.
Advertisements

1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 7a, March 10, 2015 Labs: more data, models, prediction, deciding with trees.
Selection Trees. What are selection trees? Complete binary tree Each node represents a “match” Winner Trees Loser Trees.
Regression Tree Learning Gabor Melli July 18 th, 2013.
Decision Trees in R Arko Barman With additions and modifications by Ch. Eick COSC 4335 Data Mining.
Decision Tree Dr. Jieh-Shan George YEH
Neural Nets How the brain achieves intelligence 10^11 1mhz cpu’s.
Supervised Learning & Classification, part I Reading: W&F ch 1.1, 1.2, , 3.2, 3.3, 4.3, 6.1*
Logit Lab material borrowed from tutorial by William B
Arko Barman Slightly edited by Ch. Eick COSC 6335 Data Mining
Properties of Poisson The mean and variance are both equal to. The sum of independent Poisson variables is a further Poisson variable with mean equal to.
Rapid Miner Session CIS 600 Analytical Data Mining,EECS, SU Three steps for use  Assign the dataset file first  Select functionality  Execute.
Tree-Based Methods (V&R 9.1) Demeke Kasaw, Andreas Nguyen, Mariana Alvaro STAT 6601 Project.
A quick introduction to R prog. 淡江統計 陳景祥 (Steve Chen)
R-Graphics Day 2 Stephen Opiyo. Basic Graphs One of the main reasons data analysts turn to R is for its strong graphic capabilities. R generates publication-ready.
Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 5: Classification Trees: An Alternative to Logistic.
Univariate Graphs III Review Create histogram from Commands Window. Multipanel histogram. Quantile Plots Quantile-Normal Plots Quantile-Quantile Plots.
Figure 1.1 Rules for the contact lens data.. Figure 1.2 Decision tree for the contact lens data.
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 4b, February 20, 2015 Lab: regression, kNN and K- means results, interpreting and evaluating models.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
1 Running Clustering Algorithm in Weka Presented by Rachsuda Jiamthapthaksin Computer Science Department University of Houston.
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 8b, March 21, 2014 Using the models, prediction, deciding.
Lecture 5 Model Evaluation. Elements of Model evaluation l Goodness of fit l Prediction Error l Bias l Outliers and patterns in residuals.
The Three Analytics Techniques. Decision Trees – Determining Probability.
1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 11a, April 7, 2014 Support Vector Machines, Decision Trees, Cross- validation.
ITSC/University of Alabama in Huntsville ADaM version 4.0 (Eagle) Tutorial Information Technology and Systems Center University of Alabama in Huntsville.
CSE/CIS 787 Analytical Data Mining, Dept. of EECS, SU Three steps for use  Assign the dataset file first  Assign the analysis type you want.
Weka Just do it Free and Open Source ML Suite Ian Witten & Eibe Frank University of Waikato New Zealand.
Machine Learning (ML) with Weka Weka can classify data or approximate functions: choice of many algorithms.
Regression through the origin
Biostatistics Case Studies 2008 Peter D. Christenson Biostatistician Session 6: Classification Trees.
Neural networks – Hands on
Introductory Data Analysis F73DA2. Contact Times (Spring Term 2008) Monday 4: : Lecture in LT3 Tuesday 2: : Lecture in LT3 Wednesday
1 Statistics & R, TiP, 2011/12 Multivariate Methods  Multivariate data  Data display  Principal component analysis Unsupervised learning technique 
1 Peter Fox Data Analytics – ITWS-4600/ITWS-6600 Week 3b, February 12, 2016 Lab exercises /assignment 2.
Diagonal is sum of variances In general, these will be larger when “within” class variance is larger (a bad thing) Sw(iris[,1:4],iris[,5]) Sepal.Length.
Review > head(tripData) > table(speciesData$SpeciesCode) > grep("a", c("aa","ab","bb")) > c(2,3,8) %in% c(1,2,3,5,7,9) > bocTrip
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
By Subhasis Dasgupta Asst Professor Praxis Business School, Kolkata Classification Modeling Decision Tree (Part 2)
Решение задач Data Mining. R и Hadoop. Классификация Decision Tree  Исходные данные >names(iris) [1] "Sepal.Length" "Sepal.Width" "Petal.Length“ [4]
Common Linear & Classification for Machine Learning using Microsoft R
Building 1 million predictions per second using SQL-R
Using the models, prediction, deciding
PCA/LDA Lab CSCE 587 Spring 2017.
Data Analytics – ITWS-4600/ITWS-6600
Clustering CSC 600: Data Mining Class 21.
Evaluating-Ayasdi’s-Topological-Data-Analysis-For-Big-Data_HKim2015
Chapter 18 From Data to Knowledge
Group 1 Lab 2 exercises /assignment 2
CS 235 Decision Tree Classification
Discriminant Analysis
Peter Fox and Greg Hughes Data Analytics – ITWS-4600/ITWS-6600
Data Analytics – ITWS-4600/ITWS-6600/MATP-4450
K-Means Lab.
Weka Free and Open Source ML Suite Ian Witten & Eibe Frank
DataMining, Morgan Kaufmann, p Mining Lab. 김완섭 2004년 10월 27일
Classification and clustering - interpreting and exploring data
PCA/LDA Lab CSCE 587 Fall 2018.
Assignment 2 (in lab) Peter Fox and Greg Hughes
R & Trees There are two tree libraries: tree: original
Lab weighted kNN, decision trees, random forest (“cross-validation” built in – more labs on it later in the course) Peter Fox and Greg Hughes Data Analytics.
Mathematica: Hubble.
Data Analytics – ITWS-4600/ITWS-6600/MATP-4450
ITWS-4600/ITWS-6600/MATP-4450/CSCI-4960
Association Rules Lab.
Arko Barman COSC 6335 Data Mining
DATA VISUALISATION (QUANTITATIVE).
Data Mining CSCI 307, Spring 2019 Lecture 6
Data Mining CSCI 307, Spring 2019 Lecture 9
R for Statistics and Graphics
Presentation transcript:

Decision Tree Lab

Load in iris data: Display iris data as a sanity check: iris Load package rpart. Install if necessary

We will use fit() to build tree First: understand arguments to fit() – fit(formula, data=, method, control=) – formula: outcome ~ predictor1 + predictor2+… – data: specifies the dataframe – method: “class” for classification tree – control: optional parameters for controlling tree growth

In the case of the iris dataset – formula: Species ~ Petal.Length + Petal.Width + Sepal.Length + Sepal.Width – data = iris – method=“class”

In the case of the iris dataset – control=rpart.control(minsplit=2, cp=0.001) i.e. at least 2 observation in a node must improve overall fit by a factor of (cost complexity)

Altogether: fit = rpart(Species ~ Petal.Length + Petal.Width + Sepal.Length + Sepal.Width, method = "class", data=iris, control =rpart.control(minsplit=2, cp=0.001)) Examine decision tree: print(fit)

Plot decision tree: plot(fit, uniform=TRUE, main="Classification Tree for Iris Dataset") Label the tree: text(fit, use.n=TRUE, all=TRUE, cex=.7)