Presentation is loading. Please wait.

Presentation is loading. Please wait.

Group 1 Lab 2 exercises and Assignment 2

Similar presentations


Presentation on theme: "Group 1 Lab 2 exercises and Assignment 2"— Presentation transcript:

1 Group 1 Lab 2 exercises and Assignment 2
Peter Fox Data Analytics – ITWS-4600/ITWS-6600/MATP-4450 Group 1, Lab 2, February 1, 2018

2 Labs 2a. Regression 2b. kNN 2c. Kmeans Do all three for Assignment 2
New multivariate dataset 2b. kNN New Abalone dataset 2c. Kmeans (Sort of) New Iris dataset Do all three for Assignment 2 And then general exercises

3 The Dataset(s) http://aquarius.tw.rpi.edu/html/DA
See slides: DataAnalytics2018_Assignment_2.pptx on LMS or (under week 3) Code fragments, i.e. they will not run as-is, on the following slides as group1/lab2_knn1.R, etc.

4 Remember a few useful cmds
head(<object>) tail(<object>) summary(<object>)

5 Regression Exercises Using the EPI (under /EPI on web) dataset find the single most important factor in increasing the EPI in a given region Examine distributions down to the leaf nodes and build up an EPI “model”

6 boxplot(ENVHEALTH,ECOSYSTEM)

7 qqplot(ENVHEALTH,ECOSYSTEM)

8 ENVHEALTH/ ECOSYSTEM > shapiro.test(ENVHEALTH) Shapiro-Wilk normality test data: ENVHEALTH W = , p-value = 1.083e  Reject. > shapiro.test(ECOSYSTEM) data: ECOSYSTEM W = , p-value =  ~reject

9 Kolmogorov- Smirnov - KS test -
> ks.test(ENVHEALTH,ECOSYSTEM) Two-sample Kolmogorov-Smirnov test data: ENVHEALTH and ECOSYSTEM D = , p-value = 5.413e-07 alternative hypothesis: two-sided Warning message: In ks.test(ENVHEALTH, ECOSYSTEM) : p-value will be approximate in the presence of ties

10 Linear and least-squares
> EPI_data <- read.csv(”EPI_data.csv") > attach(EPI_data); > boxplot(ENVHEALTH,DALY,AIR_H,WATER_H) > lmENVH<-lm(ENVHEALTH~DALY+AIR_H+WATER_H) > lmENVH … (what should you get?) > summary(lmENVH) … > cENVH<-coef(lmENVH)

11 Predict > DALYNEW<-c(seq(5,95,5)) > AIR_HNEW<-c(seq(5,95,5)) > WATER_HNEW<-c(seq(5,95,5)) > NEW<-data.frame(DALYNEW,AIR_HNEW,WATER_HNEW) > pENV<- predict(lmENVH,NEW,interval=“prediction”) > cENV<- predict(lmENVH,NEW,interval=“confidence”)

12 Repeat for AIR_E CLIMATE

13 Classification Exercises (group1/lab2_knn1.R)
> nyt1<-read.csv(“nyt1.csv") > nyt1<-nyt1[which(nyt1$Impressions>0 & nyt1$Clicks>0 & nyt1$Age>0),] > nnyt1<-dim(nyt1)[1] # shrink it down! > sampling.rate=0.9 > num.test.set.labels=nnyt1*(1.-sampling.rate) > training <-sample(1:nnyt1,sampling.rate*nnyt1, replace=FALSE) > train<-subset(nyt1[training,],select=c(Age,Impressions)) > testing<-setdiff(1:nnyt1,training) > test<-subset(nyt1[testing,],select=c(Age,Impressions)) > cg<-nyt1$Gender[training] > true.labels<-nyt1$Gender[testing] > classif<-knn(train,test,cg,k=5) # > classif > attributes(.Last.value) # interpretation to come!

14 Classification Exercises (group1/lab2_knn2.R)
2 examples in the script

15 Clustering Exercises group1/lab2_kmeans1.R
group1/lab2_kmeans2.R – plotting up results from the iris clustering


Download ppt "Group 1 Lab 2 exercises and Assignment 2"

Similar presentations


Ads by Google