Presentation is loading. Please wait.

Presentation is loading. Please wait.

Peter Fox and Greg Hughes Data Analytics – ITWS-4600/ITWS-6600

Similar presentations


Presentation on theme: "Peter Fox and Greg Hughes Data Analytics – ITWS-4600/ITWS-6600"— Presentation transcript:

1 Lab exercises: working with real datasets, plotting, more regression, kNN and K-means…
Peter Fox and Greg Hughes Data Analytics – ITWS-4600/ITWS-6600 Group 2, Lab 1, February 9, 2017

2 Plot tools/ tips pairs, gpairs, scatterplot.matrix, clustergram, etc. data() # precip, presidents, iris, swiss, sunspot.month (!), environmental, ethanol, ionosphere More script fragments in R will be available on the web site ( )

3 Scripts – work through these
See in folder group2/ lab1_pairs1.R lab1_splom.R lab1_gpairs1.R lab1_mosaic.R lab1_spm.R lab1_wknn.R lab1_kknn1.R lab1_kknn2.R lab1_kknn3.R lab1_kmeans1.R lab1_ctree2.R lab1_nyt.R lab1_bronx1.R lab1_bronx2.R

4 K Nearest Neighbors (classification)
Script – group2/lab1_nyt.R > nyt1<-read.csv(“nyt1.csv") … from week 3b slides or script > classif<-knn(train,test,cg,k=5) # > head(true.labels) [1] > head(classif) [1] Levels: 0 1 > ncorrect<-true.labels==classif > table(ncorrect)["TRUE"] # or > length(which(ncorrect)) > What do you conclude?

5 NYC Housing data

6 Bronx 1 = Regression You were reminded that log(0) is … not fun
> plot(log(bronx$GROSS.SQUARE.FEET), log(bronx$SALE.PRICE) ) > m1<-lm(log(bronx$SALE.PRICE)~log(bronx$GROSS.SQUARE.FEET),data=bronx) You were reminded that log(0) is … not fun  THINK through what you are doing… Filtering is somewhat inevitable: > bronx<-bronx[which(bronx$GROSS.SQUARE.FEET>0 & bronx$LAND.SQUARE.FEET>0 & bronx$SALE.PRICE>0),] Lab5b_bronx1_2016.R

7 Interpreting this! Call: lm(formula = log(SALE.PRICE) ~ log(GROSS.SQUARE.FEET), data = bronx) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) <2e-16 *** log(GROSS.SQUARE.FEET) <2e-16 *** --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.95 on 2435 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 2435 DF, p-value: < 2.2e-16

8 Plots – tell me what they tell you!

9 Solution model 2 > m2<-lm(log(bronx$SALE.PRICE)~log(bronx$GROSS.SQUARE.FEET)+log(bronx$LAND.SQUARE.FEET)+factor(bronx$NEIGHBORHOOD),data=bronx) > summary(m2) > plot(resid(m2)) # > m2a<-lm(log(bronx$SALE.PRICE)~0+log(bronx$GROSS.SQUARE.FEET)+log(bronx$LAND.SQUARE.FEET)+factor(bronx$NEIGHBORHOOD),data=bronx) > summary(m2a) > plot(resid(m2a))

10 How do you interpret this residual plot?

11 Solution model 3 and 4 > m3<-lm(log(bronx$SALE.PRICE)~0+log(bronx$GROSS.SQUARE.FEET)+log(bronx$LAND.SQUARE.FEET)+factor(bronx$NEIGHBORHOOD)+factor(bronx$BUILDING.CLASS.CATEGORY),data=bronx) > summary(m3) > plot(resid(m3)) # > m4<-lm(log(bronx$SALE.PRICE)~0+log(bronx$GROSS.SQUARE.FEET)+log(bronx$LAND.SQUARE.FEET)+factor(bronx$NEIGHBORHOOD)*factor(bronx$BUILDING.CLASS.CATEGORY),data=bronx) > summary(m4) > plot(resid(m4))

12 And this one?

13 Bronx 2 = complex example
See lab1_bronx2.R Manipulation Mapping knn kmeans

14

15 KNN! Did you loop over k? { knnpred<-knn(mapcoord[trainid,3:4],mapcoord[testid,3:4],cl=mapcoord[trainid,2],k=5) knntesterr<-sum(knnpred!=mappred$class)/length(testid) } knntesterr [1] What do you think?

16 Plot()


Download ppt "Peter Fox and Greg Hughes Data Analytics – ITWS-4600/ITWS-6600"

Similar presentations


Ads by Google