Group 1 Lab 2 exercises and Assignment 2

Slides:

Advertisements

Similar presentations

Topic 9: Remedies.

Advertisements

Exercise 1 In the ISwR data set alkfos, do a PCA of the placebo and Tamoxifen groups separately, then together. Plot the first two principal components.

CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:

1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 4a, February 11, 2014, SAGE 3101 Introduction to Analytic Methods, Types of Data Mining for Analytics.

Multivariate Methods Pattern Recognition and Hypothesis Testing.

© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.

1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 3b, February 7, 2014 Lab exercises: datasets and data infrastructure.

B.Ramamurthy. Data Analytics (Data Science) EDA Data Intuition/ understand ing Big-data analytics StatsAlgs Discoveries / intelligence Statistical Inference.

1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 2b, February 6, 2015 Lab exercises: beginning to work with data: filtering, distributions, populations,

1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.

1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 4b, February 14, 2014 Lab exercises: regression, kNN and K-means.

1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 3a, February 10, 2015 Introduction to Analytic Methods, Types of Data Mining for Analytics.

1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.

1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 4b, February 20, 2015 Lab: regression, kNN and K- means results, interpreting and evaluating models.

Inference for Regression Chapter 14. Linear Regression We can use least squares regression to estimate the linear relationship between two quantitative.

Regression Analysis Week 8 DIAGNOSTIC AND REMEDIAL MEASURES Residuals The main purpose examining residuals Diagnostic for Residuals Test involving residuals.

Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.

1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5b, February 21, 2014 Interpreting regression, kNN and K-means results, evaluating models.

1 Peter Fox Data Analytics – ITWS-4600/ITWS-6600 Week 2b, February 5, 2016 Lab exercises: beginning to work with data: filtering, distributions, populations,

1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 2a, February 2, 2016, LALLY 102 Data and Information Resources, Role of Hypothesis, Exploration and.

1 Peter Fox Data Analytics – ITWS-4600/ITWS-6600 Week 3b, February 12, 2016 Lab exercises /assignment 2.

11.1 Heteroskedasticity: Nature and Detection Aims and Learning Objectives By the end of this session students should be able to: Explain the nature.

Review of Hypothesis Testing: –see Figures 7.3 & 7.4 on page 239 for an important issue in testing the hypothesis that  =20. There are two types of error.

Data Analytics – ITWS-4963/ITWS-6965

Peter Fox and Greg Hughes Data Analytics – ITWS-4600/ITWS-6600

Data Analytics – ITWS-4600/ITWS-6600

Non-parametric test ordinal data

Course Review Questions will not be all on one topic, i.e. questions may have parts covering more than one area.

Lab exercises: beginning to work with data: filtering, distributions, populations, significance testing… Peter Fox and Greg Hughes Data Analytics – ITWS-4600/ITWS-6600.

Group 1 Lab 2 exercises /assignment 2

Assumption of normality

Classification, Clustering and Bayes…

Data Analytics – ITWS-4963/ITWS-6965

One sample t-test and z-test

PSYCH 625 Education on your terms/snaptutorial.com.

Data Analytics – ITWS-4600/ITWS-6600/MATP-4450

Data Analytics – ITWS-4600/ITWS-6600/MATP-4450

Peter Fox and Greg Hughes Data Analytics – ITWS-4600/ITWS-6600

Data Analytics – ITWS-4600/ITWS-6600/MATP-4450

Analgesic study with three treatments crossed with gender.

Data Analytics – ITWS-4600/ITWS-6600/MATP-4450

Weighted kNN, clustering, “early” trees and Bayesian

Chapter 12 Inference on the Least-squares Regression Line; ANOVA

The Science of Predicting Outcome

Assessing Normality and Data Transformations

Classification and clustering - interpreting and exploring data

Classification, Clustering and Bayes…

Assignment 2 (in lab) Peter Fox and Greg Hughes

Local Regression, LDA, and Mixed Model Lab

ITWS-4600/ITWS-6600/MATP-4450/CSCI-4960

Labs: Trees, Dimension Reduction, Multi-dimensional Scaling, SVM

Lab weighted kNN, decision trees, random forest (“cross-validation” built in – more labs on it later in the course) Peter Fox and Greg Hughes Data Analytics.

CHAPTER 12 More About Regression

ITWS-4600/ITWS-6600/MATP-4450/CSCI-4960

Cross-validation Brenda Thomson/ Peter Fox Data Analytics

Cross-validation and Local Regression Lab

Cross-validation and Local Regression Lab

Chapter 14 Inference for Regression

Classification, Clustering and Bayes…

Local Regression, LDA, and Mixed Model Lab

Inference for Regression

Data Analytics – ITWS-4600/ITWS-6600/MATP-4450

ITWS-4600/ITWS-6600/MATP-4450/CSCI-4960

Group 1 Lab 2 exercises and Assignment 2

8/22/2019 Exercise 1 In the ISwR data set alkfos, do a PCA of the placebo and Tamoxifen groups separately, then together. Plot the first two principal.

Peter Fox Data Analytics ITWS-4600/ITWS-6600/MATP-4450/CSCI-4960

Problem 3.26, when assumptions are violated

Presentation transcript:

Group 1 Lab 2 exercises and Assignment 2 Peter Fox Data Analytics – ITWS-4600/ITWS-6600/MATP-4450 Group 1, Lab 2, February 1, 2018

Labs 2a. Regression 2b. kNN 2c. Kmeans Do all three for Assignment 2 New multivariate dataset 2b. kNN New Abalone dataset 2c. Kmeans (Sort of) New Iris dataset Do all three for Assignment 2 And then general exercises

The Dataset(s) http://aquarius.tw.rpi.edu/html/DA See slides: DataAnalytics2018_Assignment_2.pptx on LMS or https://tw.rpi.edu/web/Courses/DataAnalytics/2018 (under week 3) Code fragments, i.e. they will not run as-is, on the following slides as group1/lab2_knn1.R, etc.

Remember a few useful cmds head(<object>) tail(<object>) summary(<object>)

Regression Exercises Using the EPI (under /EPI on web) dataset find the single most important factor in increasing the EPI in a given region Examine distributions down to the leaf nodes and build up an EPI “model”

boxplot(ENVHEALTH,ECOSYSTEM)

qqplot(ENVHEALTH,ECOSYSTEM)

ENVHEALTH/ ECOSYSTEM > shapiro.test(ENVHEALTH) Shapiro-Wilk normality test data: ENVHEALTH W = 0.9161, p-value = 1.083e-08 ------- Reject. > shapiro.test(ECOSYSTEM) data: ECOSYSTEM W = 0.9813, p-value = 0.02654 ----- ~reject

Kolmogorov- Smirnov - KS test - > ks.test(ENVHEALTH,ECOSYSTEM) Two-sample Kolmogorov-Smirnov test data: ENVHEALTH and ECOSYSTEM D = 0.2965, p-value = 5.413e-07 alternative hypothesis: two-sided Warning message: In ks.test(ENVHEALTH, ECOSYSTEM) : p-value will be approximate in the presence of ties

Linear and least-squares > EPI_data <- read.csv(”EPI_data.csv") > attach(EPI_data); > boxplot(ENVHEALTH,DALY,AIR_H,WATER_H) > lmENVH<-lm(ENVHEALTH~DALY+AIR_H+WATER_H) > lmENVH … (what should you get?) > summary(lmENVH) … > cENVH<-coef(lmENVH)

Predict > DALYNEW<-c(seq(5,95,5)) > AIR_HNEW<-c(seq(5,95,5)) > WATER_HNEW<-c(seq(5,95,5)) > NEW<-data.frame(DALYNEW,AIR_HNEW,WATER_HNEW) > pENV<- predict(lmENVH,NEW,interval=“prediction”) > cENV<- predict(lmENVH,NEW,interval=“confidence”)

Repeat for AIR_E CLIMATE

Classification Exercises (group1/lab2_knn1.R) > nyt1<-read.csv(“nyt1.csv") > nyt1<-nyt1[which(nyt1$Impressions>0 & nyt1$Clicks>0 & nyt1$Age>0),] > nnyt1<-dim(nyt1)[1] # shrink it down! > sampling.rate=0.9 > num.test.set.labels=nnyt1*(1.-sampling.rate) > training <-sample(1:nnyt1,sampling.rate*nnyt1, replace=FALSE) > train<-subset(nyt1[training,],select=c(Age,Impressions)) > testing<-setdiff(1:nnyt1,training) > test<-subset(nyt1[testing,],select=c(Age,Impressions)) > cg<-nyt1$Gender[training] > true.labels<-nyt1$Gender[testing] > classif<-knn(train,test,cg,k=5) # > classif > attributes(.Last.value) # interpretation to come!

Classification Exercises (group1/lab2_knn2.R) 2 examples in the script

Clustering Exercises group1/lab2_kmeans1.R group1/lab2_kmeans2.R – plotting up results from the iris clustering