Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Analytics – ITWS-4600/ITWS-6600/MATP-4450

Similar presentations


Presentation on theme: "Data Analytics – ITWS-4600/ITWS-6600/MATP-4450"— Presentation transcript:

1 Data Analytics – ITWS-4600/ITWS-6600/MATP-4450
Assignment 2 (in lab) Peter Fox Data Analytics – ITWS-4600/ITWS-6600/MATP-4450 Week 3, February 1, 2018

2 Lab consists of Regression kNN Kmeans Need all three for Assignment 2
New multivariate dataset kNN New Abalone dataset Kmeans (Sort of) New Iris dataset Need all three for Assignment 2

3 The Dataset(s) http://aquarius.tw.rpi.edu/html/DA
Two new ones; dataset_multipleRegression.csv, abalone.csv And …. Visit this link: Code fragments, i.e. they will not run as-is, on the following slides as Lab2_knn1.R, etc.

4 How does this work? Following slides have 3 tasks for you to complete: individually Once you complete (one or all), please raise your hand or approach me, or Akshay to review what you obtained (10% of grade) There is nothing to hand in* if you complete it today *If you do not complete part/all today that is okay but you will need to schedule a time to show your results or submit via LMS = screen shots and descriptions

5 Regression (1) Retrieve this dataset: dataset_multipleRegression.csv
Using the unemployment rate (UNEM) and number of spring high school graduates (HGRAD), predict the fall enrollment (ROLL) for this year by knowing that UNEM=9% and HGRAD=100,000. Repeat and add per capita income (INC) to the model. Predict ROLL if INC=$30,000 Summarize and compare the two models. Comment on significance

6 Classification (2) Retrieve the abalone.csv dataset
Predicting the age of abalone from physical measurements. The age of abalone is determined by cutting the shell through the cone, staining it, and counting the number of rings through a microscope: a boring and time-consuming task. Other measurements, which are easier to obtain, are used to predict the age. Perform knn classification to get predictors for Age (Rings). Interpretation not required.

7 Clustering (3) The Iris dataset (in R use data(“iris”) to load it)
The 5th column is the species and you want to find how many clusters without using that information Create a new data frame and remove the fifth column Apply kmeans (you choose k) with 1000 iterations Use table(iris[,5],<your clustering>) to assess your results


Download ppt "Data Analytics – ITWS-4600/ITWS-6600/MATP-4450"

Similar presentations


Ads by Google