Presentation is loading. Please wait.

Presentation is loading. Please wait.

Practice Project Overview

Similar presentations


Presentation on theme: "Practice Project Overview"— Presentation transcript:

1 Practice Project Overview
CSCE 4143: Date Mining Yueyang Wang Spring 2019

2 Data: Adult dataset

3 Description of dataset
Figure1: Boxplots of numeric attributes Online Source:

4 Data Preprocessing: Remove records with unknown (
Data Preprocessing: Remove records with unknown (?) values from both train and test data sets

5 Data Preprocessing: Remove all continuous attributes

6 Q1.a Build a decision tree classifier (single tree) and report accuracy by class including (TP rate, FP rate, precision, recall, F1) on the test data. Apply Weka

7 Q1.a Build a decision tree classifier (single tree) and report accuracy by class including (TP rate, FP rate, precision, recall, F1) on the test data. Use Scikit-Learn

8 Q1.b  Build a naïve Bayesian classifier and report accuracy by class including (TP rate, FP rate, precision, recall, F1) on the test data. Apply Weka

9 Q1.b  Build a naïve Bayesian classifier and report accuracy by class including (TP rate, FP rate, precision, recall, F1) on the test data. Use ScikitLearn

10 Data Preprocessing: Use one-hot encoding to transform multi-domain categorical attribute
Apply Weka

11 Data Preprocessing: Use one-hot encoding to transform multi-domain categorical attribute

12 Data Preprocessing:  For each numerical attribute, use the mean value to transform into binary attribute Use Python

13 Q2.a Build k-means clustering algorithm over train data with varied k values (3, 5, 10) based on your chosen distance function and report the centroids of the clusters K=3 K=5 K=10

14 Q2.a Build k-means clustering algorithm over train data with varied k values (3, 5, 10) based on your chosen distance function and report the centroids of the clusters Transform income (<=50K, >50K) to binary

15 Q2.b Use the last 10 records from test data and use kNN algorithm (with varied k values, 3, 5, 10)  to report the prediction accuracy. K=3 K=5 K=10

16 Q2.b Use the last 10 records from test data and use kNN algorithm (with varied k values, 3, 5, 10)  to report the prediction accuracy.

17 Q3. Use the train datasets from step 2, build a SVM classifier and report the predicted accuracy of the test data.  Apply Weka

18 Q3. Use the train datasets from step 2, build a SVM classifier and report the predicted accuracy of the test data. 

19 Q4. Use the train datasets from step 2, build a neural network classifier and report the predicted accuracy of the test data. Apply Weka

20 Q4. Use the train datasets from step 2, build a neural network classifier and report the predicted accuracy of the test data.

21 Questions?


Download ppt "Practice Project Overview"

Similar presentations


Ads by Google