Download presentation
Presentation is loading. Please wait.
1
Practice Project Overview
CSCE 4143: Date Mining Yueyang Wang Spring 2019
2
Data: Adult dataset
3
Description of dataset
Figure1: Boxplots of numeric attributes Online Source:
4
Data Preprocessing: Remove records with unknown (
Data Preprocessing: Remove records with unknown (?) values from both train and test data sets
5
Data Preprocessing: Remove all continuous attributes
6
Q1.a Build a decision tree classifier (single tree) and report accuracy by class including (TP rate, FP rate, precision, recall, F1) on the test data. Apply Weka
7
Q1.a Build a decision tree classifier (single tree) and report accuracy by class including (TP rate, FP rate, precision, recall, F1) on the test data. Use Scikit-Learn
8
Q1.b Build a naïve Bayesian classifier and report accuracy by class including (TP rate, FP rate, precision, recall, F1) on the test data. Apply Weka
9
Q1.b Build a naïve Bayesian classifier and report accuracy by class including (TP rate, FP rate, precision, recall, F1) on the test data. Use ScikitLearn
10
Data Preprocessing: Use one-hot encoding to transform multi-domain categorical attribute
Apply Weka
11
Data Preprocessing: Use one-hot encoding to transform multi-domain categorical attribute
12
Data Preprocessing: For each numerical attribute, use the mean value to transform into binary attribute Use Python
13
Q2.a Build k-means clustering algorithm over train data with varied k values (3, 5, 10) based on your chosen distance function and report the centroids of the clusters K=3 K=5 K=10
14
Q2.a Build k-means clustering algorithm over train data with varied k values (3, 5, 10) based on your chosen distance function and report the centroids of the clusters Transform income (<=50K, >50K) to binary
15
Q2.b Use the last 10 records from test data and use kNN algorithm (with varied k values, 3, 5, 10) to report the prediction accuracy. K=3 K=5 K=10
16
Q2.b Use the last 10 records from test data and use kNN algorithm (with varied k values, 3, 5, 10) to report the prediction accuracy.
17
Q3. Use the train datasets from step 2, build a SVM classifier and report the predicted accuracy of the test data. Apply Weka
18
Q3. Use the train datasets from step 2, build a SVM classifier and report the predicted accuracy of the test data.
19
Q4. Use the train datasets from step 2, build a neural network classifier and report the predicted accuracy of the test data. Apply Weka
20
Q4. Use the train datasets from step 2, build a neural network classifier and report the predicted accuracy of the test data.
21
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.