Presentation is loading. Please wait.

Presentation is loading. Please wait.

TeamMember1 TeamMember2 Machine Learning Project 2016/2017-2

Similar presentations


Presentation on theme: "TeamMember1 TeamMember2 Machine Learning Project 2016/2017-2"— Presentation transcript:

1 TeamMember1 TeamMember2 Machine Learning Project 2016/2017-2
MyTeam TeamMember1 TeamMember2 Machine Learning Project 2016/2017-2

2 Titanic Kaggle Challenge
Task What sorts of people were likely to survive? Binary classification Database Instances are people Basic features: Ticket class, Sex, Age, #siblings, #parents, Ticket number, Passanger fare, Port of embarkation Tools python and scikit-learn

3 Evaluation methodology
75/25 train/eval split: Train set: 891 instances Evaluation set: 419 instances Evaluation metric: accuracy (ratio of correctly classified instances)

4 Baseline Most frequent class classifier: 64.13% Baseline system:
NaiveBayes classifier Given features as they are accuracy of 78.48%

5 Feature Extraction (TeamMember1)
Split level and number of cabin Pl. B45 -> B, 45 (2 features) Missing age attributes Replacement with mean Mean grouped by sex and ticket type Bins for age child < 18 <= adult < 60 <= senior Title as new feature Dr, Mr, Mrs, stb.

6 Extra techniques (TeamMember1)
Feature selection keep only k-best features evaluation of features: Chi2 PMI

7 Results -- Features (TeamMember1)

8 Results – Extra techniques (TeamMember1)

9 Machine Learning models (TeamMember2)
NaiveBayes SVM Neural Nets (3-layer) C 4.5 RandomForestClassifier K-NN GridSearch for parameter-tuning for each of the classifiers

10 Extra techniques (TeamMember2)
Dimension reduction PCA SVD t-SNE

11 Results – Models (TeamMember2)

12 Results – Extra techniques (TeamMember2)

13 Results – class level evaluation (TeamMember2)
Best system features: … t-SNE to 6 dimensions Random Forest accuracy of 85.2% Class Precision Recall F1 0.78 0.92 0.85 1 0.80 0.54 0.64

14 Conclusions Best features Best classifier
cabin level title Best classifier RandomForest Feature selection and dimension reduction could further help 78.5% -> 85.2%


Download ppt "TeamMember1 TeamMember2 Machine Learning Project 2016/2017-2"

Similar presentations


Ads by Google