Download presentation
Presentation is loading. Please wait.
Published byNorma Page Modified over 6 years ago
1
TeamMember1 TeamMember2 Machine Learning Project 2016/2017-2
MyTeam TeamMember1 TeamMember2 Machine Learning Project 2016/2017-2
2
Titanic Kaggle Challenge
Task What sorts of people were likely to survive? Binary classification Database Instances are people Basic features: Ticket class, Sex, Age, #siblings, #parents, Ticket number, Passanger fare, Port of embarkation Tools python and scikit-learn
3
Evaluation methodology
75/25 train/eval split: Train set: 891 instances Evaluation set: 419 instances Evaluation metric: accuracy (ratio of correctly classified instances)
4
Baseline Most frequent class classifier: 64.13% Baseline system:
NaiveBayes classifier Given features as they are accuracy of 78.48%
5
Feature Extraction (TeamMember1)
Split level and number of cabin Pl. B45 -> B, 45 (2 features) Missing age attributes Replacement with mean Mean grouped by sex and ticket type Bins for age child < 18 <= adult < 60 <= senior Title as new feature Dr, Mr, Mrs, stb.
6
Extra techniques (TeamMember1)
Feature selection keep only k-best features evaluation of features: Chi2 PMI
7
Results -- Features (TeamMember1)
8
Results – Extra techniques (TeamMember1)
9
Machine Learning models (TeamMember2)
NaiveBayes SVM Neural Nets (3-layer) C 4.5 RandomForestClassifier K-NN GridSearch for parameter-tuning for each of the classifiers
10
Extra techniques (TeamMember2)
Dimension reduction PCA SVD t-SNE
11
Results – Models (TeamMember2)
12
Results – Extra techniques (TeamMember2)
13
Results – class level evaluation (TeamMember2)
Best system features: … t-SNE to 6 dimensions Random Forest accuracy of 85.2% Class Precision Recall F1 0.78 0.92 0.85 1 0.80 0.54 0.64
14
Conclusions Best features Best classifier
cabin level title Best classifier RandomForest Feature selection and dimension reduction could further help 78.5% -> 85.2%
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.