Download presentation
Presentation is loading. Please wait.
Published byCuthbert Palmer Modified over 9 years ago
2
Submit Predictions Statistics & Analysis Data Management Hypotheses Goal Get Data Predict whom survived the Titanic Disaster Score = Number of Passengers in Test Dataset Correctly Predict Passenger’s Fate
3
Training and Test Data Training Data N=891 39% Survived Test Data N=418 All Titanic Passengers N= 2,223 Develop Model How similar is the Test Data to the Training Data? If Similar, then model should do well. If Differenet, then model could perform poorly.
4
Kitchen Sink Over-Fitting?
5
Decision Tree Pruning model.6 <- rpart(survived ~ sex + age + pclass + sibsp + parch + fare + embarked, data = train_data, maxdepth=2)
6
Hold Out and Cross-Validation
7
Random Forest: Multiple Trees
8
Confusion Matrix 01%Err 02434315% 13812224% 44618% RandomForestGenderDecision Tree 01%Err 02503613% 15210833% 44620% 01%Err 02434315% 15210833% 44621% False Positives False Negatives
9
Model Ceiling 320 418 340 Gender Model Seems Realistic
10
survivedpclassNamesexagesibspparchticketFarecabinembarked 12Louch, Mrs. Charles Alexander (Alice Adelaide Slow)female4210 SC/AH 308526S 02Carter, Mrs. Ernest Courtenay (Lilian Hughes)female441024425226S 13Asplund, Miss. Lillian Gertrudfemale54234707731.3875S 03Andersson, Miss. Ebba Iris Alfridafemale64234708231.275S 11Bjornstrom-Steffansson, Mr. Mauritz Hakanmale280011056426.55C52S 01Long, Mr. Milton Clydemale290011350130D6S 11Simonius-Blumer, Col. Oberst Alfonsmale56001321335.5A26C 01Smith, Mr. James Clinchmale56001776430.6958A7C Why a Model Ceiling? Below are 4 pairs of passengers with very similar Predictor Variables; Yet, within each pair, one survived, and the other did not. At some point there just isn’t the data / variable to help make an accurate prediction.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.