Presentation is loading. Please wait.

Presentation is loading. Please wait.

Submit Predictions Statistics & Analysis Data Management Hypotheses Goal Get Data Predict whom survived the Titanic Disaster Score = Number of Passengers.

Similar presentations


Presentation on theme: "Submit Predictions Statistics & Analysis Data Management Hypotheses Goal Get Data Predict whom survived the Titanic Disaster Score = Number of Passengers."— Presentation transcript:

1

2 Submit Predictions Statistics & Analysis Data Management Hypotheses Goal Get Data Predict whom survived the Titanic Disaster Score = Number of Passengers in Test Dataset Correctly Predict Passenger’s Fate

3 Training and Test Data Training Data N=891 39% Survived Test Data N=418 All Titanic Passengers N= 2,223 Develop Model How similar is the Test Data to the Training Data? If Similar, then model should do well. If Differenet, then model could perform poorly.

4 Kitchen Sink Over-Fitting?

5 Decision Tree Pruning model.6 <- rpart(survived ~ sex + age + pclass + sibsp + parch + fare + embarked, data = train_data, maxdepth=2)

6 Hold Out and Cross-Validation

7 Random Forest: Multiple Trees

8 Confusion Matrix 01%Err 02434315% 13812224% 44618% RandomForestGenderDecision Tree 01%Err 02503613% 15210833% 44620% 01%Err 02434315% 15210833% 44621% False Positives False Negatives

9 Model Ceiling 320 418 340 Gender Model Seems Realistic

10 survivedpclassNamesexagesibspparchticketFarecabinembarked 12Louch, Mrs. Charles Alexander (Alice Adelaide Slow)female4210 SC/AH 308526S 02Carter, Mrs. Ernest Courtenay (Lilian Hughes)female441024425226S 13Asplund, Miss. Lillian Gertrudfemale54234707731.3875S 03Andersson, Miss. Ebba Iris Alfridafemale64234708231.275S 11Bjornstrom-Steffansson, Mr. Mauritz Hakanmale280011056426.55C52S 01Long, Mr. Milton Clydemale290011350130D6S 11Simonius-Blumer, Col. Oberst Alfonsmale56001321335.5A26C 01Smith, Mr. James Clinchmale56001776430.6958A7C Why a Model Ceiling? Below are 4 pairs of passengers with very similar Predictor Variables; Yet, within each pair, one survived, and the other did not. At some point there just isn’t the data / variable to help make an accurate prediction.


Download ppt "Submit Predictions Statistics & Analysis Data Management Hypotheses Goal Get Data Predict whom survived the Titanic Disaster Score = Number of Passengers."

Similar presentations


Ads by Google