Predict whom survived the Titanic Disaster Goal Predict whom survived the Titanic Disaster Hypotheses Woman and Children First Get Data Read dataset into Excel, R, etc Data Management Some Age Missing Data, Analyze Gender Only Statistics & Analysis 74% Women, 19% Men Submit Predictions 320 / 418 = 76.5%
Predictor Variables Variable Description Type Hypothesis pclass Passenger Class Categorical, Ordinal 1st class 3rd name Name Text Sex Categorical age Age Numeric sibsp Number of Siblings/Spouses Aboard Integer parch Number of Parents/Children Aboard ticket Ticket Number fare Passenger Fare cabin Cabin embarked Port of Embarkation
Age All N = 891 Data N = 714 Missing N = 177
Decision Trees Survived Age Lesser Than X Age Greater Than X Dependent variable, (Y) Continuous Categorical Independent variables, (X’s) Continuous Categorical A decision tree can: Serve as a model (e.g. create rules) Make prediction Segment the data The Decision Tree looks for split on sample at the node that can lead to the most differentiation on Y
Age
Decision Trees maximize data likelihood (minimize deviance).
Prediction and Missing Values Correlation, Association of Age with other Variables? Variable Description pclass Passenger Class name Name Sex age Age sibsp Number of Siblings/Spouses Aboard parch Number of Parents/Children Aboard ticket Ticket Number fare Passenger Fare cabin Cabin embarked Port of Embarkation
Predict whom survived the Titanic Disaster Goal Predict whom survived the Titanic Disaster Hypotheses Woman and Children First Get Data Read dataset into Excel, R, etc Data Management Some Age Missing Data, Analyze Gender Only Statistics & Analysis 74% Women, 19% Men Submit Predictions 320 / 418 = 76.5%
Gender
Gender and Age Tree grows based on optimizing only the split from the current node rather then optimizing the entire tree Tree stops when further split becomes ineffective
Prediction: Gender + Age
Predict whom survived the Titanic Disaster Goal Predict whom survived the Titanic Disaster Hypotheses Woman and Children First Get Data Read dataset into Excel, R, etc Data Management Some Age Missing Data, Analyze Gender Only Statistics & Analysis Submit Predictions
Predict whom survived the Titanic Disaster Goal Predict whom survived the Titanic Disaster Hypotheses Woman and Children First Get Data Read dataset into Excel, R, etc Data Management Age + Gender Statistics & Analysis Submit Predictions
Kitchen Sink
Kitchen Sink