Titanic and Decision Trees Supplement
Titanic Predictions and Decision Trees Variable Selection Approaches – Hypothesis Driven – Data Driven – Kitchen Sink Algorithms – Proportions / Pivot Tables – Decision Trees Model Evaluations
VariableDescriptionTypeHyp Drive Data Driven Kitchen Sink pclassPassenger ClassCategoricalYes nameNameText Sex CategoricalYes ageAgeNumericYes sibspSiblings/ SpousesIntegerYes parchParents/ ChildrenIntegerYes ticketTicket NumberText farePassenger FareNumericYes cabinCabinText embarkedPort of EmbarkationCategoricalYes Predictor Variables Many factors were brainstormed – several that were beyond what is available in the data set
Number of Variables Analyzed Pivot Tables Predictive Modeling Correlation Matrices Regression Factor Analysis Histograms Applied Stats Cluster Analysis Decision Trees Types of Analysis Analytic Toolbox
Predict whom survived the Titanic Disaster Kaggle Submission Pivot Tables Correlation Matrices Logistic Regression? Factor Analysis? Histograms Cluster Analysis? Decision Trees Hyp Drive Data Driven Kitchen Sink Which Variables have the Highest Correlation for Survivia? pclass name Sex age sibsp parch embarked fare
Predict whom survived the Titanic Disaster Woman and Children First Read dataset into Excel, R, etc Kaggle Submission: 320 / 418 = 76.5% correct Hypothesis Driven: Gender Only Analyze Gender Only We have two categorical variables, therefore a pivot table works well
Hypothesis Driven: Pclass Only Predict whom survived the Titanic Disaster People on Lower Decks Less Likely to Survived Read dataset into Excel, R, etc Pclass
Hypothesis Driven: Pclass Only Predict whom survived the Titanic Disaster People on Lower Decks Less Likely to Survived Read dataset into Excel, R, etc
Hypothesis Driven: Age Predict whom survived the Titanic Disaster Woman and Children First Read dataset into Excel, R, etc Age has Missing Data
Survived: Categorical Variable Age: Continuous Variable Hypothesis Driven: Age Only How do we visualize and analyze age vs. survived?
Hypothesis Driven: Univariate Summary
Model Summary Variable SexX Age Pclass Name sibsp fare parch embarked Analytics Pivot TablesX Scatterplots Decision Trees Kaggle Score76.5