Download presentation
Presentation is loading. Please wait.
Published byDaniella Stephens Modified over 8 years ago
1
Titanic and Decision Trees Supplement
2
Titanic Predictions and Decision Trees Variable Selection Approaches – Hypothesis Driven – Data Driven – Kitchen Sink Algorithms – Proportions / Pivot Tables – Decision Trees Model Evaluations
3
VariableDescriptionTypeHyp Drive Data Driven Kitchen Sink pclassPassenger ClassCategoricalYes nameNameText Sex CategoricalYes ageAgeNumericYes sibspSiblings/ SpousesIntegerYes parchParents/ ChildrenIntegerYes ticketTicket NumberText farePassenger FareNumericYes cabinCabinText embarkedPort of EmbarkationCategoricalYes Predictor Variables Many factors were brainstormed – several that were beyond what is available in the data set
4
Number of Variables Analyzed Pivot Tables 6+ 5 4 3 2 1 Predictive Modeling Correlation Matrices Regression Factor Analysis Histograms Applied Stats Cluster Analysis Decision Trees Types of Analysis Analytic Toolbox
5
Predict whom survived the Titanic Disaster Kaggle Submission Pivot Tables Correlation Matrices Logistic Regression? Factor Analysis? Histograms Cluster Analysis? Decision Trees Hyp Drive Data Driven Kitchen Sink Which Variables have the Highest Correlation for Survivia? pclass name Sex age sibsp parch embarked fare
6
Predict whom survived the Titanic Disaster Woman and Children First Read dataset into Excel, R, etc Kaggle Submission: 320 / 418 = 76.5% correct Hypothesis Driven: Gender Only Analyze Gender Only We have two categorical variables, therefore a pivot table works well
7
Hypothesis Driven: Pclass Only Predict whom survived the Titanic Disaster People on Lower Decks Less Likely to Survived Read dataset into Excel, R, etc Pclass
8
Hypothesis Driven: Pclass Only Predict whom survived the Titanic Disaster People on Lower Decks Less Likely to Survived Read dataset into Excel, R, etc
9
Hypothesis Driven: Age Predict whom survived the Titanic Disaster Woman and Children First Read dataset into Excel, R, etc Age has Missing Data
10
Survived: Categorical Variable Age: Continuous Variable Hypothesis Driven: Age Only How do we visualize and analyze age vs. survived?
11
Hypothesis Driven: Univariate Summary
12
Model Summary Variable1234567 SexX Age Pclass Name sibsp fare parch embarked Analytics Pivot TablesX Scatterplots Decision Trees Kaggle Score76.5
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.