Predict whom survived the Titanic Disaster

Slides:



Advertisements
Similar presentations
Titanic Analytic model to predict survival in Titanic Disaster. By,
Advertisements

Treatment Forests Identifying Subgroups of Enhanced Treatment Effect Using Random Forests Padraic G. Neville Fairport, NY SAS Cary, NC.
The Scientific Method Essential Questions:
Submit Predictions Statistics & Analysis Data Management Hypotheses Goal Get Data Predict whom survived the Titanic Disaster Score = Number of Passengers.
ANOVA: Analysis of Variance
Decision Tree under MapReduce Week 14 Part II. Decision Tree.
QM Spring 2002 Business Statistics SPSS: A Summary & Review.
Logistic regression Who survived Titanic?.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
Finding Data for Quantitative Analysis Lecture 11.
Multiple Regression – Basic Relationships
Hypothesis Testing :The Difference between two population mean :
Survival analysis. First example of the day Small cell lungcanser Meadian survival time: 8-10 months 2-year survival is 10% New treatment showed median.
Hypothesis Testing. Outline The Null Hypothesis The Null Hypothesis Type I and Type II Error Type I and Type II Error Using Statistics to test the Null.
A Presentation on the Implementation of Decision Trees in Matlab
Machine Learning Chapter 3. Decision Tree Learning
Copyright © 2008, SAS Institute Inc. All rights reserved. RMS Titanic: Using SAS Enterprise Guide To Report On A Tragedy Matt Malczewski, SAS Canada.
SIMPLE TWO GROUP TESTS Prof Peter T Donnan Prof Peter T Donnan.
S519: Evaluation of Information Systems Social Statistics Inferential Statistics Chapter 10: t test.
A Few Handful Many Time Stamps One Time Snapshot Many Time Series Number of Variables Mobile Phone Galton Height Census Titanic Survivors Stock Market.
Inference and Inferential Statistics Methods of Educational Research EDU 660.
1 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה פרופ’ בנימין רייזר פרופ’ דוד פרג’י גב’ אפרת ישכיל.
Chapter 2: Logistic Regression 2.1 Likelihood Approach 2.2 Binary Logistic Regression 2.3 Nominal and Ordinal Logistic Regression Models 1.
Business Intelligence and Decision Modeling Week 9 Customer Profiling Decision Trees (Part 2) CHAID CRT.
Tree Analysis – A Method for Constructing Edit Groups Work Session on Statistical Data Editing Oslo, Norway, September 2012 By Anders Norberg, Statistics.
Week 8: Exploring a new dataset and Chi-square..  Means, SDs and z-scores problem sheet.  Deadline for coursework.
APPLICATION OF DATAMINING TOOL FOR CLASSIFICATION OF ORGANIZATIONAL CHANGE EXPECTATION Şule ÖZMEN Serra YURTKORU Beril SİPAHİ.
Lab 9: Two Group Comparisons. Today’s Activities - Evaluating and interpreting differences across groups – Effect sizes Gender differences examples Class.
Research Question What determines a person’s height?
Titanic: Machine Learning from Disaster
Submit Predictions Statistics & Analysis Data Management Hypotheses Goal Get Data Predict whom survived the Titanic Disaster.
CIS671-Knowledge Discovery and Data Mining Vasileios Megalooikonomou Dept. of Computer and Information Sciences Temple University AI reminders (based on.
APPLIED DATA ANALYSIS IN CRIMINAL JUSTICE CJ 525 MONMOUTH UNIVERSITY Juan P. Rodriguez.
Logan And Aidan's Presentation
Lecturer’s desk INTEGRATED LEARNING CENTER ILC 120 Screen Row A Row B Row C Row D Row E Row F Row G Row.
Basic Data Mining Techniques Chapter 3-A. 3.1 Decision Trees.
FCI Supplement What determines FCI scores?. Explore FCI Dataset Descriptive Statistics Histograms Correlations Factor Analysis?
Beginners statistics Assoc Prof Terry Haines. 5 simple steps 1.Understand the type of measurement you are dealing with 2.Understand the type of question.
(Slides not created solely by me – the internet is a wonderful tool) SW388R7 Data Analysis & Compute rs II Slide 1.
The goal of the project is to predict the survival of passengers based off a set of data. To do this we train a prediction system.
Classification Tree Interaction Detection. Use of decision trees Segmentation Stratification Prediction Data reduction and variable screening Interaction.
Descriptive and Inferential Statistics Descriptive Statistics – consists of the collection, organization, and overall summery of the data presented. Inferential.
Titanic and Decision Trees Supplement. Titanic Predictions and Decision Trees Variable Selection Approaches – Hypothesis Driven – Data Driven – Kitchen.
GROUP GOAL Learn and understand python programing language Libraries: Pandas Numpy SKlearn Use machine learning algorithms Decision trees Random Forests.
BUS 362 Marketing Research SPSS Exam Prep Fall’16
Hypothesis Testing.
TeamMember1 TeamMember2 Machine Learning Project 2016/2017-2
AP Statistics Chapter 3 Part 3
Group 7 • Shing • Gueye • Thakur
Simple Linear Regression
Finding Answers through Data Collection
SPSS STATISTICAL PACKAGE FOR SOCIAL SCIENCES
Relations in Categorical Data
Machine Learning Chapter 3. Decision Tree Learning
Data Mining – Chapter 3 Classification
Regression Model Building
Application of Logistic Regression Model to Titanic Data
Machine Learning Chapter 3. Decision Tree Learning
Advanced Artificial Intelligence Classification
Classification with CART
Welcome everyone. Been to good sessions, exciting ones coming up.
Many coffee shop and many people
Displaying and Describing Categorical data
Decision trees MARIO REGIN.
Learning outcomes By the end of this session you should know about:
Exercise 1: Entering data into SPSS
CLASS 6 CLASS 7 Tutorial 2 (EXCEL version)
Chapter 13: Using Statistics
Statistics Review (It’s not so scary).
Exploratory Analysis Report
Presentation transcript:

Predict whom survived the Titanic Disaster Goal Predict whom survived the Titanic Disaster Hypotheses Woman and Children First Get Data Read dataset into Excel, R, etc Data Management Some Age Missing Data, Analyze Gender Only Statistics & Analysis 74% Women, 19% Men Submit Predictions 320 / 418 = 76.5%

Predictor Variables Variable Description Type Hypothesis pclass Passenger Class Categorical, Ordinal 1st class 3rd name Name Text Sex Categorical age Age Numeric sibsp Number of Siblings/Spouses Aboard Integer parch Number of Parents/Children Aboard ticket Ticket Number fare Passenger Fare cabin Cabin embarked Port of Embarkation

Age All N = 891 Data N = 714 Missing N = 177

Decision Trees Survived Age Lesser Than X Age Greater Than X Dependent variable, (Y) Continuous Categorical Independent variables, (X’s) Continuous Categorical A decision tree can: Serve as a model (e.g. create rules) Make prediction Segment the data The Decision Tree looks for split on sample at the node that can lead to the most differentiation on Y

Age

Decision Trees maximize data likelihood (minimize deviance).

Prediction and Missing Values Correlation, Association of Age with other Variables? Variable Description pclass Passenger Class name Name Sex age Age sibsp Number of Siblings/Spouses Aboard parch Number of Parents/Children Aboard ticket Ticket Number fare Passenger Fare cabin Cabin embarked Port of Embarkation

Predict whom survived the Titanic Disaster Goal Predict whom survived the Titanic Disaster Hypotheses Woman and Children First Get Data Read dataset into Excel, R, etc Data Management Some Age Missing Data, Analyze Gender Only Statistics & Analysis 74% Women, 19% Men Submit Predictions 320 / 418 = 76.5%

Gender

Gender and Age Tree grows based on optimizing only the split from the current node rather then optimizing the entire tree Tree stops when further split becomes ineffective

Prediction: Gender + Age

Predict whom survived the Titanic Disaster Goal Predict whom survived the Titanic Disaster Hypotheses Woman and Children First Get Data Read dataset into Excel, R, etc Data Management Some Age Missing Data, Analyze Gender Only Statistics & Analysis Submit Predictions

Predict whom survived the Titanic Disaster Goal Predict whom survived the Titanic Disaster Hypotheses Woman and Children First Get Data Read dataset into Excel, R, etc Data Management Age + Gender Statistics & Analysis Submit Predictions

Kitchen Sink

Kitchen Sink