Download presentation
1
Logistic regression Who survived Titanic?
2
The sinking of Titanic Titanic sank April 14th 1912 with 2228 souls 705 survived. A dataset of 1309 passengers survived. Who survived?
3
The data Sibsp is the number of siblings and/or spouses accompanying
pclass survived name sex age sibsp parch 1 Allen, Miss. Elisabeth Walton female 29 Allison, Master. Hudson Trevor male 0.9167 2 Allison, Miss. Helen Loraine Allison, Mr. Hudson Joshua Creighton 30 Allison, Mrs. Hudson J C (Bessie Waldo Daniels) 25 Anderson, Mr. Harry 48 Andrews, Miss. Kornelia Theodosia 63 Andrews, Mr. Thomas Jr 39 Appleton, Mrs. Edward Dale (Charlotte Lamson) 53 Sibsp is the number of siblings and/or spouses accompanying Parsc is the number of parents and/or children accompanying Some values are missing Can we predict who will survive titanic II?
4
Analyzing the data in a (too) simple manner
Associations between factors without considering interactions
5
Analyzing the data in a (too) simple manner
Associations between factors without considering interactions
6
Analyzing the data in a (too) simple manner
Associations between factors without considering interactions
7
Analyzing the data in a (too) simple manner
Associations between factors without considering interactions
8
Analyzing the data in a (too) simple manner
Associations between factors without considering interactions
9
Could we use multiple linear regression to predict survival?
Logistic regression Response variable is defined between –inf and +inf Response variable is defined between 0 and 1 Normal distributed Bernoulli distributed
10
Logit transformation is modeled linearly
The logistic function
11
The sigmodal curve
12
The sigmodal curve The intercept basically just ‘scale’ the input variable
13
The sigmodal curve The intercept basically just ‘scale’ the input variable Large regression coefficient → risk factor strongly influences the probability
14
The sigmodal curve The intercept basically just ‘scale’ the input variable Large regression coefficient → risk factor strongly influences the probability Positive regression coefficient → risk factor increases the probability
15
Logistic regression of the Titanic data
16
Logistic regression of the Titanic data
Summary of data Coding of the dependent variable Coding of the categorical explanatory variable: First class: 1 Second class: 2 Third class: reference
17
Logistic regression of the Titanic data
A fit of the null-model, basically just the intercept. Usually not interesting The total probability of survival is 500/1309 = Cutoff is 0.5 so all are classified as non-survivers. Basically tests if the null-model is sufficient. It almost certainly is not. Shows that survival is related to pclass (which is not in the null-model)
18
Logistic regression of the Titanic data
Omnibus test: Uses LR to describe if the adding the pclass variable to the model makes it better. It did! But better than the null-model, so no surprise. Model Summary. Other measures of the goodness of fit. Classification table: By including pclass 67.7 passengers were correctly categorized. Variables in the equation: first line repeats that pclass has a significant effect on survival. B is the logistic fittet parameter. Exp(B) is the odds rations, so the odds of survival is 4.7 ( ) times higher than passengers on third class (reference class)
19
Logistic regression of the Titanic data now adding family relations
‘3 or more’ is set as reference groups by SPSS
20
Logistic regression of the Titanic data now adding family relations
The model correctly classify 79.1% of the passengers
21
Logistic regression of the Titanic data now adding family relations
Basically all factors seems to affect the probability of survival.
22
How was it with age? Linear associations are easy to model, because the factor enters the predictive value directly. But it is not really look linear, maybe a third order polynomial? Three new factors for age is calculated: first, second, and third order of the age divided by the standard diviation.
23
How was it with age? The third-order age factor did not add significantly to the model. By adding third order polynomial the model can correctly categorize 79.4 vs 79.1 before. ParChild is no longer a significant factor and can be omitted from the model
24
Using the model to predict survival
Omitting the second and third order age and ParChild factors What is the probability that a 25 year old woman accompanied only by her husband holding a second class ticket would survive Titanic? z = -0.589*(-5)/14.41 +1.718 +2.552 =
25
Analysing interaction of selected factors
pclass * sex, age * sex, pclass * Siblings/Parents But the model does not converge…
26
Analysing interaction of selected factors
Collapsing the sibling/spouse number eradicated their mutual interaction
27
Is it realistic that Leonardo survives and the chick dies?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.