Generalized linear models

Slides:



Advertisements
Similar presentations
Continued Psy 524 Ainsworth
Advertisements

All Possible Regressions and Statistics for Comparing Models
Chapter 5 Multiple Linear Regression
Analysis of variance and statistical inference.
Statistical Techniques I EXST7005 Multiple Regression.
Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
HSRP 734: Advanced Statistical Methods July 24, 2008.
Multiple Linear Regression
Chapter 7 – Classification and Regression Trees
6.1.4 AIC, Model Selection, and the Correct Model oAny model is a simplification of reality oIf a model has relatively little bias, it tends to provide.
Multiple Logistic Regression RSQUARE, LACKFIT, SELECTION, and interactions.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Psychology 202b Advanced Psychological Statistics, II February 22, 2011.
Multiple Regression.
Lecture 11 Multivariate Regression A Case Study. Other topics: Multicollinearity  Assuming that all the regression assumptions hold how good are our.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Role and Place of Statistical Data Analysis and very simple applications Simplified diagram of a scientific research When you know the system: Estimation.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Assessing Survival: Cox Proportional Hazards Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
CHAPTER 5 REGRESSION Discovering Statistics Using SPSS.
Wednesday PM  Presentation of AM results  Multiple linear regression Simultaneous Simultaneous Stepwise Stepwise Hierarchical Hierarchical  Logistic.
Lecture 12 Model Building BMTRY 701 Biostatistical Methods II.
Assessing Survival: Cox Proportional Hazards Model
Entering Multidimensional Space: Multiple Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Then click the box for Normal probability plot. In the box labeled Standardized Residual Plots, first click the checkbox for Histogram, Multiple Linear.
Dundee Epidemiology and Biostatistics Unit Correlation and Regression Peter T. Donnan Professor of Epidemiology and Biostatistics.
Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Regression: Checking the Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Entering Multidimensional Space: Multiple Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
 Relationship between education level, income, and length of time out of school  Our new regression equation: is the predicted value of the dependent.
Multiple Logistic Regression STAT E-150 Statistical Methods.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Variable Selection 1 Chapter 8 Variable Selection Terry Dielman Applied Regression Analysis:
Regression. Outline of Today’s Discussion 1.Coefficient of Determination 2.Regression Analysis: Introduction 3.Regression Analysis: SPSS 4.Regression.
ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Tutorial I: Missing Value Analysis
© Galit Shmueli and Peter Bruce 2010 Chapter 6: Multiple Linear Regression Data Mining for Business Analytics Shmueli, Patel & Bruce.
Using SPSS Note: The use of another statistical package such as Minitab is similar to using SPSS.
Correlation and Linear Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Occam's razor: states that the explanation of any phenomenon should make as few assumptions as possible, eliminating those that make no difference to any.
2011 Data Mining Industrial & Information Systems Engineering Pilsung Kang Industrial & Information Systems Engineering Seoul National University of Science.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Stats Methods at IC Lecture 3: Regression.
Bootstrap and Model Validation
An Interactive Tutorial for SPSS 10.0 for Windows©
Chapter 15 Multiple Regression Model Building
BINARY LOGISTIC REGRESSION
Correlation, Bivariate Regression, and Multiple Regression
Notes on Logistic Regression
Logistic Regression CSC 600: Data Mining Class 14.
Chapter 9 Multiple Linear Regression
Statistics in MSmcDESPOT
Forward Selection The Forward selection procedure looks to add variables to the model. Once added, those variables stay in the model even if they become.
Multiple Regression.
How to handle missing data values
Chapter 6: Multiple Linear Regression
Model Selection In multiple regression we often have many explanatory variables. How do we find the “best” model?
Hypothesis testing and Estimation
Regression Model Building
Regression Model Building
10701 / Machine Learning Today: - Cross validation,
Linear Model Selection and regularization
Introduction to Logistic Regression
Lecture 12 Model Building
Incremental Partitioning of Variance (aka Hierarchical Regression)
Combined predictor Selection for Multiple Clinical Outcomes Using PHREG Grisell Diaz-Ramirez.
Lecture 20 Last Lecture: Effect of adding or deleting a variable
Multiple Regression – Split Sample Validation
Regression Analysis.
Time Series introduction in R - Iñaki Puigdollers
Presentation transcript:

Generalized linear models Unfortunately the standard REGRESSION in SPSS does not give these statistics so……. Need to use Analyze Generalized Linear Models…..

Generalized linear models. Default is linear Add Min LDL achieved as dependent as in REGRESSION in SPSS Next go to predictors…..

Generalized linear models: Predictors WARNING! Make sure you add the predictors in the correct box Categorical in FACTORS box Continuous in COVARIATES box

Generalized linear models: Model Add all factors and covariates in the model as main effects

Generalized Linear Models Parameter Estimates Note identical to REGRESSION output

Generalized Linear Models Goodness-of-fit Note output gives log likelihood and AIC = 2835 (AIC = -2x-1409.6 +2x7= 2835) Footnote explains smaller AIC is ‘better’

Let Science or Clinical factors guide selection: ‘Optimal’ model The log likelihood is a measure of GOODNESS-OF-FIT Seek ‘optimal’ model that maximises the log likelihood or minimises the AIC Model 2LL p AIC 1 Full Model -1409.6 7 2835.6 2 Non-significant variables removed -1413.6 4 2837.2 Change is 1.6

1) Let Science or Clinical factors guide selection Key points: Results demonstrate a significant association with baseline LDL, Age and Adherence Difficult choices with Gender, smoking and BMI AIC only changes by 1.6 when removed Generally changes of 4 or more in AIC are considered important

1) Let Science or Clinical factors guide selection Key points: Conclude little to chose between models AIC actually lower with larger model and consider Gender, and BMI important factors so keep larger model but have to justify Model building manual, logical, transparent and under your control

2) Use automatic selection procedures These are based on automatic mechanical algorithms usually related to statistical significance Common ones are stepwise, forward or backward elimination Can be selected in SPSS using ‘Method’ in dialogue box

2) Use automatic selection procedures (e.g Stepwise) Select Method = Stepwise

2) Use automatic selection procedures (e.g Stepwise) 1st step 2nd step Final Model

2) Change in AIC with Stepwise selection Note: Only available from Generalized Linear Models Step Model Log Likelihood AIC Change in AIC No. of Parameters p 1 Baseline LDL -1423.1 2852.2 - 2 +Adherence -1418.0 2844.1 8.1 3 +Age -1413.6 2837.2 6.9 4

2) Advantages and disadvantages of stepwise Simple to implement Gives a parsimonious model Selection is certainly objective Disadvantages Non stable selection – stepwise considers many models that are very similar P-value on entry may be smaller once procedure is finished so exaggeration of p-value Predictions in external dataset usually worse for stepwise procedures – tends to add bias

2) Automatic procedures: Backward elimination Backward starts by eliminating the least significant factor form the full model and has a few advantages over forward: Modeller has to consider the ‘full’ model and sees results for all factors simultaneously Correlated factors can remain in the model (in forward methods they may not even enter) Criteria for removal tend to be more lax in backward so end up with more parameters

2) Use automatic selection procedures (e.g Backward) Select Method = Backward

2) Backward elimination in SPSS 1st step Gender removed 2nd step BMI removed Final Model

Summary of automatic selection Automatic selection may not give ‘optimal’ model (may leave out important factors) Different methods may give different results (forward vs. backward elimination) Backward elimination preferred as less stringent Too easily fitted in SPSS! Model assessment still requires some thought

3) A mixture of automatic procedures and self selection Use automatic procedures as a guide Think about what factors are important Add ‘important’ factors Do not blindly follow statistical significance Consider AIC for ‘best’ model

Summary of Model selection Selection of factors for Multiple Linear regression models requires some judgement Automatic procedures are available but treat results with caution They are easily fitted in SPSS Check AIC or log likelihood for fit

Summary Multiple regression models are the most used analytical tool in quantitative research They are easily fitted in SPSS Model assessment requires some thought Parsimony is better – Occam’s Razor Donnelly LA, Palmer CNA, Whitley AL, Lang C, Doney ASF, Morris AD, Donnan PT. Apolipoprotein E genotypes are associated with lipid lowering response to statin treatment in diabetes: A Go-DARTS study. Pharmacogenetics and Genomics, 2008; 18: 279-87.

Remember Occam’s Razor ‘Entia non sunt multiplicanda praeter necessitatem’ ‘Entities must not be multiplied beyond necessity’ William of Ockham 14th century Friar and logician 1288-1347

Practical on Multiple Regression Read in ‘LDL Data.sav’ Try fitting multiple regression model on Min LDL obtained using forward and backward elimination. Are the results the same? Add other factors than those considered in the presentation such as BMI, smoking. Remember the goal is to assess the association of APOE with LDL response. Try fitting multiple regression models for Min Chol achieved. Is the model similar to that found for Min LDL Chol?