Conditional Test Statistics

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Conditional Test Statistics. Suppose that we are considering two Log- linear models and that Model 2 is a special case of Model 1. That is the parameters.
Regression: (2) Multiple Linear Regression and Path Analysis Hal Whitehead BIOL4062/5062.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Log-linear Analysis - Analysing Categorical Data
(Hierarchical) Log-Linear Models Friday 18 th March 2011.
Linear statistical models 2008 Binary and binomial responses The response probabilities are modelled as functions of the predictors Link functions: the.
Part I – MULTIVARIATE ANALYSIS C3 Multiple Linear Regression II © Angel A. Juan & Carles Serrat - UPC 2007/2008.
BIOST 536 Lecture 9 1 Lecture 9 – Prediction and Association example Low birth weight dataset Consider a prediction model for low birth weight (< 2500.
1 Chapter 9 Variable Selection and Model building Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Chapter 16 Chi Squared Tests.
Handling Categorical Data. Learning Outcomes At the end of this session and with additional reading you will be able to: – Understand when and how to.
Chapter 15: Model Building
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Multiple Logistic Regression STAT E-150 Statistical Methods.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Using SPSS Note: The use of another statistical package such as Minitab is similar to using SPSS.
Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L14.1 Lecture 14: Contingency tables and log-linear models Appropriate questions.
University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Logistic Regression II/ (Hierarchical)
Chapter Outline EMPIRICAL MODELS 11-2 SIMPLE LINEAR REGRESSION 11-3 PROPERTIES OF THE LEAST SQUARES ESTIMATORS 11-4 SOME COMMENTS ON USES OF REGRESSION.
Logistic Regression: Regression with a Binary Dependent Variable.
1 BUSI 6220 By Dr. Nick Evangelopoulos, © 2012 Brief overview of Linear Regression Models (Pre-MBA level)
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 22/11/ :12 AM 1 Contingency tables and log-linear models.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Stats Methods at IC Lecture 3: Regression.
Nonparametric Statistics
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard)   Week 5 Multiple Regression  
Chapter 15 Multiple Regression Model Building
Lecture 11: Simple Linear Regression
BINARY LOGISTIC REGRESSION
LINEAR REGRESSION 1.
Chapter 7. Classification and Prediction
Logistic Regression APKC – STATS AFAC (2016).
Chapter 8 Linear normal models.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Logistic Regression II/ (Hierarchical)
REGRESSION (R2).
26134 Business Statistics Week 5 Tutorial
Discrete Multivariate Analysis
Basic Estimation Techniques
Chapter 8 Linear normal models.
John Loucks St. Edward’s University . SLIDES . BY.
Chapter 10: Analysis of Variance: Comparing More Than Two Means
John Loucks St. Edward’s University . SLIDES . BY.
Discrete Multivariate Analysis
Business Statistics, 4e by Ken Black
Slides by JOHN LOUCKS St. Edward’s University.
Basic Estimation Techniques
Log Linear Modeling of Independence
CHAPTER 29: Multiple Regression*
Nonparametric Statistics
Rainfall Example The data set contains cord yield (bushes per acre) and rainfall (inches) in six US corn-producing states (Iowa, Nebraska, Illinois, Indiana,
Multiple Regression Models
Hypothesis testing and Estimation
Indicator Variables Often, a data set will contain categorical variables which are potential predictor variables. To include these categorical variables.
BASIC REGRESSION CONCEPTS
Linear Model Selection and regularization
Lecture 20 Last Lecture: Effect of adding or deleting a variable
Regression Analysis.
Copyright © Cengage Learning. All rights reserved.
Chapter 11 Variable Selection Procedures
Business Statistics, 4e by Ken Black
Chapter 6 Logistic Regression: Regression with a Binary Dependent Variable Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.
Adding variables. There is a difference between assessing the statistical significance of a variable acting alone and a variable being added to a model.
St. Edward’s University
Mortality Analysis.
Presentation transcript:

Conditional Test Statistics

Suppose that we are considering two Log-linear models and that Model 2 is a special case of Model 1. That is the parameters of Model 2 are a subset of the parameters of Model 1. Also assume that Model 1 has been shown to adequately fit the data.

In this case one is interested in testing if the differences in the expected frequencies between Model 1 and Model 2 is simply due to random variation] The likelihood ratio chi-square statistic that achieves this goal is:

Example

Goodness of Fit test for the all k-factor models Conditional tests for zero k-factor interactions

Conclusions The four factor interaction is not significant G2(3|4) = 0.7 (p = 0.705) The all three factor model provides a significant fit G2(3) = 0.7 (p = 0.705) All the three factor interactions are not significantly different from 0, G2(2|3) = 9.2 (p = 0.239). The all two factor model provides a significant fit G2(2) = 9.9 (p = 0.359) There are significant 2 factor interactions G2(1|2) = 33.0 (p = 0.00083. Conclude that the model should contain main effects and some two-factor interactions

There also may be a natural sequence of progressively complicated models that one might want to identify. In the laundry detergent example the variables are: Softness of Laundry Used Previous use of Brand M Temperature of laundry water used Preference of brand X over brand M

A natural order for increasingly complex models which should be considered might be: [1][2][3][4] [1][3][24] [1][34][24] [13][34][24] [13][234] [134][234] The all-Main effects model Independence amongst all four variables Since previous use of brand M may be highly related to preference for brand M, add first the 2-4 interaction Brand M is recommended for hot water add 2nd the 3-4 interaction brand M is also recommended for Soft laundry add 3rd the 1-3 interaction Add finally some possible 3-factor interactions

Likelihood Ratio G2 for various models d]f] G2 [1][3][24] 17 22.4 [1][24][34] 16 18 [13][24][34] 14 11.9 [13][23][24][34] 13 11.2 [12][13][23][24][34] 11 10.1 [1][234] 14.5 [134][24] 10 12.2 [13][234] 12 8.4 [24][34][123] 9 [123][234] 8 5.6

Stepwise selection procedures Forward Selection Backward Elimination

Forward Selection: Starting with a model that under fits the data, log-linear parameters that are not in the model are added step by step until a model that does fit is achieved. At each step the log-linear parameter that is most significant is added to the model: To determine the significance of a parameter added we use the statistic: G2(2|1) = G2(2) – G2(1) Model 1 contains the parameter. Model 2 does not contain the parameter

Backward Selection: Starting with a model that over fits the data, log-linear parameters that are in the model are deleted step by step until a model that continues to fit the model and has the smallest number of significant parameters is achieved. At each step the log-linear parameter that is least significant is deleted from the model: To determine the significance of a parameter deleted we use the statistic: G2(2|1) = G2(2) – G2(1) Model 1 contains the parameter. Model 2 does not contain the parameter

K = knowledge N = Newspaper R = Radio S = Reading L = Lectures

Continuing after 10 steps

The final step

The best model was found a the previous step [LN][KLS][KR][KN][LR][NR][NS]

Logit Models To date we have not worried whether any of the variables were dependent of independent variables. The logit model is used when we have a single binary dependent variable.

The variables Type of seedling (T) Depth of planting (D) Longleaf seedling Slash seedling Depth of planting (D) Too low. Too high Mortality (M) (the dependent variable) Dead Alive

The Log-linear Model Note: mij1 = # dead when T = i and D = j. mij2 = # alive when T = i and D = j. = mortality ratio when T = i and D = j.

Hence since

The logit model: where

Thus corresponding to a loglinear model there is logit model predicting log ratio of expected frequencies of the two categories of the independent variable. Also k +1 factor interactions with the dependent variable in the loglinear model determine k factor interactions in the logit model k + 1 = 1 constant term in logit model k + 1 = 2, main effects in logit model

1 = Depth, 2 = Mort, 3 = Type

Log-Linear parameters for Model: [TM][TD][DM]

Logit Model for predicting the Mortality

The best model was found by forward selection was [LN][KLS][KR][KN][LR][NR][NS] To fit a logit model to predict K (Knowledge) we need to fit a loglinear model with important interactions with K (knowledge), namely [LNRS][KLS][KR][KN] The logit model will contain Main effects for L (Lectures), N (Newspapers), R (Radio), and S (Reading) Two factor interaction effect for L and S

The Logit Parameters for the Model : LNSR, KLS, KR, KN ( Multiplicative effects are given in brackets, Logit Parameters = 2 Loglinear parameters) The Constant term: -0.226 (0.798) The Main effects on Knowledge: Lectures Lect 0.268 (1.307) None -0.268 (0.765) Newspaper News 0.324 (1.383) None -0.324 (0.723) Reading Solid 0.340 (1.405) Not -0.340 (0.712) Radio Radio 0.150 (1.162) None -0.150 (0.861) The Two-factor interaction Effect of Reading and Lectures on Knowledge