Log-linear and logistic models

Slides:



Advertisements
Similar presentations
Continued Psy 524 Ainsworth
Advertisements

Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Log-linear and logistic models Generalised linear model ANOVA revisited Log-linear model: Poisson distribution logistic model: Binomial distribution Deviances.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Maximum likelihood (ML) and likelihood ratio (LR) test
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Basics of ANOVA Why ANOVA Assumptions used in ANOVA Various forms of ANOVA Simple ANOVA tables Interpretation of values in the table Exercises.
Modeling Wim Buysse RUFORUM 1 December 2006 Research Methods Group.
Basics of ANOVA Why ANOVA Assumptions used in ANOVA
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Basics of ANOVA Why ANOVA Assumptions used in ANOVA
Maximum likelihood (ML)
Generalised linear models
Maximum likelihood (ML) and likelihood ratio (LR) test
Final Review Session.
Design of experiment and ANOVA
Log-linear and logistic models Generalised linear model ANOVA revisited Log-linear model: Poisson distribution logistic model: Binomial distribution Deviances.
Generalised linear models Generalised linear model Exponential family Example: logistic model - Binomial distribution Deviances R commands for generalised.
Generalised linear models Generalised linear model Exponential family Example: Log-linear model - Poisson distribution Example: logistic model- Binomial.
Some standard univariate probability distributions
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 14 Goodness-of-Fit Tests and Categorical Data Analysis.
Some standard univariate probability distributions
Linear and generalised linear models
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
Inferences About Process Quality
Linear and generalised linear models
PSY 307 – Statistics for the Behavioral Sciences Chapter 19 – Chi-Square Test for Qualitative Data Chapter 21 – Deciding Which Test to Use.
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Maximum likelihood (ML)
Some standard univariate probability distributions
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Generalized Linear Models
Hypothesis Testing and T-Tests. Hypothesis Tests Related to Differences Copyright © 2009 Pearson Education, Inc. Chapter Tests of Differences One.
Leedy and Ormrod Ch. 11 Gray Ch. 14
Categorical Data Prof. Andy Field.
Moment Generating Functions 1/33. Contents Review of Continuous Distribution Functions 2/33.
Basics of ANOVA Why ANOVA Assumptions used in ANOVA Various forms of ANOVA Simple ANOVA tables Interpretation of values in the table Exercises.
Standard Statistical Distributions Most elementary statistical books provide a survey of commonly used statistical distributions. The reason we study these.
Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1.
Moment Generating Functions
Some standard univariate probability distributions Characteristic function, moment generating function, cumulant generating functions Discrete distribution.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Selecting Input Probability Distribution. Simulation Machine Simulation can be considered as an Engine with input and output as follows: Simulation Engine.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Expectation. Let X denote a discrete random variable with probability function p(x) (probability density function f(x) if X is continuous) then the expected.
Multiple Logistic Regression STAT E-150 Statistical Methods.
Sampling and estimation Petter Mostad
© Department of Statistics 2012 STATS 330 Lecture 22: Slide 1 Stats 330: Lecture 22.
Variance Stabilizing Transformations. Variance is Related to Mean Usual Assumption in ANOVA and Regression is that the variance of each observation is.
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
Logistic Regression and Odds Ratios Psych DeShon.
Nonparametric Statistics
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
Logistic Regression: Regression with a Binary Dependent Variable.
Nonparametric Statistics
BINARY LOGISTIC REGRESSION
Math 4030 – 10b Inferences Concerning Variances: Hypothesis Testing
Distribution functions
Generalized Linear Models
Nonparametric Statistics
Presentation transcript:

Log-linear and logistic models Generalised linear model ANOVA revisited Log-linear model: Poisson distribution logistic model: Binomial distribution Deviances R commands for log-linear and logistic models

ANOVA revisited Let us recall the purpose of ANOVA: We want to know difference between effects of different parameters. There might be several set of parameters. If number of parameters is two then t-test is suitable for testing difference between means. If there are more than two parameters then we design experiment according to one of the schemes (n-way crossed, n-fold nested or mixture of them). When we have the result of the experiments then we fit various linear models under different hypotheses. Then we calculate likelihood ratio (LR) tests. LR test turns out to be related with ratio of sum of the squares under different hypotheses. And this ratio is related to F-distribution (if observations are distributed normally). If F-value is large enough then we say that differences between means are significant. If it is small we say that differences are not significant and we can remove some parameters from our model. One of the assumptions in ANOVA model is that observations are distributed normally. If number of observations is large enough then this assumption works very well. There are cases when ANOVA is not adequate using linear model. Examples are: Outcomes are success or failure. In this case binomial distribution is more adequate. Outcome is the number of occurrences. In this case Poisson distribution is more adequate. One more feature of Binomial and Poisson distribution is that they can be applied to categorical variables (since these distributions are discrete).

Generalised linear model Linear models are useful when the distribution of the observations are or can be approximated with normal distribution. Even if it is not the case, for large number of observations normal distribution is a safe assumption. However there are many cases when different model should be used. Generalised linear model is a way of generalising linear models to a wide range of distributions. If distribution of the observations is the from the family of generalised exponential family and mean value (or some function of it) of this distribution is linear on the input parameters then generalised linear model can be used. Recall generalised exponential family: Following distributions belong to the generalised exponential family (note that parameters we are considering are the mean values and for simplicity take S()=1). Other members of this family include: gamma, exponential and many others. If some function of the mean (, ,  for above cases) is a linear function of the observations then it can be handled using generalised linear model. Usually this function is taken A().

Generalised linear model: cont Without loss of generality general exponential family can be written: If we assume that the form of the distributions for different observations are the same but parameters are different (but  is the same for all observations) then generalised linear model will maximise the likelihood function (if observations are independent):

Poisson distribution: log-linear model If the distribution of the observations is Poisson then log-linear model could be used. Recall that Poisson distribution is from exponential family and the function A of the mean value is logarithm. It can be handled using generalised linear model. When log-linear model is appropriate: When outcomes are frequencies (expressed as integers) then log-linear model is appropriate. When we fit log-linear model then we can find estimated mean using exponential function: Example: Relation between gray hair and age Age gray hair under 40 over 40 yes 27 18 no 33 22 It is similar to two-fold nested ANOVA model. We could analyse this type of data using the log-linear model.

Binomial distribution: logistic model If the distribution of the results of experiment is binomial, i.e. outcome is 0 or 1 (success or failure) then logistic model can be used. Recall that function of mean value A has the form: This function has a special name – logit. It has several advantages: If logit() has been estimated then we can find  and it is between 0 and 1. If probability of success is larger than failure then this function is positive, otherwise it is negative. Changing places of success and failure changes only the sign of this function. This model can be used when outcomes are binary (0 and 1). If logit() is linear then we can find : For logistic model either grouped variables (fraction of successes) or individual items (every individual have success (1) or failure (0)) can be used. Ratio of the probability of success to the probability of failure is also called odds.

Deviances In linear model we maximise the likelihood with full model and under the hypothesis. Then ratio of the values of the likelihood function under two hypotheses (null and alternative) is related with F-distribution. Interpretation is that how much variance would increase if we would remove part of the model (null hypothesis). In logisitc and log-linear model analysis again likelihood function is maximised under the null-and alternative hypotheses. Then logarithm of ratio of the values of the likelihood under these two hypotheses is related asymptotically with chi-squared distribution: That is the reason why in log-linear and logistic regressions it is usual to talk about deviances and chi-squared statistics instead of variances and F-statistics. Analysis based on log-linear and logistic models (in general for generalised linear models) is usually called analyisis of deviances. Reason for this is that chi-squared is related with deviation of the fitted model and observations. Another test is based on Pearson’s chi-squared test. These two tests behave similarly as the number of observations increases.

R commands for log-linear model log-linear model can be analysed using generalised linear model. Once the factors, the data and the formula have been decided then we can use: result <- glm(data~formula,family=‘poisson’) It will give us fitted model. Then we can use anova.glm(result,test=‘Chisq’) anova(result,test=‘Chisq’) plot(result) summary(result) Interpretation of the results is similar to that for linear model ANOVA tables. Degrees of freedom is defined similarly. Only difference is that instead of sum of squares deviances are used.

R commands for logistic regression Similar to log-linear model: Decide what are the data, the factors and what formula should be used. Then use generalised linear model to fit. result <- glm(data~formula,family=‘binomial’) then analyse using anova(result,test=“Chisq”) summary(result) plot(result)

Bootstrap There are different ways of applying bootstrap for these cases: Sample the original observation with design matrix Sample the residuals and add them to the fitted values (for each member of family and each link function it should be done differently) Use estimated parameters and do parametric sampling. fit the model (using glm and family of distributions) For each cell in the design matrix find the parameters of the distribution Sample using the distribution with this parameter Fit the model again and save coefficients (or any other statistics of interest) Repeat 3 and 4 B times Build distributions and other properties

Exercises: generalised linear Show that gamma distribution is from the exponential family (r is constant): Find moment generating function for natural exponential family: Hint: Use the fact that density of the distribution should be normalised to 1: Then use the definition of moment generating function. Find the first and the second moments.

References Myers, RM, Montgomery, DC and Vining GG. Generalized linear models: with application in Engineering and the Sciences