GENERAL LINEAR MODELS Oneway ANOVA, GLM Univariate (n-way ANOVA, ANCOVA)

Slides:



Advertisements
Similar presentations
One-Way and Factorial ANOVA SPSS Lab #3. One-Way ANOVA Two ways to run a one-way ANOVA 1.Analyze  Compare Means  One-Way ANOVA Use if you have multiple.
Advertisements

Topic 12: Multiple Linear Regression
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Research Support Center Chongming Yang
FACTORIAL ANOVA Overview of Factorial ANOVA Factorial Designs Types of Effects Assumptions Analyzing the Variance Regression Equation Fixed and Random.
Hypothesis Testing Steps in Hypothesis Testing:
Analysis of variance (ANOVA)-the General Linear Model (GLM)
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Linear regression models
Generalized Linear Models (GLM)
Regression Part II One-factor ANOVA Another dummy variable coding scheme Contrasts Multiple comparisons Interactions.
One-Way Between Subjects ANOVA. Overview Purpose How is the Variance Analyzed? Assumptions Effect Size.
ANOVA notes NR 245 Austin Troy
Analysis of variance (ANOVA)-the General Linear Model (GLM)
ANalysis Of VAriance (ANOVA) Comparing > 2 means Frequently applied to experimental data Why not do multiple t-tests? If you want to test H 0 : m 1 = m.
Final Review Session.
Experimental Design Terminology  An Experimental Unit is the entity on which measurement or an observation is made. For example, subjects are experimental.
Two Groups Too Many? Try Analysis of Variance (ANOVA)
Chapter 11 Multiple Regression.
Ch. 14: The Multiple Regression Model building
Analysis of Variance & Multivariate Analysis of Variance
Statistical Methods in Computer Science Hypothesis Testing II: Single-Factor Experiments Ido Dagan.
Repeated Measures ANOVA Used when the research design contains one factor on which participants are measured more than twice (dependent, or within- groups.
Two-Way Analysis of Variance STAT E-150 Statistical Methods.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Quantitative Business Analysis for Decision Making Multiple Linear RegressionAnalysis.
Leedy and Ormrod Ch. 11 Gray Ch. 14
Analysis of Variance. ANOVA Probably the most popular analysis in psychology Why? Ease of implementation Allows for analysis of several groups at once.
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
Introduction to SAS Essentials Mastering SAS for Data Analytics
One-Factor Experiments Andy Wang CIS 5930 Computer Systems Performance Analysis.
Topic 28: Unequal Replication in Two-Way ANOVA. Outline Two-way ANOVA with unequal numbers of observations in the cells –Data and model –Regression approach.
 Combines linear regression and ANOVA  Can be used to compare g treatments, after controlling for quantitative factor believed to be related to response.
More complicated ANOVA models: two-way and repeated measures Chapter 12 Zar Chapter 11 Sokal & Rohlf First, remember your ANOVA basics……….
23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,
Regression Part II One-factor ANOVA Another dummy variable coding scheme Contrasts Multiple comparisons Interactions.
Topic 14: Inference in Multiple Regression. Outline Review multiple linear regression Inference of regression coefficients –Application to book example.
Chapter 10 Analysis of Variance.
ANOVA (Analysis of Variance) by Aziza Munir
Testing Multiple Means and the Analysis of Variance (§8.1, 8.2, 8.6) Situations where comparing more than two means is important. The approach to testing.
Inferential Statistics
 Step 1: Collect and clean data (spreadsheet from heaven)  Step 2: Calculate descriptive statistics  Step 3: Explore graphics  Step 4: Choose outcome(s)
Regression Part II One-factor ANOVA Another dummy variable coding scheme Contrasts Multiple comparisons Interactions.
The Completely Randomized Design (§8.3)
Statistically speaking…
ANOVA: Analysis of Variance.
Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 1 Analysis of variance (ANOVA) (from Chapter 7)
Adjusted from slides attributed to Andrew Ainsworth
PSYC 3030 Review Session April 19, Housekeeping Exam: –April 26, 2004 (Monday) –RN 203 –Use pencil, bring calculator & eraser –Make use of your.
Simple Linear Regression (SLR)
Simple Linear Regression (OLS). Types of Correlation Positive correlationNegative correlationNo correlation.
Analysis Overheads1 Analyzing Heterogeneous Distributions: Multiple Regression Analysis Analog to the ANOVA is restricted to a single categorical between.
Topic 24: Two-Way ANOVA. Outline Two-way ANOVA –Data –Cell means model –Parameter estimates –Factor effects model.
Two-Way (Independent) ANOVA. PSYC 6130A, PROF. J. ELDER 2 Two-Way ANOVA “Two-Way” means groups are defined by 2 independent variables. These IVs are typically.
Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
One-Way Analysis of Variance Recapitulation Recapitulation 1. Comparing differences among three or more subsamples requires a different statistical test.
CRD, Strength of Association, Effect Size, Power, and Sample Size Calculations BUSI 6480 Lecture 4.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Analysis of variance Tron Anders Moger
ANOVA and Multiple Comparison Tests
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
The “Big Picture” (from Heath 1995). Simple Linear Regression.
Analyze Of VAriance. Application fields ◦ Comparing means for more than two independent samples = examining relationship between categorical->metric variables.
ANALYSIS OF VARIANCE (ANOVA)
B&A ; and REGRESSION - ANCOVA B&A ; and
Comparing Three or More Means
Analysis of Covariance (ANCOVA)
Joanna Romaniuk Quanticate, Warsaw, Poland
One-Factor Experiments
Presentation transcript:

GENERAL LINEAR MODELS Oneway ANOVA, GLM Univariate (n-way ANOVA, ANCOVA)

BASICS Dependent variable is continuous Independent variables are nominal, categorical (factor, CLASS) or continuous (covariate) Are the group means of the dependent variable different across groups defined by the independents Main effects, interactions and nested effects Often used for testing hypotheses with experimental data

BASICS Factor A (industry) Level 1 (manufact) Factor A (industry) Level 2 (trade) Factor B (size) Level 1 (small) Cell Factor B (size) Level 2 (medium) Factor B (size) Level 3 (large) 3 X 2 full factorial design (full: each cell has observations) Balanced design: each cell has equal number of observations

ASSUMPTIONS Enough observations in each group? (n >20) Independence of observations Similarity of variance-covariance matrices (no problem if largest group variance < 1.5*smallest group variance, 4* if balanced design) Normality Linearity No outlier-observations

STEPS OF INTERPRETATION Model significance? F-test and R square Welch, if unequal group variances (this can be tested using Levene or Brown-Forsythe test) Significance of effects? (F-test and partial eta squared) Which group differences are significant? Post hoc or contrast tests What are the group differences like? Estimated marginal means for groups

Oneway ANOVA A continuous dependent variable (y) and one categorical independent variable (x), with min. 3 categories, k= number of categories assumptions: y normally distributed with equal variance in each x category H0: mean of y is the same in all x categories Variance of y is divided into two components: within groups (error) and between groups (model, treatment) Test statistic= between mean square / within mean square follows F-distribution with k-1, n-k degrees of freedom F-test can be replaced by Welch if variances are unequal

Oneway ANOVA If the F test is significant, you can use post hoc tests for pairwise comparison of means across the groups Alternatively (in experiments) you can define contrasts ex ante

SAS: oneway ANOVA

BF or Levene, H0: group variances are equal Use this instead of F if variances are not equal

SAS: oneway ANOVA Post hoc -tests

SAS: oneway ANOVA

MODEL FIT SourceDF Sum of SquaresMean SquareF ValuePr > F Model <.0001 Error Corrected Total R-SquareCoeff VarRoot MSEdeathrate Mean Class Level Information ClassLevelsValues class_popgrowth

EQUALITY OF VARIANCES Levene's Test for Homogeneity of deathrate Variance ANOVA of Squared Deviations from Group Means SourceDF Sum of Squares Mean SquareF ValuePr > F class_popgrowth Error Welch's ANOVA for deathrate SourceDFF ValuePr > F class_popgrowth <.0001 Error

GROUP MEANS Level of class_popgrowthN deathrate MeanStd Dev

POST HOC TEST Comparisons significant at the 0.05 level are indicated by ***. class_popgrowth Comparison Difference Between Means Simultaneous 95% Confidence Limits *** *** *** *** *** ***

BOXPLOTS

Multiway ANOVA, GLM A continuous dependent variable y, two or more categorical independent variables (factorial design) ANCOVA, if there are continuous independents (covariates) main effects and interaction effects can be modeled fixed factor, if all groups are present and random factor, if only some groups are randomly represented in the data Eta squared = SSK/SST expresses how many % of the variance in y is explained by x (not in EG! SAS code: model y = x1 x2 / ss3 EFFECTSIZE;)

INTERACTION EFFECT Synergy of two factors, the effect of one factor is different in the groups of the other factor Crossing effect = interaction effect Ordinal (lines in means plot have different slopes, but do not cross) Disordinal (lines cross in the means plot)

NO INTERACTION Size and industry both have a significant main effect No interaction, homogeneity of slopes

INTERACTIONS Ordinal interaction (the effect of size is stronger in manufacturing than in trade) Dis-ordinal interaction (the effect of size has a different sign in manufacturing and trade)

NESTED EFFECTS Nested effect B(A) ”B nested within A” size (industry): the effect of size is estimated separately for each industry group Difference between nested and interaction effect is that the main effect of B (size) is not included The slope of B (size) is different in each category of A (industry)

ESTIMATED GROUP MEANS Estimated marginal means or LS (least squares) means Predicted group means are calculated using the estimated model coefficients The effects of other independent variables are controlled for Is not equal to the group means from the sample

SUM OF SQUARES Type I SS does not control for the effects of other independent variables which are specified later into the model Type II SS controls for the effects of all other independents Types III and IV SS are better in unbalanced designs, IV if there are empty cells

POST HOC TESTS Multiple comparison procedures, mean separation tests The idea is to avoid the risk of Type I error which results from doing many pairwise tests, each at 5% risk level E.g. Bonferroni, Scheffe, Sidak,… Tukey-Kramer is most powerful H0: equal group means -> rejection means that group means are not equal, but failure to reject does not necessarily mean that they are equal (small sample size -> low power -> failure to reject the null)

ANCOVA The model includes a covariate (= continuous independent variable, often one whose effect you want to control for) Regress y on the covariate -> then ANOVA with factors explaining the residual The relationship between covariate and y must be linear, and the slope is assumed to be the same at all factor levels The covariate and factor should not be too much related to each other Do not include too many covariates, max 0.1*n – (k- 1)

SAS: analyze – ANOVA – linear models

Effects to be estimated Interaction here, first select both variables, then click Cross

Sums of squares

Other options, defaults ok

Post hoc-tests

Plots

SAS - code PROC GLM DATA=libname.datafilename PLOTS(ONLY)=DIAGNOSTICS(UNPACK) PLOTS(ONLY)=RESIDUALS PLOTS(ONLY)=INTPLOT ; CLASS Elinkaari Perheyr; MODEL growthorient=ln_hlo Elinkaari Perheyr Elinkaari*Perheyr / SS3 SOLUTION SINGULAR=1E-07 EFFECTSIZE ; LSMEANS Elinkaari Perheyr Elinkaari*Perheyr / PDIFF ADJUST=BON ; RUN; QUIT;

Model significance and fit Class Level Information ClassLevelsValues Elinkaari phase Perheyr family20 1 Number of Observations Read181 Number of Observations Used132 SourceDF Sum of SquaresMean SquareF ValuePr > F Model Error Corrected Total R-SquareCoeff VarRoot MSEgrowthorient Mean

Significance of predictors SourceDFType III SSMean SquareF ValuePr > F ln_hlo employees Elinkaari phase Perheyr family Elinkaari*Perheyr Phase*Family

EFFECT SIZE OF PREDICTORS Source Total Variation Accounted ForPartial Variation Accounted For Semipartial Eta-Square Semipartial Omega- Square Conservative 95% Confidence Li mits Partial Eta- Square Partial Omega- Square 95% Confidenc e Limits ln_hlo Elinkaari Perheyr Elinkaari*Per heyr

Parameter estimates ParameterEstimate Standard Errort ValuePr > |t| Intercept B <.0001 ln_hlo employees Elinkaari 2 growth B Elinkaari 3 mature B Elinkaari 4 decline B... Perheyr 0 non family B Perheyr 1 family B... Elinkaari*Perheyr B Elinkaari*Perheyr B... Elinkaari*Perheyr B Elinkaari*Perheyr B... Elinkaari*Perheyr B... Elinkaari*Perheyr B...

Prediction for 6 cells Elinkaari=2 & perheyr=0 (growth phase, non family) Growth = *ln_hlo – = *ln_hlo Elinkaari=3 & perheyr=0 (mature phase, non family) Growth = *ln_hlo – 0.04 – = *ln_hlo Elinkaari=4 & perheyr=0 (decline phase, non family) Growth = *ln_hlo – = *ln_hlo Elinkaari=2 & perheyr=1 (growth phase, family) Growth = *ln_hlo = *ln_hlo Elinkaari=3 & perheyr=1 (mature phase, family) Growth = *ln_hlo = *ln_hlo Elinkaari=4 & perheyr=1 (decline phase, family) Growth = *ln_hlo = *ln_hlo 38

Parameter estimates The X'X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter 'B' are not uniquely estimable. This warning always occurs if you have categorical independent variables in the model, SAS can however estimate the coefficients 39

Homoskedasticity

Outlier diagnostics

Residual distribution

Model fit

Influence diagnostics

Residual vs. covariate

Significance of group differences, main effects Elinkaari phase growthorient LSMEAN LSMEAN Number 2 growth mature decline Least Squares Means for effect Elinkaari Pr > |t| for H0: LSMean(i)=LSMean(j) Dependent Variable: growthorient i/j Perheyr Family growthorient LSMEAN H0:LSMean1=LSMean 2 Pr > |t|

Significance of group differences, interaction PhaseFamily growthorient LSMEAN LSMEAN Number 2 growth mature decline Least Squares Means for effect Elinkaari*Perheyr Pr > |t| for H0: LSMean(i)=LSMean(j) Dependent Variable: growthorient i/j Non-family firms in growth phase differ from non-family firms in mature phase

REPORTING GLM Model fit: F + df + p and R Square Nature and significance of effects: parameter estimates B+s.e.+t+p and F+p estimated group means (means plot) post hoc test results

Means plot Employees at its mean value (20)