Categorical Independent Variables STA302 Fall 2015.

Slides:



Advertisements
Similar presentations
Topic 12: Multiple Linear Regression
Advertisements

STA305 Spring 2014 This started with excerpts from STA2101f13
Introduction to Regression with Measurement Error STA431: Spring 2015.
Exploratory Factor Analysis
Definition  Regression Model  Regression Equation Y i =  0 +  1 X i ^ Given a collection of paired data, the regression equation algebraically describes.
Regression Part II One-factor ANOVA Another dummy variable coding scheme Contrasts Multiple comparisons Interactions.
Logistic Regression STA302 F 2014 See last slide for copyright information 1.
1 1 Slide © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Ch. 14: The Multiple Regression Model building
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Introduction to Regression with Measurement Error STA431: Spring 2013.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Normal Linear Model STA211/442 Fall Suggested Reading Davison’s Statistical Models, Chapter 8 The general mixed linear model is defined in Section.
Introduction to Linear Regression and Correlation Analysis
MAT 254 – Probability and Statistics Sections 1,2 & Spring.
Normal Linear Model STA211/442 Fall Suggested Reading Faraway’s Linear Models with R, just start reading. Davison’s Statistical Models, Chapter.
Double Measurement Regression STA431: Spring 2013.
SAS: The last of the great mainframe stats packages STA431 Winter/Spring 2015.
Maximum Likelihood See Davison Ch. 4 for background and a more thorough discussion. Sometimes.
Introduction to Regression with Measurement Error STA302: Fall/Winter 2013.
1 1 Slide © 2016 Cengage Learning. All Rights Reserved. The equation that describes how the dependent variable y is related to the independent variables.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved OPIM 303-Lecture #9 Jose M. Cruz Assistant Professor.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination n Model Assumptions n Testing.
Applications The General Linear Model. Transformations.
Regression Part II One-factor ANOVA Another dummy variable coding scheme Contrasts Multiple comparisons Interactions.
Chapter 14 Introduction to Multiple Regression
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Section 12.4.
Factorial ANOVA More than one categorical explanatory variable STA305 Spring 2014.
Logistic Regression STA2101/442 F 2014 See last slide for copyright information.
Regression. Population Covariance and Correlation.
Regression Part II One-factor ANOVA Another dummy variable coding scheme Contrasts Multiple comparisons Interactions.
CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables.
Chapter 13 Multiple Regression
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Statistics for Business and Economics 8 th Edition Chapter 11 Simple Regression Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Linear Discriminant Analysis (LDA). Goal To classify observations into 2 or more groups based on k discriminant functions (Dependent variable Y is categorical.
Confirmatory Factor Analysis Part Two STA431: Spring 2013.
Categorical Independent Variables STA302 Fall 2013.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 14 Comparing Groups: Analysis of Variance Methods Section 14.3 Two-Way ANOVA.
STA302: Regression Analysis. Statistics Objective: To draw reasonable conclusions from noisy numerical data Entry point: Study relationships between variables.
General Linear Model.
1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 14-1 Chapter 14 Introduction to Multiple Regression Statistics for Managers using Microsoft.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
STA302: Regression Analysis. Statistics Objective: To draw reasonable conclusions from noisy numerical data Entry point: Study relationships between variables.
The Sample Variation Method A way to select sample size This slide show is a free open source document. See the last slide for copyright information.
Logistic Regression For a binary response variable: 1=Yes, 0=No This slide show is a free open source document. See the last slide for copyright information.
© 2001 Prentice-Hall, Inc.Chap 13-1 BA 201 Lecture 19 Measure of Variation in the Simple Linear Regression Model (Data)Data.
Introduction Many problems in Engineering, Management, Health Sciences and other Sciences involve exploring the relationships between two or more variables.
29 October 2009 MRC CBU Graduate Statistics Lectures 4: GLM: The General Linear Model - ANOVA & ANCOVA1 MRC Cognition and Brain Sciences Unit Graduate.
Decomposition of Sum of Squares
Logistic Regression STA2101/442 F 2017
Poisson Regression STA 2101/442 Fall 2017
Interactions and Factorial ANOVA
Logistic Regression Part One
Statistical Analysis of the Randomized Block Design
STA302: Regression Analysis
Multinomial Logit Models
Simple Linear Regression
Prepared by Lee Revere and John Large
Regression and Categorical Predictors
Presentation transcript:

Categorical Independent Variables STA302 Fall 2015

Categorical means unordered categories Like Field of Study: Humanities, Sciences, Social Sciences Could number them 1 2 3, but what would the regression coefficients mean? But you really want them in your regression model.

One Categorical Explanatory Variable X=1 means Drug, X=0 means Placebo Population mean is For patients getting the drug, mean response is For patients getting the placebo, mean response is

Sample regression coefficients for a binary explanatory variable X=1 means Drug, X=0 means Placebo Predicted response is For patients getting the drug, predicted response is For patients getting the placebo, predicted response is

Regression test of Same as an independent t-test Same as a oneway ANOVA with 2 categories Same t, same F, same p-value. Now extend to more than 2 categories

Drug A, Drug B, Placebo x 1 = 1 if Drug A, Zero otherwise x 2 = 1 if Drug B, Zero otherwise Fill in the table

Drug A, Drug B, Placebo x 1 = 1 if Drug A, Zero otherwise x 2 = 1 if Drug B, Zero otherwise Regression coefficients are contrasts with the category that has no indicator – the reference category

Indicator dummy variable coding with intercept Need p-1 indicators to represent a categorical explanatory variable with p categories. If you use p dummy variables, columns of the X matrix are linearly dependent. Regression coefficients are contrasts with the category that has no indicator. Call this the reference category.

Now add a quantitative variable (covariate) x 1 = Age x 2 = 1 if Drug A, Zero otherwise x 3 = 1 if Drug B, Zero otherwise

A common error Categorical explanatory variable with p categories p dummy variables (rather than p-1) And an intercept There are p population means represented by p+1 regression coefficients - not unique

But suppose you leave off the intercept Now there are p regression coefficients and p population means The correspondence is unique, and the model can be handy -- less algebra Called cell means coding

Cell means coding: p indicators and no intercept This model is equivalent to the one with the intercepts

Add a covariate: x 4

Do the residuals add to zero with cell means coding? If so, SST = SSR + SSE And we have R 2 Let j denote an nx1 column of ones. If there is a (k+1)x1 vector a with Xa=j, the residuals add up to zero. So the answer is Yes.

Copyright Information This slide show was prepared by Jerry Brunner, Department of Statistical Sciences, University of Toronto. It is licensed under a Creative Commons Attribution - ShareAlike 3.0 Unported License. Use any part of it as you like and share the result freely. These Powerpoint slides will be available from the course website: