General Qualitative Data, and “Dummy Variables” How might we have represented “make-of-car” in the motorpool case, had there been more than just two makes?

Slides:



Advertisements
Similar presentations
Continued Psy 524 Ainsworth
Advertisements

Qualitative predictor variables
USING DUMMY VARIABLES IN REGRESSION MODELS. Qualitative Variables Qualitative variables can be introduced into regression models using dummy variables.
Guide to Using Excel 2007 For Basic Statistical Applications To Accompany Business Statistics: A Decision Making Approach, 8th Ed. Chapter 15: Multiple.
Economics 20 - Prof. Anderson1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 7. Specification and Data Problems.
Example 1 To predict the asking price of a used Chevrolet Camaro, the following data were collected on the car’s age and mileage. Data is stored in CAMARO1.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Lecture 28 Categorical variables: –Review of slides from lecture 27 (reprint of lecture 27 categorical variables slides with typos corrected) –Practice.
Soc 3306a Lecture 6: Introduction to Multivariate Relationships Control with Bivariate Tables Simple Control in Regression.
Multiple Regression Fenster Today we start on the last part of the course: multivariate analysis. Up to now we have been concerned with testing the significance.
Chapter 6: Correlational Research Examine whether variables are related to one another (whether they vary together). Correlation coefficient: statistic.
Econ 140 Lecture 151 Multiple Regression Applications Lecture 15.
Does Poverty Cause Domestic Terrorism? Who knows? (Regression alone can’t establish causality.) There does appear to be some.
Guide to Using Minitab For Basic Statistical Applications To Accompany Business Statistics: A Decision Making Approach, 6th Ed. Chapter 14: Multiple Regression.
Chapter 13 Multiple Regression
Chapter 12 Multiple Regression
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.
Lecture 27 Polynomial Terms for Curvature Categorical Variables.
Stat 112: Lecture 20 Notes Chapter 7.2: Interaction Variables. Chapter 8: Model Building. I will Homework 6 by Friday. It will be due on Friday,
1 G Lect 11M Binary outcomes in psychology Can Binary Outcomes Be Studied Using OLS Multiple Regression? Transforming the binary outcome Logistic.
Regression Analysis: How to DO It Example: The “car discount” dataset.
Dr. Mario MazzocchiResearch Methods & Data Analysis1 Correlation and regression analysis Week 8 Research Methods & Data Analysis.
The Glass Ceiling: A Study on Annual Salaries Group 4 Julie Shan, Brian Abe, Yu-Ting Cheng, Kathinka Tysnes, Huan Zhang, Andrew Booth.
Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and.
Stat 112: Lecture 9 Notes Homework 3: Due next Thursday
Multiple Regression 2 Sociology 5811 Lecture 23 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Multiple Linear Regression A method for analyzing the effects of several predictor variables concurrently. - Simultaneously - Stepwise Minimizing the squared.
DUMMY VARIABLES BY HARUNA ISSAHAKU Haruna Issahaku.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Nonlinear Regression Functions
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.
Modeling Possibilities
Moderation & Mediation
Copyright © 2014 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
1 1 Slide Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination n Model Assumptions n Testing.
Chapter 14 Introduction to Multiple Regression
Statistics and Quantitative Analysis U4320 Segment 12: Extension of Multiple Regression Analysis Prof. Sharyn O’Halloran.
Stat 112 Notes 20 Today: –Interaction Variables (Chapter ) –Interpreting slope when Y is logged but not X –Model Building (Chapter 8)
Multiple Regression 3 Sociology 5811 Lecture 24 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Chapter 14 Inference for Regression © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course.
Review of the Basic Logic of NHST Significance tests are used to accept or reject the null hypothesis. This is done by studying the sampling distribution.
Copyright © 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Dummy Variable Regression Models chapter ten.
University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 5 Multiple Regression.
Stat 112 Notes 9 Today: –Multicollinearity (Chapter 4.6) –Multiple regression and causal inference.
Welcome to Econ 420 Applied Regression Analysis Study Guide Week Seven.
Multivariate Descriptive Research In the previous lecture, we discussed ways to quantify the relationship between two variables when those variables are.
Linear Discriminant Analysis (LDA). Goal To classify observations into 2 or more groups based on k discriminant functions (Dependent variable Y is categorical.
Overview of Regression Analysis. Conditional Mean We all know what a mean or average is. E.g. The mean annual earnings for year old working males.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 14-1 Chapter 14 Introduction to Multiple Regression Statistics for Managers using Microsoft.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
DTC Quantitative Research Methods Regression I: (Correlation and) Linear Regression Thursday 27 th November 2014.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Nonparametric Statistics STAT E-150 Statistical Methods.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Discounts on Car Purchases: Does Salesperson Identity Matter? Assume there are five salesfolks: Andy, Bob, Chuck, Dave and Ed Take one (e.g., Andy) as.
Multiple Linear Regression An introduction, some assumptions, and then model reduction 1.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 6 Regression: ‘Loose Ends’
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard)   Week 5 Multiple Regression  
Chapter 14 Introduction to Multiple Regression
The Correlation Coefficient (r)
Correlation, Regression & Nested Models
Regression Analysis.
Discounts on Car Purchases: Does Salesperson Identity Matter?
Discounts on Car Purchases: Does Salesperson Identity Matter?
Session 4.1: We Approach the End
Regression and Categorical Predictors
General Linear Regression
The Correlation Coefficient (r)
Presentation transcript:

General Qualitative Data, and “Dummy Variables” How might we have represented “make-of-car” in the motorpool case, had there been more than just two makes? – Assume that Make takes four categorical values (Ford, Honda, BMW, and Sterling). Choose one value as the “foundation” case. Create three 0/1 (“yes”/”no”, so-called “dummy”) variables for the other three cases. These three variables jointly represent the four-valued qualitative Make variable. Here are the details. Here We’ll use this representational trick in order to include “day of game” (either Friday, Saturday, or Sunday) in a model which predicts attendance at a professional indoor soccer team’s home games. Here is the example.Here – Using this trick requires that we extend the “significance level” (with respect to whether a variable “belongs” in the model) to groups of variables. This is done via “analysis of variance” (ANOVA).

Discounts on Car Purchases: Does Salesperson Identity Matter? Assume there are five salesfolks: Andy, Bob, Chuck, Dave and Ed Take one (e.g., Andy) as the foundation case, and add four new “dummy” variables D B = 1 only if Bob, 0 otherwise D C = 1 only if Chuck, 0 otherwise D D = 1 only if Dave, 0 otherwise D E = 1 only if Ed, 0 otherwise The coefficient of each (in the most-complete model) will differentiate the average discount that each salesperson gives a customer from the average discount Andy would give the same customer

Does Salesperson Identity Matter? Imagine that, after adding the new variables (four new columns of data) to your model, the regression yields: Discount pred =  Age –  Income  Sex  D B + (–300)  D C + (–50)  D D  D E With similar customers, you’d expect Bob to give a discount $240 higher than would Andy With similar customers, you’d expect Chuck to give a discount $300 lower than would Andy, $540 lower than would Bob, and also lower than would Dave (by $250) and Ed (by $670)

Does “Salesperson” Interact with “Sex”? Are some of the salesfolk better at selling to a particular Sex of customer? – Add D B, D C, D D, D E, and D B  Sex, D C  Sex, D D  Sex, D E  Sex to the model – Imagine that your regression yields: Discount pred =  Age  Income  Sex  D B – 350  D C + 75  D D + 10  D E – 375  (D B  Sex) – 150  (D C  Sex) – 50  (D D  Sex)  (D E  Sex) – Interpret this back in the “conceptual” model: Discount pred =  Age –  Income  Sex + (240 – 375  Sex)  D B + (–350 – 150  Sex)  D C + (75 – 50  Sex)  D D + (  Sex)  D E

Discount pred =  Age –  Income  Sex + (240 – 375  Sex)  D B + (–350 – 150  Sex)  D C + (75 – 50  Sex)  D D + (  Sex)  D E – Given a male (Sex=0) customer, you’d expect Bob (D B =1) to give a greater discount (by $240-$375  0 = $240) than Andy – Given a female (Sex=1) customer, you’d expect Bob to give a smaller discount (by $240-$375  1 = -$135) than Andy – Chuck has been giving smaller discounts to both men and women than has Andy, and Dave and Ed have been giving larger discounts than Andy to both sexes – And we could take the same approach to investigate whether “Salesperson” interacts with Age, including also D B  Age, D C  Age, D D  Age, D E  Age in our model

Outliers An outlier is a sample observation which fails to “fit” with the rest of the sample data. Such observations may distort the results of an entire study. – Types of outliers (three) – Identification of outliers (via “model analysis”) – Dealing with outliers (perhaps yielding a better model) These issues are dealt with here.here