Lecture 28 Categorical variables: –Review of slides from lecture 27 (reprint of lecture 27 categorical variables slides with typos corrected) –Practice.

Slides:



Advertisements
Similar presentations
Class 18 – Thursday, Nov. 11 Omitted Variables Bias
Advertisements

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Stat 112: Lecture 17 Notes Chapter 6.8: Assessing the Assumption that the Disturbances are Independent Chapter 7.1: Using and Interpreting Indicator Variables.
Lecture 23: Tues., Dec. 2 Today: Thursday:
Lecture 1 Outline: Tue, Jan 13 Introduction/Syllabus Course outline Some useful guidelines Case studies and
Chapter 12 Simple Regression
BA 555 Practical Business Analysis
Stat 112: Lecture 22 Notes Chapter 9.1: One-way Analysis of Variance. Chapter 9.3: Two-way Analysis of Variance Homework 6 is due on Friday.
Lecture 15: Tues., Mar. 2 Inferences about Linear Combinations of Group Means (Chapter 6.2) Chi-squared test (Handout/Notes) Thursday: Simple Linear Regression.
Lecture 14 – Thurs, Oct 23 Multiple Comparisons (Sections 6.3, 6.4). Next time: Simple linear regression (Sections )
Lecture 23: Tues., April 6 Interpretation of regression coefficients (handout) Inference for multiple regression.
Class 22: Tuesday, Nov. 30th Today: One-way analysis of variance I will you tonight or tomorrow morning with comments on your project. Schedule:
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Lecture 26 Omitted Variable Bias formula revisited Specially constructed variables –Interaction variables –Polynomial terms for curvature –Dummy variables.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Lecture 24: Thurs., April 8th
Lecture 27 Polynomial Terms for Curvature Categorical Variables.
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
Stat 112: Lecture 20 Notes Chapter 7.2: Interaction Variables. Chapter 8: Model Building. I will Homework 6 by Friday. It will be due on Friday,
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
Data Analysis Statistics. Inferential statistics.
Class 20: Thurs., Nov. 18 Specially Constructed Explanatory Variables –Dummy variables for categorical variables –Interactions involving dummy variables.
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and.
Statistics 350 Lecture 17. Today Last Day: Introduction to Multiple Linear Regression Model Today: More Chapter 6.
Stat 112: Lecture 16 Notes Finish Chapter 6: –Influential Points for Multiple Regression (Section 6.7) –Assessing the Independence Assumptions and Remedies.
Stat 112: Lecture 9 Notes Homework 3: Due next Thursday
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Leedy and Ormrod Ch. 11 Gray Ch. 14
AM Recitation 2/10/11.
Chapter 13: Inference in Regression
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
Hypothesis Testing in Linear Regression Analysis
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.
Simple Linear Regression
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 14 Comparing Groups: Analysis of Variance Methods Section 14.2 Estimating Differences.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
CHAPTER 14 MULTIPLE REGRESSION
Statistics and Quantitative Analysis U4320 Segment 12: Extension of Multiple Regression Analysis Prof. Sharyn O’Halloran.
Stat 112 Notes 17 Time Series and Assessing the Assumption that the Disturbances Are Independent (Chapter 6.8) Using and Interpreting Indicator Variables.
Stat 112 Notes 20 Today: –Interaction Variables (Chapter ) –Interpreting slope when Y is logged but not X –Model Building (Chapter 8)
Copyright © 2011 Pearson Education, Inc. Analysis of Variance Chapter 26.
Education 793 Class Notes Presentation 10 Chi-Square Tests and One-Way ANOVA.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 26.
Chapter 13 Multiple Regression
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Stat 112 Notes 9 Today: –Multicollinearity (Chapter 4.6) –Multiple regression and causal inference.
Experimental Research Methods in Language Learning Chapter 10 Inferential Statistics.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 12/6/12 Synthesis Big Picture Essential Synthesis Bayesian Inference (continued)
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 14 Comparing Groups: Analysis of Variance Methods Section 14.3 Two-Way ANOVA.
N318b Winter 2002 Nursing Statistics Specific statistical tests Chi-square (  2 ) Lecture 7.
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
Stat 112 Notes 5 Today: –Chapter 3.7 (Cautions in interpreting regression results) –Normal Quantile Plots –Chapter 3.6 (Fitting a linear time trend to.
Stat 112 Notes 23. Quiz 4 Info 4 double sided sheets of notes Covers interactions, models with categorical variables and interactions, one way analysis.
Soc 3306a Lecture 7: Inference and Hypothesis Testing T-tests and ANOVA.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 26 Analysis of Variance.
SUMMARY EQT 271 MADAM SITI AISYAH ZAKARIA SEMESTER /2015.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Stats Methods at IC Lecture 3: Regression.
Applied Biostatistics: Lecture 2
CHAPTER 29: Multiple Regression*
Soc 3306a Lecture 11: Multivariate 4
Simple Linear Regression
Presentation transcript:

Lecture 28 Categorical variables: –Review of slides from lecture 27 (reprint of lecture 27 categorical variables slides with typos corrected) –Practice Problem Review of main themes from course Directions for future study

Sex discrimination revisited At the beginning of the class, in case study 1.2, we examined data from a sex discrimination case. Strong evidence that male clerks are paid more than female hires. But bank’s defense lawyers say that this is because males have higher education and experience, i.e., there are omitted confounding variables.

Multiple regression model for sex discrimination Let’s look at controlling for education level first. To examine bank’s claim, we want to look at and compare to How do we incorporate a categorical explanatory variable into multiple regression? Dummy variables.

Dummy variables Define Multiple regression model:, the coefficient on the dummy variable for sex, is the difference in mean earnings between the populations of men and women with the same education levels.

Categorical variables in JMP To color and mark the points by a categorical variable such as Sex, click red triangle to left on first column and select Color or Mark by Column. Select Set Marker by Value to use different marker by column.

Parallel Regression Lines The model implies that Regression lines for males and females as education varies are parallel. No interaction between sex and education.

Plot produced by JMP version 5 in Fit Model output that shows the parallel regression lines and the actual observations.

Interactions with Dummy Variables The model assumes that difference between men and women’s mean salaries for fixed levels of education is the same for all levels of education. There might be an interaction between sex and education. Difference between men and women might differ depending on level of education.

Interaction Model Multiple regression model that allows for interaction between sex and education: To add interaction in JMP, create a new colun sexdummy*educ. Right click on column, select formula and use the formula sexdummy*educ.. Difference in mean salary between men and women of same education level depends on the education level.

The model with one continuous explanatory variable, one categorical variable and an interaction is called the separate regression lines model because regression lines of y on continuous explanatory variables for two levels of dummy variable are “separate,” neither coincident nor parallel.

Multiple regression with education, experience and sex We can easily control for both education and experience in the sex discrimination case by adding them both to the multiple regression. A model without interactions is: Note that is difference between mean salaries of males and females of same education and experience level.

Course Summary Techniques: –Methods for comparing two groups –Methods for comparing more than two groups (one-way ANOVA F test, multiple comparisons) –Method for testing hypothesis about distribution of one population of nominal variable (chi-squared test) –Simple and multiple linear regression for predicting a response variable based on explanatory variables and (with a random experiment or no omitted confounding variables) finding the causal effect of explanatory variables on a response variable.

Course Summary Cont. Key messages: –Always do a randomized experiment if possible. Inferences about causal effects from observational studies require the always questionable assumption that there are no omitted confounding variables. Similarly, always take a random sample if possible. –p-values only assesses whether there is strong evidence against the null hypothesis. They do not provide information about practical significance. –Always form confidence intervals for the parameters (e.g., difference in means, regression coefficients) in addition to making point estimates and doing hypothesis tests. Confidence intervals provide information about the accuracy of the estimate and the practical significance of the finding.

Course Summary Cont. Key messages: –Beware of multiple comparisons and data snooping. Use Tukey-Kramer method or Bonferroni to adjust for multiple comparisons. –Simple/multiple linear regression is a powerful method for making predictions of a variable y based on explanatory variables. However, beware of extrapolation. –Multiple regression can be used to control for known confounding variables in order to obtain good estimates of the causal effect of a variable on an outcome. However, if there are omitted confounding variables, the estimate of the causal effect will be biased. The sign and magnitude of the bias is indicated by the omitted variable bias formula.

Directions for Future Study Stat 500: Applied Regression and Analysis of Variance. Offered next fall. Natural follow-up to Stat 112, giving a more advanced treatment of the topics in 112. Stat 501: Introduction to Nonparametric Methods and Log-linear models. Offered this spring. Follow-up to Stat 500. Stat 430: Probability. Will be offered next fall and next spring. Stat 431: Statistical Inference. Will be offered next fall and next spring. Stat 210: Sample Survey Design. Stat 202: Intermediate Statistics.