Presentation and Data  Short Courses  Regression Analysis Using JMP  Download Data to Desktop 1.

Slides:



Advertisements
Similar presentations
Multiple Regression and Model Building
Advertisements

Chapter 9: Regression Analysis
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Chapter 3 Bivariate Data
Chapter 13 Multiple Regression
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
Chapter 2: Looking at Data - Relationships /true-fact-the-lack-of-pirates-is-causing-global-warming/
To accompany Quantitative Analysis for Management, 9e by Render/Stair/Hanna 4-1 © 2006 by Prentice Hall, Inc., Upper Saddle River, NJ Chapter 4 RegressionModels.
BA 555 Practical Business Analysis
Chapter 12 Multiple Regression
January 6, morning session 1 Statistics Micro Mini Multiple Regression January 5-9, 2008 Beth Ayers.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Chapter Topics Types of Regression Models
Correlation and Regression. Correlation What type of relationship exists between the two variables and is the correlation significant? x y Cigarettes.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Topic 3: Regression.
Multiple Linear Regression
Introduction to Probability and Statistics Linear Regression and Correlation.
Ch. 14: The Multiple Regression Model building
Ch 2 and 9.1 Relationships Between 2 Variables
Correlation and Regression Analysis
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Chapter 7 Forecasting with Simple Regression
Introduction to Regression Analysis, Chapter 13,
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Simple Linear Regression Analysis
April 6, 2010 Generalized Linear Models 2010 LISA Short Course Series Mark Seiss, Dept. of Statistics.
Correlation & Regression
Quantitative Business Analysis for Decision Making Multiple Linear RegressionAnalysis.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Regression and Correlation Methods Judy Zhong Ph.D.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Introduction to Linear Regression and Correlation Analysis
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Chapter 13: Inference in Regression
Simple Linear Regression
Multiple Regression Analysis
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Correlation & Regression
Examining Relationships in Quantitative Research
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
November 5, 2008 Logistic and Poisson Regression: Modeling Binary and Count Data LISA Short Course Series Mark Seiss, Dept. of Statistics.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Linear correlation and linear regression + summary of tests
Chapters 8 & 9 Linear Regression & Regression Wisdom.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Chapter 16 Data Analysis: Testing for Associations.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Chapter 11: Linear Regression and Correlation Regression analysis is a statistical tool that utilizes the relation between two or more quantitative variables.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Simple Linear Regression The Coefficients of Correlation and Determination Two Quantitative Variables x variable – independent variable or explanatory.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
1 BUSI 6220 By Dr. Nick Evangelopoulos, © 2012 Brief overview of Linear Regression Models (Pre-MBA level)
BINARY LOGISTIC REGRESSION
Chapter 9 Multiple Linear Regression
Stats Club Marnie Brennan
Prepared by Lee Revere and John Large
Adequacy of Linear Regression Models
Adequacy of Linear Regression Models
Presentation transcript:

Presentation and Data  Short Courses  Regression Analysis Using JMP  Download Data to Desktop 1

Mark Seiss, Dept. of Statistics Regression Analysis Using JMP February 28, 2012

Presentation Outline 1. Simple Linear Regression 2. Multiple Linear Regression 3. Regression with Binary and Count Response Variables

Presentation Outline Questions/Comments Individual Goals/Interests

Simple Linear Regression 1.Definition 2.Correlation 3.Model and Estimation 4.Coefficient of Determination (R 2 ) 5.Assumptions 6.Example

Simple Linear Regression Simple Linear Regression (SLR) is used to study the relationship between a variable of interest and another variable. Both variables must be continuous Variable of interest known as Response or Dependent Variable Other variable known as Explanatory or Independent Variable Objectives Determine the significance of the explanatory variable in explaining the variability in the response (not necessarily causation). Predict values of the response variable for given values of the explanatory variable.

Simple Linear Regression Scatterplots are used to graphically examine the relationship between two quantitative variables. Linear or Non-linear Positive or Negative

Simple Linear Regression Positive Linear Relationship Non-Linear RelationshipNo Relationship Negative Linear Relationship

Simple Linear Regression Correlation Measures the strength of the linear relationship between two quantitative variables. Pearson Correlation Coefficient Assumption of normality Calculation: Spearman’s Rho and Kendall’s Tau are used for non-normal quantitative variables.

Simple Linear Regression Properties of Pearson Correlation Coefficient -1 ≤ r ≤ 1 Positive values of r: as one variable increases, the other increases Negative values of r: as one variable increases, the other decreases Values close to 0 indicate no linear relationship between the two variables Values close to +1 or -1 indicated strong linear relationships Important note: Correlation does not imply causation

Simple Linear Regression Pearson Correlation Coefficient: General Guidelines 0 ≤ |r| < 0.2 : Very Weak linear relationship 0.2 ≤ |r| < 0.4 : Weak linear relationship 0.4 ≤ |r| < 0.6 : Moderate linear relationship 0.6 ≤ |r| < 0.8 : Strong linear relationship 0.8 ≤ |r| < 1.0 : Very Strong linear relationship

Simple Linear Regression The Simple Linear Regression Model Basic Model: response = deterministic + stochastic Deterministic: model of the linear relationship between X and Y Stochastic: Variation, uncertainty, and miscellaneous factors Model y i = value of the response variable for the i th observation x i = value of the explanatory variable for the i th observation β 0 = y-intercept β 1 = slope ε i = random error, iid Normal(0,σ 2 )

Simple Linear Regression Least Square Estimation Predicted Values Residuals

Simple Linear Regression Interpretation of Parameters β 0 :Value of Y when X=0 β 1 :Change in the value of Y with an increase of 1 unit of X (also known as the slope of the line) Hypothesis Testing β 0 -Test whether the true y-intercept is different from 0 Null Hypothesis: β 0 =0 Alternative Hypothesis: β 0 ≠0 β 1 -Test whether the slope is different from 0 Null Hypothesis: β 1 =0 Alternative Hypothesis: β 1 ≠0

Simple Linear Regression Analysis of Variance (ANOVA) for Simple Linear Regression SourceDfSum of Squares Mean SquareF RatioP-value Model1SSRSSR/1F 1 =MSR/MSEP(F>F 1,1-α,1,n-2) Errorn-2SSESSE/(n-2) Totaln-1SST

Simple Linear Regression

Coefficient of Determination (R 2 ) Percent variation in the response variable (Y) that is explained by the least squares regression line 0 ≤ R 2 ≤ 1 Calculation:

Simple Linear Regression Assumptions of Simple Linear Regression 1.Independence Residuals are independent of each other Related to the method in which the data were collected or time related data Tested by plotting time collected vs. residuals Parametric test: Durbin-Watson Test 2.Constant Variance Variance of the residuals is constant Tested by plotting predicted values vs. residuals Parametric test:Brown-Forsythe Test

Simple Linear Regression Assumptions of Simple Linear Regression 3.Normality Residuals are normally distributed Tested by evaluating histograms and normal-quantile plots of residuals Parametric test: Shapiro Wilkes test

Simple Linear Regression Constant Variance: Plot of Fitted Values vs. Residuals Good Residual Plot: No Pattern Bad Residual Plot: Variability Increasing Predicted Values

Simple Linear Regression Normality: Histogram and Q-Q Plot of Residuals Normal Assumption Appropriate Normal Assumption Not Appropriate

Simple Linear Regression Some Remedies Non-Constant Variance: Weight Least Squares Non-normality:Box-Cox Transformation Dependence:Auto-Regressive Models

Simple Linear Regression Example Dataset: Chirps of Ground Crickets Pierce (1949) measure the frequency (the number of wing vibrations per second) of chirps made by a ground cricket, at various ground temperature. Filename: chirp.jmp

Simple Linear Regression Questions/Comments about Simple Linear Regression

Multiple Linear Regression 1.Definition 2.Categorical Explanatory Variables 3.Model and Estimation 4.Adjusted Coefficient of Determination 5.Assumptions 6.Model Selection 7.Example

Multiple Linear Regression Explanatory Variables Two Types: Continuous and Categorical Continuous Predictor Variables Examples – Time, Grade Point Average, Test Score, etc. Coded with one parameter – β # x # Categorical Predictor Variables Examples – Sex, Political Affiliation, Marital Status, etc. Actual value assigned to Category not important Ex) Sex - Male/Female, M/F, 1/2, 0/1, etc. Coded Differently than continuous variables

Multiple Linear Regression Categorical Explanatory Variables Consider a categorical explanatory variable with L categories One category selected as reference category Assignment of Reference Category is arbitrary Variable represented by L-1 dummy variables Model Identifiability Effect Coding (Used in JMP) x k = 1 if explanatory variable is equal to category k 0 otherwise x k = -1 for all k if explanatory variable equals category I

Multiple Linear Regression Similar to simple linear regression, except now there is more than one explanatory variable, which may be quantitative and/or qualitative. Model y i = value of the response variable for the i th observation x #i = value of the explanatory variable # for the i th observation β 0 = y-intercept β # = parameter corresponding to explanatory variable # ε i = random error, iid Normal(0,σ 2 )

Multiple Linear Regression Least Square Estimation Predicted Values Residuals

Multiple Linear Regression Interpretation of Parameters β 0 :Value of Y when X=0 Β # :Change in the value of Y with an increase of 1 unit of X # in the presence of the other explanatory variables Hypothesis Testing β 0 -Test whether the true y-intercept is different from 0 Null Hypothesis: β 0 =0 Alternative Hypothesis: β 0 ≠0 Β # -Test of whether the value change in Y with an increase of 1 unit in X# is different from 0 in the presence of the other explanatory variables. Null Hypothesis: β # =0 Alternative Hypothesis: β # ≠0

Multiple Linear Regression Adjusted Coefficient of Determination (R 2 ) Percent variation in the response variable (Y) that is explained by the least squares regression line with explanatory variables x 1, x 2,…,x p Calculation of R 2 : The R 2 value will increase as explanatory variables added to the model The adjusted R 2 introduces a penalty for the number of explanatory variables.

Multiple Linear Regression Other Model Evaluation Statistics Akaike Information Criterion (AIC or AICc) Schwartz Information Criterion (SIC) Bayesian Information Criterion (BIC) Mallows’ C p Prediction Sum of Squares (PRESS)

Multiple Linear Regression Model Selection 2 Goals:Complex enough to fit the data well Simple to interpret, does not overfit the data Study the effect of each explanatory variable on the response Y Continuous Variable – Graph Y versus X Categorical Variable - Boxplot of Y for categories of X

Multiple Linear Regression Model Selection cont. Multicollinearity Correlations among explanatory variables resulting in an increase in variance Reduces the significance value of the variable Occurs when several explanatory variables are used in the model

Multiple Linear Regression Algorithmic Model Selection Backward Selection: Start with all explanatory variables in the model and remove those that are insignificant Forward Selection:Start with no explanatory variables in the model and add best explanatory variables one at a time Stepwise Selection:Start with two forward selection steps then alternate backward and forward selection steps until no variables to add or remove

Multiple Linear Regression Example Dataset:Discrimination in Salaries A researcher was interested in whether there was discrimination in the salaries of tenure track professors at a small college. The professor collected six variables from 52 professors. Filename: Salary.xls Reference: S. Weisberg (1985). Applied Linear Regression, Second Edition. New York: John Wiley and Sons. Page 194.

Multiple Linear Regression Other Multiple Linear Regression Issues Outliers Interaction Terms Higher Order Terms

Multiple Linear Regression Questions/Comments about Multiple Linear Regression

Regression with Non-Normal Response 1.Logistic Regression with Binary Response 2. Poisson Regression with Count Response

Logistic Regression Consider a binary response variable. Variable with two outcomes One outcome represented by a 1 and the other represented by a 0 Examples: Does the person have a disease? Yes or No Who is the person voting for?McCain or Obama Outcome of a baseball game? Win or loss

Logistic Regression Consider the linear probability model where y i = response for observation i x i = quantitative explanatory variable Predicted values represent the probability of Y=1 given X Issue: Predicted probability for some subjects fall outside of the [0,1] range.

Logistic Regression Consider the logistic regression model Predicted values from the regression equation fall between 0 and 1

Logistic Regression Interpretation of Coefficient β – Odds Ratio The odds ratio is a statistic that measures the odds of an event compared to the odds of another event. Say the probability of Event 1 is π 1 and the probability of Event 2 is π 2. Then the odds ratio of Event 1 to Event 2 is: Value of Odds Ratio range from 0 to Infinity Value between 0 and 1 indicate the odds of Event 2 are greater Value between 1 and infinity indicate odds of Event 1 are greater Value equal to 1 indicates events are equally likely

Logistic Regression Example Dataset: A researcher is interested how GRE exam scores, GPA, and prestige of a students undergraduate institution affect admission into graduate school. Filename: Admittance.csv Important Note:JMP models the probability of the 0 category

Poisson Regression Consider a count response variable. Response variable is the number of occurrences in a given time frame. Outcomes equal to 0, 1, 2, …. Examples: Number of penalties during a football game. Number of customers shop at a store on a given day. Number of car accidents at an intersection.

Poisson Regression Consider the model where y i = response for observation i x i = quantitative explanatory variable for observation i Issue: Predicted values range from -∞ to +∞

Poisson Regression Consider the Poisson log-linear model Predicted response values fall between 0 and +∞ In the case of a single predictor, An increase of one unit of x results an increase of exp(β) in μ

Poisson Regression Example Data Set: Researchers are interested in the number of awards earned by students at a high school. Other variables measured as possible explanatory variables include type of program in which the student was enrolled (vocational, general, or academic), and the final score on their math final exam. Filename: Awards.csv

Attendee Questions If time permits