More on regression Petter Mostad 2005.10.24. More on indicator variables If an independent variable is an indicator variable, cases where it is 1 will.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
Ch11 Curve Fitting Dr. Deshi Ye
Chapter 8 Linear Regression © 2010 Pearson Education 1.
Objectives (BPS chapter 24)
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Chapter 13 Additional Topics in Regression Analysis
Additional Topics in Regression Analysis
Petter Mostad Linear regression Petter Mostad
Lecture 24 Multiple Regression (Sections )
Chapter 11 Multiple Regression.
Topic 3: Regression.
Multiple Linear Regression
Quantitative Business Analysis for Decision Making Simple Linear Regression.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Business Statistics - QBM117 Statistical inference for regression.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Introduction to Regression Analysis, Chapter 13,
Simple Linear Regression Analysis
Relationships Among Variables
Multiple Linear Regression A method for analyzing the effects of several predictor variables concurrently. - Simultaneously - Stepwise Minimizing the squared.
Correlation & Regression
Objectives of Multiple Regression
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
What does it mean? The variance of the error term is not constant
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
Chap 14-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 14 Additional Topics in Regression Analysis Statistics for Business.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Advanced topics in regression Tron Anders Moger
Topic 10 - Linear Regression Least squares principle - pages 301 – – 309 Hypothesis tests/confidence intervals/prediction intervals for regression.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Chapter 14 Inference for Regression © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Autocorrelation in Time Series KNNL – Chapter 12.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
1Spring 02 Problems in Regression Analysis Heteroscedasticity Violation of the constancy of the variance of the errors. Cross-sectional data Serial Correlation.
Chapter Three TWO-VARIABLEREGRESSION MODEL: THE PROBLEM OF ESTIMATION
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Model Building and Model Diagnostics Chapter 15.
 Relationship between education level, income, and length of time out of school  Our new regression equation: is the predicted value of the dependent.
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
Sampling and estimation Petter Mostad
I271B QUANTITATIVE METHODS Regression and Diagnostics.
ANOVA, Regression and Multiple Regression March
B AD 6243: Applied Univariate Statistics Multiple Regression Professor Laku Chidambaram Price College of Business University of Oklahoma.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
ANAREGWEEK 14 AUTOCORRELATION IN TIME SRIES DATA  Problems of autocorrelation  First-order autoregressive error model  Durbin-Watson test for autocorrelation.
1 Regression Review Population Vs. Sample Regression Line Residual and Standard Error of Regression Interpretation of intercept & slope T-test, F-test.
Stat 112 Notes 14 Assessing the assumptions of the multiple regression model and remedies when assumptions are not met (Chapter 6).
LECTURE 04: LINEAR REGRESSION PT. 2 February 3, 2016 SDS 293 Machine Learning.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
Assumptions & Requirements.  Three Important Assumptions 1.The errors are normally distributed. 2.The errors have constant variance (i.e., they are homoscedastic)
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
The simple linear regression model and parameter estimation
Virtual COMSATS Inferential Statistics Lecture-26
CHAPTER 29: Multiple Regression*
Autocorrelation.
6-1 Introduction To Empirical Models
Prepared by Lee Revere and John Large
Simple Linear Regression
Simple Linear Regression
Regression Forecasting and Model Building
Chapter 13 Additional Topics in Regression Analysis
Autocorrelation.
Presentation transcript:

More on regression Petter Mostad

More on indicator variables If an independent variable is an indicator variable, cases where it is 1 will just have an addition to the constant term To use different slopes for these cases, additional variables must be added (products of predictors and indicators) By viewing the constant term as a data column, we can express the models more symmetrically

Several indicator variables A model with two indicator variables will assume that the effect of one indicator adds to the effect of the other If this may be unsuitable, use an additional interaction variable (product of indicators) For categorical variables with m possible values, use m-1 indicators.

Logistic regression What if the dependent variable is an indicator variable? The model then has two stages: First, we predict a value z i from predictors as before, then the probability of indicator value 1 is given by Given data, we can estimate coefficients in a similar way as before

Experimental design So far, we have considered data as given; to the extent that we can control what data we have, how should we choose to set the independent variables? –Choice of variables –Choice of values for these variables

Choice of variables Include variables which you believe have a clear influence on the dependent variable, even if the variable is ”uninteresting”: This helps find the true relationship between ”interesting” variables and the dependent. Avoid including a pair (or a set) of variables whose values are clearly linearily related

Multicollinearity To discover it, make plots and compute correlations (or make a regression of one parameter on the others) To deal with it: –Remove unnecessary variables –Define and compute an ”index” –If variables are kept, model could still be used for prediction

Specification bias Unless two independent variables are uncorrelated, the estimation of one will influence the estimation of the other Not including one variable which bias the estimation of the other Thus, one should be humble when interpreting regression results: There are probably always variables one could have added

Choice of values Should have a good spread: Again, avoid collinearity Should cover the range for which the model will be used For categorical variables, one may choose to combine levels in a systematic way.

Generating experimental designs For n binary variables, there are 2 n ways to set them in different combinations. If 2 n is too big, there are systematic ways to choose from these 2 n experiments. If 2 n is too small, we can use several experiments at each setting.

Heteroscedasticity – what is it? In the standard regression model it is assumed that all have the same variance. If the variance varies with the independent variables or dependent variable, the model is heteroscedastic. Sometimes, it is clear that data exhibit such properties.

Heteroscedasticity – why does it matter? Our standard methods for estimation, confidence intervals, and hypothesis testing assume equal variances. If we go on and use these methods anyway, our answers might be quite wrong!

Heteroscedasticity – how to detect it? Fit a regression model, and study the residuals –make a plot of them against independent variables –make a plot of them against the predicted values for the dependent variable Possibility: Test for heteroscedasticity by doing a regression of the squared residuals on the predicted values.

Heteroscedasticity – what to do about it? Using a transformation of the dependent variable –log-linear models If the standard deviation of the errors appears to be proportional to the predicted values, a two-stage regression analysis is a possibility

Dependence over time Sometimes, y 1, y 2, …, y n are not completely independent observations (given the independent variables). –Lagged values: y i may depend on y i-1 in addition to its independent variables –Autocorrelated errors: Successive observations y i, y i+1,… depend similarily on unobserved variables

Lagged values In this case, we may run a multiple regression just as before, but including the previous dependent variable y i-1 as a predictor variable for y i.

Autocorrelated errors In the standard regression model, the errors are independent. Using standard regression formulas anyway can lead to errors: Typically, the uncertainty in the result is underestimated. –Example: Taking observations closer and closer together in time will not increase your knowledge about regression parameters beyond a certain point

Autocorrelation – how to detect? Plotting residuals against time! The Durbin-Watson test compares the possibility of independent errors with a first-order autoregressive model: Test statistic: Option in SPSS

Autocorrelation – what to do? It is possible to use a two-stage regression procedure: –If a first-order auto-regressive model with parameter is appropriate, the model will have uncorrelated errors Estimate from the Durbin-Watson statistic, and estimate from the model above