Warsaw Summer School 2015, OSU Study Abroad Program Regression.

Slides:



Advertisements
Similar presentations
Inference for Regression
Advertisements

1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Simple Linear Regression and Correlation
Definition  Regression Model  Regression Equation Y i =  0 +  1 X i ^ Given a collection of paired data, the regression equation algebraically describes.
Correlation and regression
Chapter 15 (Ch. 13 in 2nd Can.) Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
Statistics for the Social Sciences
9. SIMPLE LINEAR REGESSION AND CORRELATION
Statistics for Business and Economics
The Simple Regression Model
SIMPLE LINEAR REGRESSION
REGRESSION AND CORRELATION
SIMPLE LINEAR REGRESSION
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Correlation 1. Correlation - degree to which variables are associated or covary. (Changes in the value of one tends to be associated with changes in the.
Correlation and Regression Analysis
Linear Regression/Correlation
Relationships Among Variables
Lecture 5 Correlation and Regression
Chapter 8: Bivariate Regression and Correlation
Lecture 15 Basics of Regression Analysis
SIMPLE LINEAR REGRESSION
Correlation and Linear Regression
ASSOCIATION BETWEEN INTERVAL-RATIO VARIABLES
Simple Linear Regression Models
Correlation and Regression. The test you choose depends on level of measurement: IndependentDependentTest DichotomousContinuous Independent Samples t-test.
Chapter 15 Correlation and Regression
EQT 272 PROBABILITY AND STATISTICS
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
INTRODUCTORY LINEAR REGRESSION SIMPLE LINEAR REGRESSION - Curve fitting - Inferences about estimated parameter - Adequacy of the models - Linear.
Chapter 12 Examining Relationships in Quantitative Research Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
Statistics for the Social Sciences Psychology 340 Fall 2013 Correlation and Regression.
Chapter 8 – 1 Chapter 8: Bivariate Regression and Correlation Overview The Scatter Diagram Two Examples: Education & Prestige Correlation Coefficient Bivariate.
© 2001 Prentice-Hall, Inc. Statistics for Business and Economics Simple Linear Regression Chapter 10.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Soc 3306a Lecture 9: Multivariate 2 More on Multiple Regression: Building a Model and Interpreting Coefficients.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Environmental Modeling Basic Testing Methods - Statistics III.
Chapter 14 Correlation and Regression
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
Regression Analysis. 1. To comprehend the nature of correlation analysis. 2. To understand bivariate regression analysis. 3. To become aware of the coefficient.
Advanced Statistical Methods: Continuous Variables REVIEW Dr. Irina Tomescu-Dubrow.
Essentials of Business Statistics: Communicating with Numbers By Sanjiv Jaggia and Alison Kelly Copyright © 2014 by McGraw-Hill Higher Education. All rights.
Lecture 10 Introduction to Linear Regression and Correlation Analysis.
Multiple Independent Variables POLS 300 Butz. Multivariate Analysis Problem with bivariate analysis in nonexperimental designs: –Spuriousness and Causality.
REGRESSION AND CORRELATION SIMPLE LINEAR REGRESSION 10.2 SCATTER DIAGRAM 10.3 GRAPHICAL METHOD FOR DETERMINING REGRESSION 10.4 LEAST SQUARE METHOD.
11-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
Describing Bivariate Relationships. Bivariate Relationships When exploring/describing a bivariate (x,y) relationship: Determine the Explanatory and Response.
The simple linear regression model and parameter estimation
Inference for Least Squares Lines
Relationship with one independent variable
Simple Linear Regression
Correlation and Simple Linear Regression
Simple Linear Regression
Correlation and Simple Linear Regression
Relationship with one independent variable
Simple Linear Regression and Correlation
Simple Linear Regression
Regression & Prediction
Introduction to Regression
Warsaw Summer School 2017, OSU Study Abroad Program
Presentation transcript:

Warsaw Summer School 2015, OSU Study Abroad Program Regression

Linear Relationship The line = a mathematical function that can be expressed through the formula Y = a + bX, where Y & X are our variables. Y, the dependent variable, is expressed as a linear function of the independent (explanatory) variable X.

Linear Relationship

The constant a = value of Y at the point in which the line Y = a + bX intersects the Y-axis (also called the intercept). The slope b equals the change in Y for a one-unit increase in X (one-unit increase in X corresponds to a change of b units in Y). The slope describes the rate of change in Y-values, as X increases. Verbal interpretation of the slope of the line: “Rise over run”: the rise divided by the run (the change in the vertical distance is divided by the change in the horizontal distance).

Linear Relationship The constant a = value of Y at the point in which the line Y = a + bX intersects the Y-axis (also called the intercept). The slope b equals the change in Y for a one-unit increase in X (one-unit increase in X corresponds to a change of b units in Y). The slope describes the rate of change in Y-values, as X increases. Verbal interpretation of the slope of the line: “Rise over run”: the rise divided by the run (the change in the vertical distance is divided by the change in the horizontal distance).

Cartesian Coordinate System Variables X, Y and their linear function: The formula Y = a + bX expresses the dependent (response) variable Y as a linear function of the independent (explanatory) variable X. The formula maps out a strait- line graph with slope b and Y-intercept a.

Basics Linear Relationship: Y = a + bX The constant a is the value of Y when X = 0. For X = 0 we have: Y = a + b*0 = a The constant a is the value of Y where the line Y = a + bX intersects the Y-axis. The slope b equals the change in Y for a one-unit increase in X. This means that one-unit increase in X corresponds to a change of b units in Y. Thus, the slope describes the rate of change in the Y-values as X increases. Generally, b = (Y - a) / X

Model vs Reality The function Y = a + bX is a model In reality we do not have one line

The Scatter gram and Least Squares Method The graphical plot of observed values (X,Y) is called a - scatter-gram - scatter-diagram - scatter-plot. A regression function is a function that describes how the expected value of the dependent (response) variable Y changes according to the values of an independent (explanatory) variable X.

Regression This expected value is estimated by a linear function: Ý = a + bX Ý =predicted value for the dependent variable, Y a = the intercept (the value of Y when X = 0) b = the regression coefficient (the slope), indicating the amount of change in Y given a unit change in X X =the independent variable

Regression Ý = a + bX b = [Σ(X - X̃)(Y - Ÿ )] / Σ(X - X̃) 2 a = Ý - b*X

Method of Least Squares The prediction errors, called residuals, are defined as the differences between observed and predicted values of Y E = Ý - (a + bX) = Y - Ý Regression line minimizes the sum of error terms: SSE = Σ(Y - Ý) 2

Method of Least Squares The method of least squares provides the prediction equation Ý = a + bX having the minimal value of SSE. The least square estimates a and b are the values determining the prediction equation for which the sum of squared errors SSE is a minimum.

Covariance In the regression analysis we ask: to what extend could we predict Y knowing our variable X? Prediction means that values X and Y go together or co-vary. Covariance is sum of products, or SP, SP = Σ (X - X̃) (Y - Ÿ ) Sums of squares for X: SSx = Σ (X - X̃) 2 Note that in the regression equation of Y on X Ý = bX + a b = SP / SSx

Interpretation of b The slope of the line, b, has the verbal interpretation “ rise over run ” -- that is, the rise divided by the run. This means that the change in the vertical distance is divided by the change in the horizontal distance. The more steep the hill, the higher the slope. You go “ up ” more rapidly than you go over. The line can have a negative slope. When there is negative slope, you are going “ downhill ” rather than “ uphill. ” b > 0, positive relationship b < 0, negative relationship b = 0, no relationship

Linear Relationship The constant a = value of Y at the point in which the line Y = a + bX intersects the Y-axis (also called the intercept). The slope b equals the change in Y for a one-unit increase in X (one-unit increase in X corresponds to a change of b units in Y). The slope describes the rate of change in Y-values, as X increases. Verbal interpretation of the slope of the line: “Rise over run”: the rise divided by the run (the change in the vertical distance is divided by the change in the horizontal distance).

Unststandardized and standardized coefficients If both variables, IV and DV, are expressed in z-scores, a (constant) is equal zero. We obtain Beta coefficients that tell us the following: How much change in the standard deviation units in DV is attributable to the change in IV by one standard deviation.

Two and more IVs Ý = a + b 1 X 1 + b 2 X 2 Ý = β 1 X 1 + β 2 X 2 Ý = a + b 1 X 1 + b 2 X 2 ……….. b k-1 X k-1 + b k X k Ý = β 1 X 1 + β 2 X 2 ……….. β k-1 X k-1 + β k X k

Coefficients and variables The estimated parameters b 1, b 2,..., b k are partial regression coefficients. They are different from regression coefficients for bi-variate relationships between Y and each exploratory variable. Three criteria for a number of independent (exploratory) variables: (1) Theory (2) Parsimony (3) Sample size

R2R2 Coefficient of determination (explained variance) for two variables SS(total) - SS(error) r 2 = SS(total) Stata provides a value of the coefficient of determination for SS(total) - SS(error) R 2 = SS(total)

Sum of squares R 2 is a proportion of explained variance by X 1, X 2,...., X k. Therefore, 1 - R 2 is a proportion of unexplained variance.

Adjusted R-square Adjusted R-square is a modification of R-square that adjusts for the number of terms in a model. R-square always increases when a new term is added to a model, but adjusted R-square increases only if the new term improves the model more than would be expected by chance.

Sum of Squares The Regression SUM of SQUARES is defined: SS(regression) = SS(total) – SS(error)

Mean square The Regression MEAN SQUARE MSS(regression) = SS(regression) / df-v df-v = k where k is a number of variables The MEAN SQUARE ERROR MSS(error) = SS(error) / df df-t = n - (k + 1) where n is a number of cases and k is a number of variables.

F The null hypothesis Ho: b 1 = b 2 = … = b k = 0 MSS(model) F = MSS(error) The sampling distribution of this statistic is the F-distribution

t The test of H 0 : b k = 0 evaluates whether Y and X are statistically dependent, ignoring other variables. We use the t statistic b t = σ B where σ B is a standard error of B SS(error) σ B = n - 2

ANOVA ANALYSIS OF VARIANCE How much of the variance is explained by values of the nominal variable? Total sum of squared variation from the mean: SS(total) = Σ [X – X̃(total)] 2

ANOVA The between group variation represents the squared deviations of every group mean from the total mean: SS(between) = Σ [X̃(group) – X̃(total)] 2 The within-group sum of squares is the sum of every raw score from its group mean: SS(within) = Σ [X – X̃(group)] 2

ANOVA Mean Squares: MSS(between) = SS(between) / df(between) where df(between) = k – 1 MSS(within) = SS(within) / df(within) where df(within) = N - k

F F-statistic MSS(between) F = MSS(within) The larger the F-value, the greater the impact of a group on the dependent variable.

F Compare: MSS(between) F = MSS(within) MSS(regression) F = Regression ANOVA MSS(error)

Stata

ANOVA Source - Model, Residual, and Total. The Total variance is partitioned into the variance which can be explained by the independent variables (Model) and the variance which is not explained by the independent variables (Residual, sometimes called Error). SS - Sum of Squares associated with the three sources of variance, Total, Model and Residual. df - Degrees of freedom associated with the sources of variance. The total variance has N-1 degrees of freedom. The model degrees of freedom = the number of coefficients + intercept minus 1. The Residual degrees of freedom is the DF total minus the DF model. MS - Mean Squares, the Sum of Squares divided by their respective DF.

Regression Number of observations used in the regression analysis. The F-statistic is the Mean Square Model divided by the Mean Square Residual. The numbers in parentheses are the Model and Residual degrees of freedom. Prob > F - This is the p-value associated with the above F- statistic. It is used in testing the null hypothesis that all of the model coefficients are 0. R-squared - R-Squared is the proportion of variance in the dependent variable which can be explained by the independent variables. Adj R-squared - This is an adjustment of the R-squared that penalizes the addition of extraneous predictors to the model. Adjusted R-squared is computed using the formula 1 - ((1 - Rsq)((N - 1) /( N - k - 1)) where k is the number of predictors. Root MSE - Root MSE is the standard deviation of the error term, and is the square root of the Mean Square Residual (or Squared Error).