Regression and correlation analysis (RaKA) 1. Investigating the relationships between the statistical characteristics: 2 Investigating the relationship.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Managerial Economics in a Global Economy
Lesson 10: Linear Regression and Correlation
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 ~ Curve Fitting ~ Least Squares Regression Chapter.
Forecasting Using the Simple Linear Regression Model and Correlation
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Linear regression models
Ch11 Curve Fitting Dr. Deshi Ye
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
The Simple Linear Regression Model: Specification and Estimation
Chapter 10 Simple Regression.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Linear Regression and Correlation
The Simple Regression Model
SIMPLE LINEAR REGRESSION
Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Linear Regression and Correlation Analysis
Chapter 11 Multiple Regression.
SIMPLE LINEAR REGRESSION
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Correlation and Regression Analysis
Introduction to Regression Analysis, Chapter 13,
Simple Linear Regression Analysis
Relationships Among Variables
Correlation & Regression
Correlation and Linear Regression
Regression and Correlation Methods Judy Zhong Ph.D.
SIMPLE LINEAR REGRESSION
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Introduction to Linear Regression and Correlation Analysis
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Chapter 13: Inference in Regression
Linear Regression and Correlation
Correlation and Linear Regression
CPE 619 Simple Linear Regression Models Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Simple Linear Regression Models
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Y X 0 X and Y are not perfectly correlated. However, there is on average a positive relationship between Y and X X1X1 X2X2.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
MARKETING RESEARCH CHAPTER 18 :Correlation and Regression.
Regression Analysis © 2007 Prentice Hall17-1. © 2007 Prentice Hall17-2 Chapter Outline 1) Correlations 2) Bivariate Regression 3) Statistics Associated.
Lecture 10: Correlation and Regression Model.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Correlation & Regression Analysis
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Linear Regression and Correlation Chapter GOALS 1. Understand and interpret the terms dependent and independent variable. 2. Calculate and interpret.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Multiple Regression Chapter 14.
CORRELATION-REGULATION ANALYSIS Томский политехнический университет.
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
Chapter 4 Basic Estimation Techniques
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Correlation and Regression
Undergraduated Econometrics
Product moment correlation
Presentation transcript:

Regression and correlation analysis (RaKA) 1

Investigating the relationships between the statistical characteristics: 2 Investigating the relationship between qualitative characteristics, e.g. A  B, called measurement of association Investigating the relationship between quantitative characteristics – Regression and correlation analysis

3

Regression and correlation analysis: 4 examining causal dependency, exploring the relationship between cause and effect When one or more effects (attributes, independent variables) cause resulting effect – dependent variable Y = f (X 1 X 2 …... X k,B o, B 1,….B p ) +e Dependent variable - effect Independent variables - cause Unknown parameters of a functional relationship Random, unspecified effects

Example of false correlation 5 One of the famous spurious correlations: If the skirt lenght gets shorter, quotation of stocks gets higher Apart from that it is not always true, it would be false, or spurious correlation.

Examples of statistical - free - dependence 6 Examination how consumption of pork depend on income, price of pig meat, beef, poultry and tradition resp. another unspecified, or random effects. Examination of dependence of GNP on Labour and Capital... Ivestigation if the nutrition of the population depend on the degree of economic development of the country

Opposite of the statistical dependence is the functional dependence 7 Y = f(X 1 X 2 …... X k,B o, B 1,…., B p ) Where the dependent variable is clearly determined by functional relationship, Examples from physics, chemistry – this kind of relationship is not the subject of statistical investigation.

Regression and correlation analysis (RaKA) 8 Two basic task of RaKA: Regression a) find a functional relationship by which the dependent variable changes with the change of independent variables - find a suitable regression line (function). b) It is also necessary to estimate the parameters of the regression function. Correlation - to measure strength of the examined dependence (relationship).

Illustration of the correlation field in two cases (scatter plot) 9 x y y x

According to the number of independent variables are distinguished: 10 Simple dependence, when we consider only one independent variable X, we investigate the relationship between Y and X. Multiple dependence, we are considering at least two independent variables veli č iny X 1, X 2, … X k, for k  2

Simple regression and correlation analysis 11 Consider statistical sign X and Y which are in the population in linear relationship Y = B o + B 1 X +e point estimate of the regression function is a straight line y j = b 0 + b 1 x j + e j, with coefficients calculated from the sample data Which method to use ???

The least square method (LSM) 12  We get set of p+1 equation with p+1 unknown parameters => Ordinary least square method (OLS)

y j = b 0 + b 1 x j + e j  we can rewrite y j = y j, + e j and e j = y j - y j, Principle of the LSM 13 (e j ) = y j - y j ’ (e j ) 2 = (y j - y j ’) 2

Can be proved that coefficients b o, b 1, …, b p determined by OLS are “best estimates” of parameters B 0, B 1, …, B p if the random error meet the assumptions: 14 E (e j ) = 0, D (e j ) = E (e j 2 ) =  2, E(e j1, e j2 ) = 0, for each j 1  j 2 Verbal formulation : Random errors are required to have zero mean, constant variance and should be independent.

Coefficients of the simple regression function can be derived: 15

After transformation we get two normal equations with two unknown parameters: 16 The system of equation can be solved by elimination method, or by using determinants. We get the coefficients b o a b 1

The procedure for calculating the coefficients LRF 17 x j y j x j y j x j 2 x 1 y 1 x 1 y 1 x 1 2 x 2 y 2 x n y n

Interpretation of simple linear regression coefficients 18 b o …intercept - expected value of dependent variable if the independent variable is equal to zero b 1 …. Regression coefficient express the change in dependent variable, if the independent variable will change by one unit. if b 1 > 0 …positive correlation (dependence) if b 1 < 0 ….negative correlation (dependence)

Properties of least square method: 19 Regression function passes throught the coordinates a

When OLS can be applied? 20 If the regression function is linear Linear in parameters (LiP) Or we can transform regression function to be linear in parameters Consider in which of the following regression functions can be used OLS

Some types of simple regression function: 21

Examples from micro- and macroeconomy 22 Phillips curve ???? Cobb -Douglas production curve Engel curves Curve of economic growth Any other? …...

Examining the consumption of selected commodities (depends on the level of GNP) 23

Comparison of two cases of correlation Which correlation is closer? 24 y x y x

25 Confidence interval for linear regression In addition to point estimates of parameters of linear regression functions are often calculated also interval estimates of parameters, which are called confidence intervals. Calculations of confidence intervals can be done with standard deviations of parameters and residual variance. Residual variance, if all the conditions of classical linear model are satisfied, is undistorted estimate of the stochastic parameter and is calculated according to equation

26 Interval estimate of any parameters for the regression line Assumes that if the assumptions formulated in classical linear model has variable t distribution with n – p degrees of freedom. For the chosen confidence level 1 – is confidence interval for parameter given by relationship

27 And for parameter Analogically is constructed confidence interval for regression line Where is quantile of t distribution S with (for regression line n-2) degrees of freedom.

Role of the correlation 28 Examine tightness - strength - of dependence We use various correlation indices Should be bounded in interval and within that interval increased to a higher power of dependence

29 Correlation analysis provides methods and techniques which are used for verifying of explanatory ability of quantified regression models as a whole and its parts. Verification of explanatory ability of quantified regression models leads to calculation of numerical characteristics, which in concentrated form describe the quality of the calculated models.

Index of correlation and index of determination In population I yx estimate from sample data is i yx est I yx = i yx. Principle lies in the decomposition of variability of dependent variable Y 30 Total variability of dependent variable Variability of dependent variable explained by regression function Variability unexplained by regression function - Residual variability

31. Its obvious that there is a relationship: T = E + U T = Total sum of squares (of deviation) E = is explained sum of squares U = is unexplained (residual) sum of squares.

Index of correlation i yx 32 Index of determination i yx 2

33 Index of determination can take values from 0 to 1, when the value of the index is close to 1, the great proportion of the total variability is explained by the model and vice versa, if the index of determination is close to zero, the low proportion of the total variability is explained by the model. Index of determination is commonly used as a criterion in deciding about the shape of the regression function. However, if the regression functions has different number of parameters, it is necessary to adjust the index of determination to the corrected form:

34 Variability Sum of squares Degrees of freedomVariance F test Explained Unexplained V = N = F = TotalC =

35 Test criterion in the table can be used for simultaneous testing the significance of the regression model, the index of determination and also correlation index. We compare calculated value of F test and quantile of F distribution with p-1 and n-p degrees of freedom. if F regression model is insignificant, as well as the index of correlation and index of determination. if F > regression model is statistically significant as well as the index of correlation and index of determination.

36 For a detailed evaluation of the parameters quality of regression model is used t tests. We formulate the null hypothesis H 0 :pre i = 0, 1 H 1 : where we assume zero therefore insignificant effect or impact of the variable at which the parameter is. The test criterion is defined by relationship:

37 Where is value of the parameter of regression function and is standard error of the parameter. We will compare calculated value of test criterion with quantile of t distribution at significance level and degrees of freedom.: - if we do not reject null hypothesis about insignificance of the parameter. - if we reject null hypothesis, and confirm statistical significance of the parameter.

38 Nonlinear regression and correlation analysis In addition to linear regression functions, in practice are very often used nonlinear functions, which can be used also with two or more parameters. Some non-linear regression functions can be suitably transformed to be linear in parameters, and we can then use the method of least squares.. Most often, we can transform nonlinear function with two parameters to shape:

39 We estimate regression function in form where Function is then calculated as a linear function. Not all non-linear functions can be converted in this way, only those which are linear in parameters, ie there is some form of transformation called the linearising transformation, most often it is the substitution and logarithmic transformation for example

40 Hyperbolic function

41 Logarithmic function

42 Exponential function

43 power (Cobb-Douglas production function)

44 Similarly, it is possible to modify some more parametrical nonlinear functions such as. second degree parabola

45 Second degree hyperbole

46 It should be noted that the transformed regression functions do not always have the same parameters as the original non-linear regression function, so it is necessary for the estimated parameters of the transformed functions to do backwards calculations of the original parameters. Thus obtained estimates of the original parameters, do not have optimal statistical properties, but are often sufficient to solve specific tasks. Some regression function can not be adjusted or transformed to functions linear in parameters. Estimates of the parameters of such functions are obtained using different approximate or iterative methods. Most of them are based on so-called gradual improvement of initial estimates, which may be eg. expert estimates, or the estimates obtained by the selected points and so on.

47 Multiple regression and correlation analysis Suppose that the dependent variable Y and explanatory (independent) variables X i,i = 1, 2,..., k Are in linear relationship, we have already mentioned in previous sections, can be written: Which we estimate:

48 Coefficients, which are estimates of parameters, should meet the condition of the Least squares method since we assume a particular shape of the regression functions, we can install it into previous relationship and look for a minimum of this function ie.: we determine the minimum of the function similarly like in the case of a simple regression equation using partial derivatives of functions

49 Which leads to system of equations:

50 The solution of this system of equations will be the coefficients of linear regression equations Like for the simple linear relationship, we can calculate estimate of the parameters from the matrix equation The quality of a regression model can be evaluated similarly to the simple linear relationship, which we described in the previous section.

51

52 Important terms: Correlation analysis – group of techniques to measure the association between two variables Dependent variable – variable that is being predicted or estimated Independent variable – variable that provides the basis for estimation. It is the predictor variable Coefficient of correlation – a measure of the strength of the linear relationship Coefficient of determination – The proportion of the total variation in the dependent variable Y that is explained, or accounted for, by the variation in the independent variable X

53 Regression equation – An equation that express relationship between variables Least square principle – Determining a regression equation by minimizing the sum of the squares of the vertical distances between the actual Y values and the predicted values of Y