Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear regression – analysis – prediction.

Slides:



Advertisements
Similar presentations
Lesson 10: Linear Regression and Correlation
Advertisements

Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Inference for Regression
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Correlation and Regression
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Introduction to Regression Analysis
Chapter 13 Multiple Regression
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Chapter 10 Simple Regression.
9. SIMPLE LINEAR REGESSION AND CORRELATION
Chapter 12 Simple Regression
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 11 th Edition.
Chapter 12 Multiple Regression
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Linear Regression and Correlation
The Simple Regression Model
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 13-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
SIMPLE LINEAR REGRESSION
Chapter Topics Types of Regression Models
Linear Regression and Correlation Analysis
Introduction to Probability and Statistics Linear Regression and Correlation.
SIMPLE LINEAR REGRESSION
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 10 th Edition.
Correlation and Regression Analysis
Simple Linear Regression and Correlation
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Lecture 5 Correlation and Regression
Correlation and Linear Regression
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Linear Regression and Correlation
Correlation and Linear Regression
Correlation and Regression
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
INTRODUCTORY LINEAR REGRESSION SIMPLE LINEAR REGRESSION - Curve fitting - Inferences about estimated parameter - Adequacy of the models - Linear.
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Production Planning and Control. A correlation is a relationship between two variables. The data can be represented by the ordered pairs (x, y) where.
Ch4 Describing Relationships Between Variables. Section 4.1: Fitting a Line by Least Squares Often we want to fit a straight line to data. For example.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Statistics for Business and Economics 8 th Edition Chapter 11 Simple Regression Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Simple Linear Regression In the previous lectures, we only focus on one random variable. In many applications, we often work with a pair of variables.
Lecture 10: Correlation and Regression Model.
Applied Quantitative Analysis and Practices LECTURE#25 By Dr. Osman Sadiq Paracha.
Correlation & Regression Analysis
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Statistics for Managers Using Microsoft® Excel 5th Edition
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Simple linear regression and correlation Regression analysis is the process of constructing a mathematical model or function that can be used to predict.
Conceptual Foundations © 2008 Pearson Education Australia Lecture slides for this course are based on teaching materials provided/referred by: (1) Statistics.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 9 l Simple Linear Regression 9.1 Simple Linear Regression 9.2 Scatter Diagram 9.3 Graphical.
Correlation and Regression
CHAPTER 29: Multiple Regression*
SIMPLE LINEAR REGRESSION
Presentation transcript:

Measures of relationship Dr. Omar Al Jadaan

Agenda Correlation – Need – meaning, simple linear regression – analysis – prediction

Correlation Correlation is a statistical measurement of the relationship between two variables. Possible correlations range from +1 to –1. A zero correlation indicates that there is no relationship between the variables. A correlation of –1 indicates a perfect negative correlation, meaning that as one variable goes up, the other goes down. A correlation of +1 indicates a perfect positive correlation, meaning that both variables move in the same direction together.

Correlation Why we need correlation? To discover the interaction patterns between the dependent variable and infer the mathematical model of the relation.

Linear regression simple linear regression is the least squares estimator of a linear regression model with a single predictor variable. In other words, simple linear regression fits a straight line through the set of n points in such a way that makes the sum of squared residuals of the model (that is, vertical distances between the points of the data set and the fitted line) as small as possible.least squares linear regression modelpredictor variableresiduals The adjective simple refers to the fact that this regression is one of the simplest in statistics. The fitted line has the slope equal to the correlation between y and x corrected by the ratio of standard deviations of these variables. The intercept of the fitted line is such that it passes through the center of mass (x, y) of the data points.correlation

The purpose of least-squares method is to find the equation of the straight line that fits the data in the sense of least squares. Assumption of regression: – Normality of errors (with zero mean of each value) – variation around the line of regression is constant for all the values of x (this means that the errors vary by the same amount for small x as for large x. – The errors are independent for all values of x. – The relationship between x, y is postulated to be linear.

The linear model y=β 0 +β 1 x+ε to calculate the estimate of β 0, β 1 we have to calculate

Example (study hours and score) studied hours xscored on test observed yx-mean(x)squary-mean(y)(x-mean(x))*(y-mean(y))predicted yy-predicted ysquar mean E SSE SSxx215.6 this might be zero this is the minimum value you can get Ssxy352.8 b b

Exercise The following table shows the prominent product sales in millions for the years Assuming the trend continues in 2004, predict the sales in 2004 YearYear codedSales in millions

Solution Sales = year coded Sales= (7) = 5.7 million From the line equation we can conclude that each year we pass the sales decreases 11 million.

Inference about the slope of the regression line Purpose of the test is to determine whether the given value is reasonable for the slope of the population regression line (H 0 : β 1 =c). The test H 0 : β 1 =0 is a test to determine whether a straight line should be fit to data. If he null hypothesis is not rejected then the straight line does not model the relationship between x and y.

Assumptions: The regression model is y=β 0 +β 1 x+ε and β 1 is the slope of the model. To test the null hypothesis that β 1 equals some value, say c, we divide the difference (β 1 - c ) by the standard error of β 1

The following test H 0 : β 1 =0, H a : β 1  0 Student test t with n-2 degree of freedom, where SE(β 1 ) is the standard error of β 1

Example The following table shows the systolic blood pressure readings with weights for 10 newly diagnosed patients with high blood pressure. PatientSystolic (y)weight(x) (pound) We would like to test that the systolic blood pressure increases one point for each pound that the patient increases

Solution CoefficientsStandard Errort StatP-value Intercept X Variable The regression equation is systolic = weight The statistic is computed as follow. C=1, β 1 = and the standard error of β 1 =

We calculate At α=0.05, the t values with 8 degree of freedom are  The data would refute the null hypothesis, Each additional pound would increase the systolic blood pressure by less than 1. The T value (3.83) shown in the table along with the two-tailed p-value (0.005) is for the null hypothesis H 0 : β 1 =0, H a : β 1  0

The coefficient of Correlation The coefficient correlation is used to measure the strength of the linear relationship between two random variables. A measure very much related to the slope of regression line is the Pearson correlation coefficient.

The value of r will be in the range of -1 to +1 If the point fall on the straight line with a positive slope then r=+1, If the point fall on the straight line with a negative slope then r=-1, If the point from the shotgun pattern, r=0.

Example The correlations for the systolic blood pressure with weight is as follow As you can see the linear relation is negative SystolicWeight Systolic1 Weight

The coefficient of determination The coefficient of determination is used to measure the strength of the linear relationship between dependent variables. Assumptions : the analysis of variance (ANOVA) for simple linear regression may be represented as follows Sourced.f.Sum of squares Mean of squares F-value Explained variation 1SSRMSR=SSR/1F=MSR/MSE Unexplained variation N-2SSEMSE=SSE/(n-2) TotalN-1SS(total)

The symbol r2 is used to represent the ration SSR/SS(total) and is called the coefficient of determination. Which measures the proportion of variation in y that explained by x. coefficient of determination can be called explained variation, regression variation, unexplained variation, residual variation.

Example The following table explains the contraceptive prevalence (x) and the fertility rate (y) CountryContraceptive (x)Fertility (y) Thailand692.3 Costa Rica713.5 Turkey623.4 Mexico554 Zimbabwe465.4 Jordan355.5 Gana146 Pakistan135 Sudan104.8 Nigeria75.7

SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations10 ANOVA dfSSMSFSignificance F Regression Residual Total CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0% Upper 95.0% Intercept E contraceptive (x)

The coefficient of determination is shown as R Square= 65.8% It may computed as The interpretation is that about 65.8% of the variation in fertility rates is explained by the variation in contraceptive prevalence

Using the model for estimation and prediction The estimated regression equation y=β 0 +β 1 x+ε can be used to predict the value of y for some value of x. also the same equation can be used to estimate the mean values of ys. Example – Suppose you would like to know the estimate systolic blood pressure of a patient weighted 250 pound. Simply substitute the of the weight in the equation systolic = *(250)=157.6

We would expect the prediction interval to be wider than the confidence interval, that is the interval estimate of the expected value of y will be narrower that the prediction interval for the same value of x and confidence interval.

A (1-α)100% prediction interval for an individual new value of y at x=x 0 is Where y ^ =b 0 +b 1 x, the t value is based on (n-2) degree of freedom and is referred as the estimated standard error of the regression model, n is the sample size, x 0 is the fixed value of x,

A (1-α)100% confidence interval for the mean value of y at x=x 0 is

Example The following table shows the results of an experiment conducted on 15 diabetic patients, the independent variable x was hemoglobin A1C value, taken after 3 months of taking the fasting blood glucose value each morning of the three months period and averaging the values. The later values was the dependent value y. We wish to set a 95% prediction interval for average glucose reading of a diabetic who has hemoglobin A1C value of 7.0 as well as 95% confidence interval for all diabetics with hemoglobin A1C value of 7.0.

Patientx, Hemoglobin y, average fasting blood sugar over 3 month period

SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations15 ANOVA dfSSMSFSignificance F Regression E-08 Residual Total Coefficient sStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0% Intercept E X Variable E

Where y ^ = (7.0)= The 95% confidence interval is The prediction interval is

We are 95% confident that a diabetic with a hemoglobin A1C value of 7 had a fasting blood sugar over the past 3 months that average between and We are 95% confident that diabetics with hemoglobin A1A value of 7.0 had an average fasting blood sugar over the past 3 months between and

Exerscises 1.Give the deterministic equation for the line passing through the following pair of points: a) (1,1.5) and (3,8.5) b) (0,1) and (2,-3) c) (0,3.1) and (1,4.8)

Reference This lecture prepared from Advanced statistics demystified “MCGrawHill” Dr. Larry J. Stephens