Lecture on Correlation and Regression Analyses. REVIEW - Variable A variable is a characteristic that changes or varies over time or different individuals.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Lesson 10: Linear Regression and Correlation
13- 1 Chapter Thirteen McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved.
Hypothesis Testing Steps in Hypothesis Testing:
Correlation and regression Dr. Ghada Abo-Zaid
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Learning Objectives 1 Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Data Analysis Statistics. Inferential statistics.
CORRELATON & REGRESSION
Chapter 13 Multiple Regression
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
Chapter 12 Multiple Regression
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Linear Regression and Correlation
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Linear Regression and Correlation Analysis
Topics: Regression Simple Linear Regression: one dependent variable and one independent variable Multiple Regression: one dependent variable and two or.
Correlation and Regression. Correlation What type of relationship exists between the two variables and is the correlation significant? x y Cigarettes.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
REGRESSION AND CORRELATION
SIMPLE LINEAR REGRESSION
Data Analysis Statistics. Inferential statistics.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Correlation and Regression Analysis
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Lecture 5 Correlation and Regression
Correlation and Linear Regression
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Lecture 16 Correlation and Coefficient of Correlation
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Correlation and Regression
Introduction to Linear Regression and Correlation Analysis
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Linear Regression and Correlation
Correlation and Linear Regression
Correlation.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-1 Review and Preview.
EQT 272 PROBABILITY AND STATISTICS
Ms. Khatijahhusna Abd Rani School of Electrical System Engineering Sem II 2014/2015.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Chapter 3: Introductory Linear Regression
Hypothesis Testing Using the Two-Sample t-Test
Examining Relationships in Quantitative Research
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 16 Data Analysis: Testing for Associations.
Lecture 4 Introduction to Multiple Regression
Correlation & Regression Analysis
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Linear Regression and Correlation Chapter GOALS 1. Understand and interpret the terms dependent and independent variable. 2. Calculate and interpret.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Chapter 5: Introductory Linear Regression
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Linear Regression and Correlation Chapter 13.
REGRESSION AND CORRELATION SIMPLE LINEAR REGRESSION 10.2 SCATTER DIAGRAM 10.3 GRAPHICAL METHOD FOR DETERMINING REGRESSION 10.4 LEAST SQUARE METHOD.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Regression and Correlation
Correlation and Simple Linear Regression
Statistics II: An Overview of Statistics
Product moment correlation
Presentation transcript:

Lecture on Correlation and Regression Analyses

REVIEW - Variable A variable is a characteristic that changes or varies over time or different individuals or objects under consideration. Broad Classification of Variables: QUANTITATIVE DISCRETE CONTINUOUS QUALITATIVE

Types of Variable Qualitative assumes values that are not numerical but can be categorized categories may be identified by either non- numerical descriptions or by numeric codes

Types of Variable Quantitative indicates the quantity or amount of a characteristic data are always numeric can be discrete or continuous

2.A.5 Types of Quantitative Variables Discrete – variable with a finite or countable number of possible values Continuous – variable that assumes any value in a given interval

Data may be classified into four hierarchical levels of measurement: Nominal Ordinal Interval Ratio Note: The type of statistical analysis that is appropriate for a particular variable depends on its level of measurement. Levels/Scales of Measurement

Data collected are labels, names or categories. Frequencies or counts of observations belonging to the same category can be obtained. It is the lowest level of measurement. NOMINAL SCALE

ORDINAL SCALE Data collected are labels with implied ordering. The difference between two data labels is meaningless.

INTERVAL SCALE Data can be ordered or ranked. The difference between two data values is meaningful. Data at this level may lack an absolute zero point.

RATIO SCALE Data have all the properties of the interval scale. The number zero indicates the absence of the characteristic being measured. It is the highest level of measurement.

Learning Points – PART II 1. What is a correlation analysis 2. What is a regression analysis 3. When do we use correlation analysis? 4. When do we use regression analysis? 5. How do we compare regression versus correlation analysis?

5.F.12 CORRELATION ANALYSIS It is a statistical technique used to determine the strength of the relationship between two variables, X and Y. It provides a measure of strength of the linear relationship between two variables measured in at least interval scale.

5.F.13 ILLUSTRATION The UP Admissions office may be interested in the relationship between UPCAT scores in Math and Reading Comprehension of UPCAT qualifiers.

5.F.14 A social scientist might be concerned with how a city’s crime rate is related to its unemployment rate. ILLUSTRATION

5.F.15 A nutritionist might try to relate the quantity of carbohydrates in the diet consumed to the amount of sugar in the blood of diabetic individuals. ILLUSTRATION

5.F.16 PEARSON’S CORRELATION COEFFICIENT,  where  XY = covariance between X and Y  X = standard deviation of the X values  Y = standard deviation of the Y values N = number of paired observations in the population

5.F.17 X and Y increases (decreases) together,  >0 X Y Y  as X  PEARSON’S CORRELATION COEFFICIENT, 

5.F.18 X increases (decreases) while Y decreases (increases),  < 0 Y X Y  as X  PEARSON’S CORRELATION COEFFICIENT, 

5.F.19 X and Y have no linear relationship,  = 0 X Y No pattern PEARSON’S CORRELATION COEFFICIENT, 

5.F.20 SAMPLE CORRELATION COEFFICIENT, r where s XY = sample covariance of X and Y values s X = sample standard deviation of X values s Y = sample standard deviation of Y values n = sample size

5.F.21 QUALITATIVE INTERPRETATION OF  AND r Absolute Value of the Correlation Coefficient Strength of Linear Relationship 0.0 – 0.2Very weak 0.2 – 0.4Weak 0.4 – 0.6Moderate 0.6 – 0.8Strong 0.8 – 1.0Very Strong

It is of interest to study the relationship between the number of hours spent studying and the student’s grade in an examination. A random sample of twenty students is selected and the data are given in the following table.table. Compute and interpret the sample correlation coefficient. EXAMPLE

Score (%) Hours Studied Student Slide No. V.F.15

SCATTER PLOT Number of Hours Spent Studying Examination Score

5.F.25

Sample Correlation Coefficient Interpretation: There is a strong positive linear relationship between the number of hours the student spent studying for the exam and exam score of students.

5.F.27 TEST OF HYPOTHESIS ABOUT  Ho:  = 0; There is no linear relationship between X and Y. vs. Ha:   0; There is a linear relationship between X and Y. or Ha:  > 0; There is a positive linear relationship between X and Y. or Ha:  < 0; There is a negative linear relationship between X and Y.

5.F.28 The standardized form of the test statistic is which follows the Student’s t distribution with n - 2 df when the null hypothesis is TRUE. This is commonly referred to as t- test for correlation coefficient. TEST OF HYPOTHESIS ABOUT 

5.F.29 Decision Rule Alternative Hypothesis Reject Ho if t c < t tab = - t α(n-2). Fail to reject Ho, otherwise. Ha:  < 0 (one-tailed test) Reject Ho if t c > t tab = t α(n-2). Fail to reject Ho, otherwise. Ha:  > 0 (one-tailed test) Reject Ho if |t c | > t tab = t α/2(n-2). Fail to reject Ho, otherwise. Ha:  ≠ 0 (two-tailed test) With a given level of significance,  TEST OF HYPOTHESIS ABOUT 

In the study of the relationship between the number of hours spent studying and the student’s grade in an examination. Is there evidence to say that longer number of hours spent studying is associated with higher exam scores at 5% level of significance? EXAMPLE

Test of Hypothesis Ho:  = 0; There is no linear relationship between the number of hours a student spent studying for the exam and his exam score. Ha:  > 0; There is a positive linear relationship between the number of hours a student spent studying for the exam and his exam score.

Test of Hypothesis The test statistic is Test procedure: One-tailed t-test for correlation coefficient Decision rule: Reject Ho if t c > t.tab = t.05(18) = Reject Ho, otherwise.

Test of Hypothesis Decision: Reject Ho. Conclusion: At α=5%, there is evidence to say that longer number of hours spent studying is associated with higher exam scores.

5.F.34 WORD OF CAUTION Correlation is a measure of the strength of linear relationship between two variables, with no suggestion of “cause and effect” or causal relationship. A correlation coefficient equal to zero only indicates lack of linear relationship and does not discount the possibility that other forms of relationship may exist.

5.F.35 REGRESSION ANALYSIS A statistical technique used to study the functional relationship between variables which allows predicting the value of one variable, say Y, given the value of another variable, say X

5.F.36 REGRESSION ANALYSIS Y – dependent variable  A variable whose variation/value depends on that of another. X – independent variable - A variable whose variation/value does not depend on that of another.

5.F.37 ILLUSTRATION The relationship between the number of hours spent studying and the student’s exam score may be expressed in equation form. This equation may be used to predict the student’s exam score knowing the number of hours the student spent studying.

5.F.38 A child’s height is studied to see whether it is related to his father’s height such that some equation can be used to predict a child’s height given his father’s height. Sales of a product may be related to the corresponding advertising expenditures. ILLUSTRATION

5.F.39 SAMPLE REGRESSION MODEL where b 0 = estimated Y-intercept; the predicted value of Y when X = 0; b 1 = estimated slope of the line; measures the change in the predicted value of Y per unit change in X

5.F.40 ESTIMATORS where = mean of the Y values = estimated common variance of the Y’s = mean of the X values

EXAMPLE In the previous example, we may want to predict the examination score of a student given the number of hours he spent studying. Estimated regression line: Predicted exam score for X i = 2.5 is ~ 69

EXAMPLE Score (%) Hours Studied Student Slide No. V.F.15

5.F.43 Ho:   = where is the hypothesized value of   TEST OF HYPOTHESIS ABOUT  1 Ha:    or Ha:   > or Ha:   <

5.F.44 where and it follows the Student’s t distribution with n -2 df when the null hypothesis is TRUE. This is commonly referred to as t-test for regression coefficient. The standardized form of the test statistic is TEST OF HYPOTHESIS ABOUT  1

5.F.45 Decision Rule Alternative Hypothesis Reject Ho if t c < t tab = -t α(n-2). Fail to reject Ho, otherwise. Ha:   < (one-tailed test) Reject Ho if t c > t tab = t α(n-2). Fail to reject Ho, otherwise. Ha:   > (one-tailed test) Reject Ho if |t c | > t tab = t α/2(n-2). Fail to reject Ho, otherwise. Ha:   ≠ (two-tailed test) With a given level of significance,  TEST OF HYPOTHESIS ABOUT  1

EXAMPLE Using the previous example, test at  = 5% if a student’s examination score will increase by at least 1 percent with an additional hour of study time. Ho:Ha: Test statistic: Test procedure: One-tailed t-test for regression coefficient < >

EXAMPLE Decision Rule: Reject Ho if t c > t.tab = -t.05(18) = Otherwise, Fail to reject Ho, Computations:

EXAMPLE Decision: Since t c = > t.tab = , we reject Ho. Conclusion: At  =5%, the student’s exam score will increase by at least 1 percent for an additional hour of study time.

5.F.49 Ho:   = where is the hypothesized value of   TEST OF HYPOTHESIS ABOUT  0 Ha:    or Ha:   > or Ha:   <

5.F.50 where and it follows the Student’s t distribution with n -2 df when the null hypothesis is TRUE. This is commonly referred to as t-test for regression constant. The standardized form of the test statistic is TEST OF HYPOTHESIS ABOUT  0

5.F.51 Decision Rule Alternative Hypothesis Reject Ho if t c < t tab = - t α(n-2). Fail to reject Ho, otherwise. Ha:   < (one-tailed test) Reject Ho if t c > t tab = t α(n-2). Fail to reject Ho, otherwise. Ha:   > (one-tailed test) Reject Ho if |t c | > t tab = t α/2(n-2). Fail to reject Ho, otherwise. Ha:   ≠ (two-tailed test) With a given level of significance,  TEST OF HYPOTHESIS ABOUT  0

Ho: Ha: Test statistic: Test procedure: One-tailed t-test for regression constant EXAMPLE At  = 5%, test if the data indicate that the student will fail (a score less than 60) if he did not study. >

EXAMPLE Decision rule: Reject Ho if t c > t.05(18) = otherwise, Fail to reject Ho Computations:

EXAMPLE Decision: Since t c = > t tab = , we reject Ho. Conclusion: At  = 5%, the student will get a score less than 60 or the student will fail if he/she did not study for the examination.

5.F.55 ADEQUACY OF THE MODEL Coefficient of Determination (R 2 ) - proportion of the total variation in Y that is explained by X, usually expressed in percent

EXAMPLE Interpretation: Around 55% of the total variation in examination scores is explained by the number of hours spent studying. The remaining 45% is explained by other variables not in the model, or by the fact that the relationship is not exactly linear.

SUMMARY 1. Correlation analysis 2. Regression analysis 3. Application with computer output 4. Interpretation

Regression analysis is a causality relationship, where you can predict the value of one variable given the values of the other variable/s.

Correlation analysis is a relationship between two variables but without the causality clause. Regression analysis in policy analysis is usually used to forecast certain events. For example, our trend line is an example of a regression analysis.

Illustrations: Knowing the effect of TV spot advertising on the number of people visiting the Family Planning clinic would allow the population commission official to decide rationally whether or not to increase the amount to be spent on TV spot advertising. The officer would be able to predict how many people the commission would be able to attract to the Family Planning clinic if it increased the number of TV ads run. (See series p.176)

The relationship between two variables (in our example, the number of TV ad runes and the number of people visiting Family Planning clinic can be summarized by a line. This is called the regression line. This is the line that we will use to predict the value of one variable, given the other.

Formula of the regression line: Where: b = the slope of the line; a = the Y intercept or the value of Y when x=0; e = the error term.

Example: Relationship between TV ads and number of people visiting the family planning clinic: MunicipalitiesNumber of TV ads (X) Number of people visiting the clinic (Y)

The equation of the line is Y= X If X= 5, our predicted value for Y will be Y= (5) = 29.2 If X=7, our predicted value for Y will be Y= (7)= 40.7 Interpretation: An increase of one in the number of TV ad runs will generate a 5.76 increase in the number of people visiting the family planning clinic. So the family planning officer can now proceed with evaluating the cost effectiveness of the program ads.

Coefficient of Determination The coefficient of determination is the percent variation in Y explained or accounted for by the variability of X. It is derived by squaring R and multiplying by 100. It is expressed in percentage term. Thus, if R=.9, the coefficient of determination will be 81%. Formula:

Hypothesis Testing for a and b- We use the t-statistic to test the Hypothesis that a and b are significantly different from zero. Excel analysis of the problem

Summary Output dF: k, n-(k+1), n-1 Revised Figure

DUMMY VARIABLE Represents nominal or categorical variable in the regression model For Example: Y= b0 + b1X1 + b2X2 Y= scores, X1=hours spent in studying, X2=M/F taking a value of 1 if male, otherwise 0