INTRODUCTORY LINEAR REGRESSION 1. 3.1 SIMPLE LINEAR REGRESSION - Curve fitting - Inferences about estimated parameter - Adequacy of the models - Linear.

Slides:



Advertisements
Similar presentations
Chapter 12 Simple Linear Regression
Advertisements

Chapter 14, part D Statistical Significance. IV. Model Assumptions The error term is a normally distributed random variable and The variance of  is constant.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Learning Objectives Copyright © 2004 John Wiley & Sons, Inc. Bivariate Correlation and Regression CHAPTER Thirteen.
Chapter 12 Simple Linear Regression
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
The Simple Regression Model
SIMPLE LINEAR REGRESSION
Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter Topics Types of Regression Models
SIMPLE LINEAR REGRESSION
Korelasi dalam Regresi Linear Sederhana Pertemuan 03 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Simple Linear Regression and Correlation
Chapter 7 Forecasting with Simple Regression
Linear Regression/Correlation
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Lecture 5 Correlation and Regression
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
EQT 272 PROBABILITY AND STATISTICS
Ms. Khatijahhusna Abd Rani School of Electrical System Engineering Sem II 2014/2015.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Econ 3790: Business and Economics Statistics
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
1 1 Slide Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination n Model Assumptions n Testing.
1 1 Slide © 2004 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Econ 3790: Business and Economics Statistics Instructor: Yogesh Uppal
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
CHAPTER 14 MULTIPLE REGRESSION
Chapter 3: Introductory Linear Regression
1 1 Slide Simple Linear Regression Coefficient of Determination Chapter 14 BA 303 – Spring 2011.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
CHAPTER 3 INTRODUCTORY LINEAR REGRESSION. Introduction  Linear regression is a study on the linear relationship between two variables. This is done by.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 13 Multiple Regression
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
C HAPTER 4: I NTRODUCTORY L INEAR R EGRESSION Chapter Outline 4.1Simple Linear Regression Scatter Plot/Diagram Simple Linear Regression Model 4.2Curve.
Regression Analysis. 1. To comprehend the nature of correlation analysis. 2. To understand bivariate regression analysis. 3. To become aware of the coefficient.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Chapter 5: Introductory Linear Regression
Chapter 12 Simple Linear Regression n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n Testing.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Free Powerpoint Templates ROHANA BINTI ABDUL HAMID INSTITUT E FOR ENGINEERING MATHEMATICS (IMK) UNIVERSITI MALAYSIA PERLIS.
1 1 Slide © 2011 Cengage Learning Assumptions About the Error Term  1. The error  is a random variable with mean of zero. 2. The variance of , denoted.
Free Powerpoint Templates ROHANA BINTI ABDUL HAMID INSTITUT E FOR ENGINEERING MATHEMATICS (IMK) UNIVERSITI MALAYSIA PERLIS.
Chapter 5: Introductory Linear Regression. INTRODUCTION TO LINEAR REGRESSION Regression – is a statistical procedure for establishing the relationship.
REGRESSION AND CORRELATION SIMPLE LINEAR REGRESSION 10.2 SCATTER DIAGRAM 10.3 GRAPHICAL METHOD FOR DETERMINING REGRESSION 10.4 LEAST SQUARE METHOD.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 9 l Simple Linear Regression 9.1 Simple Linear Regression 9.2 Scatter Diagram 9.3 Graphical.
INTRODUCTION TO MULTIPLE REGRESSION MULTIPLE REGRESSION MODEL 11.2 MULTIPLE COEFFICIENT OF DETERMINATION 11.3 MODEL ASSUMPTIONS 11.4 TEST OF SIGNIFICANCE.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
Regression and Correlation
Correlation and Simple Linear Regression
Essentials of Modern Business Statistics (7e)
Chapter 5 STATISTICS (PART 4).
SIMPLE LINEAR REGRESSION MODEL
Simple Linear Regression
Statistics for Business and Economics (13e)
Quantitative Methods Simple Regression.
Slides by JOHN LOUCKS St. Edward’s University.
Correlation and Regression
SIMPLE LINEAR REGRESSION
SIMPLE LINEAR REGRESSION
St. Edward’s University
Presentation transcript:

INTRODUCTORY LINEAR REGRESSION 1

3.1 SIMPLE LINEAR REGRESSION - Curve fitting - Inferences about estimated parameter - Adequacy of the models - Linear correlation 3.2 Multiple Linear Regression 2

Introduction:  Regression – is a statistical procedure for establishing the r/ship between 2 or more variables.  This is done by fitting a linear equation to the observed data.  The regression line is then used by the researcher to see the trend and make prediction of values for the data.  There are 2 types of relationship:  Simple ( 2 variables)  Multiple (more than 2 variables) 3

3.1 The Simple Linear Regression Model  is an equation that describes a dependent variable (Y) in terms of an independent variable (X) plus random error where, = intercept of the line with the Y-axis = slope of the line = random error  Random error, is the difference of data point from the deterministic value.  This regression line is estimated from the data collected by fitting a straight line to the data set and getting the equation of the straight line, 4

Example 3.1: 1) A nutritionist studying weight loss programs might wants to find out if reducing intake of carbohydrate can help a person reduce weight. a)X is the carbohydrate intake (independent variable). b)Y is the weight (dependent variable). 2) An entrepreneur might want to know whether increasing the cost of packaging his new product will have an effect on the sales volume. a)X is cost b)Y is sales volume 5

3.1.1 CURVE FITTING (SCATTER PLOTS)  A scatter plot is a graph or ordered pairs (x,y).  The purpose of scatter plot – to describe the nature of the relationships between independent variable, X and dependent variable, Y in visual way.  The independent variable, x is plotted on the horizontal axis and the dependent variable, y is plotted on the vertical axis. 6

n Positive Linear Relationship E(y)E(y)E(y)E(y) x Slope  1 is positive Regression line Intercept  0 SCATTER DIAGRAM 7

n Negative Linear Relationship E(y)E(y)E(y)E(y) x Slope  1 is negative Regression line Intercept  0 SCATTER DIAGRAM 8

n No Relationship E(y)E(y)E(y)E(y) x Slope  1 is 0 Regression line Intercept  0 SCATTER DIAGRAM 9

 A linear regression can be develop by freehand plot of the data. Example 3.2: The given table contains values for 2 variables, X and Y. Plot the given data and make a freehand estimated regression line. LINEAR REGRESSION MODEL 10

11

The least squares method is commonly used to determine values for and that ensure a best fit for the estimated regression line to the sample data points The straight line fitted to the data set is the line: INFERENCES ABOUT ESTIMATED PARAMETERS Least Squares Method 12

Theorem :  Given the sample data, the coefficients of the least squares line are: LEAST SQUARES METHOD i) y-Intercept for the Estimated Regression Equation,  and are the mean of x and y respectively. 13

LEAST SQUARES METHOD ii) Slope for the Estimated Regression Equation, Where, 14

LEAST SQUARES METHOD Given any value of the predicted value of the dependent variable, can be found by substituting into the equation 15

Example 3.3: Students score in history The data below represent scores obtained by ten primary school students before and after they were taken on a tour to the museum (which is supposed to increase their interest in history) Before,x After, y a)Fit a linear regression model with “before” as the explanatory variable and “after” as the dependent variable. b)Predict the score a student would obtain “after” if he scored 60 marks “before”. 16

17

18

 The coefficient of determination is a measure of the variation of the dependent variable (Y) that is explained by the regression line and the independent variable (X).  The symbol for the coefficient of determination is or.  If =0.90, then =0.81. It means that 81% of the variation in the dependent variable (Y) is accounted for by the variations in the independent variable (X).  The rest of the variation, 0.19 or 19%, is unexplained and called the coefficient of nondetermination.  Formula for the coefficient of nondetermination is ADEQUACY OF THE MODEL COEFFICIENT OF DETERMINATION( ) 19

 Relationship Among SST, SSR, SSE where: SST = total sum of squares SST = total sum of squares SSR = sum of squares due to regression SSR = sum of squares due to regression SSE = sum of squares due to error SSE = sum of squares due to error SST = SSR + SSE n The coefficient of determination is: where: SSR = sum of squares due to regression SST = total sum of squares COEFFICIENT OF DETERMINATION( ) 20

Example 3.4 1) If =0.919, find the value for and explain the value. Solution : = It means that 84% of the variation in the dependent variable (Y) is explained by the variations in the independent variable (X). 21

 Correlation measures the strength of a linear relationship between the two variables.  Also known as Pearson’s product moment coefficient of correlation.  The symbol for the sample coefficient of correlation is, population.  Formula : Linear Correlation (r) 22

Properties of :   Values of close to 1 implies there is a strong positive linear relationship between x and y.  Values of close to -1 implies there is a strong negative linear relationship between x and y.  Values of close to 0 implies little or no linear relationship between x and y. 23

Students score in history Refer Example 3.3: Students score in history c)Calculate the value of r and interpret its meaningSolution: Thus, there is a strong positive linear relationship between score obtain before (x) and after (y). Thus, there is a strong positive linear relationship between score obtain before (x) and after (y). 24

The sign of b 1 in the equation is “+”. Calculate the first. Then, use the second formula of r. Calculate the first. Then, use the second formula of r. Refer example 3.3: r r =

Assumptions About the Error Term  1. The error  is a random variable with mean of zero. 2. The variance of , denoted by  2, is the same for all values of the independent variable. 2. The variance of , denoted by  2, is the same for all values of the independent variable. 3. The values of  are independent. 4. The error  is a normally distributed random variable. 4. The error  is a normally distributed random variable.

 To determine whether X provides information in predicting Y, we proceed with testing the hypothesis.  Two test are commonly used: i) ii) TEST OF SIGNIFICANCE t Test F Test 27

1) t-Test 1. Determine the hypotheses. 2. Compute Critical Value/ level of significance. 3. Compute the test statistic.  p-value ( no linear r/ship) (exist linear r/ship) 28

1) t-Test 4. Determine the Rejection Rule. Reject H 0 if : t t p -value <  There is a significant relationship between variable X and Y. 5.Conclusion. 29

2) F-Test 1. Determine the hypotheses. 2. Specify the level of significance. 3. Compute the test statistic. F   wi F   with degree of freedom (df) in the numerator (1) and degrees of freedom (df) in the denominator (n-2) F = MSR/MSE 4. Determine the Rejection Rule. Reject H 0 if : p -value <  F test > ( no linear r/ship) (exist linear r/ship) 30

There is a significant relationship between variable X and Y. 5.Conclusion. 2) F-Test 31

Students score in history Refer Example 3.3: Students score in history  d) Test to determine if their scores before and after the trip is related. Use  Solution: 1. ( no linear r/ship) (exist linear r/ship) (exist linear r/ship)

4. Rejection Rule: 5. Conclusion: Thus, we reject H 0. The score before (x) is linear relationship to the score after (y) the trip. Thus, we reject H 0. The score before (x) is linear relationship to the score after (y) the trip. 33

 The value of the test statistic F for an ANOVA test is calculated as: F=MSR MSE  To calculate MSR and MSE, first compute the regression sum of squares (SSR) and the error sum of squares (SSE). ANALYSIS OF VARIANCE (ANOVA) 34

General form of ANOVA table: ANOVA Test 1) Hypothesis: 2) Select the distribution to use: F-distribution 3) Calculate the value of the test statistic: F 4) Determine rejection and non rejection regions: 5) Make a decision: Reject Ho/ accept H0 ANALYSIS OF VARIANCE (ANOVA) Source of Variation Degrees of Freedom(df) Sum of Squares Mean SquaresValue of the Test Statistic Regression1SSRMSR=SSR/1 F=MSR MSE Errorn-2SSEMSE=SSE/n-2 Totaln-1SST 35

Example 3.5 The manufacturer of Cardio Glide exercise equipment wants to study the relationship between the number of months since the glide was purchased and the length of time the equipment was used last week. 1)Determine the regression equation. 2)At, test whether there is a linear relationship between the variables 36

Solution (1): Regression equation: 37

Solution (2): 1) Hypothesis: 1) F-distribution table: 2) Test Statistic: F = MSR/MSE = or using p-value approach: significant value = ) Rejection region: Since F statistic > F table (17.303> ), we reject H0 or since p-value (0.003 < 0.01 )we reject H0 5)Thus, there is a linear relationship between the variables (month X and hours Y). 38

3.2 MULTIPLE LINEAR REGRESSION  In multiple regression, there are several independent variables (X)and one dependent variable (Y).  The multiple regression model:  This equation that describes how the dependent variable y is related to these independent variables x 1, x 2,... x p. where: are the parameters, and e is a random variable called the error term are the independent variables. 39

MULTIPLE REGRESSION MODEL  Multiple regression analysis is use when a statistician thinks there are several independent variables contributing to the variation of the dependent variable.  This analysis then can be used to increase the accuracy of predictions for the dependent variable over one independent variable alone. 40

Estimated Multiple Regression Equation Estimated Multiple Regression Equation Estimated Multiple Regression Equation  In multiple regression analysis, we interpret each regression coefficient as follows: regression coefficient as follows:  i represents an estimate of the change in y  i represents an estimate of the change in y corresponding to a 1-unit increase in x i when all corresponding to a 1-unit increase in x i when all other independent variables are held constant. other independent variables are held constant. 41

MULTIPLE COEFFICIENT OF DETERMINATION (R 2 )  As with simple regression, R 2 is the coefficient of multiple determination, and it is the amount of variation explained by the regression model.  Formula:  In multiple regression, as in simple regression, the strength of the relationship between the independent variable and the dependent variable is measured by correlation coefficient, R. MULTIPLE CORRELATION COEFFICIENT (R) 42

MODEL ASSUMPTIONS The errors are normally distributed with mean and variance Var =. The errors are statistically independent. Thus the error for any value of Y is unaffected by the error for any other Y-value. The X-variables are linear additive (i.e., can be summed). 43

ANALYSIS OF VARIANCE (ANOVA) General form of ANOVA table: SourceDegrees of Freedom Sum of Squares Mean SquaresValue of the Test Statistic RegressionpSSRMSR=SSR pF=MSR MSE Errorn-p-1SSEMSE=SSE n-p-1 Totaln-1SST n Excel’s ANOVA Output SSR SST 44

TEST OF SIGNIFICANCE  In simple linear regression, the F and t tests provide the same conclusion. the same conclusion.  In multiple regression, the F and t tests have different purposes. purposes. The F test is used to determine whether a significant relationship exists between the dependent variable and the set of all the independent variables. The F test is referred to as the test for overall significance. The F test is used to determine whether a significant relationship exists between the dependent variable and the set of all the independent variables. The F test is referred to as the test for overall significance. The t test is used to determine whether each of the individual independent variables is significant. independent variables is significant. A separate t test is conducted for each of the independent variables in the model. independent variables in the model. We refer to each of these t tests as a test for individual We refer to each of these t tests as a test for individual significance. significance. The t test is used to determine whether each of the individual independent variables is significant. independent variables is significant. A separate t test is conducted for each of the independent variables in the model. independent variables in the model. We refer to each of these t tests as a test for individual We refer to each of these t tests as a test for individual significance. significance. 45

Testing for Significance: F Test - Overall Significance Hypotheses Rejection Rule Test Statistics H 0 :  1 =  2 =... =  p = 0 H 0 :  1 =  2 =... =  p = 0 H 1 : One or more of the parameters H 1 : One or more of the parameters is not equal to zero. is not equal to zero. F = MSR/MSE Reject H 0 if p -value F   where : F  is based on an F distribution With p d.f. in the numerator and n - p - 1 d.f. in the denominator. 46

Testing for Significance: t Test- Individual Parameters Hypotheses Rejection Rule Test Statistics Reject H 0 if p -value <  or t t  t t Where: t  is based on a t distribution t  is based on a t distribution with n - p - 1 degrees of freedom. 47

Example: An independent trucking company, The Butler Trucking Company involves deliveries throughout southern California. The managers want to estimate the total daily travel time for their drivers. He believes the total daily travel time would be closely related to the number of miles traveled in making the deliveries. a)Determine whether there is a relationship among the variables using b) Use the t-test to determine the significance of each independent variable. What is your conclusion at the 0.05 level of significance? 48

Solution: a)Hypothesis Statement: Test Statistics: Rejection Region: Since 32.88>4.74, we Reject H 0 and conclude that there is a significance relationship between travel time (Y) and two independent variables, miles traveled and number of deliveries. 49

Solution: b) Hypothesis Statement: Test Statistics: Rejection Region: Since 6.18>2.365, we Reject H 0 and conclude that there is a significance relationship between travel time (Y) and miles traveled (X1). 50

Solution: b) Hypothesis Statement: Test Statistics: Rejection Region: Since 4.18>2.365, we Reject H 0 and conclude that there is a significance relationship between travel time (Y) and number of deliveries (X2). 51

End of Chapter 3 52