10-1 COMPLETE BUSINESS STATISTICS by AMIR D. ACZEL & JAYAVEL SOUNDERPANDIAN 6 th edition (SIE)

Slides:



Advertisements
Similar presentations
Chapter 4 Sampling Distributions and Data Descriptions.
Advertisements

Angstrom Care 培苗社 Quadratic Equation II
AP STUDY SESSION 2.
1
Ecole Nationale Vétérinaire de Toulouse Linear Regression
STATISTICS Joint and Conditional Distributions
STATISTICS Linear Statistical Models
STATISTICS INTERVAL ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION
1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.
Chapter 7 Sampling and Sampling Distributions
Simple Linear Regression 1. review of least squares procedure 2
Biostatistics Unit 5 Samples Needs to be completed. 12/24/13.
Break Time Remaining 10:00.
Chapter 4: Basic Estimation Techniques
PP Test Review Sections 6-1 to 6-6
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
Multiple Regression. Introduction In this chapter, we extend the simple linear regression model. Any number of independent variables is now allowed. We.
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt Synthetic.
1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.
Chapter 10 Correlation and Regression
Essential Cell Biology
Converting a Fraction to %
Chapter Thirteen The One-Way Analysis of Variance.
Ch 14 實習(2).
Chapter 8 Estimation Understandable Statistics Ninth Edition
Clock will move after 1 minute
Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 11 Simple Linear Regression.
Experimental Design and Analysis of Variance
4/9/ :38 AM Department of Epidemiology and Health Statistics,Tongji Medical College (Dr. Chuanhua Yu)
COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., Using Statistics The Simple.
Simple Linear Regression Analysis
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Linear Regression and Correlation Chapter 13.
Physics for Scientists & Engineers, 3rd Edition
Correlation and Linear Regression
Multiple Regression and Model Building
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Linear regression models
Chapter 10 Simple Regression.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
SIMPLE LINEAR REGRESSION
Chapter Topics Types of Regression Models
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
SIMPLE LINEAR REGRESSION
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Introduction to Regression Analysis, Chapter 13,
Lecture 5 Correlation and Regression
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
Lecture 10: Correlation and Regression Model.
1 Pertemuan 22 Regresi dan Korelasi Linier Sederhana-2 Matakuliah: A0064 / Statistik Ekonomi Tahun: 2005 Versi: 1/1.
The “Big Picture” (from Heath 1995). Simple Linear Regression.
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
Chapter 13 Simple Linear Regression
The simple linear regression model and parameter estimation
Regresi dan Korelasi Pertemuan 10
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Simple Linear Regression and Correlation
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

10-1 COMPLETE BUSINESS STATISTICS by AMIR D. ACZEL & JAYAVEL SOUNDERPANDIAN 6 th edition (SIE)

10-2 Chapter 10 Simple Linear Regression and Correlation

10-3 Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares Error Variance and the Standard Errors of Regression Estimators Correlation Hypothesis Tests about the Regression Relationship How Good is the Regression? Analysis of Variance Table and an F Test of the Regression Model Residual Analysis and Checking for Model Inadequacies Use of the Regression Model for Prediction The Solver Method for Regression Simple Linear Regression and Correlation 10

10-4 Determine whether a regression experiment would be useful in a given instance Formulate a regression model Compute a regression equation Compute the covariance and the correlation coefficient of two random variables Compute confidence intervals for regression coefficients Compute a prediction interval for the dependent variable LEARNING OBJECTIVES 10 After studying this chapter, you should be able to:

10-5 Test hypothesis about a regression coefficients Conduct an ANOVA experiment using regression results Analyze residuals to check if the assumptions about the regression model are valid Solve regression problems using spreadsheet templates Apply covariance concept to linear composites of random variables Use LINEST function to carry out a regression LEARNING OBJECTIVES (continued) 10 After studying this chapter, you should be able to:

Using Statistics RegressionRegression refers to the statistical technique of modeling the relationship between variables. simple linearregression two variablesIn simple linear regression, we model the relationship between two variables. dependent variable independent variableOne of the variables, denoted by Y, is called the dependent variable and the other, denoted by X, is called the independent variable. straight-line relationshipThe model we will use to depict the relationship between X and Y will be a straight-line relationship. scatter plotA graphical sketch of the the pairs (X, Y) is called a scatter plot.

Using Statistics

10-8 X Y X Y X Y X Y X Y X Y Examples of Other Scatterplots

10-9 statistical model The inexact nature of the relationship between advertising and sales suggests that a statistical model might be useful in analyzing the relationship. systematic component random component A statistical model separates the systematic component of a relationship from the random component. statistical model The inexact nature of the relationship between advertising and sales suggests that a statistical model might be useful in analyzing the relationship. systematic component random component A statistical model separates the systematic component of a relationship from the random component. Data Statisticalmodel Systematiccomponent+Randomerrors In ANOVA, the systematic component is the variation of means between samples or treatments (SSTR) and the random component is the unexplained variation (SSE). regression In regression, the systematic component is the overall linear relationship, and the random component is the variation around the line. In ANOVA, the systematic component is the variation of means between samples or treatments (SSTR) and the random component is the unexplained variation (SSE). regression In regression, the systematic component is the overall linear relationship, and the random component is the variation around the line. Model Building

10-10 The population simple linear regression model: Y=  0 +  1 X +  Nonrandom or Random Systematic Component Component where Y is the dependent variable, the variable we wish to explain or predict X is the independent variable, also called the predictor variable  is the error term, the only random component in the model, and thus, the only source of randomness in Y.  0 is the intercept of the systematic component of the regression relationship.  1 is the slope of the systematic component. The conditional mean of Y: The population simple linear regression model: Y=  0 +  1 X +  Nonrandom or Random Systematic Component Component where Y is the dependent variable, the variable we wish to explain or predict X is the independent variable, also called the predictor variable  is the error term, the only random component in the model, and thus, the only source of randomness in Y.  0 is the intercept of the systematic component of the regression relationship.  1 is the slope of the systematic component. The conditional mean of Y: 10-2 The Simple Linear Regression Model

10-11 The simple linear regression model gives an exact linear relationship between the expected or average value of Y, the dependent variable, and X, the independent or predictor variable: E[Y i ]=  0 +  1 X i Actual observed values of Y differ from the expected value by an unexplained or random error: Y i = E[Y i ] +  i =  0 +  1 X i +  i The simple linear regression model gives an exact linear relationship between the expected or average value of Y, the dependent variable, and X, the independent or predictor variable: E[Y i ]=  0 +  1 X i Actual observed values of Y differ from the expected value by an unexplained or random error: Y i = E[Y i ] +  i =  0 +  1 X i +  i X Y E[Y]=  0 +  1 X XiXi } }  1 = Slope 1  0 = Intercept YiYi { Error:  i Regression Plot Picturing the Simple Linear Regression Model

10-12 The relationship between X and Y is a straight-line relationship. The values of the independent variable X are assumed fixed (not random); the only randomness in the values of Y comes from the error term  i. The errors  i are normally distributed with mean 0 and variance  2. The errors are uncorrelated (not related) in successive observations. That is:  ~ N(0,  2 ) The relationship between X and Y is a straight-line relationship. The values of the independent variable X are assumed fixed (not random); the only randomness in the values of Y comes from the error term  i. The errors  i are normally distributed with mean 0 and variance  2. The errors are uncorrelated (not related) in successive observations. That is:  ~ N(0,  2 ) X Y E[Y]=  0 +  1 X Assumptions of the Simple Linear Regression Model Identical normal distributions of errors, all centered on the regression line. Assumptions of the Simple Linear Regression Model

10-13 Estimation of a simple linear regression relationship involves finding estimated or predicted values of the intercept and slope of the linear regression line. The estimated regression equation: Y = b 0 + b 1 X + e where b 0 estimates the intercept of the population regression line,  0 ; b 1 estimates the slope of the population regression line,  1 ; and e stands for the observed errors - the residuals from fitting the estimated regression line b 0 + b 1 X to a set of n points. Estimation of a simple linear regression relationship involves finding estimated or predicted values of the intercept and slope of the linear regression line. The estimated regression equation: Y = b 0 + b 1 X + e where b 0 estimates the intercept of the population regression line,  0 ; b 1 estimates the slope of the population regression line,  1 ; and e stands for the observed errors - the residuals from fitting the estimated regression line b 0 + b 1 X to a set of n points Estimation: The Method of Least Squares

10-14 Fitting a Regression Line X Y Data X Y Three errors from a fitted line X Y Three errors from the least squares regression line X Errors from the least squares regression line are minimized

{ Y X Errors in Regression XiXiXiXi

10-16 Least Squares Regression b0b0 SSE b1b1 Least squares b 0 Least squares b 1 At this point SSE is minimized with respect to b 0 and b 1

10-17 Sums of Squares, Cross Products, and Least Squares Estimators

10-18 MilesDollarsMiles 2 Miles*Dollars ,448106,605293,426,946390,185,014 MilesDollarsMiles 2 Miles*Dollars ,448106,605293,426,946390,185,014 Example 10-1

10-19 Template (partial output) that can be used to carry out a Simple Regression

10-20 Template (continued) that can be used to carry out a Simple Regression

10-21 Template (continued) that can be used to carry out a Simple Regression Residual Analysis. The plot shows the absence of a relationship between the residuals and the X-values (miles). Residual Analysis. The plot shows the absence of a relationship between the residuals and the X-values (miles).

10-22 Template (continued) that can be used to carry out a Simple Regression Note: Note: The normal probability plot is approximately linear. This would indicate that the normality assumption for the errors has not been violated.

10-23 Y X What you see when looking at the total variation of Y. X What you see when looking along the regression line at the error variance of Y. Y Total Variance and Error Variance

10-24 X Y Square and sum all regression errors to find SSE Error Variance and the Standard Errors of Regression Estimators

10-25 Standard Errors of Estimates in Regression

10-26 Length = 1 Height = Slope Least-squares point estimate: b 1 = Upper 95% bound on slope: Lower 95% bound: (not a possible value of the regression slope at 95%) 0 Confidence Intervals for the Regression Parameters

10-27 Template (partial output) that can be used to obtain Confidence Intervals for    and  

10-28 correlation degree of linear association The correlation between two random variables, X and Y, is a measure of the degree of linear association between the two variables. The population correlation, denoted by , can take on any value from -1 to 1. correlation degree of linear association The correlation between two random variables, X and Y, is a measure of the degree of linear association between the two variables. The population correlation, denoted by , can take on any value from -1 to 1.  indicates a perfect negative linear relationship -1 <  < 0 indicates a negative linear relationship  indicates no linear relationship 0 <  < 1 indicates a positive linear relationship  indicates a perfect positive linear relationship The absolute value of  indicates the strength or exactness of the relationship.  indicates a perfect negative linear relationship -1 <  < 0 indicates a negative linear relationship  indicates no linear relationship 0 <  < 1 indicates a positive linear relationship  indicates a perfect positive linear relationship The absolute value of  indicates the strength or exactness of the relationship Correlation

10-29 Y X  = 0 Y X  = -.8 Y X  =.8 Y X  = 0 Y X  = -1 Y X  = 1 Illustrations of Correlation

10-30 Example 10-1: = r SS XY SS X Y   ()().. Note: *Note: If  0, b 1 >0 Covariance and Correlation

10-31 H 0 :  = 0(No linear relationship) H 1 :  0(Some linear relationship) Test Statistic: Hypothesis Tests for the Correlation Coefficient

10-32 Y X Y X Y X Constant YUnsystematic VariationNonlinear Relationship A hypothesis test for the existence of a linear relationship between X and Y: H 0 H 1 Test statistic for the existence of a linear relationship between X and Y: (-) where is the least-squares estimate ofthe regression slope and() is the standard error of. When thenull hypothesis is true, the statistic has a distribution with- degrees offreedom. : : ()      t n b sb bsbb tn 10-6 Hypothesis Tests about the Regression Relationship

10-33 Hypothesis Tests for the Regression Slope

10-34 coefficient of determination, r 2 The coefficient of determination, r 2, is a descriptive measure of the strength of the regression relationship, a measure of how well the regression line fits the data.. { Y X { } Total Deviation Explained Deviation Unexplained Deviation Percentage of total variation explained by the regression How Good is the Regression?

10-35 Y X r 2 = 0SSE SST Y X r 2 = 0.90 SSESSE SST SSR Y X r 2 = 0.50 SSE SST SSR The Coefficient of Determination

Analysis-of-Variance Table and an F Test of the Regression Model

10-37 Template (partial output) that displays Analysis of Variance and an F Test of the Regression Model

Residual Analysis and Checking for Model Inadequacies

10-39 Normal Probability Plot of the Residuals Flatter than Normal

10-40 Normal Probability Plot of the Residuals More Peaked than Normal

10-41 Normal Probability Plot of the Residuals Positively Skewed

10-42 Normal Probability Plot of the Residuals Negatively Skewed

10-43 Point Prediction A single-valued estimate of Y for a given value of X obtained by inserting the value of X in the estimated regression equation. Prediction Interval For a value of Y given a value of X Variation in regression line estimate Variation of points around regression line For an average value of Y given a value of X Variation in regression line estimate Point Prediction A single-valued estimate of Y for a given value of X obtained by inserting the value of X in the estimated regression equation. Prediction Interval For a value of Y given a value of X Variation in regression line estimate Variation of points around regression line For an average value of Y given a value of X Variation in regression line estimate Use of the Regression Model for Prediction

10-44 X Y X Y Regression line Upper limit on slope Lower limit on slope 1) Uncertainty about the slope of the regression line X Y X Y Regression line Upper limit on intercept Lower limit on intercept 2) Uncertainty about the intercept of the regression line Errors in Predicting E[Y|X]

10-45 X Y X Prediction Interval for E[Y|X] Y Regression line The prediction band for E[Y|X] is narrowest at the mean value of X. The prediction band widens as the distance from the mean of X increases. Predictions become very unreliable when we extrapolate beyond the range of the sample itself. The prediction band for E[Y|X] is narrowest at the mean value of X. The prediction band widens as the distance from the mean of X increases. Predictions become very unreliable when we extrapolate beyond the range of the sample itself. Prediction Interval for E[Y|X] Prediction band for E[Y|X]

10-46 Additional Error in Predicting Individual Value of Y 3) Variation around the regression line X Y Regression line X Y X Prediction Interval for E[Y|X] Y Regression line Prediction band for E[Y|X] Prediction band for Y

10-47 Prediction Interval for a Value of Y

10-48 Prediction Interval for the Average Value of Y

10-49 Template Output with Prediction Intervals

The Solver Method for Regression See the text for instructions. The solver macro available in EXCEL can also be used to conduct a simple linear regression. See the text for instructions.

10-51 The Case of Independent Random Variables: For independent random variables, X 1, X 2, …, X n, the expected value for the sum, is given by: E(X 1 + X 2 + … + X n ) = E(X 1 ) + E(X 2 )+ … + E(X n ) For independent random variables, X 1, X 2, …, X n, the variance for the sum, is given by: V(X 1 + X 2 + … + X n ) = V(X 1 ) + V(X 2 )+ … + V(X n ) The Case of Independent Random Variables: For independent random variables, X 1, X 2, …, X n, the expected value for the sum, is given by: E(X 1 + X 2 + … + X n ) = E(X 1 ) + E(X 2 )+ … + E(X n ) For independent random variables, X 1, X 2, …, X n, the variance for the sum, is given by: V(X 1 + X 2 + … + X n ) = V(X 1 ) + V(X 2 )+ … + V(X n ) Linear Composites of Dependent Random Variables

10-52 The Case of Independent Random Variables with Weights: For independent random variables, X 1, X 2, …, X n, with respective weights  1,  2, …,  n, the expected value for the sum, is given by: E(  1 X 1 +  2 X 2 + … +  n X n ) =  1 E(X 1 ) +  2 E(X 2 )+ … +  n E(X n ) For independent random variables, X 1, X 2, …, X n, with respective weights  1,  2, …,  n, the variance for the sum, is given by: V(  1 X 1 +  2 X 2 + … +  n X n ) =  1 2 V(X 1 ) +  2 2 V(X 2 )+ … +  n 2 V(X n ) The Case of Independent Random Variables with Weights: For independent random variables, X 1, X 2, …, X n, with respective weights  1,  2, …,  n, the expected value for the sum, is given by: E(  1 X 1 +  2 X 2 + … +  n X n ) =  1 E(X 1 ) +  2 E(X 2 )+ … +  n E(X n ) For independent random variables, X 1, X 2, …, X n, with respective weights  1,  2, …,  n, the variance for the sum, is given by: V(  1 X 1 +  2 X 2 + … +  n X n ) =  1 2 V(X 1 ) +  2 2 V(X 2 )+ … +  n 2 V(X n ) Linear Composites of Dependent Random Variables

10-53 The covariance between two random variables X 1 and X 2 is given by: Cov(X 1, X 2 ) = E{[X 1 – E(X 1 )] [X 2 – E(X 2 )]} A simpler measure of covariance is given by: Cov(X 1, X 2 ) =  SD(X 1 ) SD(X 2 ) where  is the correlation between X 1 and X 2. The covariance between two random variables X 1 and X 2 is given by: Cov(X 1, X 2 ) = E{[X 1 – E(X 1 )] [X 2 – E(X 2 )]} A simpler measure of covariance is given by: Cov(X 1, X 2 ) =  SD(X 1 ) SD(X 2 ) where  is the correlation between X 1 and X 2. Covariance of two random variables X 1 and X 2

10-54 The Case of Dependent Random Variables with Weights: For dependent random variables, X 1, X 2, …, X n, with respective weights  1,  2, …,  n, the variance for the sum, is given by: V(  1 X 1 +  1 X 2 + … +  n X n ) =  1 2 V(X 1 ) +  2 2 V(X 2 )+ … +  n 2 V(X n ) + 2  1  2 Cov(X 1, X 2 ) + … + 2  n-1  n Cov(X n-1, X n ) The Case of Dependent Random Variables with Weights: For dependent random variables, X 1, X 2, …, X n, with respective weights  1,  2, …,  n, the variance for the sum, is given by: V(  1 X 1 +  1 X 2 + … +  n X n ) =  1 2 V(X 1 ) +  2 2 V(X 2 )+ … +  n 2 V(X n ) + 2  1  2 Cov(X 1, X 2 ) + … + 2  n-1  n Cov(X n-1, X n ) Linear Composites of Dependent Random Variables