Download presentation
Presentation is loading. Please wait.
1
Chapter 12 Simple Linear Regression
宇传华
2
Terminology Linear regression 线性回归
Response (dependent) variable 反应(应)变量 Explanatory (independent) variable 解释(自)变量 Linear regression model 线性回归模型 Regression coefficient 回归系数 Slope 斜率 Intercept 截距 Method of least squares 最小二乘法 Error sum of squares or residual sum of squares 残差(剩余)平方和 Coefficient of determination 决定系数 Outlier 异常点(值) Homoscedasticity 方差齐同 heteroscedasticity 方差非齐同
3
Contents 12.2 The Simple Linear Regression Model
An example The Simple Linear Regression Model Estimation: The Method of Least Squares Error Variance and the Standard Errors of Regression Estimators Confidence Intervals for the Regression Parameters Hypothesis Tests about the Regression Relationship How Good is the Regression? Analysis of Variance Table and an F Test of the Regression Model Residual Analysis Prediction Interval and Confidence Interval
4
An example Table IL-6 levels in brain and serum (pg/ml) of 10 patients with subarachnoid hemorrhage (蛛网膜下腔出血) Patient i Serum IL-6 (pg/ml) x Brain IL-6 (pg/ml) y 1 22.4 134.0 2 51.6 167.0 3 58.1 132.3 4 25.1 80.2 5 65.9 100.0 6 79.7 139.1 7 75.3 187.2 8 32.4 97.2 9 96.4 192.3 10 85.7 199.4
5
Scatterplot This scatterplot locates pairs of observations of serum IL-6 on the x-axis and brain IL-6 on the y-axis. We notice that: Larger (smaller) values of brain IL-6 tend to be associated with larger (smaller) values of serum IL-6 . The scatter of points tends to be distributed around a positively sloped straight line. The pairs of values of serum IL-6 and brain IL-6 are not located exactly on a straight line. The scatter plot reveals a more or less strong tendency rather than a precise linear relationship. The line represents the nature of the relationship on average.
6
Examples of Other Scatterplots
Y
7
12.2 The Simple Linear Regression Model
The population simple linear regression model: y= a + b x or my|x=a+b x Nonrandom or Random Systematic Component Component Where y is the dependent (response) variable, the variable we wish to explain or predict; x is the independent (explanatory) variable, also called the predictor variable; and is the error term, the only random component in the model, and thus, the only source of randomness in y. my|x is the mean of y when x is specified, all called the conditional mean of Y. a is the intercept of the systematic component of the regression relationship. is the slope of the systematic component.
8
Picturing the Simple Linear Regression Model
Regression Plot The simple linear regression model posits (假定) an exact linear relationship between the expected or average value of Y, the dependent variable Y, and X, the independent or predictor variable: my|x= a+b x Actual observed values of Y (y) differ from the expected value (my|x ) by an unexplained or random error(e): y = my|x = a+b x + Y my|x=a + x { y } } Error: = Slope 1 { a = Intercept X x
9
Errors in Regression y= a+ bx + e Y . yi { X xi
10
12.3 Estimation: The Method of Least Squares
squared e rrors in r egression is: n n å å $ SSE = e 2 = (y - y ) 2 SSE: 残差平方和 i i i i = 1 i = 1 The least squa res regres sion line is that which minimizes the SSE with respe ct to the estimates a and b .
11
Example 12-1
12
Example 12-1: Using Computer-Excel
The results on the bottom are the output created by selecting REGRESSION (回归)option from the DATA ANALYSIS(数据分析) toolkit. 完全安装Office后,点击菜单“工具”“加载宏”可安装“数据分析”插件
13
Total Variance and Error Variance
Y X What you see when looking at the total variation of Y. What you see when looking along the regression line at the error variance of Y.
14
12.4 Error Variance and the Standard Errors of Regression Estimators
Y Square and sum all regression errors to find SSE. X
15
Standard Errors of Estimates in Regression
16
18.5 Confidence Intervals for the Regression Parameters
17
12.6 Hypothesis Tests about the Regression Relationship
Constant Y Unsystematic Variation Nonlinear Relationship Y X Y X Y X H0:b =0 H0:b =0 H0:b =0 A hypothes is test fo r the exis tence of a linear re lationship between X and Y: H : b = H : b 1 Test stati stic for t he existen ce of a li near relat ionship be tween X an d Y: sb b where b is the le ast - squares es timate of the regres sion slope and is the s tandard er ror of When the null hypot hesis is t rue, the stati stic has a t distribu tion with n - 2 degrees o f freedom.
18
Hypothesis Tests for the Regression Slope
19
12.7 How Good is the Regression?
The coefficient of determination, R2, is a descriptive measure of the strength of the regression relationship, a measure how well the regression line fits the data. R2:决定系数 Y . } { Unexplained Deviation Total Deviation { Explained Deviation Percentage of total variation explained by the regression. R2= X
20
The Coefficient of Determination 决定系数
Y Y Y X X X SST SST SST S E R2=0 SSE R2=0.50 SSE SSR R2=0.90 SSR
21
12.8 Analysis of Variance Table and an F Test of the Regression Model
22
12.9 Residual Analysis Residuals Homoscedasticity: Residuals appear completely random. No indication of model inadequacy. Curved pattern in residuals resulting from underlying nonlinear relationship. Residuals exhibit a linear trend with time. Time Heteroscedasticity: Variance of residuals changes when x changes.
23
Assumptions of the Simple Linear Regression Model
The relationship between X and Y is a straight-Line 线性relationship. The values of the independent variable X are assumed fixed (not random); the only randomness in the values of Y comes from the error term . The errors are uncorrelated (i.e. Independent独立) in successive observations. The errors are Normally正态 distributed with mean 0 and variance 2(Equal variance等方差). That is: ~ N(0,2) LINE assumptions of the Simple Linear Regression Model Y my|x=a + x y Identical normal distributions of errors, all centered on the regression line. N(my|x, sy|x2) x X
24
12.10 Prediction Interval and Confidence Interval
Point Prediction A single-valued estimate of Y for a given value of X obtained by inserting the value of X in the estimated regression equation. Prediction Interval For a value of Y given a value of X Variation in regression line estimate Variation of points around regression line For confidence interval of an average value of Y given a value of X
25
Confidence Interval for the Average Value of Y and Prediction Interval for the Individual Value of Y
26
Summary 1. Regression analysis is applied for prediction while control effect of independent variable X. 2. The principle of least squares in solution of regression parameters is to minimize the residual sum of squares. 3. The coefficient of determination, R2, is a descriptive measure of the strength of the regression relationship. 4. There are two confidence bands: one for mean predictions and the other for individual prediction values 5. Residual analysis is used to check the conditions for which the model is true
27
Assignments 1. What is the main distinctions and assossiations between correlation analysis and simple linear regression? 2. What is the least squares method to estimate regression line? 3. Please describe the main steps for fitting a simple linear regression model with data.
28
main distinctions Difference: 1. Data source:
correlation analysis is required that both x and y follow normal distribution; but for simple linear regression, only y is required following normal distribution. 2. application: correlation analysis is employed to measure the association between two random variables (both x and y are treated symmetrically) simple linear regression is employed to measure the change in y for x (x is the independent varible, y is the dependent variable) 3. r is a dimensionless number, it has no unit of measurement; but b has its unit which relate to y.
29
main associations relationship: 1. tr=tb
2. Have same sign between r and b.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.