1 Chapter 1 Introduction Ray-Bing Chen Institute of Statistics National University of Kaohsiung
2 1.1 Regression and Model Bulding Regression Analysis: a statistical technique for investigating and modeling the relationship between variables. Applications: Engineering, the physical and chemical science, economics, management, life and biological science, and the social science Regression analysis may be the most widely used statistical technique
3 Example: delivery time v.s. delivery volume –Suspect that the time required by a route deliveryman to load and service a machine is related to the number of cases of product delivered –25 randomly chosen retail outlet –The in-outlet delivery time and the volume of product delivery –Scatter diagram: display a relationship between delivery time and delivery volume
4
5
6 y: delivery time, x: delivery volume y = 0 + 1 x Error, : –The difference between y and 0 + 1 x –A statistical error, i.e. a random variable –The effects of the other variables on delivery time, measurement errors, …
7 Simple linear regression model: y = 0 + 1 x + –x: independent (predictor, regressor) variable –y: dependent (response) variable – : error If x is fixed, y is determined by . Suppose that E( ) = 0 and Var( ) = 2. Then E(y|x) = E( 0 + 1 x + ) = 0 + 1 x Var(y|x) = Var( 0 + 1 x + ) = 2
8 The true regression line is a line of mean values: the height of the regression line at any x is the expected value of y for that x. The slope, 1 : the change in the mean of y for a unit change in x The variability of y at x is determined by the variance of the error
9 Example: –E(y|x) = x, and Var(y|x) = 2 –y|x ~ N( 0 + 1 x, 2 ) – 2 small: the observed values will fall close the line. – 2 large: the observed values may deviate considerably from the line.
10
11 The regression equation is only an approximation to the true functional relationship between the variables. Regression model: Empirical model
12
13 Valid only over the region of the regressor variables contained in the observed data!
14 Multiple linear regression model: y = 0 + 1 x 1 + + k x k + Linear: the model is linear in the parameters, 0, 1, …, k, not because y is a linear function of x’s.
15 Two important objectives: –Estimate the unknown parameters (fitting the model to the data): The method of least squares. –Model adequacy checking: An iterative procedure to choose an appropriate regression model to describe the data. Remarks: –Don’t imply a cause-effect relationship between the variables –Can aid in confirming a cause-effect relationship, but it is not the sole basis! –Part of a broader data-analysis approach
Data Collection Three basic methods for collecting data: –A retrospective study based on historical data –An observational study –A designed experiment (BEST)
Use of Regression Several purpose: –Data decription –Parameter estimation –Prediction and estimation –Control
Role of the Computer Regression analysis requires the intelligent and artful use of the computer. SAS, SPSS, S-plus, R, MATLAB, …