1/18/2019 ST3131, Lecture 1
Chapter 1. Introduction Questions: Q1. What is Regression Analysis? Q2. How to Do Regression Analysis? Q3. Where to Use Regression Analysis? 1/18/2019 ST3131, Lecture 1
Q1: What is Regression Analysis? A useful tool for finding Functional Relationships among Variables based on data, and using this relationship for further analysis of the data. Functional Relationships: Mathematical Formulas or Equations connects a response variable & several predictor variables. Variables include Quantitative (Numerical) Variables ( e.g. ) 2 Types: Continuous(e.g. ) and Discrete (e.g. ). Binary Variable: takes only 2 values, 0 and 1, say. And Qualitative (Non-numerical) Variables: e.g., Neighborhood type ( ), Blood type ( ) 1/18/2019 ST3131, Lecture 1
Exercise Page 18, Problem 1.1 (b),(d),(f) and (h): Quantitative or Qualitative Variables? If Latter, state the possible categories: (b). # of Children in a family (d). Race (f). Fuel Consumption (h). Political party preference 1/18/2019 ST3131, Lecture 1
A General Regression Model Response Variable.: Y Predictor Variables.: Measurement Error: f : unknown regression function 1/18/2019 ST3131, Lecture 1
Parametric Regression Models Parametric Regression Models : f is a ( ) function Type 1: Linear Regression Model: f is a ( ) function: : Regression Parameters /Coefficients Type 2: Nonlinear Regression Model: f is a ( ) function in ( ): 1/18/2019 ST3131, Lecture 1
Exercise Linear Regression / Nonlinear Regression Model? (a). (b). 1/18/2019 ST3131, Lecture 1
Types of Parametric Regression Models Regression Type Conditions/Definitions Univariate (Multivariate) One (two or more Quantitative) Response variables Simple (Multiple) Only one (two or more) Predictor variables Linear /Nonlinear Response is Linear/Nonlinear in Parameters Logistic Response variable is Qualitative Analysis of Variance (Covariance) All (Some) Predictors are Qualitative variables 1/18/2019 ST3131, Lecture 1
An Example Questions of Interest A company markets and repairs small computers. How fast (Time, response) an electronic component (Computer Unit, predictor variable) can be repaired is very important to the efficiency of the company. The Variables in this example are: Time and Units Questions of Interest What is the relationship between the length of a service call (Time) and the number of electronic components (Computer Units)? In general, how long will it take to repair k computer units? 1/18/2019 ST3131, Lecture 1
Computer Repair Data Table 2.5 Page 27 Pre-Data Analysis Units Min’s 1 23 6 97 2 29 7 109 3 49 8 119 4 64 9 149 74 145 5 87 10 154 96 166 To see How the Time is related with computer Units, we can draw a plot of Time against computer Units. From the plot, we can see the simple relationship between Time and Units. This will suggest what kind of model is good to fit the data. 1/18/2019 ST3131, Lecture 1
Pre-Data Analysis Scatter Plot (Time vs Units) Some Simple Conclusions Time is Linearly related with computer Units. Time is Increasing with Number of Units. The Linearity is NOT exactly, Measurement Errors exist. Thus Linear Regression Model can be used for the relation between Time and computer Units 1/18/2019 ST3131, Lecture 1
Simple Linear Regression Model called Linear Regression Intercept called Linear Regression Slope called Regression Parameters or Coefficients =I-th Measurement Error n=# of observations where X=Units, called Independent, Explanatory or Predictor variable the i-th observation Y=Time, called Dependent or Response variable 1/18/2019 ST3131, Lecture 1
Least Squares Method Least Squares Method is often used to fit the above Linear Regression Model: Find to Minimize Least: Minimization Squares : Sum of Squared residuals The solution is called Least Squares Estimator of , denoted as . 1/18/2019 ST3131, Lecture 1
Simple Linear Regression Fit The Fitted Equation Some Conclusions Least Squares fit gives the LS-Estimates The left plot shows that the simple linear regression model is good for the data. The fitted line is increasing with increasing Units. 1/18/2019 ST3131, Lecture 1
The Resulting Model & Prediction The resulting model is Where , read as “Y hat”, stands for the estimation at X. That is “Minutes=4.162+15.509* Units”. Prediction: X=1, Y=4.162+15.509*1=19.67, X=5, Y=4.162+15.509*5=81.71, etc. Interpretation: it takes about 19.67 minutes to repair 1 computer unit; about 81.71 minutes to repair 5 computer units. 1/18/2019 ST3131, Lecture 1
Q2:How to Do Regression Analysis? In summary, we can list the steps in Regression Analysis as: Step 1. State the Problem of Interest Step 2. Select Potentially Relevant Variables Step 3. Collect Relevant Data Step 4. Specify a Model Step 5. Choose a Fitting Method Step 6. Fit the Chosen Model Step 7. Check the Resulting Model Step 8. Apply the Resulting Model for Prediction 1/18/2019 ST3131, Lecture 1
Step1: State the Problem of Interest A Key step in Regression Analysis. Different statement of the problem will lead to different choice of response and predictor variables. Example: Suppose we wish to determine if or not an employer is discriminating against a given group of employees,say women. Data on Salary, Qualifications, and Sex are available. Example 1: For question “ On average, are women paid less than equally qualified men?”, we choose Salary as Response, Qualification and Sex as Predictors. Example 2: For question “ On average, are women more qualified than equally paid men?”, we choose Qualification as Response, Salary and Sex as Predictors. 1/18/2019 ST3131, Lecture 1
Exercise Page 19, Problem 1.3 (a), (b), (c ). Which variable can be used as Response, Which can be used as Predictors? Why? (a). Number of cylinders, gasoline consumption of cars (b). SAT scores, grade point average, college admission (c ). Supply and demand of certain goods 1/18/2019 ST3131, Lecture 1
Step 2:Select Potentially Relevant Variables Depends on the Problem of Interest. Usually, Y denotes Response variable, X1, X2, …,Xp denote Predictor variables. For the question “ if the price of a single house in a given geographical area is high?”, response: price of a single house(Y), predictor variables may include: area of the lot(X1), area of the house(X2), age of the house(X3), number of bathrooms(X4), type of neighborhood(X5), style of the house(X6), amount of real estate taxes(X7) etc. 1/18/2019 ST3131, Lecture 1
Step 3: Data Collection Data are collected for the chosen response and predictor variables. The collected data are usually recorded in the following form: Observation No. Response Y Predictors X1 X2 …. Xp 1 2 … n Y1 Y2 …. Yn x11, x12,… x1p x21, x22,…, x2p ………………. xn1, xn2,…,xnp Each column lists observations for a variable, Each row list observations for all predictor variables in a case. 1/18/2019 ST3131, Lecture 1
Step 4: Model Specification Usually specified by the experts in the area of study May be specified based on the Initial Analysis of the data In our level, the variables are given, and the model is often given, mostly Linear and Parametric. 1/18/2019 ST3131, Lecture 1
Step 5: Method of Fitting In our case, we mainly focus on Least Squares Method. People may use other methods such as Weighted Least Squares, Maximum likelihood method, Ridge Regression, Principal Components method etc. from different views and different contexts. 1/18/2019 ST3131, Lecture 1
Step 6:Fitted Model and Prediction When the parameters are estimated , we got the fitted regression equation or formula, e.g., the Estimated Linear Regression Equation is: : Fitted value when (X1,..,Xp) is a data point, n Fitted Values are: : Predicted Value when (X1,..,Xp) is not a data point. 1/18/2019 ST3131, Lecture 1
Step 7: Model Checking To check if the assumptions for the regression model is valid or not. This is Regression Diagnostics problem. The details will be given in Chapter 4. 1/18/2019 ST3131, Lecture 1
Q3: Where to Use Regression Analysis? Regression Analysis is one of the most widely used statistical tools. It has extensive applications in many subject areas, including: Agricultural Sciences Industrial and Labor Relations History Government Environmental Sciences Military, Economics, Financial etc. For details, read Section 1.3, Page 3-7. 1/18/2019 ST3131, Lecture 1
Read Chapter 1, Sections 1-4 Reading Assignments Read Chapter 1, Sections 1-4 Read Chapter 2, Sections 1-4. Thinking about the following questions: a). What is SLR model? b). How to find the LS estimates of the parameters? c). How to compute the LS estimates manually? 1/18/2019 ST3131, Lecture 1