Download presentation
Presentation is loading. Please wait.
1
Lecture 17 Interaction Plots Simple Linear Regression (Chapter 18.1- 18.2) Homework 4 due Friday. JMP instructions for question 15.41 are actually for question 15.35.
2
18.1 Introduction In Chapters 18 to 20 we examine the relationship between interval variables via a mathematical equation. The motivation for using the technique: –Forecast the value of a dependent variable (y) from the value of independent variables (x 1, x 2,…x k.). –Analyze the specific relationships between the independent variables and the dependent variable.
3
Uses of Regression Analysis A building manager company plans to submit a bid on a contract to clean 40 corporate offices scattered throughout an office complex. The costs incurred by the company are proportional to the number of cleaning crews needed for this task. How many crews will be enough? The product manager in charge of a brand of children’s cereal would like to predict demand during the next year. She has available the following “predictor” variables: price of the product, number of children in target market, price of competitors’ products, effectiveness of advertising, annual sales this year and previous year
4
Uses of Regression Analysis A community in the Philadelphia area is interested in how crime rates affect property values. If low crime rates increase property values, the community might be able to cover the cost of increased police protection by gains in tax revenues from higher property values. A real estate agent wants to more accurately predict the selling price of houses. She believes the following variables affect the price of a house: Size of house (sq. feet), number of bedrooms, frontage of lot, condition and location.
5
House size House Cost Most lots sell for $25,000 Building a house costs about $75 per square foot. House cost = 25000 + 75(Size) 18.2 The Model The model has a deterministic and a probabilistic components
6
House cost = 25000 + 75(Size) House size House Cost Most lots sell for $25,000 However, house cost vary even among same size houses! 18.2 The Model Since cost behave unpredictably, we add a random component.
7
18.2 The Model The first order linear model y = dependent variable x = independent variable 0 = y-intercept 1 = slope of the line = error variable x y 00 Run Rise = Rise/Run 0 and 1 are unknown population parameters, therefore are estimated from the data.
8
Interpreting the Coefficients Roomsclean=1.78+3.70*Number of Crews called the y-intercept and called the slope. Interpretation of slope: “For every additional cleaning crew, we are able to clean an additional 3.70 rooms on average.” Interpretation of intercept: Technically, how many rooms on average can be cleaned with zero cleaning crews but doesn’t make sense here because it involves extrapolation.
9
Simple Regression Model The data are assumed to be a realization of is the “signal” and is “noise” (error) are the unknown parameters of the model. Objective of regression is to estimate them. What is the interpretation of ?
10
18.3 Estimating the Coefficients The estimates are determined by –drawing a sample from the population of interest, –calculating sample statistics. –producing a straight line that cuts into the data. Question: What should be considered a good line? x y
11
The Least Squares (Regression) Line A good line is one that minimizes the sum of squared differences between the points and the line.
12
The Least Squares (Regression) Line 3 3 4 1 1 4 (1,2) 2 2 (2,4) (3,1.5) Sum of squared differences =(2 - 1) 2 +(4 - 2) 2 +(1.5 - 3) 2 + (4,3.2) (3.2 - 4) 2 = 6.89 Sum of squared differences =(2 -2.5) 2 +(4 - 2.5) 2 +(1.5 - 2.5) 2 +(3.2 - 2.5) 2 = 3.99 2.5 Let us compare two lines The second line is horizontal The smaller the sum of squared differences the better the fit of the line to the data.
13
The Estimated Coefficients To calculate the estimates of the line coefficients, that minimize the differences between the data points and the line, use the formulas: The regression equation that estimates the equation of the first order linear model is:
14
Typical Regression Analysis Observe pairs of data Plot the data! See if a simple linear regression model seems reasonable. If necessary, transform the data. Suspect (or hope) SRM assumptions are justified. Estimate the true regression line by the LS regression line Check the model and make inferences.
15
Example 18.2 (Xm18-02)Xm18-02 –A car dealer wants to find the relationship between the odometer reading and the selling price of used cars. –A random sample of 100 cars is selected, and the data recorded. –Find the regression line. Independent variable x Dependent variable y The Simple Linear Regression Line
16
Solution –Solving by hand: Calculate a number of statistics where n = 100.
17
This is the slope of the line. For each additional mile on the odometer, the price decreases by an average of $0.0623 Interpreting the Linear Regression -Equation The intercept is b 0 = $17067. 0 No data Do not interpret the intercept as the “Price of cars that have not been driven” 17067
18
Fitted Values and Residuals The least squares line decomposes the data into two parts where are called the fitted or predicted values. are called the residuals. The residuals are estimates of the errors
19
18.4 Error Variable: Required Conditions The error is a critical part of the regression model. Four requirements involving the distribution of must be satisfied. –The probability distribution of is normal. –The mean of is zero: E( ) = 0. –The standard deviation of is for all values of x. –The set of errors associated with different values of y are all independent.
20
The Normality of From the first three assumptions we have: y is normally distributed with mean E(y) = 0 + 1 x, and a constant standard deviation From the first three assumptions we have: y is normally distributed with mean E(y) = 0 + 1 x, and a constant standard deviation 0 + 1 x 1 0 + 1 x 2 0 + 1 x 3 E(y|x 2 ) E(y|x 3 ) x1x1 x2x2 x3x3 E(y|x 1 ) The standard deviation remains constant, but the mean value changes with x
21
Estimating The standard error of estimate (root mean squared error) is an estimate of The standard error of estimate is basically the standard deviation of the residuals. If the simple regression model holds, then approximately –68% of the data will lie within one of the LS line. –95% of the data will lie within two of the LS line.
22
Cleaning Crew Example Roomsclean=1.78+3.70*Number of Crews The building maintenance company is planning to submit a bid on a contract to clean 40 corporate offices scattered throughout an office complex. Currently, the company has only 11 cleaning crews. Will 11 crews be enough?
23
Practice Problems 18.4,18.10,18.12
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.