Copyright © 2011 Pearson Education, Inc. Linear Patterns Chapter 19.

19.1 Fitting a Line to Data What is the relationship between the price and weight of diamonds?  Use regression analysis to find an equation that summarizes the linear association between price and weight  The intercept and slope of the line estimate the fixed and variable costs in pricing diamonds Copyright © 2011 Pearson Education, Inc. 3 of 37

19.1 Fitting a Line to Data Consider Two Questions about Diamonds:  What’s the average price of diamonds that weigh 0.4 carat?  How much more do diamonds that weigh 0.5 carat cost? Copyright © 2011 Pearson Education, Inc. 4 of 37

19.1 Fitting a Line to Data Equation of a Line  Using a sample of diamonds of various weights, regression analysis produces an equation that relates weight to price.  Let y denote the response variable (price) and let x denote the explanatory variable (weight). Copyright © 2011 Pearson Education, Inc. 5 of 37

19.1 Fitting a Line to Data Equation of a Line  Identify the line fit to the data by an intercept and a slope.  The equation of the line is Estimated Price = Weight. Copyright © 2011 Pearson Education, Inc. 7 of 37

19.1 Fitting a Line to Data Least Squares  Residual: vertical deviations from the data points to the line ( ).  The best fitting line collectively makes the squares of residuals as small as possible (the choice of b 0 and b 1 minimizes the sum of the squared residuals). Copyright © 2011 Pearson Education, Inc. 8 of 37

19.2 Interpreting the Fitted Line Diamond Example  The least squares regression equation for relating diamond prices to weight is Estimated Price = 43 + 2670 Weight Copyright © 2011 Pearson Education, Inc. 11 of 37

19.2 Interpreting the Fitted Line Diamond Example  The average price of a diamond that weighs 0.4 carat is Estimated Price = 43 + 2,670(0.4) = $1,111  A diamond that weighs 0.5 carat costs $267 more, on average. Copyright © 2011 Pearson Education, Inc. 12 of 37

19.2 Interpreting the Fitted Line Interpreting the Intercept  The intercept is the portion of y that is present for all values of x (i.e., fixed cost, $43, per diamond).  The intercept estimates the average response when x = 0 (where the line crosses the y axis). Copyright © 2011 Pearson Education, Inc. 14 of 37

19.2 Interpreting the Fitted Line Interpreting the Slope  The slope estimates the marginal cost used to find the variable cost (i.e., marginal cost is $2,670 per carat).  While tempting, it is not correct to describe the slope as the change in y caused by changing x. Copyright © 2011 Pearson Education, Inc. 16 of 37

Another empirical problem Empirical problem: Class size and educational output.  Policy question: What is the effect of reducing class size by one student per class? by 8 students/class?  What is the right output (performance) measure? parent satisfaction. student personal development. future adult welfare. future adult earnings. performance on standardized tests.

What do data say about class sizes and test scores? The California Test Score Data Set All K-6 and K-8 California school districts (n = 420) Variables:  5 th grade test scores (Stanford-9 achievement test, combined math and reading), district average.  Student-teacher ratio (STR) = number of students in the district divided by number of full-time equivalent teachers.

An initial look at the California test score data:

Question: Do districts with smaller classes (lower STR) have higher test scores? And by how much?

The class size/test score policy question:  What is the effect of reducing STR by one student/teacher on test scores ?  Object of policy interest:.  This is the slope of the line relating test score and STR.

This suggests that we want to draw a line through the Test Score v.s. STR scatterplot, but how?

Linear Regression: Some Notation and Terminology The population regression line is

 β 0 and β 1 are “population” parameters?  We would like to know the population value of β 1  We don’t know β 1, so we must estimate it using data.

The Population Linear Regression Model— general notation  X is the independent variable or regressor.  Y is the dependent variable.  β 0 = intercept.  β 1 = slope.

 u i = the regression error.  The regression error consists of omitted factors, or possibly measurement error in the measurement of Y. In general, these omitted factors are other factors that influence Y, other than the variable X.

Application to the California Test Score-Class Size data  Estimated slope = = - 2.28  Estimated intercept = = 698.9  Estimated regression line: = 698.9 - 2.28 ST R

4M Example 19.1: ESTIMATING CONSUMPTION Motivation A utility company that sells natural gas in the Philadelphia area needs to estimate how much is used in homes in which their meters cannot be read. Copyright © 2011 Pearson Education, Inc. 17 of 37

4M Example 19.1: ESTIMATING CONSUMPTION Method Use regression analysis to find the equation that relates y (amount of gas consumed measured in CCF) to x (the average number of degrees below 65º during the billing period). The utility company has 4 years of data (n = 48 months) for one home. Copyright © 2011 Pearson Education, Inc. 18 of 37

4M Example 19.1: ESTIMATING CONSUMPTION Message During the summer, the home uses about 26.7 CCF of gas during the billing period. As the weather gets colder, the estimated average amount of gas consumed rises by 5.7 CCF for each additional degree below 65º. Copyright © 2011 Pearson Education, Inc. 21 of 37

Scattergram 1. Plot of all (x i, y i ) pairs 2. Suggests how well model will fit 0 20 40 60 0204060 x y

0 20 40 60 0204060 x y Thinking Challenge How would you draw a line through the points? How do you determine which line ‘fits best’?

迴歸分析的基本概念迴歸分析 (regression analysis) 以成對的資料點 (pair data) 研究兩個或兩個以上變數之間的關係以兩個變數為例, 所謂成對的資料點 (pair data) 係指觀察到的資料為 :, 如果經濟理論告訴我們 x 與 y 之間具有一定的關係, 我們可用 y = f (x) 來刻畫此關係舉例來說, 「個人所得」為「教育程度」所影響 ; 或者是「物價膨脹率」為「貨幣供給」所影響

y x Population Linear Regression Model Observed value  i = Random error

Ex : The population regression line and the error term

母體迴歸線簡單地說, 如果我們擁有母體資料, 母體迴歸線與相關係數一樣, 都可視為描繪這組母體資料的敘述統計量

Population & Sample Regression Models Population $ $ $ $ $ Unknown Relationship Random Sample $ $

y x Sample Linear Regression Model Unsampled observation  i = Random error Observed value ^

Least Squares ‘Best fit’ means difference between actual y values and predicted y values are a minimum But positive differences off-set negative Least Squares minimizes the Sum of the Squared Differences (SSE)

最佳預測式最佳的預測式 f (x) 極小化以下的方差和利用極小化方差和的概念來解出 (solve) 最佳的預測式 f (x) 的方法, 我們稱之為最小平方法 (method of least-squares)

最佳常數預測式如果我們沒有任何 x 的資訊, 對於 y 的最佳預測為何 ? 亦即, f (x i ) = c 我們稱此預測最佳常數預測式

其一階條件為因此,

Derivation of the OLS Estimators and are the values of b 0 and b 1 the above two normal equations.

From equations (1) and (2), and divide each term by n, we have

From (3),, substitute in (4) and collect terms, we have

定義 : 誤差 ( 殘差 )

Least Squares Example You’re a marketing analyst for Hasbro Toys. You gather the following data: Ad $Sales (Units) 11 21 32 42 54 Find the least squares line relating sales and advertising.

0 1 2 3 4 012345 Scattergram Sales vs. Advertising Sales Advertising

Parameter Estimation Solution Table 22 11111 21412 32946 421648 54251620 1510552637 xixi yiyi xixi yiyi xiyixiyi

Parameter Estimation Solution

0 1 2 3 4 012345 Regression Line Fitted to the Data Sales Advertising

Least Squares Thinking Challenge You’re an economist for the county cooperative. You gather the following data: Fertilizer (lb.)Yield (lb.) 43.0 65.5 106.5 129.0 Find the least squares line relating crop yield and fertilizer. © 1984-1994 T/Maker Co.

0 2 4 6 8 10 05 15 Scattergram Crop Yield vs. Fertilizer* Yield (lb.) Fertilizer (lb.)

Parameter Estimation Solution Table* 43.0 16 9.00 12 65.5 3630.25 33 106.510042.25 65 129.014481.00108 3224.0296162.50218 xixi 2 yiyi xixi yiyi xiyixiyi 2

Parameter Estimation Solution*

Regression Line Fitted to the Data* 0 2 4 6 8 10 05 15 Yield (lb.) Fertilizer (lb.)

Goodness of fit 如果我們設定 β 0 = μ y, β 1 = 0, 則迴歸線 y = μ y, 也就是說, 最佳常數預測式乃是最佳線性預測式的一個特例因此, 我們可以據此衡量, 在加入了 x 的資訊後, 對於預測 y 的預測力提升多少 ? 這就是迴歸線的配適度衡量 (goodness of fit) 簡單地說, 迴歸線的配適度衡量就是在比較 : 相對於最佳常數預測式, 最佳線性預測式增加了多少對 y 的解釋力

Goodness of fit y i − μ y 代表以最佳常數預測式預測 y 的預測誤差最佳線性預測式的預測誤差為我們可以將 y 變動的總變異拆解成迴歸線所不能解釋的變異以及可解釋變異 : 總變異 : 可解釋變異 : 不能解釋變異 :

Goodness of fit 總變異為可解釋變異與不能解釋變異的加總 : TV = EV + UV 因此, 我們可以用「可解釋變異」佔「總變異」的比例來衡量迴歸線的配適度 : 當越大, 代表總變異中有越多比例可以被迴歸線所解釋, 亦即迴歸線的配適度越佳

Goodness of fit 然而, 我們可以用另外一個角度來詮釋迴歸線的配適度衡量亦即可以用來衡量「在加入了 x 的資訊後, 對於預測 y 的預測力提升多少 ? 」迴歸線的配適度衡量同時也在比較 : 相對於最佳常數預測式, 最佳線性預測式增加了多少對 y 的解釋力

Goodness of fit 令 = 最佳常數預測式的預測誤差, 而 = 最佳線性預測式的預測誤差由於 TV = EV + UV, 則因此, 如果越大, 代表越小, 也就是說最佳線性預測式的預測誤差相對於最佳常數預測式的預測誤差越小, 亦即相對於最佳常數預測式, 最佳線性預測式所增加的解釋力越多

Copyright © 2011 Pearson Education, Inc. Linear Patterns Chapter 19.

Similar presentations

Presentation on theme: "Copyright © 2011 Pearson Education, Inc. Linear Patterns Chapter 19."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Copyright © 2011 Pearson Education, Inc. Linear Patterns Chapter 19.

Similar presentations

Presentation on theme: "Copyright © 2011 Pearson Education, Inc. Linear Patterns Chapter 19."— Presentation transcript:

Similar presentations

About project

Feedback