Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistics for Business and Economics Dr. TANG Yu Department of Mathematics Soochow University May 28, 2007.

Similar presentations


Presentation on theme: "Statistics for Business and Economics Dr. TANG Yu Department of Mathematics Soochow University May 28, 2007."— Presentation transcript:

1 Statistics for Business and Economics Dr. TANG Yu Department of Mathematics Soochow University May 28, 2007

2 Types of Correlation Positive correlation Slope is positive Negative correlation Slope is negtive No correlation Slope is zero

3 Hypothesis Test For the simple linear regression model If x and y are linearly related, we must have We will use the sample data to test the following hypotheses about the parameter

4 Sampling Distribution Just as the sampling distribution of the sample mean, X-bar, depends on the the mean, standard deviation and shape of the X population, the sampling distributions of the β 0 -hat and β 1 -hat least squares estimators depend on the properties of the {Y j } sub-populations (j=1,…, n). Given x j, the properties of the {Y j } sub-population are determined by the ε j error/random variable.

5 Model Assumption As regards the As regards the probability distributions of ε j ( j =1,…, n), it is assumed that: i.Each ε j is normally distributed, Y j is also normal; ii.Each ε j has zero mean, E(Y j ) = β 0 + β 1 x j iii.Each ε j has the same variance, σ ε 2, Var(Y j ) = σ ε 2 is also constant; iv.The errors are independent of each other, {Y i } and {Y j }, i  j, are also independent; v.The error does not depend on the independent variable(s). The effects of X and ε on Y can be separated from each other.

6 Graph Show xixi xjxj Y i : N (β 0 +β 1 x i ; σ ) Y j : N (β 0 +β 1 x j ; σ ) The y distributions have the same shape at each x value

7 Sum of Squares Sum of squares due to error (SSE) Sum of squares due to regression (SSR) Total sum of squares (SST)

8 ANOVA Table Source of Variation Sum of Squares Degree of Freedom Mean SquareF RegressionSSR1MSR=SSR/1MSR/MSE ErrorSSEn-2 MSE= SSE/(n-2) TotalSSTn-1

9 Example Total

10 SSE

11 SST and SSR

12 ANOVA Table Source of Variation Sum of Squares Degree of Freedom Mean SquareF Regression1824.31MSR=1824.335.93 Error253.95MSE=50.78 Total2078.26 As F=35.93 > 6.61, where 6.61 is the critical value for F-distribution with degrees of freedom 1 and 5 (significant level takes.05), we reject H 0, and conclude that the relationship between x and y is significant

13 Hypothesis Test For the simple linear regression model If x and y are linearly related, we must have We will use the sample data to test the following hypotheses about the parameter

14 Standard Errors : the sample standard deviation of ε. Standard error of estimate: the sample standard deviation of ε. Replacing σ ε with its estimate, s ε, the ofβ 1 -hat is Replacing σ ε with its estimate, s ε, the estimated standard error ofβ 1 -hat is

15 t-test Hypothesis Test statistic where t follows a t-distribution with n-2 degrees of freedom

16 Reject Rule This is a two-tailed test Hypothesis

17 Example Total

18 SSE

19 Calculation where 2.571 is the critical value for t-distribution with degree of freedom 5 (significant level takes.025), so we reject H 0, and conclude that the relationship between x and y is significant

20 Confidence Interval So the C% of β 1 is So the C% confidence interval estimators of β 1 is The ofβ 1 -hat is The estimated standard error ofβ 1 -hat is β 1 -hat is an estimator of β 1 follows a t-distribution with n-2 degrees of freedom

21 Example The 95% of β 1 in the previous example is The 95% confidence interval estimators of β 1 in the previous example is i.e., from –12.87 to -5.15, which does not contain 0

22 Regression Equation It is believed that the longer one studied, the better one’s grade is. The final mark (Y) on study time (X) is supposed to follow the regression equation: It is believed that the longer one studied, the better one’s grade is. The final mark (Y) on study time (X) is supposed to follow the regression equation: If the fit of the sample regression equation is satisfactory, it can be used to its mean value or to the dependent variable. If the fit of the sample regression equation is satisfactory, it can be used to estimate its mean value or to predict the dependent variable.

23 Estimate and Predict E.g.: What is the final mark of Tom who spent 30 hours on studying? I.e., given x = 30, how large is y? E.g.: What is the mean final mark of all those students who spent 30 hours on studying? I.e., given x = 30, how large is E(y)? For a particular element of a Y sub-population. For the expected value of a Y sub-population. PredictEstimate

24 What Is the Same? For a given X value, the point forecast (predict) of Y and the point estimator of the mean of the {Y} sub-population are the same: Ex.1 Estimate the mean final mark of students who spent 30 hours on study. Ex.2 Predict the final mark of Tom, when his study time is 30 hours.

25 What Is the Difference? The interval prediction of Y and the interval estimation of the mean of the {Y} sub-population are different: The prediction The prediction interval is wider than the confidence interval The estimation

26 Example Total

27 SSE

28 Estimation and Prediction For The point forecast (predict) of Y and the point estimator of the mean of the {Y} are the same:

29 Estimation and Prediction For But for the interval estimation and prediction, it is different:

30 Data Needed The prediction The estimation For

31 Calculation Estimation Prediction

32 Moving Rule As x g moves away from x the interval becomes longer. That is, the shortest interval is found at x. The confidence interval when x g = The confidence interval when x g = The confidence interval when x g =

33 Moving Rule As x g moves away from x the interval becomes longer. That is, the shortest interval is found at x. The confidence interval when x g = The confidence interval when x g = The confidence interval when x g =

34 Interval Estimation Estimation Prediction

35 Residual Analysis Regression Residual – the difference between an observed y value and its corresponding predicted value Properties of Regression Residual The mean of the residuals equals zero The standard deviation of the residuals is equal to the standard deviation of the fitted regression model

36 Example

37 Residual Plot Against x

38 Residual Plot Against y-hat

39 Three Situations Good Pattern Non-constant Variance Model form not adequate

40 Standardized Residual Standard deviation of the ith residual where Standardized residual for observation i

41 Standardized Residual Plot

42 Standardized Residual The standardized residual plot can provide insight about the assumption that the error term has a normal distribution It is expected to see approximately 95% of the standardized residuals between – 2 and +2 If the assumption is satisfied, the distribution of the standardized residuals should appear to come from a standard normal probability distribution

43 Detecting Outlier Outlier

44 Influential Observation Outlier

45 Influential Observation Influential observation

46 High Leverage Points Leverage of observation For example

47 Contact Information Tang Yu ( 唐煜 ) ytang@suda.edu.cn http://math.suda.edu.cn/homepage/tangy


Download ppt "Statistics for Business and Economics Dr. TANG Yu Department of Mathematics Soochow University May 28, 2007."

Similar presentations


Ads by Google