Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.

Similar presentations


Presentation on theme: "1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case."— Presentation transcript:

1 1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case Study: Cost of Manufacturing Computers Simple Linear Regression Agenda

2 2 The Empirical Rule (p.5)

3 3 Review Example Suppose that the average hourly earnings of production workers over the past three years were reported to be $12.27, $12.85, and $13.39 with the standard deviations $0.15, $0.18, and $0.23, respectively. The average hourly earnings of the production workers in your company also continued to rise over the past three years from $12.72 in 2002, $13.35 in 2003, to $13.95 in 2004. Assume that the distribution of the hourly earnings for all production workers is mound-shaped. Do the earnings in your company become less and less competitive? Why or why not.

4 4 Review Example Year Industry average Industry std. % increase Company average % increase Z score 200212.270.1512.723 200312.850.184.73%13.354.95%2.77 200413.390.234.20%13.954.50%2.43

5 5 The Empirical Rule Generalize the results from the empirical rule. Justify the use of the mound-shaped distribution.

6 6 Sampling Distribution (p.6) The sampling distribution of a statistic is the probability distribution for all possible values of the statistic that results when random samples of size n are repeatedly drawn from the population. When the sample size is large, what is the sampling distribution of the sample mean / sample proportion / the difference of two samples means / the difference of two sample proportions?  NORMAL !!!

7 7 Central Limit Theorem (CLT) (p.6)

8 8 CLT

9 9 Summary: Sampling Distributions The sampling distribution of a sample mean The sampling distribution of a sample proportion The sampling distribution of the difference between two sample means The sampling distribution of the difference between two sample proportions

10 10 Standard Deviations

11 11 Statistical Inference: Estimation Research Question: What is the parameter value? Sample of size n Population Tools (i.e., formulas): Point Estimator Interval Estimator

12 12 Confidence Interval Estimation (p.7)

13 13 Example 1: Estimation for the population mean A random sampling of a company’s weekly operating expenses for a sample of 48 weeks produced a sample mean of $5474 and a standard deviation of $764. Construct a 95% confidence interval for the company’s mean weekly expenses. Example 2Example 2: Estimation for the population proportion

14 14 Statistical Inference: Hypothesis Testing Research Question: Is the claim supported? Sample of size n Population Tools (i.e., formulas): z or t statistic

15 15 Hypothesis Testing (p.9)

16 16 Example A bank has set up a customer service goal that the mean waiting time for its customers will be less than 2 minutes. The bank randomly samples 30 customers and finds that the sample mean is 100 seconds. Assuming that the sample is from a normal distribution and the standard deviation is 28 seconds, can the bank safely conclude that the population mean waiting time is less than 2 minutes?

17 17 Setting Up the Rejection Region Type I Error Type I Error If we reject H 0 (accept H a ) when in fact H 0 is true, this is a Type I error. False Alarm.

18 18 The P-Value of a Test (p.11)P-Value The p-value or observed significance level is the smallest value of  for which test results are statistically significant. “the conclusion of rejecting H 0 can be reached.”

19 19 Regression Analysis A technique to examine the relationship between an outcome variable (dependent variable, Y) and a group of explanatory variables (independent variables, X 1, X 2, … X k ). The model allows us to understand (quantify) the effect of each X on Y. It also allows us to predict Y based on X 1, X 2, …. X k.

20 20 Types of Relationship Linear Relationship Simple Linear Relationship Y =  0 +  1 X +  Multiple Linear Relationship Y =  0 +  1 X 1 +  2 X 2 + … +  k X k +  Nonlinear Relationship Y =  0 exp(  1 X+  Y =  0 +  1 X 1 +  2 X 1 2 +  … etc. Will focus only on linear relationship.

21 21 Simple Linear Regression Model population sample True effect of X on Y Estimated effect of X on Y Key questions: 1. Does X have any effect on Y? 2. If yes, how large is the effect? 3. Given X, what is the estimated Y?

22 22 Least Squares Method Least squares line: Least squares line It is a statistical procedure for finding the “best- fitting” straight line. It minimizes the sum of squares of the deviations of the observed values of Y from those predicted Deviations are minimized. Bad fit.

23 23 Case: Cost of Manufacturing Computers (pp.13 – 45) A manufacturer produces computers. The goal is to quantify cost drivers and to understand the variation in production costs from week to week. The following production variables were recorded: COST: the total weekly production cost (in $millions) UNITS: the total number of units (in 000s) produced during the week. LABOR: the total weekly direct labor cost (in $10K). SWITCH: the total number of times that the production process was re-configured for different types of computers FACTA: = 1 if the observation is from factory A; = 0 if from factory B.

24 24 Raw Data (p. 14) How many possible regression models can we build?

25 25 Simple Linear Regression Model (pp. 17 – 26) Question1: Is Labor a significant cost driver? This question leads us to think about the following model: Cost = f(Labor) + . Specifically, Cost =  0 +  1 Labor +  Question 2: How well does this model perform? (How accurate can Labor predict Cost?) This question leads us to try other regression models and make comparison.

26 26 Initial Analysis (pp. 15 – 16) Summary statistics + Plots (e.g., histograms + scatter plots) + Correlations Things to look for Features of Data (e.g., data range, outliers) do not want to extrapolate outside data range because the relationship is unknown (or un-established). Summary statistics and graphs. Is the assumption of linearity appropriate? Inter-dependence among variables? Any potential problem? Scatter plots and correlations.

27 27 Correlation (p. 15)  (rho): Population correlation (its value most likely is unknown.) r: Sample correlation (its value can be calculated from the sample.) Correlation is a measure of the strength of linear relationship. Correlation falls between –1 and 1. No linear relationship if correlation is close to 0. But, ….  = –1 –1 <  < 0  = 0 0 <  < 1  = 1 r = –1 –1 < r < 0 r = 0 0 < r < 1 r = 1

28 28 Correlation (p. 15) Is 0.9297 a  or r? Sample size P-value for H 0 :  = 0 H a :  ≠ 0

29 29 Fitted Model (Least Squares Line) (p.18) H 0 :  1 = 0 H a :  1 ≠ 0  1 or b 1 ?  0 or b 0 ? S b1 S b0 b1b1 b0b0 Degrees of freedom = n – k – 1, where n = sample size, k = # of Xs. ** Divide the p-value by 2 for one-sided test. Make sure there is at least weak evidence for doing this step.

30 30 Hypothesis Testing and Confidence Interval Estimation for  (pp. 19 – 20) S b1 S b0 b1b1 b0b0 Degrees of freedom = n – k – 1 k = # of independent variables Q1: Does Labor have any impact on Cost → Hypothesis Testing Q2: If so, how large is the impact? → Confidence Interval Estimation

31 31 Analysis of Variance (p. 21) - Not very useful in simple regression. - Useful in multiple regression.

32 32 Sum of SquaresSum of Squares (p.22) S yy = Total variation in Y SSE = remaining variation that can not be explained by the model. SSR = S yy – SSE = variation in Y that has been explained by the model.

33 33 Fit Statistics (pp. 23 – 24) 0.45199 x 0.45199 = 0.204295

34 34 Prediction (pp. 25 – 26) What is the predicted production cost of a given week, say, Week 21 of the year that Labor = 5 (i.e., $50,000)? Point estimate: predicted cost = b 0 + b 1 (5) = 1.0867 + 0.0081 (5) = 1.12724 (million dollars). Margin of error? → Prediction Interval What is the average production cost of a typical week that Labor = 5? Point estimate: estimated cost = b 0 + b 1 (5) = 1.0867 + 0.0081 (5) = 1.12724 (million dollars). Margin of error? → Confidence Interval

35 35 Prediction vs. Confidence Intervals (pp. 25 – 26) ☺ ☺ ☺ ☺ ☺ ☻☻ ☻ ☻☻☻ ☺ Variation (margin of error) on both ends seems larger. Implication?

36 36 Another Simple Regression Model: Cost =  0 +  1 Units +  (p. 27) A better model? Why?

37 37 Statgraphics Simple Regression Analysis Relate / Simple Regression X = Independent variable, Y = dependent variable For prediction, click on the Tabular option icon and check Forecasts. Right click to change X values. Multiple Regression Analysis Relate / Multiple Regression For prediction, enter values of Xs in the Data Window and leave the corresponding Y blank. Click on the Tabular option icon and check Reports.

38 38 Normal Probabilities

39 39 Critical Values of t


Download ppt "1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case."

Similar presentations


Ads by Google