Download presentation
Presentation is loading. Please wait.
1
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case Study: Cost of Manufacturing Computers Simple Linear Regression Agenda
2
2 The Empirical Rule (p.5)
3
3 Review Example Suppose that the average hourly earnings of production workers over the past three years were reported to be $12.27, $12.85, and $13.39 with the standard deviations $0.15, $0.18, and $0.23, respectively. The average hourly earnings of the production workers in your company also continued to rise over the past three years from $12.72 in 2002, $13.35 in 2003, to $13.95 in 2004. Assume that the distribution of the hourly earnings for all production workers is mound-shaped. Do the earnings in your company become less and less competitive? Why or why not.
4
4 Review Example Year Industry average Industry std. % increase Company average % increase Z score 200212.270.1512.723 200312.850.184.73%13.354.95%2.77 200413.390.234.20%13.954.50%2.43
5
5 The Empirical Rule Generalize the results from the empirical rule. Justify the use of the mound-shaped distribution.
6
6 Sampling Distribution (p.6) The sampling distribution of a statistic is the probability distribution for all possible values of the statistic that results when random samples of size n are repeatedly drawn from the population. When the sample size is large, what is the sampling distribution of the sample mean / sample proportion / the difference of two samples means / the difference of two sample proportions? NORMAL !!!
7
7 Central Limit Theorem (CLT) (p.6)
8
8 CLT
9
9 Summary: Sampling Distributions The sampling distribution of a sample mean The sampling distribution of a sample proportion The sampling distribution of the difference between two sample means The sampling distribution of the difference between two sample proportions
10
10 Standard Deviations
11
11 Statistical Inference: Estimation Research Question: What is the parameter value? Sample of size n Population Tools (i.e., formulas): Point Estimator Interval Estimator
12
12 Confidence Interval Estimation (p.7)
13
13 Example 1: Estimation for the population mean A random sampling of a company’s weekly operating expenses for a sample of 48 weeks produced a sample mean of $5474 and a standard deviation of $764. Construct a 95% confidence interval for the company’s mean weekly expenses. Example 2Example 2: Estimation for the population proportion
14
14 Statistical Inference: Hypothesis Testing Research Question: Is the claim supported? Sample of size n Population Tools (i.e., formulas): z or t statistic
15
15 Hypothesis Testing (p.9)
16
16 Example A bank has set up a customer service goal that the mean waiting time for its customers will be less than 2 minutes. The bank randomly samples 30 customers and finds that the sample mean is 100 seconds. Assuming that the sample is from a normal distribution and the standard deviation is 28 seconds, can the bank safely conclude that the population mean waiting time is less than 2 minutes?
17
17 Setting Up the Rejection Region Type I Error Type I Error If we reject H 0 (accept H a ) when in fact H 0 is true, this is a Type I error. False Alarm.
18
18 The P-Value of a Test (p.11)P-Value The p-value or observed significance level is the smallest value of for which test results are statistically significant. “the conclusion of rejecting H 0 can be reached.”
19
19 Regression Analysis A technique to examine the relationship between an outcome variable (dependent variable, Y) and a group of explanatory variables (independent variables, X 1, X 2, … X k ). The model allows us to understand (quantify) the effect of each X on Y. It also allows us to predict Y based on X 1, X 2, …. X k.
20
20 Types of Relationship Linear Relationship Simple Linear Relationship Y = 0 + 1 X + Multiple Linear Relationship Y = 0 + 1 X 1 + 2 X 2 + … + k X k + Nonlinear Relationship Y = 0 exp( 1 X+ Y = 0 + 1 X 1 + 2 X 1 2 + … etc. Will focus only on linear relationship.
21
21 Simple Linear Regression Model population sample True effect of X on Y Estimated effect of X on Y Key questions: 1. Does X have any effect on Y? 2. If yes, how large is the effect? 3. Given X, what is the estimated Y?
22
22 Least Squares Method Least squares line: Least squares line It is a statistical procedure for finding the “best- fitting” straight line. It minimizes the sum of squares of the deviations of the observed values of Y from those predicted Deviations are minimized. Bad fit.
23
23 Case: Cost of Manufacturing Computers (pp.13 – 45) A manufacturer produces computers. The goal is to quantify cost drivers and to understand the variation in production costs from week to week. The following production variables were recorded: COST: the total weekly production cost (in $millions) UNITS: the total number of units (in 000s) produced during the week. LABOR: the total weekly direct labor cost (in $10K). SWITCH: the total number of times that the production process was re-configured for different types of computers FACTA: = 1 if the observation is from factory A; = 0 if from factory B.
24
24 Raw Data (p. 14) How many possible regression models can we build?
25
25 Simple Linear Regression Model (pp. 17 – 26) Question1: Is Labor a significant cost driver? This question leads us to think about the following model: Cost = f(Labor) + . Specifically, Cost = 0 + 1 Labor + Question 2: How well does this model perform? (How accurate can Labor predict Cost?) This question leads us to try other regression models and make comparison.
26
26 Initial Analysis (pp. 15 – 16) Summary statistics + Plots (e.g., histograms + scatter plots) + Correlations Things to look for Features of Data (e.g., data range, outliers) do not want to extrapolate outside data range because the relationship is unknown (or un-established). Summary statistics and graphs. Is the assumption of linearity appropriate? Inter-dependence among variables? Any potential problem? Scatter plots and correlations.
27
27 Correlation (p. 15) (rho): Population correlation (its value most likely is unknown.) r: Sample correlation (its value can be calculated from the sample.) Correlation is a measure of the strength of linear relationship. Correlation falls between –1 and 1. No linear relationship if correlation is close to 0. But, …. = –1 –1 < < 0 = 0 0 < < 1 = 1 r = –1 –1 < r < 0 r = 0 0 < r < 1 r = 1
28
28 Correlation (p. 15) Is 0.9297 a or r? Sample size P-value for H 0 : = 0 H a : ≠ 0
29
29 Fitted Model (Least Squares Line) (p.18) H 0 : 1 = 0 H a : 1 ≠ 0 1 or b 1 ? 0 or b 0 ? S b1 S b0 b1b1 b0b0 Degrees of freedom = n – k – 1, where n = sample size, k = # of Xs. ** Divide the p-value by 2 for one-sided test. Make sure there is at least weak evidence for doing this step.
30
30 Hypothesis Testing and Confidence Interval Estimation for (pp. 19 – 20) S b1 S b0 b1b1 b0b0 Degrees of freedom = n – k – 1 k = # of independent variables Q1: Does Labor have any impact on Cost → Hypothesis Testing Q2: If so, how large is the impact? → Confidence Interval Estimation
31
31 Analysis of Variance (p. 21) - Not very useful in simple regression. - Useful in multiple regression.
32
32 Sum of SquaresSum of Squares (p.22) S yy = Total variation in Y SSE = remaining variation that can not be explained by the model. SSR = S yy – SSE = variation in Y that has been explained by the model.
33
33 Fit Statistics (pp. 23 – 24) 0.45199 x 0.45199 = 0.204295
34
34 Prediction (pp. 25 – 26) What is the predicted production cost of a given week, say, Week 21 of the year that Labor = 5 (i.e., $50,000)? Point estimate: predicted cost = b 0 + b 1 (5) = 1.0867 + 0.0081 (5) = 1.12724 (million dollars). Margin of error? → Prediction Interval What is the average production cost of a typical week that Labor = 5? Point estimate: estimated cost = b 0 + b 1 (5) = 1.0867 + 0.0081 (5) = 1.12724 (million dollars). Margin of error? → Confidence Interval
35
35 Prediction vs. Confidence Intervals (pp. 25 – 26) ☺ ☺ ☺ ☺ ☺ ☻☻ ☻ ☻☻☻ ☺ Variation (margin of error) on both ends seems larger. Implication?
36
36 Another Simple Regression Model: Cost = 0 + 1 Units + (p. 27) A better model? Why?
37
37 Statgraphics Simple Regression Analysis Relate / Simple Regression X = Independent variable, Y = dependent variable For prediction, click on the Tabular option icon and check Forecasts. Right click to change X values. Multiple Regression Analysis Relate / Multiple Regression For prediction, enter values of Xs in the Data Window and leave the corresponding Y blank. Click on the Tabular option icon and check Reports.
38
38 Normal Probabilities
39
39 Critical Values of t
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.