Example 10.1 Experimenting with a New Pizza Style at the Pepperoni Pizza Restaurant Concepts in Hypothesis Testing
2 Background Information The manager of Pepperoni Pizza Restaurant has recently begun experimenting with a new method of baking its pepperoni pizzas.
3 Background Information – cont’d He believes that the new method produces a better-tasting pizza, but he would like to base a decision on whether to switch from the old method to the new method on customer reactions. Therefore he performs an experiment.
4 The Experiment For 100 randomly selected customers who order a pepperoni pizza for home delivery, he includes both an old style and a free new style pizza in the order.
5 The Experiment – cont’d All he asks is that these customers rate the difference between pizzas on a -10 to +10 scale, where -10 means they strongly favor the old style, +10 means they strongly favor the new style, and 0 means they are indifferent between the two styles. Once he gets the ratings from the customers, how should he proceed?
6 Hypothesis Testing This example’s goal is to explain hypothesis testing concepts. We are not implying that the manager would, or should, use a hypothesis testing procedure to decide whether to switch methods.
7 Hypothesis Testing – cont’d First, hypothesis testing does not take costs into account. In this example, if the new method is more costly it would be ignored by hypothesis testing. Second, even if costs of the two pizza-making methods are equivalent, the manager might base his decision on a simple point estimate and possibly a confidence interval.
8 Null and Alternative Hypotheses Usually, the null hypothesis is labeled H o and the alternative hypothesis is labeled H a. The null and alternative hypotheses divide all possibilities into two nonoverlapping sets, exactly one of which must be true.
9 Null and Alternative Hypotheses – cont’d Traditionally, hypotheses testing has been phrased as a decision-making problem, where an analyst decides either to accept the null hypothesis or reject it, based on the sample evidence.
10 One-Tailed Versus Two-Tailed Tests The form of the alternative hypothesis can be either a one-tailed or two-tailed, depending on what the analyst is trying to prove. A one-tailed hypothesis is one where the only sample results which can lead to rejection of the null hypothesis are those in a particular direction, namely, those where the sample mean rating is positive.
11 One-Tailed Versus Two-Tailed Tests – cont’d A two-tailed test is one where results in either of two directions can lead to rejection of the null hypothesis. Once the hypotheses are set up, it is easy to detect whether the test is one-tailed or two- tailed.
12 One-Tailed Versus Two-Tailed Tests – cont’d One tailed alternatives are phrased in terms of “>” or “<“ whereas two tailed alternatives are phrased in terms of “ ” The real question is whether to set up hypotheses for a particular problem as one-tailed or two-tailed. There is no statistical answer to this question. It depends entirely on what we are trying to prove.
13 Types of Errors Whether or not one decides to accept or reject the null hypothesis, it might be the wrong decision. One might reject the null hypothesis when it is true or incorrectly accept the null hypothesis when it is false. These errors are called type I and type II errors.
14 Types of Errors – cont’d In general we incorrectly reject a null hypothesis that is true. We commit a type II error when we incorrectly accept a null hypothesis that is false. These ideas appear graphically below.
15 Types of Errors -- continued While these errors seem to be equally serious, actually type I errors have traditionally been regarded as the more serious of the two. Therefore, the hypothesis-testing procedure factors caution in terms of rejecting the null hypothesis.
16 Significance Level and Rejection Region The real question is how strong the evidence in favor of the alternative hypothesis must be to reject the null hypothesis. The analyst determines the probability of a type I error that he is willing to tolerate. The value is denoted by and is most commonly equal to 0.05, although sigma=0.01 and sigma=0.10 are also frequently used.
17 Significance Level and Rejection Region – cont’d The value of is called the significance level of the test. Then, given the value of sigma, we use statistical theory to determine the rejection region.
18 Significance Level and Rejection Region – cont’d If the sample falls into this region we reject the null hypothesis; otherwise, we accept it. Sample evidence that falls into the rejection region is called statistically significant at the sigma level.
19 Significance from p-values This approach is currently more popular than the significance level and rejected region approach. This approach is to avoid the use of the level and instead simply report “how significant” the sample evidence is.
20 Significance from p-values – cont’d We do this by means of the p-value.The p-value is the probability of seeing a random sample at least as extreme as the sample observes, given that the null hypothesis is true. Here “extreme” is relative to the null hypothesis.
21 Significance from p-values – cont’d In general smaller p-values indicate more evidence in support of the alternative hypothesis. If a p-value is sufficiently small, almost any decision maker will conclude that rejecting the null hypothesis is the more “reasonable” decision.
22 Significance from p-values – cont’d How small is a “small” p-value? This is largely a matter of semantics but if the −p-value is less than 0.01, it provides “convincing” evidence that the alternative hypothesis is true; −p-value is between 0.01 and 0.05, there is “strong” evidence in favor of the alternative hypothesis;
23 Significance from p-values – cont’d −p-value is between 0.05 and 0.10, it is in a “gray area”; −p-values greater than 0.10 are interpreted as weak or no evidence in support of the alternative.
Example 10.1a Experimenting with a New Pizza Style at the Pepperoni Pizza Restaurant Hypothesis Tests for a Population Mean
Objective To use a one-sample t test to see whether consumers prefer the new style pizza to the old style.
Background Information Recall that the manager of the Pepperoni Pizza Restaurant is running an experiment to test the hypotheses of H 0 : μ ≤ 0 versus H a : μ > 0, where μ is the mean rating in the entire customer population. Here, each customer rates the difference between an old-style pizza and a new-style pizza on a -10 to +10 scale, where negative ratings favor the old-style pizza and positive ratings favor the new-style pizza.
PIZZA.XLS The ratings of 40 randomly selected customers and several summary statistics appear in this file and in the following table.
Summary Statistics From the summary statistics, we see that the sample mean is 2.10 and the sample standard deviation is The positive sample mean provides some evidence in favor of the alternative hypothesis, but given the rather large standard deviation and the boxplot of ratings shown on the next slide, does it provide enough evidence to reject H 0 ?
Summary Statistics – cont’d
Running the Test To run the test, we calculate the test statistic, using the borderline null hypothesis value mu 0 = 0, and report how much probability is beyond it in the right tail of the appropriate t distribution. We use the right tail because the alternative is one- tailed of the “greater than” variety.
Running the Test – cont’d The test statistic is The probability beyond this value in the right tail of the t distribution with n-1 = 39 degrees of freedom is approximately 0.004, which can be found in Excel with the function TDIST(2.816,39,1).
Running the Test – cont’d The probability, 0.004, is the p-value for the test. It indicates that these sample results would be very unlikely if the null hypothesis is true. The manager has two choices: he can conclude that the null hypothesis is true or he can conclude that the alternative hypothesis is true - and presumably switch to the new-style pizza. The second choice appears to be more reasonable.
Using StatTools Another way to interpret the results is in terms of traditional significance levels, but the p-value is the preferred method.
Using StatTools – cont’d The StatTools One-Sample procedure can be used to perform this analysis easily. To use it, select the StatTools/Statistical Inference/One-Sample Analysis menu item, and choose the Rating variable as the variable to analyze. Then fill in the dialog boxes as shown on the following slides.
One-Sample Hypothesis Test Dialog Box
The Results Most of this output should be familiar; it mirrors the previous calculations. The results are significant at the 1% level.
Conclusion Should the manager switch to the new-style pizza on the basis of these sample results? We would probably recommend “yes”. There is no indication that the new-style pizza costs any more to make than the old-style pizza, and the sample evidence is fairly convincing that customers, on average, will prefer the new-style pizza.
Conclusion – cont’d Therefore, unless there are reasons for not switching (for example, costs), we recommend the switch.