Presentation is loading. Please wait.

Presentation is loading. Please wait.

Inferential Statistics Part 2: Hypothesis Testing Chapter 9 p. 280 - 306.

Similar presentations


Presentation on theme: "Inferential Statistics Part 2: Hypothesis Testing Chapter 9 p. 280 - 306."— Presentation transcript:

1 Inferential Statistics Part 2: Hypothesis Testing Chapter 9 p. 280 - 306

2 Introduction Hypothesis testing is closely related to estimation (i.e., what we studied at last week) The difference is that now we are posing a hypothesis that we want to test For example, rather than just estimating a population parameter using a sample, we may hypothesize that a sample is different than the population in some way Bases on a sample statistic we can either accept or reject the hypothesis

3 Steps in Classical Hypothesis Testing 1: Formulate a hypothesis 2: Specify the sampling statistic and its distribution 3: Select a level of significance 4: Construct a decision rule 5: Compute a value of the test statistic 6: Decide to accept or reject the hypothesis

4 Formulate a hypothesis Null Hypothesis (H 0 ) – when the sample statistic follows the population parameter (e.g., when characteristics from a sample more or less match those from the population) Alternative Hypothesis (H A ) – When the sample statistic does not follow the population parameter Possible statements:

5 Formulate a hypothesis Which type of hypothesis (null or alternative) are we typically concerned with? How do “tails” of a distribution fit the statements? What does it mean to say these hypotheses are mutually exclusive & exhaustive?

6 Formulate a hypothesis Remember that the hypotheses are being tested using sample data that may contain sampling error This is why hypothesis testing falls under the category of inferential statistics We have to infer results based on a sample We can’t be completely certain of the results, so there is a degree of uncertainty associated with our answers To estimate this uncertainty we rely upon probability

7 Types of error Type 1 Error: when we falsely reject a null hypothesis, the probability of doing so is labeled α (i.e., α = P(type 1 error) Type 2 Error: when we falsely accept a null hypothesis, the probability of doing so is labeled β (i.e., β = P(type 2 error) H 0 is trueH 0 is false Reject H 0 Type 1 ErrorNo Error Accept H 0 No ErrorType 2 Error

8 Specify the sampling statistic and its distribution What sampling statistic should you choose for μ, σ, and pi respectively? What distributions will the sampling statistics have and how do we know? FYI, when used to test a hypothesis, sampling statistics are also called test statistics

9 Select a level of significance In classical hypothesis testing we are only concerned with type 1 error (α) For example: alpha of 0.1, 0.05, or 0.01 The value for alpha is called the significance level This means that if we reject H 0 we will be very confident that it is false How confident depends on the significance level The flip-side of this approach is that we are more likely to not reject a null hypothesis that is false

10 Select a level of significance How does this fit with the idea that we are typically concerned with H A rather than H 0 ? Answer: since the significance is tied to rejecting H 0 it is also linked with accepting H A This means that the hypotheses we make should be structured so that we are testing H A (i.e., rejecting H 0 should be scientifically interesting) To make this more clear, think about the opposite case: if we were really interested in accepting H 0 we would have no idea about the significance because we are ignoring type 2 error

11 Select a level of significance Whenever we report a decision about the null hypothesis (to reject it or not) we also report the statistical significance Example: The null hypothesis is rejected at the 0.05 significance level

12 Select a level of significance Which significance level we actually choose depends on the application When might we want a very small α? In geography 0.1, 0.05., 0.01 are pretty typical It is also common to see results reported for multiple alphas

13 Construct a decision rule For this step we take the hypothesis we’ve defined and the significance level we’ve selected and determine the critical region and the critical values In other words, we take our values, and determine the thresholds for accepting or rejecting H 0

14 Construct a decision rule Critical Regions: if the sample statistic falls within these area(s) we will reject H 0 Critical Values: the thresholds that divide the critical region(s) from the non-critical region

15 Construct a decision rule For a test statistic with a normal distribution (e.g., x and p) we make our decision rule using: For p, the equation is: For x the equation is: Key things to remember How to calculate σ The number of tails

16 Compute a value of the test statistic Here we just compute the values using equations we’re familiar with (e.g., x and p) Note that constructing a decision rule and computing the values of a test statistic can also be done using z-values for the critical values and for the test statistic (see p. 289 for details)

17 Decide to accept or reject the hypothesis Now we just compare the test statistic with the critical values and make our decision to reject H 0 or not

18 Classical Hypothesis Testing Example Has the mean temperature of Charlotte increased over the last 30 years? This is an example for μ

19 Example Data Suppose Charlotte’s annual mean temp for the last: 150 years is 50 o F. 30 years is 53 o F. Suppose the population variance, σ 2, for these 150 years is 9 (so σ = 3) Assumptions: Each year is independent of other years The last 30 years act as a sample of the population of years since greenhouse gases have been emitted into the Earth’s atmosphere. (These 30 are all we have access to). These 30 years come from the same distribution.

20 Steps in Classical Hypothesis Testing 1: Formulate a hypothesis 2: Specify the sampling statistic and its distribution 3: Select a level of significance 4: Construct a decision rule 5: Compute a value of the test statistic 6: Decide to accept or reject the hypothesis

21 Step 1: Formulate a hypothesis Scientifically, we say our hypothesis is: the mean temperature of Charlotte has increased over the last 30 years Statistically, we develop Null hypothesisH 0 : Θ ≤ Θ 0 Alternative hypothesisH A : Θ > Θ 0 When we apply the data: Null hypothesisH 0 : x ≤ 50 o F Alternative hypothesisH A : x > 50 o F This is a 1-sided test

22 Step 2: Specify the sampling statistic and its distribution What sampling statistic should we use? What distribution with it have? Answers: The sample mean (in this case 53 o F) A normal distribution Our sample size is 30, which is just large enough to use the z rather than the t distribution This is an application of the central limit theorem

23 The sample statistic & the hypothesis If x is below or near 50, we do not reject the null hypothesis: H 0 : x ≤ 50 o F. If x is far greater than 50, we reject the null hypothesis in favor of the alternative hypothesis: H A : x > 50 o F. Why isn’t this simple comparison sufficient? Answer: because x is just a sample and may have error We set a cutoff point for x, above which we reject our null hypothesis. This cutoff is set at a point where, if the null hypothesis were true, a value of x this large or larger would be very unlikely (due to sampling variation alone).

24 Step 3: Select a level of significance This step is always somewhat arbitrary, but we’ll just use 0.05 This means that we’re willing to accept a 5% chance of having a type 1 error (i.e., rejecting H 0 when we should not)

25 Step 4: Construct a decision rule

26 So we say that we will reject H 0 if x is > 50.9037 with a significance level of 0.05

27 Steps 5 & 6 Step 5: Compute a value of the test statistic In this case we already have the test statistic (x = 53) Step 6: Decide to reject the null hypothesis (or not) Now we just compare our test statistic with the critical value Since 53 is > 50.9037 we will reject the null hypothesis and accept the alternative hypothesis

28 Shortcomings of the classical approach The decision to reject the null hypothesis is binary No detail is given for how far the test statistic is from the critical value (e.g., is it just above it, or way above it) Different α value might read to different decisions

29 The PROB-VALUE approach This approach fixes the shortcomings of the classical approach Basically it involves using the same equations, but flipping them around so that we solve for α In other words: At what level is the test statistic significant What is the α (i.e., the probability of making a type 1 error) Should we reject H 0 how likely are we to be wrong

30 The PROB-VALUE approach This is based on the equation: The difference from the classical approach is that now we look up the z-value to tell us the alpha (α)

31 PROB-VALUE example Charlotte Example Using a z-table, what alpha is associated with this z? Answer: α = 0.000000021602 This value is actually from Excel, the z-table in the book does not go up to 5.477 In other words, there is a 2.16 in 100 million chance of the null being falsely rejected

32 PROB-VALUE & alpha Remember that the PROB-VALUE is equivalent to finding the alpha associated with a z-value Therefore we can also use the PROB-VALUE to reject a H 0 (or not) Example: If our selected significance level is 0.05 And our PROB-VALUE is 0.00001 We’d reject the null hypothesis since 0.00001 < 0.05

33 Additional things to consider As with confidence intervals, when conducting a hypothesis using μ we should use t instead of z when: n < 30 we have s instead of σ (with an n > 30 either is ok) As with confidence intervals, when conducting a hypothesis test using π we should use the binomial distribution instead of z when: n < 100 Example 9-4 in the book solves such a problem

34 Sample Problems Galore! We’re going to go through several examples that are reminiscent of problems on your homework and what will be on the exam

35 Key questions to ask before starting What is the test statistic? x and p have slightly different equations, particularly for their standard deviations How many tails does the test have? Determines whether we use α or α/2 Determines whether we multiply the PROB-VALUE by 2 If we are doing a 1 tailed test, which critical value are we concerned about? : lower critical value : upper critical value What distribution should we use (t, z, or binomial)

36 Sample Problem #1 A census of UNC students found that students had, on average, 3.4 pets each while growing up with a standard deviation of 1.9 pets. A single dorm with 220 students had an average of 3.65 pets growing up. Assuming the students are assigned to the dorm at random (i.e., they are statistically independent), does this dorm have a higher than normal “pet history” with a 0.01 significance level?

37 Sample Problem #1 What is the test statistic? How many tails does the test have? Which critical value are we concerned about? Putting these together - what are H 0 and H A ?

38 Sample Problem #1 What are n, σ, and α? n = 220 σ = 1.9 α = 0.01 What distribution should we use and why? The z-distribution since n > 30 What is the z-value associated with α? Z 0.01 = 2.33 What is the standard deviation of x?

39 Sample Problem #1 Critical Value Should we reject the null hypothesis?

40 Sample Problem #1 What would happen to the critical value if we changed the significance level to 0.05? Does this make us more or less likely to reject the null hypothesis?

41 Sample Problem #1 PROB-VALUE What values go in this equation? What do we do with the resulting z-value? What is the PROB-VALUE

42 Sample Problem #2 A census of UNC students found that students had, on average, a 12 minute commute (walking, bicycling, bus, car, etc.) to their first class of the day. 16 randomly sampled students living off campus had an average commute of 17 minutes with a sample standard deviation of 4.5 minutes. Do students living off campus have a longer commute with a 0.05 significance level?

43 Sample Problem #2 What is the test statistic? How many tails does the test have? Which critical value are we concerned about? Putting these together - what are H 0 and H A ?

44 Sample Problem #2 What are n, s, and α? n = 16 s = 4.5 α = 0.05 What distribution should we use and why? The t-distribution since n < 30 and we have s instead of σ What is the t-value associated with α? t 0.05,15 = 1.75 What is the standard deviation of x?

45 Sample Problem #2 Critical Value Should we reject the null hypothesis?

46 Sample Problem #2 PROB-VALUE What values go in this equation? What do we do with the resulting z-value? What is the PROB-VALUE

47 Sample Problem #3 A botanical index states that the average weight of a northern red oak acorn is 6 grams. A random sample of 101 acorns was collected from the red oaks in the quad and the acorns had an average weight of 5.6 grams and a sample standard deviation of 1.3 grams. Are the oak trees in the quad atypical from normal trees with a significance of 0.05?

48 Sample Problem #3 What is the test statistic? How many tails does the test have? Which critical value are we concerned about? Putting these together - what are H 0 and H A ?

49 Sample Problem #3 What are n, s, and α? n = 101 s = 1.3 α = 0.05 What distribution should we use and why? Either one would be ok, but since we’re using s we’ll go with t What is the t-value associated with α/2? t 0.025,100 = 1.98Note how close this is to z 0.025 = 1.96 What is the standard deviation of x?

50 Sample Problem #3 Critical Value Should we reject the null hypothesis?

51 Sample Problem #3 PROB-VALUE What values go in this equation? What do we do with the resulting z-value? What is the PROB-VALUE

52 Sample Problem #4 Suppose a census of UNC students found that 8 percent of students bike to class regularly. A random sample of 160 business majors found that 7 biked regularly. If would seem that business majors bike less than other students, what significance level does this statement have?

53 Sample Problem #4 What is the test statistic? How many tails does the test have? Which critical value are we concerned about? Putting these together - what are H 0 and H A ?

54 Sample Problem #4 What are n, π, and p? n = 160 π = 0.08 p = 7/160 = 0.04375 What distribution should we use and why? The z-distribution since have probabilities and a large n What is the standard deviation of p?

55 Sample Problem #4 PROB-VALUE What values go in this equation? What do we do with the resulting z-value? What is the PROB-VALUE

56 Sample Problem #4 What does a PROB-VALUE of 0.046 indicate about our statement?

57 Statistical Significance vs. Practical Significance What are all these tests really telling us? They tell us about the presence of difference (, =), which can be really scientifically uninteresting Two approaches for managing this situation Test only important hypotheses Use confidence intervals rather than hypothesis tests


Download ppt "Inferential Statistics Part 2: Hypothesis Testing Chapter 9 p. 280 - 306."

Similar presentations


Ads by Google