Presentation is loading. Please wait.

Presentation is loading. Please wait.

SUMMARY. Central limit theorem Statistical inference If we can’t conduct a census, we collect data from the sample of a population. Goal: make conclusions.

Similar presentations


Presentation on theme: "SUMMARY. Central limit theorem Statistical inference If we can’t conduct a census, we collect data from the sample of a population. Goal: make conclusions."— Presentation transcript:

1 SUMMARY

2 Central limit theorem

3 Statistical inference If we can’t conduct a census, we collect data from the sample of a population. Goal: make conclusions about that population

4 Confidence interval Point estimate Interval estimate margin of error možná odchylka critical value kritická hodnota interval spolehlivosti hladina spolehlivosti

5

6 HYPOTHESIS TESTING

7 Aim of hypothesis testing decision making

8 Engagement self-assessment Hopefully, you like this course so far. How to measure this? At the scale between 1 and 10, self-report how engaged you think you’re during the lecture (1 is the lowest value, 10 is the highest value).

9 Engagement distribution 1 2 3 4 5 6 7 8 9 10

10 Engagement distribution 1 2 3 4 5 6 7 8 9 10

11 Hypothesis testing song

12 Null hypothesis no song song Null hypothesis: I assume that populations without and with song are same.

13 Sampling from population 7.8

14 Sampling from population 7.8 Mean engagement of randomly chosen 30 students from student population can lie anywhere on the blue curve.

15 Hypothesis testing song 8.2 7.8

16 Did the song help? If we assume that the null hypothesis is true, what is the probability that we randomly chose such 30 students that their mean engagement is at least 8.2? Use Z-tables. 8.2 7.8

17 Conclusion If we assume that two populations are same (i.e., we assume that the song does not influence the engagement, null hypothesis is true), than the observed sample mean of 8.2 is only due to the fact that we chose 30 students that are substantially more engaged than it would be expected (the expectation value of the population, i.e. population mean, is 7.8). However, it is very unlikely that we select such 30 students just by chance. The probability of selecting such 30 students only by chance is really very low (0.22%).

18 Conclusion But, we observed the mean of 8.2 in our sample! How is it possible, if the probability of observing this result is only 0.22%? The only possible explanation is that our assumption about the identity of two populations is not correct. We should reject the null hypothesis. Conclusion: Because of such a low probability, we interpret 8.2 as a significant increase over 7.8 caused by undeniable pedagogical qualities of the 'Hypothesis testing song'. HURRAY, THE SONG WORKS

19 Clear-cut decision We decided that the song works because the probability of 0.22% is so low that it is very unlikely that we observed the mean of 8.2 just by chance. However, what if the probability would be 1%? Is it likely or unlikely? Or 3%? Or 5%? Or 8%? Or 15%? Where is the line between likely and unlikely? If you have such a line, you can make a clear-cut decision. If the probability is lower than this line, null hypothesis is rejected. If the probability is higher than this line, we failed to reject the null hypothesis.

20 Lingo Probability by which we decide if the result is likely or unlikely is called the p-value. In our example, the p-value is 0.0022. A line between likely and unlikely is called the α level (hladina významnosti). If the p-value is less than the α level then such result is considered to be unlikely.

21 α levels α level (or significance level, hladina významnosti) is our criteria for deciding if something is likely or unlikely. If the probability of getting sample mean is less than 0.05 – 0.01 – 0.001 then it is considered unlikely. 0.05 (5%) 0.01 (1%) 0.001 (0.1%)

22 Hypothesis testing

23 Null hypothesis H 0 states that nothing happened, there is no change, no difference. Does the song improve students' engagement? H 0 : students' engagement is the same regardless the song. Does a diet coke taste differently than a full-fat coke? H 0 : there is no difference in the taste between diet and full-fat cokes. Does this drug increases the blood pressure? H 0 : this drug causes no increase in the blood pressure. Is this a fair coin? H 0 : the coin is fair.

24 Alternative hypothesis The alternative hypothesis states the opposite to the null and is usually the hypothesis you are trying to prove. Does the song improve students' engagement? H a : song improved students' engagement. Does a diet coke tastes differently than a full-fat coke? H a : a diet coke tastes much worse/better than a full-fat coke. Does this drug increases the blood pressure? H a : this drug leads to an increase in the blood pressure. Is this a fair coin? H a : the coin is unfair.

25 Hypothesis testing Formulate the null hypothesis. There is no statistical testing without the null. You assume that H 0 is true. Then, you collect data. If you find enough evidence in your data for rejecting H 0, you do so and accept H a. Otherwise, you failed to reject H 0. If H 0 is rejected at the given significance level (i.e., H a is accepted), the result is called statistically significant.

26 Z-critical value If the Z-score of the sample mean is greater than the Z- critical value we have an evidence that this mean is different from the regular population (the population that had not watched the musical lesson). If the probability of obtaining a particular sample mean is less than alpha level then it will fall in this tail which is called the critical region. Z* Z-critical value

27 Critical regions

28 Test statistic

29 Quiz Sampling distribution Z*

30 Another engagement score

31

32 Two-tailed test (oboustranný test) Z=?? mean engagement score of 7.13 (Z = -4.83) is significant at p < 0.05

33 One-tailed Z-critical value 0.05±1.65 0.01±2.32 0.001±3.08 Two-tailed Z-critical value 0.05±1.96 0.01±2.57 0.001±3.27 In the critical region, we (most likely) did not get sample mean by chance. The critical region can also be on the left.

34 One-tailed and two-tailed one-tailed (directional) test two-tailed (non-directional) test

35 One-tailed or two-tailed In general, we use two-tailed tests. One exception to this general rule is when we’re comparing a new treatment with an established treatment. In such cases we often only care if the new treatment is better than the old one. And we would use a one-tailed directional test.

36 Alternative hypothesis two-tailed test one-tailed test

37 Quiz – reject the null What does it mean to reject the null using two-tailed test? Our sample mean falls within/outside the critical region. The Z-score of our sample mean is less than/greater than the Z- critical value. The probability of obtaining the sample mean is less than/greater than the alpha level.

38 Four steps of hypothesis testing 1. Formulate the null and the alternative (this includes one- or two-directional test) hypothesis. 2. Select the significance level α – a criterion upon which we decide that the claim being tested is true or not. --- COLLECT DATA --- 3. Compute the p-value. The p-value is the probability that the data would be at least as extreme as those observed, if the null hypothesis were true. 4. Compare the p-value to the α-level. If p ≤ α, the observed effect is statistically significant, the null is rejected, and the alternative hypothesis is valid.


Download ppt "SUMMARY. Central limit theorem Statistical inference If we can’t conduct a census, we collect data from the sample of a population. Goal: make conclusions."

Similar presentations


Ads by Google