Introduction to Data Analysis. Hypothesis Testing for means and proportions.

Slides:



Advertisements
Similar presentations
Testing Hypotheses About Proportions Chapter 20. Hypotheses Hypotheses are working models that we adopt temporarily. Our starting hypothesis is called.
Advertisements

Chapter 10: Hypothesis Testing
Chapter 8 Hypothesis Testing I. Significant Differences  Hypothesis testing is designed to detect significant differences: differences that did not occur.
1/55 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 10 Hypothesis Testing.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 8-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Basic Business Statistics.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 9 Hypothesis Testing: Single.
8-2 Basics of Hypothesis Testing
Ch. 9 Fundamental of Hypothesis Testing
Using Statistics in Research Psych 231: Research Methods in Psychology.
Inference about Population Parameters: Hypothesis Testing
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 11 Introduction to Hypothesis Testing.
Chapter Ten Introduction to Hypothesis Testing. Copyright © Houghton Mifflin Company. All rights reserved.Chapter New Statistical Notation The.
Statistics 11 Hypothesis Testing Discover the relationships that exist between events/things Accomplished by: Asking questions Getting answers In accord.
Hypothesis Testing:.
Overview of Statistical Hypothesis Testing: The z-Test
Chapter 10 Hypothesis Testing
Confidence Intervals and Hypothesis Testing - II
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 9. Hypothesis Testing I: The Six Steps of Statistical Inference.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 9 Introduction to Hypothesis Testing.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Business Statistics,
Statistical inference: confidence intervals and hypothesis testing.
Introduction to Biostatistics and Bioinformatics
Fundamentals of Hypothesis Testing: One-Sample Tests
Testing Hypotheses Tuesday, October 28. Objectives: Understand the logic of hypothesis testing and following related concepts Sidedness of a test (left-,
Introduction to Statistical Inferences Inference means making a statement about a population based on an analysis of a random sample taken from the population.
Lecture 3: Review Review of Point and Interval Estimators
1 Power and Sample Size in Testing One Mean. 2 Type I & Type II Error Type I Error: reject the null hypothesis when it is true. The probability of a Type.
Hypothesis Testing: One Sample Cases. Outline: – The logic of hypothesis testing – The Five-Step Model – Hypothesis testing for single sample means (z.
Chapter 10 Hypothesis Testing
Chapter 8 Introduction to Hypothesis Testing
Lecture 7 Introduction to Hypothesis Testing. Lecture Goals After completing this lecture, you should be able to: Formulate null and alternative hypotheses.
Copyright ©2011 Nelson Education Limited Large-Sample Tests of Hypotheses CHAPTER 9.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
LECTURE 19 THURSDAY, 14 April STA 291 Spring
Psy B07 Chapter 4Slide 1 SAMPLING DISTRIBUTIONS AND HYPOTHESIS TESTING.
Copyright © 2009 Pearson Education, Inc. Chapter 21 More About Tests.
Essential Statistics Chapter 131 Introduction to Inference.
Introduction to Probability and Statistics Thirteenth Edition Chapter 9 Large-Sample Tests of Hypotheses.
Introduction to Data Analysis Confidence Intervals.
1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.
Chapter 20 Testing hypotheses about proportions
Lecture 16 Dustin Lueker.  Charlie claims that the average commute of his coworkers is 15 miles. Stu believes it is greater than that so he decides to.
1 Psych 5500/6500 The t Test for a Single Group Mean (Part 1): Two-tail Tests & Confidence Intervals Fall, 2008.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Large sample CI for μ Small sample CI for μ Large sample CI for p
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests Statistics.
Chap 8-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 8 Introduction to Hypothesis.
Introduction to the Practice of Statistics Fifth Edition Chapter 6: Introduction to Inference Copyright © 2005 by W. H. Freeman and Company David S. Moore.
Lecture 9 Chap 9-1 Chapter 2b Fundamentals of Hypothesis Testing: One-Sample Tests.
Economics 173 Business Statistics Lecture 4 Fall, 2001 Professor J. Petry
1 Chapter 8 Introduction to Hypothesis Testing. 2 Name of the game… Hypothesis testing Statistical method that uses sample data to evaluate a hypothesis.
Lecture 18 Dustin Lueker.  A way of statistically testing a hypothesis by comparing the data to values predicted by the hypothesis ◦ Data that fall far.
Chapter 21: More About Test & Intervals
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
Lecture 17 Dustin Lueker.  A way of statistically testing a hypothesis by comparing the data to values predicted by the hypothesis ◦ Data that fall far.
3-1 MGMG 522 : Session #3 Hypothesis Testing (Ch. 5)
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.
Stats Lunch: Day 3 The Basis of Hypothesis Testing w/ Parametric Statistics.
Slide 21-1 Copyright © 2004 Pearson Education, Inc.
Welcome to MM570 Psychological Statistics
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Business Statistics,
Introduction Suppose that a pharmaceutical company is concerned that the mean potency  of an antibiotic meet the minimum government potency standards.
Chapter 8: Introduction to Hypothesis Testing. Hypothesis Testing A hypothesis test is a statistical method that uses sample data to evaluate a hypothesis.
Uncertainty and confidence Although the sample mean,, is a unique number for any particular sample, if you pick a different sample you will probably get.
BIOL 582 Lecture Set 2 Inferential Statistics, Hypotheses, and Resampling.
Hypothesis Testing and Statistical Significance
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
STA 291 Spring 2008 Lecture 17 Dustin Lueker.
Presentation transcript:

Introduction to Data Analysis. Hypothesis Testing for means and proportions.

2 Today’s lecture Hypothesis testing (A&F 6) What’s a hypothesis? Probabilities of hypotheses being correct. Type I and type II errors.

3 What’s a hypothesis? Hypotheses = testable statements about the world. Hypotheses = falsifiable. We test hypotheses by attempting to see if they could be false, rather than ‘proving’ them to be true. All swans are white…? Popper said you cannot prove that all swans are white by counting white swans, but you can prove that not all swans are white by counting one black swan. We normally generate hypotheses from a combination of theory, past empirical work, common sense and anecdotal observations about the world; e.g. young people are more leftwing than their elders. Observations abut my friends, work showing this relationship in other countries, theory suggesting that social ageing makes people more rightwing.

4 Some hypotheses (or not) My girlfriend spends too much on make-up. NO. It’s a normative claim about ‘too much’. My girlfriend spends more than me on make-up. NO. Falsifiable, but not really social science. Women spend more than men on make-up. HOORAY, a social scientific hypothesis. It’s falsifiable, and tells us something about the world. Make-up is used to marginalize the Other as a form of contemporary patriarchialist nihilism. NO. This is too vague to be falsifiable.

5 Hypothesis testing Two different types of hypothesis. Descriptive inference (e.g. old people are more religious than young people). Causal inference (e.g. old people are more religious than young people because they think about death more). We test statistical hypotheses using significance testing. This is a way of statistically testing a hypothesis by comparing the data we have to values predicted by the hypothesis.

6 Null and alternative hypotheses When we’re testing hypotheses, we want to choose between two conflicting statements. Null hypothesis (H 0 ) is directly tested. This is a statement that the parameter we are interested in has a value similar to no effect. e.g. regarding religiosity, old people are the same as young people. Alternative hypothesis (H a ) contradicts the null hypothesis. This is a statement that the parameter falls into a different set of values than those predicted by H 0. e.g. regarding religiosity, old people are different to young people.

7 A (simple) hypothesis We’re interested in whether new measures to curb MEPs fraudulently claiming for flights are effective. Previously, the population of MEPs managed to claim 16 flights each on expenses (per month). After the new measures are introduced, we sample 100 MEPs and find that they are now managing to only charge 13½ flights a month each to expenses. The standard deviation for this sample is 10 flights. Are EU (well British and German anyway) taxpayers really getting better value for money though?

8 Aviatophobia or less fraud? So in this case the null hypothesis is that there has been no change. H 0 is that MEPs are claiming the same amount per month as they were before, and the difference is just because our sample happens to include a lot of aviatophobic MEPs (or something like that). The alternative hypothesis is that MEPs spending has been curbed. H a is that MEPs are claiming less than they were previously.

9 More formally Slightly more formally: The population we are interested is MEPs after the changes. We want to know whether the population mean ‘number of flights claimed per month’ is different from 16. H 0 is that this population mean is equal to 16. H a is that this population mean is less than 16. The info we have is from one sample mean. Now we imagine that H 0 is true…

10 If H 0 is true…(1) The population mean will be equal to 16. Since n is large(ish) the sampling distribution will be normal and centred around 16. We can calculate the standard error of the sampling distribution.

11 If H 0 is true…? (2) H 0 mean = 16 Sample mean = ½ % of the distribution

12 P-value (1) So we know that H 0 looks pretty unlikely (certainly less than a 2½ per cent chance), but we can actually give a more precise probability. We work out how many standard errors the sample mean is away from H 0 to produce a z-score, and then calculate a p-value from this with reference to the normal probability distribution.

13 P-value (2) If H 0 were correct it looks unlikely that we would get a sample that had a mean of 13½. In fact there is just a (or 0.6%) chance that our sample could have come from a population postulated by the null hypothesis. We could set an (arbitrary) ‘significance level’ that our test must meet. Maybe we need to be 99% confident that we can reject the null hypothesis (i.e. the p-value is less than 1%). Best practice when we perform significance tests is simply to report the p-value, and to make the judgement that p-values of, say, 5% and below are probably good evidence the null hypothesis can be rejected.

14 Steps for Hypothesis test Check assumptions (i.e. normality, sample size, level of measurement) State hypotheses—Null and Alternative Calculate appropriate test statistic (e.g z-score) Calculate associated p-value Interpret the result

15 Interpreting hypothesis test We NEVER accept the null hypothesis. We either reject or fail to reject based on our p-value. May fail to reject null hypothesis due to:  small sample size  inappropriate research design  biased sample, etc..

16 Type I and type II errors A type I error occurs when we reject H 0, even though it is true. This is going to happen 5% of the time if we choose to reject H 0 when the p-value is less than A type II error occurs when we do not reject H 0, even though it is false. If our ‘significance level’ is 0.05, then sometimes there will be a real difference and we won’t detect it. The more stringent the significance level the more difficult to detect a real effect is, but the more confident we can be that when we find an effect it is real.

17 Making errors… There is a trade-off between the two types of error. Depending on what we’re doing we may be more willing to accept one sort or the other. Think of this as analogous to a legal trial, we don’t want the guilty to go free (a type II error), but we’d be even unhappier if we execute an innocent person (a type I error). In this case, we might want the significance level to be very low to minimize executing innocent people (but at the same time allowing lots of the guilty to go free)

18 Differences between 2 sample means More normally we’ve sampled two groups and wish to see if they differ. Let’s go back to our religion example. We might be interested in whether women’s church- going differs from men. We have 2 samples, 45 men and 55 women. The men have a mean attendance of 7 days a year, with standard deviation of 15. The women have a mean attendance of 10 days a year with standard deviation of 15.

19 Are women more or less religious than men? Essentially, to answer this we estimate the difference between the populations (the parameter) using the difference between the sample means (the statistic). We can run a significance test on this statistic, and work out whether our samples are likely to represent real differences between the populations of men and women. The null hypothesis is therefore that there is no difference between men’s mean churchgoing and women’s mean churchgoing.

20 Z-score So we work out the z-score as before.

21 P-value The standard error of the estimator is ~3 and the Z- score is ~1 then. We have no prior ideas of whether we think women are more or less religious than men, so we just want to test the possibility that they are not the same. i.e. we want to know how likely it would be to get an individual estimate of the difference between the sample means that is either 1 SE greater than the null hypothesis (i.e. zero) or 1 SE less than the null hypothesis.

22 Two-sided tests (1) H 0 mean = 0. SE = ~3. Difference between sample means = 3 i.e. Sample Women is 3 greater than sample Men 68% of the distribution

23 Two-sided tests (2) The probability of a difference between the two sample means being 3 or greater is The probability of a difference between the two sample means being 3 or less is So the p-value for a 2-sided test is 2*(0.16) or This value is high (much higher than our 5% cut off value), so we fail to reject H 0 that men and women do not differ in their church attendance.

24 Two-sided tests (3) Regardless of our theoretical expectations, the convention is to use two-tailed tests. Why? In essence making it even more difficult to find results just due to chance. We normally don’t have very strong prior information about the difference. One tailed tests are often the hallmark of someone trying to make something out of nothing. What does it mean to use a one-tailed test??  Not necessarily bad—in fact arguably MUCH smarter.

25 Significance tests and CIs (1) Notice that our significance test has ended up looking rather similar to the CIs. We could use a CI around the difference between the two sample means to test the hypothesis that they are the same. A 95% CI would just be 1.96*SE. We’ve just worked out the SE (it’s approximately 3).

26 Significance tests and CIs (2) The 95% CI encloses zero (which was our null hypothesis, that women are the same as men). CIs and significance tests are doing the same job, just presenting the information in a slightly different way.

27 Proportions and significance tests This means of course that all I said about proportions and CIs, applies to proportions and significance tests. A hypothesis related to a previous example is that one of the candidates does have a lead. Therefore the null hypothesis is that both candidates have a vote share of 50%.

28 Proportions example – the return Sample is 1000, proportion voting Democrat is 0.45 (or 45%). Null hypothesis is that the population proportion is 0.5. Thus, according to my sample, it is very likely that the Presidential race is not a dead-heat. Note: The SE is calculated with the null hypothesis proportion, not with the sample proportion. Remember, we are testing the null hypothesis, so the mean is the null hypothesis mean.

29 Summary Because we know the shape of the sampling distribution, we can work out: Ranges around an individual sample mean that will enclose the population mean X per cent of the time. The probability that a hypothesis about the population mean is true, given a particular sample mean. The probability that population means for different groups are different, given two sample means. All of the above for proportions. Although there are some small complications...

30 Things I haven’t mentioned The reality of statistical hypothesis testing is slightly more complicated than this in some ways particularly: If your sample sizes are small. If the sample < 30 or < 40 then we need to use a different distribution called the t distribution. To use this we need to assume that the population distribution we are interested in is normal. As sample size increases, the t-distribution looks more and more similar to the z-distribution. This is why significance tests (even for big sample sizes) are often called t-tests. When looking at proportions we often use slightly different tests as well to accommodate the fact that a proportion cannot be greater than 1, or less than 0.