1 Lecture 6 Hypothesis Testing II: Proportions and 2 Populations Graduate School Quantitative Research Methods Gwilym Pryce

Slides:



Advertisements
Similar presentations
Tests of Hypotheses Based on a Single Sample
Advertisements

1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 9 Hypothesis Testing Developing Null and Alternative Hypotheses Developing Null and.
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
PSY 307 – Statistics for the Behavioral Sciences
Cal State Northridge  320 Ainsworth Sampling Distributions and Hypothesis Testing.
Statistics Are Fun! Analysis of Variance
1/55 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 10 Hypothesis Testing.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 10: Hypothesis Tests for Two Means: Related & Independent Samples.
Independent t-Test CJ 526 Statistical Analysis in Criminal Justice.
IENG 486 Statistical Quality & Process Control
Inferences About Process Quality
Chapter 8 Introduction to Hypothesis Testing
Today Concepts underlying inferential statistics
Independent Sample T-test Classical design used in psychology/medicine N subjects are randomly assigned to two groups (Control * Treatment). After treatment,
5-3 Inference on the Means of Two Populations, Variances Unknown
Major Points Formal Tests of Mean Differences Review of Concepts: Means, Standard Deviations, Standard Errors, Type I errors New Concepts: One and Two.
“There are three types of lies: Lies, Damn Lies and Statistics” - Mark Twain.
Chapter 9: Introduction to the t statistic
Chapter 9 Hypothesis Testing II. Chapter Outline  Introduction  Hypothesis Testing with Sample Means (Large Samples)  Hypothesis Testing with Sample.
Chapter 5For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 Suppose we wish to know whether children who grow up in homes without access to.
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
1 Lecture 2 Calculating z Scores Quantitative Methods Module I Gwilym Pryce
1 Faculty of Social Sciences Induction Block: Maths & Statistics Lecture 5: Generalisability of Social Research and the Role of Inference Dr Gwilym Pryce.
AM Recitation 2/10/11.
Estimation and Hypothesis Testing Faculty of Information Technology King Mongkut’s University of Technology North Bangkok 1.
Hypothesis Testing:.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
Confidence Intervals and Hypothesis Testing - II
Fundamentals of Hypothesis Testing: One-Sample Tests
Hypothesis testing – mean differences between populations
Confidence Intervals and Hypothesis Testing
Comparing Means From Two Sets of Data
Chapter 9.3 (323) A Test of the Mean of a Normal Distribution: Population Variance Unknown Given a random sample of n observations from a normal population.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Chapter 9 Hypothesis Testing and Estimation for Two Population Parameters.
Lecture 7 Introduction to Hypothesis Testing. Lecture Goals After completing this lecture, you should be able to: Formulate null and alternative hypotheses.
1 Lecture 3: Introduction to Confidence Intervals Social Science Statistics I Gwilym Pryce
January 31 and February 3,  Some formulae are presented in this lecture to provide the general mathematical background to the topic or to demonstrate.
1 Lecture 4 Confidence Intervals for All Occasions SSS I Gwilym Pryce
1 Chapter 9 Hypothesis Testing. 2 Chapter Outline  Developing Null and Alternative Hypothesis  Type I and Type II Errors  Population Mean: Known 
DIRECTIONAL HYPOTHESIS The 1-tailed test: –Instead of dividing alpha by 2, you are looking for unlikely outcomes on only 1 side of the distribution –No.
Independent t-Test CJ 526 Statistical Analysis in Criminal Justice.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall 9-1 σ σ.
1 ANALYSIS OF VARIANCE (ANOVA) Heibatollah Baghi, and Mastee Badii.
Introduction to Statistical Inference Jianan Hui 10/22/2014.
Ex St 801 Statistical Methods Inference about a Single Population Mean.
Testing Differences between Means, continued Statistics for Political Science Levin and Fox Chapter Seven.
Welcome to MM570 Psychological Statistics
Chapter 10 The t Test for Two Independent Samples
Inen 460 Lecture 2. Estimation (ch. 6,7) and Hypothesis Testing (ch.8) Two Important Aspects of Statistical Inference Point Estimation – Estimate an unknown.
© Copyright McGraw-Hill 2004
1 Lecture 5 Introduction to Hypothesis Tests Slides available from Statistics & SPSS page of Social Science Statistics Module.
Inference for distributions: - Comparing two means.
Hypothesis Tests. An Hypothesis is a guess about a situation that can be tested, and the test outcome can be either true or false. –The Null Hypothesis.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Chapter 9 Introduction to the t Statistic
Statistical Significance
Two-Sample Hypothesis Testing
Chapter 9 Hypothesis Testing.
Chapter 9 Hypothesis Testing.
Hypothesis Testing.
Lecture 5 Introduction to Hypothesis tests
What are their purposes? What kinds?
Objectives (Section 7.2) Two sample problems:
Presentation transcript:

1 Lecture 6 Hypothesis Testing II: Proportions and 2 Populations Graduate School Quantitative Research Methods Gwilym Pryce

2 Notices: n Register

3 Aims & Objectives n Aim the aim of this lecture is to continue with our introduction of the method of hypothesis testing and to demonstrate a number of applications n Objectives –by the end of this lecture students should be able to carry out hypothesis tests on: two population means one population proportion two population proportions

4 Plan: n 1. Review of Significance n 2. Review of one sample tests on the mean n 3. Hypothesis tests about Two population means Homogenous variances Heterogeneous variances n 4. Deciding on whether variances are equal n 5. Hypothesis tests about proportions –One population –Two populations

5 Macro commands:

6 1.Review of Significance n P = significance level = chances of our observed sample mean occurring given that our assumption about the population (denoted by “H 0 ”) is true. n So if we find that this probability is small, it might lead us to question our assumption about the population mean. –I.e. if our sample mean is a long way from our assumed population mean then it is: either a freak sample or our assumption about the population mean is wrong.

7 If we draw the conclusion that it is our assumption re  that is wrong and reject H 0 then we have to bear in mind that there is a chance that H 0 was in fact true. –In other words: when P = 0.05, for every twenty times we reject H 0, then on one of those occasions we would have rejected H 0 when it was in fact true.

8 2. Review of one sample tests on the mean n We introduced a common framework for hypothesis testing: 4 Steps of Hypothesis testing: Step (1) state H 0 and H 1 Step (2) state  and formula Step (3) state decision rule Step (4) compute P & decide

9 We also looked at 2 specific tests: n Large sample sig. Test on one mean: Formula: Macro syntax: H_L1M n=(?) x_bar=(?) m=(?) s=(?). n Small sample sig. Test on one mean: Formula: Macro syntax: H_S1M n=(?) x_bar=(?) m=(?) s=(?).

10 3. Hypothesis tests about two population means n In SPSS: this is called the “Independent Sample t-test” go to Analyse, Compare Means... n Two different formulas for computing t: Equal Variances (formula has an exact t-distribution) Unequal Variances (does not have an exact t-distribution)

11 Example where variances are different: As part of your PhD, you want to test whether the new “Fun Phonics” reading method is better than the “Letterland” method. You examine the reading power of 6 year old children from two similar schools. –The first used the FP method and you found that this produced an average reading proficiency score of 53.7 (based on a sample of 22 children; s.d. = 11.5). –The second school used the Letterland method and you found that this produced an average reading proficiency score of (sample = 24; s.d. = 16.9). Test whether the FP method produces higher results at the 1% significance level.

12 n Use the 4 steps and the following formula to test whether the FP method produces higher results at the 1% significance level. n Can you use the canned SPSS procedure to do this problem? 4 Steps of Hypothesis testing: Step (1) state H 0 and H 1 Step (2) state  and formula Step (3) state decision rule Step (4) compute P & decide

13 (1) H 0 :  FP =  L (means are equal) H 1 :  FP >  L (upper tail test) (2)  = 0.01 (implies critical t value of 2.528), (3) Reject H 0 iff P <  I.e. if P < 0.01 n (4) P = Prob(t > 2.644) = , so reject H 0

14 Doing the calculation in SPSS: n You cannot use the canned SPSS procedure unless you have the original data. n But you can use the following macro commands: –Homogenous variances: H_S2Mp n1=(?) n2=(?) x_bar1=(?) x_bar2=(?) s1=(?) s2=(?). –Heterogeneous variances: H_S2Md n1=(?) n2=(?) x_bar1=(?) x_bar2=(?) s1=(?) s2=(?).

15 For the Letterland/FP example we would use the diff. Variances syntax: n H_S2Md n1=(22) n2=(24) x_bar1=(53.7) x_bar2=(42.51) s1=(11.5) s2=(16.9). n The upper tail sig. = I.e. less than 1% chance of false rejection, therefore reject H 0 of equal means in favour of the alternative hypothesis that Fun Phonics results in higher reading scores on average than Letterland.

16 4. How do we decide on whether the variances are similar? n Where variances are hugely different or exactly the same, the decision is simple. n When there is any ambiguity, we can use one of two tests to help us: Simple Ratio of Variances Test Levene’s Test

17 Simple Ratio of Variances test: n If we divide the ratio of variances of samples from two independent populations we find that that ratio has an F distribution in repeated samples: n where the denominator degrees of freedom calculated as n 1 –1 and the numerator degrees of freedom calculated as n 2 –1. NB Because the critical values for the F distribution are only calculated for the upper tail, if the F value you are have calculated is less than one, you need to invert it (i.e. swap round the numerator and denominator). F = s 1 2 / s 2 2

18 n This is the formula behind the following command: H8_S2VF n1=(?) n2=(?) s1(?) s2(?) n E.g. For the Letterland/FP example: H8_S2VF n1=(22) n2=(24) s1=(11.5) s2=(16.9). n Which tells us that there is less than a 5% chance of false rejection if we reject the null of equal variances. So reject the null I.e. we can be sure that the population variances are indeed different.

19 The Levene’s test n If we have the original data we can use Levene’s test which is a canned routine in SPSS. n The Levene’s test is more sophisticated & robust than the simple ratio of variances test: –If P (I.e. “sig.”) from the Levene’s test is small reject the H 0 of equal variances & use the 1st t- formula. – If P from the Levene’s test is large, accept H 1 & use the 2nd t-formula to compute the test statistic.

20 SPSS Output from test equal purchase prices between Cumberland and Durham (Nationwide data):

21 Two tails from one: n Along with the Levene’s test results, SPSS automatically supplies t-test results for both the equal and different variances formulas. n One problem with the SPSS t-test, however, is that it only gives the 2 tail sig., but you can work out the one tail sigs as follows: The two tailed significance is twice that of the smallest one tailed significance: 2 tailed sig. = 2  min[lower tail sig., upper tail sig.] But it can be a bit confusing working out which one tail significance level is the one you want (see notes).

22 Testing for 2 means summary: n If you’ve got the original data, First do the Levene’s test in SPSS Analyze, Compare Means, Independent Samples Then do the appropriate macro t-test to avoid confusion. H_S2Mp for equal variances or H_S2Md for different variances n If you don’t have the original data, First do the ratio of variances test H8_S2VF Then do the appropriate macro t-test H_S2Mp for equal variances or H_S2Md for different variances

Hypothesis tests on proportions One population (large samples only) n So far looked at: –how to make inferences about the population mean from our sample mean. n But sometimes the variable of interest is categorical household has or has not insurance; person is homosexual or not homosexual; a person has Aids or does not have Aids

24 n In such cases, what we are interested in is the proportion of cases that fall into a particular category: the proportion of households with insurance; the proportion of people who are homosexual; the proportion of people with Aids

25 n Calculating the sample proportion: p = x / n –where: x = cases with the attribute of interest e.g. the number of households with insurance n = sample size

26 CLT and Proportions: n Q/ Does the Central Limit Theorem apply to sample proportions? n A/ Yes. –Proportions from repeated random samples will be normally distributed around the population proportion . –We can then translate any sample proportion onto the standard normal curve by calculating its z score:

27 Example: –E.g. 1 As a historian, you want to find the proportion of citizens in medieval Scotland that contracted the plague. From a sample of 400 parish records, you find that 22 died of the plague. The assumption in the literature has been that 10% of the population had died. Test whether this assumption is valid using both 2 and 1 tailed tests.

28 Summary of data: n = 400 x = 22   = 0.1 –(1) H 0 :  = 10% H 1 :   10% (2-tailed test) –(2)  = 0.02, for example.

29 –(3) Reject H 0 iff P <  I.e. if P < 0.02 (this will happen if z c 2.33, where 2.33 is the z value associated with  = Since z c = , we know we can safely reject H 0 ). –(4) Calculate z: P = 2x(Prob(z < -3.00)) = 2x = since P < 0.02 (I.e. less than one in 50 chance of type I error) we can reject H 0. In fact, the chances of incorrect rejection of H 0 are less than one in 3,000. I.e. the chances of observing p (our sample proportion) assuming H 0 (  = 10%) to be true are so small that we are forced to question this assumption about  n = 400 x = 22   = 0.1

30 One tailed test: –(1) H 0 :  = 10% H 1 :  < 10% (lower tail test) –(2)  = 0.02 –(3) Reject H 0 iff P <  I.e. if P < 0.02 –(4) Lower tail sig. = P = Prob(z < -3.00) = since P < 0.02 we can reject H 0 knowing that the chances of incorrect rejection of H 0 are less than one in 740 »our cut-off rule for rejecting H 0 was no more than a one in 50 chance »one in 740 is a lot less than one in 50 so we can reject H 0 with confidence.

31 n The macro syntax for one proportion tests is as follows: n H6_L1P n=(400) x=(22) pi=(0.1). Which comes to the same result.

Hypothesis tests about Two population proportions n To test the hypothesis that the population proportions are equal: H 0 :  1 =  2 compute the z statistic: where: SE Dp is the pooled standard error = and

33 Example: n Two surveys of mortgage payment protection insurance (MPPI) are carried out, one on single parents with 1 child and one on single parents with 3 children. Amongst the first group, 67 out of a sample of 300 were found to have taken out MPPI, compared with 15 out of a sample of 101 in the second group. Is take-up significantly lower amongst the HHs with three children? p 1 = 67/300 = ; p 2 = 15/101 = ; p = ( )/(67+15) = ;

34 –(1) H 0 :  1 =  2 H 1:  1 >  2 –(2)  = 0.01 (z* =  2.33) –(3) Reject H 0 : if P < 0.01 –(4) P = Take-up is not significantly lower amongst HHs with 3 children at the1% sig. level; or even at 5% significance level. I.e. we cannot say that the difference in proportions is anything more than the effect of sampling variation.

35 n H7_L2P n1=(300) n2=(101) x1=(67) x2=(15).

36 Plan: n 1. Review of Significance n 2. Review of one sample tests on the mean n 3. Hypothesis tests about Two population means Homogenous variances Heterogeneous variances n 4. Deciding on whether variances are equal n 5. Hypothesis tests about proportions –One population –Two populations