DTC Quantitative Research Methods Statistical Inference II: Statistical Testing Thursday 7th November 2014  

Slides:



Advertisements
Similar presentations
Introductory Mathematics & Statistics for Business
Advertisements

Inferential Statistics
1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Hypothesis Testing Steps in Hypothesis Testing:
Statistical Issues in Research Planning and Evaluation
Significance Testing Chapter 13 Victor Katch Kinesiology.
1/55 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 10 Hypothesis Testing.
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Basic Business Statistics.
T-Tests Lecture: Nov. 6, 2002.
Independent Sample T-test Often used with experimental designs N subjects are randomly assigned to two groups (Control * Treatment). After treatment, the.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 9 Hypothesis Testing: Single.
Ch. 9 Fundamental of Hypothesis Testing
Today Concepts underlying inferential statistics
Hypothesis Testing Using The One-Sample t-Test
Getting Started with Hypothesis Testing The Single Sample.
Chapter 9: Introduction to the t statistic
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
AM Recitation 2/10/11.
Statistics 11 Hypothesis Testing Discover the relationships that exist between events/things Accomplished by: Asking questions Getting answers In accord.
Hypothesis Testing:.
Chapter 10 Hypothesis Testing
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 9. Hypothesis Testing I: The Six Steps of Statistical Inference.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Business Statistics,
Statistical inference: confidence intervals and hypothesis testing.
Fundamentals of Hypothesis Testing: One-Sample Tests
Hypothesis testing – mean differences between populations
1 GE5 Lecture 6 rules of engagement no computer or no power → no lesson no SPSS → no lesson no homework done → no lesson.
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
Significance Tests …and their significance. Significance Tests Remember how a sampling distribution of means is created? Take a sample of size 500 from.
1 Level of Significance α is a predetermined value by convention usually 0.05 α = 0.05 corresponds to the 95% confidence level We are accepting the risk.
More About Significance Tests
Statistics Primer ORC Staff: Xin Xin (Cindy) Ryan Glaman Brett Kellerstedt 1.
1 Today Null and alternative hypotheses 1- and 2-tailed tests Regions of rejection Sampling distributions The Central Limit Theorem Standard errors z-tests.
Comparing Two Population Means
Comparing Means: t-tests Wednesday 22 February 2012/ Thursday 23 February 2012.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Copyright © 2012 by Nelson Education Limited. Chapter 7 Hypothesis Testing I: The One-Sample Case 7-1.
Chapter 8 Introduction to Hypothesis Testing
Introduction to Hypothesis Testing: One Population Value Chapter 8 Handout.
Inferential Statistics 2 Maarten Buis January 11, 2006.
Confidence intervals and hypothesis testing Petter Mostad
Statistical Inference Friday 17 th February 2012.
Statistical Inference Friday 15 th February 2013.
1 Chapter 9 Hypothesis Testing. 2 Chapter Outline  Developing Null and Alternative Hypothesis  Type I and Type II Errors  Population Mean: Known 
Chapter 9 Three Tests of Significance Winston Jackson and Norine Verberg Methods: Doing Social Research, 4e.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests Statistics.
Lecture 9 Chap 9-1 Chapter 2b Fundamentals of Hypothesis Testing: One-Sample Tests.
3-1 MGMG 522 : Session #3 Hypothesis Testing (Ch. 5)
Chapter 8 Parameter Estimates and Hypothesis Testing.
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.
Inferential Statistics. Coin Flip How many heads in a row would it take to convince you the coin is unfair? 1? 10?
Inferential Statistics Friday 13 th February 2009.
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 14 th February 2013.
Inferential Statistics Inferential statistics allow us to infer the characteristic(s) of a population from sample data Slightly different terms and symbols.
The Analysis of Variance ANOVA
1 Testing Statistical Hypothesis The One Sample t-Test Heibatollah Baghi, and Mastee Badii.
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
Hypothesis Tests. An Hypothesis is a guess about a situation that can be tested, and the test outcome can be either true or false. –The Null Hypothesis.
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 8 th Edition Chapter 9 Hypothesis Testing: Single.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Inferential Statistics Psych 231: Research Methods in Psychology.
Statistical principles: the normal distribution and methods of testing Or, “Explaining the arrangement of things”
Chapter 10: The t Test For Two Independent Samples.
Chapter 9 Introduction to the t Statistic
When the means of two groups are to be compared (where each group consists of subjects that are not related) then the excel two-sample t-test procedure.
Inference and Tests of Hypotheses
Hypothesis testing Imagine that we know that the mean income of university graduates is £16,500. We then do a survey of 64 sociology graduates and find.
Presentation transcript:

DTC Quantitative Research Methods Statistical Inference II: Statistical Testing Thursday 7th November 2014  

Hypothesis testing Imagine that we know that the mean income of university graduates is £16,500. We then do a survey of 64 sociology graduates and find that they earn a mean income of £15,400 with a standard deviation of £4,000. Can we say that this is convincing evidence that sociology graduate students earn less than other graduate students? The null hypothesis here is that sociology graduates earn the same as other graduates. This is a hypothesis of no difference. The alternative hypothesis is that there is a difference. The null hypothesis (or Ho) is usually of no difference. And the alternative hypothesis (or Ha) is usually of difference. When we carry out statistical tests, we attempt, as here, to reject the null hypothesis at a 95% level of confidence (or sometimes at a 99% or 99.9% level).

Statistical significance A conclusion (e.g. that a difference or relationship exists) is statistically significant if the probability that the conclusion would be drawn if it is, in fact, erroneous falls below the significance level chosen (in social science research this is often 5% = 0.05 = 1 in 20). The significance level is sometimes referred to as alpha (α).

Hypothesis testing So, thinking about the example again: Imagine that we know that the mean income of university graduates is £16,500. We then do a survey of 64 sociology graduates and find that they earn a mean income of £15,400 with a standard deviation of £4,000. Can we say that this is convincing evidence that sociology graduate students earn less than other graduate students? If we construct a 95% confidence interval for the population mean income of sociology graduates it will look like this: 15,400 plus or minus 1.96 x (4,000 / 64) 15,400 plus or minus 1.96 x (4,000 / 8) 15,400 plus or minus 980  £14,420 to £16,380 The top point of this range is still below the mean income for graduates generally – there is no overlap. This means that there is less than a 5% chance that a difference as big as £1,100 would have occurred if there is no difference between sociology graduates’ mean income and the mean income for all graduates.

p-values A p-value quantifies (more precisely) the statistical significance of a result. More precisely, it quantifies how likely a difference or relationship of equal or greater magnitude to that observed would be to have occurred if there is no difference/relationship in the population (i.e. if the null hypothesis is correct)

Back to the example… In the example, the standard error (i.e. the standard deviation of the sample mean) is equal to (4,000 / 64) = 500. Thus the sample mean is 1,100/500 = 2.2 standard errors away from the suggested population mean. Statistical theory tells us that 95% of sample means are within 1.96 standard errors of the population mean. And also tells us that 97.2% of sample means are within 2.2 standard errors of the population mean. Hence the p-value for the difference of 2.2 standard errors (which is a test statistic) is (100-97.2)/100 = 0.028 Since p < 0.05, it is statistically significant at the conventional 5% significance level.

Hypothesis testing Theory You test out particular hypotheses with reference to your sample statistics. However these hypotheses are about underlying population characteristics (parameters) Procedure Set up ‘null’ (and ‘alternative’) hypothesis Note sample size and design Establish sampling distribution under the assumption that the null hypothesis is true Identify decision rule (i.e. what constitutes acceptance/rejection of the null hypothesis) Compute sample statistic(s), and apply the decision rule (N.B. This is where Type I and Type II errors can occur).

Truth about population Error Types Decision (based on hypothesis test) Truth about population H0 true Ha true Reject H0 Type I error Correct decision Do not reject H0 Type II error Note: Reducing the chance of one type of error occurring increases the chance that the other type will!

(Statistical) Power Power is defined as the probability that a test will correctly reject the null hypothesis, i.e. correctly conclude that there is a difference, relationship, etc. The probability of a Type II error is sometimes labelled beta (β), hence power equals 1-β. The power of a test depends on the size of the effect (which is, of course, unknown!)

What is the point of power? Power also depends on the sample size and the significance level chosen. So if we want to use the usual 5% significance level (to obtain ‘95% confidence’ in our results) and we want to be able to identify an effect of a given size, we can calculate how likely, for a given sample size, we are to find an effect of that size, assuming such an effect exists. If the power of a test is low, there is little point in applying it, which suggests a need for a larger sample.

Never innocent… Rather deciding between ‘guilty’ and ‘innocent’, statistical tests decide between ‘guilty’ and ‘not proven’. In other words, a statistically insigificant or non-significant result (sometimes indicated by NS rather than, say p > 0.05) does not indicate that a difference or relationship does not exist, but simply that there is insufficient evidence to conclude that one does exist! This leaves open the possibility of a small difference or weak relationship, which the the statistical test was insufficiently powerful to identify…

Applying the logic of a statistical test… There are a large number of different statistical tests that use inferential methods to ask questions about different forms of differences/relationships: Is the sample mean sufficiently different from the suggested population mean that it is implausible that the suggested population mean is correct? Testing the plausibility of a suggested population mean (via a z-test). [This is what we’ve just done]. Are the means from two samples sufficiently different for it to be implausible that the populations from which they come are actually the same? Test via a two-sample t-test, or if comparing more than two (sub-) samples (i.e. more than two groups) testing for differences via Analysis of Variance (usually referred to as ANOVA). Are the observed frequencies in a cross-tabulation sufficiently different from what one would have expected to have seen if there were no relationship in the population for the idea that there is no relationship in the population to be implausible? Test this via a chi-square test. In each instance we are asking whether the difference between the actual (observed) data and what one would have expected to have seen, given some hypothesis Ho, is sufficiently large that the hypothesis is implausible. Thus we are always trying to disprove a (null) hypothesis.

(Two sample) t-tests Test the null hypothesis, which is: H0: 1 = 2 or H0: 1- 2 = 0 i.e. the equality of means The alternative hypothesis is: Ha: 1  2 or Ha: 1- 2  0

What does a t-test measure? Note: T = treatment group and C = control group. (The above depicts a comparison in experimental research; in most discussions the groups tend just to be labelled as groups 1 and 2, indicating different groups.)

Population of Australian children Population of British children Example We want to compare the average amounts of television watched by Australian and by British children. We have a sample of Australian and a sample of British children. We could say that what we have and want to do are something like this: Population of Australian children Want to compare Population of British children inference inference Sample of Australian children Sample of British children

t distribution critical values Example (continued) Here the dependent variable is number of hours of TV watched each night And the independent variable is nationality (or, perhaps, national context). When we are comparing means SPSS calls the independent variable the grouping variable and the dependent variable the test variable. For a more detailed view of statistics go all the way to Australia: SurfStat

Example (continued) If the null hypothesis, hypothesising no difference between the two groups, was correct (and children thus watch the same average amount of television in Australia as in Britain), we would assume that if we took repeated samples from the two groups the difference in means between them would generally be small or zero. However it is highly likely that the difference between any two particular samples will not be zero. Therefore we acquire a knowledge of the sampling distribution of the difference between the two sample means. We use this distribution to determine the probability of getting an observed difference (of a given size) between two sample means from populations with no difference.

If we take a large number of random samples and calculate the difference between each pair of sample means, we will end up with a sampling distribution that has the following properties: It will be a t-distribution, with The mean of the difference between sample means will be zero if the null hypothesis is correct. Mean (M1 – M2) = 0 The ‘average’ spread of scores around this mean of zero (the standard error) will be defined by the formula: This estimate ‘pools’ the variance in the groups – just take it at face value!

Back to the example… When we are choosing the test of significance it is important to note that: We are making an inference from TWO samples (of Australian and of British children). And these samples are independent (the number of hours of TV watched by British children doesn’t affect the number of hours watched by Australian children, and vice versa) Therefore we need an two-sample test (what SPSS calls an ‘independent samples’ t-test) The two samples are being compared in terms of an interval-ratio variable (hours of TV watched). Therefore the relevant descriptive statistic is the mean.  These facts lead us to select the two sample t-test for the equality of means as the relevant test of significance. Table 1. Descriptive statistics for the samples Descriptive statistic Australian sample British sample Mean 166 minutes 187 minutes Standard deviation 29 minutes 30 minutes Sample size 20

t-test of independent means: formulae Note: 1 + 1 = N1 + N2 N1 N2 N1 N2 Where: M = mean SDM = Standard error of the difference between means N = number of subjects in a group s = Sample standard deviation of a group df = degrees of freedom

What are ‘degrees of freedom’? Degrees of freedom can be thought of as the ‘sources of variation’ in a particular situation. If we are comparing groups of 20, then within each group there are 19 (independent) sources of difference between the values for that group. Thus for the two groups combined there are 19+19 = 38 degrees of freedom (d.f.)

Example: Calculating the t-value Descriptive statistic Australian sample British sample Mean 166 minutes 187 minutes Std. dev. 29 minutes 30 minutes Sample size 20 S DM = (20-1)292 + (20-1)302 20+20 = 9.3 20 + 20 – 2 20 x 20     tsample = 166 – 187 = – 2.3 9.3

Example: Obtaining a p-value for a t-value To obtain the p-value for this t-value (score) we could consult a table of critical values for the t-distribution. Such a table may not have a row of probabilities for 38 degrees of freedom (d.f.) In that case we (to be cautious) would refer to the row for the nearest reported number of degrees of freedom below the desired number. Here that might be 30. For 30 degrees of freedom and a two-tailed test, the tabulated t-scores for p=0.05 and p=0.02 are 2.042 and 2.457. The (absolute magnitude) of the t-statistic, falls between these scores, hence the p-value linked to this t-statistic is therefore between 0.02 and 0.05. Therefore the p-value is statistically significant at the 5% (0.05) level but not at the 2% or 1% (0.02 or 0.01) level. Of course, SPSS is set up to calculate exact p-values for test statistics such as the t-statistic (in this case the exact value is p=0.030).

Example: Reporting the results “The mean number of minutes of TV watched by the sample of 20 British children is 187 minutes, which is 21 minutes higher than the mean of 166 minutes for the sample of 20 Australian children; this difference is statistically significant at the 0.05 level (t(38)= -2.3, p = 0.03, two-tailed test). Based on these results we can reject the hypothesis that British and Australian children watch the same average amount of television every night.”

Some final thoughts… ANOVA (Analysis of Variance) works on broadly similar principles, but is a technique allowing one to look simultaneously at differences between the means of more than two groups. Both t-tests and ANOVA make an assumption of homogeneity of variance (i.e. that the spread of values in each of the groups being considered is consistent). We will look at ANOVA in more detail later in the module. What are crucial to remember from this session are the principles of hypothesis testing: That we start with a null hypothesis (of no difference in the population). That, using our sample we can test whether this is plausible. The p-values that we get (and that we report) show the likelihood of the observed results given no difference. Therefore (to simplify), the lower the p-value the more likely it is that there is a real difference between the groups. A reminder: The three things that affect the test statistic are the sample size (of each group), the size of the differences in the means (between groups) and the variability of scores (within each group).