The t-test Dr David Field.

Slides:

Advertisements

Similar presentations

T-tests continued.

Advertisements

1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.

Personal Response System (PRS). Revision session Dr David Field Do not turn your handset on yet!

Review of the Basic Logic of NHST Significance tests are used to accept or reject the null hypothesis. This is done by studying the sampling distribution.

Analysis of frequency counts with Chi square

The Two Sample t Review significance testing Review t distribution

Review: What influences confidence intervals?

PSY 307 – Statistics for the Behavioral Sciences

SADC Course in Statistics Comparing Means from Independent Samples (Session 12)

The t-test:. Answers the question: is the difference between the two conditions in my experiment "real" or due to chance? Two versions: (a) “Dependent-means.

Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 10: Hypothesis Tests for Two Means: Related & Independent Samples.

Lecture 9: One Way ANOVA Between Subjects

T-Tests Lecture: Nov. 6, 2002.

The one sample t-test November 14, From Z to t… In a Z test, you compare your sample to a known population, with a known mean and standard deviation.

PSY 307 – Statistics for the Behavioral Sciences

Non-parametric statistics

The Two Sample t Review significance testing Review t distribution

Descriptive Statistics

PSY 307 – Statistics for the Behavioral Sciences

Inferential Statistics

Statistical Analysis. Purpose of Statistical Analysis Determines whether the results found in an experiment are meaningful. Answers the question: –Does.

1 Psych 5500/6500 Statistics and Parameters Fall, 2008.

AP Statistics Section 13.1 A. Which of two popular drugs, Lipitor or Pravachol, helps lower bad cholesterol more? 4000 people with heart disease were.

Analysis of Variance. ANOVA Probably the most popular analysis in psychology Why? Ease of implementation Allows for analysis of several groups at once.

Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.

AM Recitation 2/10/11.

Chapter Eleven Inferential Tests of Significance I: t tests – Analyzing Experiments with Two Groups PowerPoint Presentation created by Dr. Susan R. Burns.

Chapter 13 – 1 Chapter 12: Testing Hypotheses Overview Research and null hypotheses One and two-tailed tests Errors Testing the difference between two.

Hypothesis testing Dr David Field

1 GE5 Lecture 6 rules of engagement no computer or no power → no lesson no SPSS → no lesson no homework done → no lesson.

Statistical Analysis Statistical Analysis

Significance Tests …and their significance. Significance Tests Remember how a sampling distribution of means is created? Take a sample of size 500 from.

Section 10.1 ~ t Distribution for Inferences about a Mean Introduction to Probability and Statistics Ms. Young.

Education 793 Class Notes T-tests 29 October 2003.

Covariance and correlation

RMTD 404 Lecture 8. 2 Power Recall what you learned about statistical errors in Chapter 4: Type I Error: Finding a difference when there is no true difference.

1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.

Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Spring 2015 Room 150 Harvill.

Understanding the Variability of Your Data: Dependent Variable Two "Sources" of Variability in DV (Response Variable) –Independent (Predictor/Explanatory)

Hypothesis Testing: One Sample Cases. Outline: – The logic of hypothesis testing – The Five-Step Model – Hypothesis testing for single sample means (z.

Independent Samples t-Test (or 2-Sample t-Test)

Comparing two sample means Dr David Field. Comparing two samples Researchers often begin with a hypothesis that two sample means will be different from.

1 Psych 5500/6500 t Test for Two Independent Means Fall, 2008.

Testing Hypotheses about Differences among Several Means.

AP Statistics Section 13.1 A. Which of two popular drugs, Lipitor or Pravachol, helps lower bad cholesterol more? 4000 people with heart disease were.

Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.

7. Comparing Two Groups Goal: Use CI and/or significance test to compare means (quantitative variable) proportions (categorical variable) Group 1 Group.

1 Psych 5500/6500 The t Test for a Single Group Mean (Part 1): Two-tail Tests & Confidence Intervals Fall, 2008.

Essential Question:  How do scientists use statistical analyses to draw meaningful conclusions from experimental results?

Section 10.1 Confidence Intervals

DIRECTIONAL HYPOTHESIS The 1-tailed test: –Instead of dividing alpha by 2, you are looking for unlikely outcomes on only 1 side of the distribution –No.

Chapter 9 Three Tests of Significance Winston Jackson and Norine Verberg Methods: Doing Social Research, 4e.

KNR 445 Statistics t-tests Slide 1 Introduction to Hypothesis Testing The z-test.

Review I A student researcher obtains a random sample of UMD students and finds that 55% report using an illegally obtained stimulant to study in the past.

Chapter 10 The t Test for Two Independent Samples

SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.

SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.

© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 11 Testing for Differences Differences betweens groups or categories of the independent.

Hypothesis test flow chart

Chapter 13 Understanding research results: statistical inference.

Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.

HYPOTHESIS TESTING FOR DIFFERENCES BETWEEN MEANS AND BETWEEN PROPORTIONS.

The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.

Chapter 9 Introduction to the t Statistic

Dependent-Samples t-Test

INF397C Introduction to Research in Information Studies Spring, Day 12

Review: What influences confidence intervals?

Hypothesis Testing.

Some statistics questions answered:

Chapter 9 Test for Independent Means Between-Subjects Design

Presentation transcript:

The t-test Dr David Field

Summary Andy Field, 3rd Edition, chapter 9 covers today's lecture But one sample t and formulas are not covered The t test for independent samples Comparing the means of two samples of subjects Degrees of freedom The paired samples t test (also known as repeated measures and related samples) Comparing the means of two experimental conditions when the same sample participated in both conditions The one sample t test Comparing the mean of a single sample against a fixed value, e.g. 50% chance performance, or 0 Effect size The degree to which changes in a DV are caused by the IV relative to other sources of variation 95% confidence interval of the difference between sample means Chapter 9 of Andy Field is a big chapter, and it revises a lot of the things from lectures 3 and 4. So it is worth reading all of it, but I will also point out the most important parts. It is also a chapter where he does a few things a bit differently from me (e.g. effect size and the reporting of t tests), so use the versions covered in my lecture handouts.

The story so far…. In workshop 3 you produced an error bar chart for the blueberries versus sugar cubes with exam scores experiment The 95% confidence intervals around the two means overlapped, so we could not be sure that the two samples were from different populations We could not reject the null hypothesis We could not be sure that we have failed to reject the null hypothesis Today, we will use the t test to make a direct test of the null hypothesis for that experiment By performing a t test you arrive at a significance level or p value, which is the probability of obtaining the difference between the means of experimental conditions by random sampling from a null distribution

The t statistic 9.3 9.3.1 There are several variations on how t is calculated that depend on the type of experiment (independent samples, repeated measures, one sample) But these are variations on a general theme Higher values of t reflect greater statistical significance Difference between two means t itself is a statistic, just like the SD or the median. It desctibes how much systematic variation there is in the data in relation to the amount of unsystematic variton – it is a ratio. It is generally calculated in order to convert it to a p value Variation in data

t will be smaller than ‘medium’ In the following examples the difference between the means is the same in all three cases But the variability of the samples is different t will be smaller than ‘medium’ we have to judge the difference between two means relative to the spread or variability of the two distributions that the means are summarising. In the figure, we can be very confident that in the low variability case the two sample means come from two different populations, but we are much less confident of this in the high variability case, even though the difference between the sample means is the same size. The t statistic is a way of taking into account the size of the difference between two samples and the variability of the two samples at the same time. t will be larger than ‘medium’

The t statistic The size of t can be increased in three ways increase the difference between the two sample means reduce the variability in the two samples Increase the number of participants Difference between two means It is fairly obvious that all other things being equal a bigger difference between the means is going to be more statistically significant. In terms of the t statistic formula, if the numerator (the thing you divide, on top of the formula) is bigger then the result of the calculation will be bigger. In lecture 3 I gave an example of reducing the variation in two experimental groups based on the blueberries and exam scores example. In that example the experiment was repeated using participants who were selected to be similar in terms of IQ and studying habits. This was done because these are both factors other than the blueberries vs sugar IV that would produce variation in exam scores. Once the group of well matched participants was recruited you’d divide it randomly between the two experimental conditions. Variation in data

The t statistic There are several variations on how t is calculated that depend on the type of experiment (independent samples, repeated measures, one sample) But these are variations on a general theme mean sample 1 – mean sample 2 The formula for calculating t is very similar to the formula for calculating the z test statistic that was covered in lecture 4. In both cases what you are doing is dividing the size of the experimental effect (systematic variation) by an estimate of the unsystematic or error variance in the data, producing a ratio. If that ratio is larger than 1, then there is more systematic variation than unsystematic variation in the data. In different types of experiment different methods are used to calculate the SE of the difference SE of the difference

SE of the difference (see lecture 4) The SE of the difference is similar to the SE of a sample mean It reflects an underlying sampling distribution of many possible repetitions of the experiment using that exact sample size, where in each case the difference obtained between the means would be slightly different The key thing is that the SE gets smaller as the sample size increases and as the underlying SD of the population of interest reduces Because the t formula involves dividing by the SE of the difference t is in units of SE just as Z scores are in units of SD mean sample 1 – mean sample 2 SE of the difference

SE of the difference for independent samples Independent samples (or between subjects) is where separate samples of participants are tested in each experimental condition We will use the effects of eating blueberries versus sugar cubes on exam scores as an illustration In this case the SE of the difference has to take into account the fact that there are two samples contributing to the part of the unsystematic variation in the scores that is not due to the IV IQ and hours studied are things the researchers are not interested in that might contribute to the variability of scores in both samples of the blueberry experiment

SE of the difference for independent samples Sample 1 variance Sample 2 variance N sample 1 N sample 2 N is the number of participants in the groups, which may be unequal Recall that the variance is the sum of squared differences from the mean divided by N-1 (see lecture 1) This formula is very similar to that used in the z test (lecture 4). But the way that t is converted to a p value is slightly different from the way z is converted to a p value Notice that the variance of a difference between two sample means is equal to the two separate sample variances added together (“variance sum law”). This works against you as a researcher, requiring large sample sizes to make the denominator term of the equation for t smaller.

t formula for independent samples 9.5 Sample 1 mean Sample 2 mean t = Sample 1 variance Sample 2 variance N sample 1 N sample 2 This is the full formula for calculating t for independent samples. It looks scary when you see it all together like this, but you know what each individual term in the equation means. And the key thing to remember is just that all t statistics re a measure of the difference between two means, divided by some estimate of the variability of the sample The square root is used to put the SE back into the same measurement units as the means, so that you can meaningfully divide the mean difference by the error term. The variation due to the IV is estimated on the numerator as the difference between the two sample means. Variation in scores due to all sources of variation in the data other than the IV is estimated by the denominator, which adds together the variance from both groups (the variance of the mean difference is equal to the combined variance of the two groups)

Blueberry experiment example 57.47 50.77 t = 106.8 94.8 30 30 t = 2.58

Converting t to a p value (significance level) The t statistic is a measure of the difference between the sample means in units of SE (i.e. units of estimated measurement error) It is a measure of how big the effect of the IV is compared to unsystematic variation random allocation of participants to conditions helps to insure that other sources of variation do not co-vary with the IV Two sample means will differ even under the null hypothesis, where the only source of variation in the data is unsystematic what is the probability of a value of t this big or bigger arising under the null hypothesis?

What differs from the z test? For the z test, the obtained value of z is referred to the standard normal distribution, and the proportion of the area under the curve with that value of z or greater is the p value However, when sample sizes are small the standard normal distribution is actually a bad model of the null distribution it’s tails are too thin and values close to the mean are too common The t distribution has much fatter tails, accurately representing the chances of large differences between sample means occurring by chance when samples are small Unlike the standard normal distribution, the t distribution changes shape as the sample size increases As N increases the tails get thinner, and if N > 30, it is almost identical to the z distribution What are the consequences for the z test of the standard normal distribution being a bad model of the null distribution for small sample sizes? The answer is that this increases the type 1 error rate (i.e. rejecting the null hypothesis when there is no real difference in the population)

The standard normal distribution compared to the t distribution when N = 2 Blue is the standard normal distribution Red is the t probability distribution if N = 2 in both groups The proportion of area under the curve that is, e.g., > 2 is much greater for the t distribution The proportion is 14.7% for t and 2.3% for z

The standard normal distribution compared to the t distribution when N = 4 Blue is the standard normal distribution Red is the t probability distribution if N = 4 both groups The proportion of area under the curve that is, e.g., > 2 is now reduced for the t distribution The proportion is 5.0% for t and 2.3% for z The green curves are for t distributions with smaller samples

Degrees of Freedom (DOF) Jane Superbrain 2.2 The shape of the t distribution under the null hypothesis changes as sample size (N) goes up This is because the number of degrees of freedom increases as the two sample sizes go up and the shape of the t distribution is directly specified by the DOF, not N The degrees of freedom for the independent samples t test are given by (group one sample size - 1) + (group two sample size -1) All statistics have degrees of freedom, and for some statistics the formula for working them out is more complicated Fortunately, SPSS calculates DOF for you What I presented in the previous slides about the shape of the t distribution changing as the sample size goes up was actually a slight simplification of the truth

Degrees of freedom (DOF) The DOF are the number of elements in a calculation that are free to vary. If you know only that four numbers have a mean of 10 you can say nothing about what the four numbers might be But if you also know that the first three numbers are 8, 9, and 11 Then the fourth number must be 12 If you only know that the first two numbers are 8 and 9 there are still an infinite possible number that the third and fourth numbers can take Therefore, this calculation of the mean of 4 numbers has 3 DOF It is possible to ignore the topic of DOF and get through a Psychology degree. In practice, SPSS always calculates it for you, and you just copy the appropriate value of DOF out of the table and put it in brackets after t(). Students frequently copy the wrong number from SPSS into their reports for the DOF, and I suspect this is because they often don’t really know what DOF is. So, it is worth understanding it to prevent this error. If you ever get involved in more advanced statistics, and particularly with fitting mathematical models to data, then the concept of DOF becomes central to what you are doing rather than just being a number that you take out of an SPSS output and write in brackets in your report.

Converting t to a p value (significance level) Having worked out the value of t and the number of DOF associated with the calculation of t, SPSS will report the exact probability (p value) of obtaining that value of t under the null hypothesis that both samples were selected randomly from populations with identical means The p value is equal to the proportion of the area under the curve of the appropriate t distribution where the values are as big or bigger than the obtained value of t In the past you had to look up the p value yourself in statistical tables of DOF versus t at the back of a textbook….. Remember that if p < 0.05 you can reject the null hypothesis and say that the effect of the IV on the DV was statistically significant Note that technically speaking the two samples could come from two populations with identical means but different SD’s and the null hypothesis of a t test remains true. There are statistical tests that ask whether two samples come from populations with the same or different SD’s, but this is not covered in this lecture.

Blueberry example In the blueberry experiment there was a difference of 6.7% in the predicted direction, but the 95% confidence intervals around the means of the two groups overlapped, so it is useful to perform the t test to see if there is enough evidence to reject the null hypothesis “The effect of supplementing diet with blueberries compared to sugar cubes on exam scores was statistically significant, t(58) = 2.58, p = 0.006 (one tailed).” This is an incomplete example of reporting experimental results A full example will be given later We write (one tailed) and divide the p value SPSS gives by 2 because the researchers predicted that blueberries would improve exam scores, rather than just changing them t is sometimes negative, but this has no special meaning, it just depends which way round you enter the variables in the t test dialogue box in SPSS In the blueberry example p = 0.006 means that a difference in the means in this direction and of this size or greater would only occur times out of 1000 repetitions of the experiment if the null hypothesis was true Note the DOF (30-1) + (30-1)

Related t-test 9.6 This is also known as paired samples or repeated measures Use it when the same sample of participants perform in both the experimental and control conditions For example, if you are interested in whether reaction times differ in the morning and the afternoon, it would be more efficient to use the same sample for both measurements If you suspect that practice (or fatigue) effects will occur when the DV is measured twice in the same person then counterbalance the order of testing to prevent order effects from being confounded with the IV In the example you’d test half the participants in the morning first and then in the afternoon, and the other half would be tested in the afternoon first and then in the morning Order effects will then contribute only to unsystematic variation One obvious advantage of this type of design is that you only need half the number of participants compared to independent samples. Sometimes repeated measures designs are not possible, because once the DV has been measured once you can’t measure it again in the same person and maintain validity (e.g. retrospective judgments of time intervals)

Advantages of related t over independent t In the blueberry experiment, there was variation in scores in each group due to IQ and hours studied. this variation was included in the SE part of the equation If you tested the same participants twice, once in the blueberry condition, and once in the sugar cube condition, employing two equally difficult exams, then IQ scores and study habits would be equalised across the two experimental conditions This would reduce the risk of a type 1 error occurring due to chance (sampling) differences in unsystematic variation between conditions It would no longer be possible, by chance, for the majority of the higher IQ and hard working participants to end up in one of the two experimental conditions If you constructed the experiment in this repeated measures way, and used the independent samples t formula this would already be an improvement on the separate samples design But, a bigger improvement in the sensitivity of the experiment can be achieved by changing how the SE is calculated

Advantages of related t over independent t In the repeated measures design, each score can be directly paired (compared) with a score in the other experimental condition We can see that John scored 48 in the exam when supplemented with sugar cubes and 52 when he ate blueberries There was no meaningful way to compare the score of an individual participant with any other individual in the other condition when independent samples were used In the repeated measures case we can calculate a difference score for John, which is 4 If Jane scored 70 (sugar cubes) and 72 (blueberries) then her difference score is 2 Jane has a very different overall level of performance compared to John, but we can ignore this, and calculate the error term using only the difference scores In the independent samples t the massive difference between John and Jane had to be included in the error term

Data table for repeated measures t test Participant Reaction time morning (sec) Reaction time afternoon (sec) Difference (sec) [morning – afternoon] Peter 0.90 0.80 -0.10 Sarah 0.40 0.30 John 0.25 -0.05 Jane 0.50 0.45 Henry 1.20 1.00 -0.20 MEAN 0.66 0.56 The reason that that the SE of the difference is calculated differently in the paired t test is that we take advantage of the fact that there is some correlation in the scores between time 1 and time 2. In this example you can see that Henry is always the slowest subject, and this is true whether he is measured in the morning or the afternoon. Likewise, John is always the fastest. But Henry is faster in the afternoon when compared to himself in the morning, and that is also true of John. Basically, everybody gets faster in the afternoon, even though it’s only by a small amount. In fact, the variation in the scores due to time of day is quite small compared to the variation in the scores due to differences between individual participants. THE GREAT STRENGHT OF THE REPEATED MEASURES EXPERIMENTAL DESIGN IS THAT IT ALLOWS YOU TO DISCOUNT THE DIFFFERENCES BETWEEN INDIVIDUALS AND FOCUS ON THE DIFFERENCES CAUSED BY THE IV. This is achieved by calculating the difference scores for each person, as in the fourth column here, and then calculating the SE of the differences. In effect this calculation ignores the between subject differences in the denominator of the t statistic calculation. On the other hand, in the independent samples design you have to include the between subjects difference in the denominator, which makes it bigger. If you are designing an experiment and you think that for the DV you are measuring the between subjects differences will be large compared to the effects of the IV you should try to use a repeated measures design if it is practical to do so.

Data table for independent samples t test Participant (group 1 - morning) Reaction time morning (sec) Participant (group 2 – afternoon) Reaction time afternoon (sec) Peter 0.90 Tom 1.0 Sarah 0.40 Rachel 0.80 John 0.30 David Jane 0.50 Louise 0.45 Henry 1.20 James 0.25 MEAN 0.66 0.56 x If the same rtm experiment was performed using 5 subjects in the morning and 5 different subjects in the afternoon, and you actually got the same sores in both conditions as in the earlier repeated measures example, leading to the same mean difference [this would be very unlikely to happen but this is just an example to make the point] it would be much less clear that the experiment has detected a difference due to time of day of 0.10 sec, and much more likely that the difference of 0.1 sec is due to the differences between individuals and random sampling. In the between subjects version of the experiment it is meaningless to calculate difference scores, and we have to include the variance of both groups in the denominator

t formula for related samples Mean difference between conditions t = SD of the difference Sample size (Compare independent samples t) Sample 1 mean Sample 2 mean t = In fact the SE term for the related samples test is calculated in the same was as the SE of a sample mean, but you operate on the difference scores rather than raw scores The formula for independent samples is reproduced below for comparison. The two formulas look quite different, but they are quite similar really. The “mean difference” in the related t test formula is exactly equal to the difference between the two condition means in the independent t formula The key difference is that while the denominator in the independent samples test adds together two sources of variance (one for each sample), the denominator in the related samples t test only has to take into account a single source of variation, which in this case is the variation of the DIFFERENCE SCORES. Remember that difference scores are a good thing in the repeated measures case because they exclude between subjects variation from the calculation of the t statistic. It is not possible to calculate difference scores in the independent samples t test. The formula is just the formula for the SE you learned about in lecture 2. In words the related samples t statistic formula is just the mean difference divided by the standard error of the difference. Sample 1 variance Sample 2 variance N sample 1 N sample 2

Worked example of related t Participant Difference (sec) [morning – afternoon] Peter -0.10 Sarah John -0.05 Jane Henry -0.20 MEAN The SD of the difference scores is 0.061 sec and N is 5 Therefore, SE is 0.061 / square root of 5 0.061 / 2.23 = 0.027 t = Mean dif / SE -0.10 / 0.027 t = -3.65 The reason that that the SE of the difference is calculated differently in the paired t test is that we take advantage of the fact that there is some correlation in the scores between time 1 and time 2. In this example you can see that Henry is always the slowest subject, and this is true whether he is measured in the morning or the afternoon. Likewise, John is always the fastest. But Henry is faster in the afternoon when compared to himself in the morning, and that is also true of John. Basically, everybody gets faster in the afternoon, even though it’s only by a small amount. In fact, the variation in the scores due to time of day is quite small compared to the variation in the scores due to differences between individual participants. THE GREAT STRENGHT OF THE REPEATED MEASURES EXPERIMENTAL DESIGN IS THAT IT ALLOWS YOU TO DISCOUNT THE DIFFFERENCES BETWEEN INDIVIDUALS AND FOCUS ON THE DIFFERENCES CAUSED BY THE IV. This is achieved by calculating the difference scores for each person, as in the fourth column here, and then calculating the SE of the differences. In effect this calculation ignores the between subject differences in the denominator of the t statistic calculation. On the other hand, in the independent samples design you have to include the between subjects difference in the denominator, which makes it bigger. If you are designing an experiment and you think that for the DV you are measuring the between subjects differences will be large compared to the effects of the IV you should try to use a repeated measures design if it is practical to do so.

Converting related t to a p value The null hypothesis is that the mean difference between the two conditions in the population is zero DOF are given by sample size – 1 for related t DOF are needed to determine the exact shape of the probability distribution, in exactly the same way as for independent samples t The value of t is referred to the t distribution, and the p value is the proportion of the area under the curve equal to or greater than the obtained t This is the probability of a mean difference that big occurring by chance in a sample if the difference in the population was zero The repeated t test is reported in the same way as the independent t test

Reaction time example “It took longer to respond in the morning than in the afternoon, t(4) = -3.65, p = 0.02 (two tailed).” This is an incomplete example of reporting experimental results A full example will be given later This is a two tailed test because the researchers thought that circadian rhythms would result in a different rtm in the afternoon, but they were not able to predict the direction of the difference in advance For a two tailed test you have to consider both tails of the probability distribution, in this case the area under the curve that is greater than 3.65 and less than -3.65 SPSS does this by default

One sample t-test Sometimes, you want to test the null hypothesis that the mean of a single sample is equal to a specific value For example, we might hypothesize that, on average, students drink more alcohol than the people of similar age in the general population Imagine that a previous study used a large and appropriately selected sample to establish that the mean alcohol consumption per week for UK adults between the ages of 18 and 22 is 15 units We can make use of their large and expensive sample, which provides a very good estimate of the population mean To test the hypothesis we could randomly select a single sample of students aged 18-22 and compare their alcohol consumption with that from the published study The null hypothesis will be that the mean units of alcohol consumed per week in the student sample is 15 This is in part a practical issue. We could select 30 students and 30 non students of similar age, but chances are that our small sample of non-students will not provide such an accurate picture of the alcohol consumption in the general population as the big published study.

Data table for one sample t test Participant Units of alcohol per week Null hypothesis Difference (units of alcohol) Peter 20 15 5 Sarah 16 1 John 10 -5 Jane 12 -3 Henry 18 3 MEAN 15.2 0.2 As with the related samples t test you calculate a difference score for each participant and also a mean difference

(Compare related samples t formula) One sample t formula Mean difference from null hypothesis t = This is the SE of the sample mean SD of the sample Sample size (Compare related samples t formula) t = Mean difference between conditions SD of the difference Sample size The one sample t test formula converts the observed difference from the null hypothesis into units of standard error. If the original data was in units of alcohol then the t statistic is in units of estimated measurement error The t formula for related samples is presented for comparison. It’s actually almost identical. The only real difference is that you divide by the SE of the mean difference in the related t test, whereas in the one sample t test you divide by the SE of the sample mean.

Worked example of one sample t Participant Difference (units of alcohol) Peter 5 Sarah 1 John -5 Jane -3 Henry 3 MEAN 0.2 The SD of the difference scores is 4.14 units and N is 5 Therefore, SE is 4.41 / square root of 5 4.41 / 2.23 = 1.85 t = Mean dif / SE 0.2 / 1.85 t = 0.107

Converting one sample t to a p value The null hypothesis is that the population mean is the same as the fixed test value (15 in this case) DOF are given by sample size – 1 DOF are needed to determine the exact shape of the probability distribution, in exactly the same way as for independent samples t The value of t is referred to the t distribution, and the p value is the proportion of the area under the curve equal to or greater than the obtained t This is the probability of the sample mean being that different from the fixed test value if the true population mean is equal to the fixed test value

Alcohol consumption example “We found no evidence that students consume more alcohol than other people in their age group, t(4) = 0.10, p > 0.05, NS” This is an incomplete example of reporting experimental results A full example will be given later NS is short for for non-significant The results of a t test are non significant if p is larger than 0.05 (5%).

Effect size (Cohen’s d ) Statistical significance (i.e. p of data under null hypothesis of < 0.05) is not equivalent to scientific importance If SPSS gives a p value of 0.003 for one t test, and 0.03 for another t test, then the former is “statistically highly significant” but it is NOT a “bigger effect” or “more important" than the latter This is because the SE of the difference depends partly on sample size and partly on the SD of the underlying population (as well as on the size of the mean difference) If the sample size is large a trivially small difference between the means can be statistically highly significant If the two underlying populations of scores in an independent samples t test have a small SD then a trivially small difference in means will again be statistically highly significant the SE of the mean difference is the denominator in all the t equations you have seen (i.e. the bit you divide by) A (made up) example of a trivial but highly statistically significant finding would be if you sampled 100 men from Reading and 100 men from Aberdeen and found that the men from Aberdeen were 1mm taller, and this turned out to be statistically significant

Effect size (Cohen’s d ) For these reasons it is becoming increasingly common to report effect size alongside a null hypothesis test Condition 1 mean – Condition 2 mean d = SD Condition 1 + SD condition 2 2 The key difference from the t formula is that sample size is not part of the formula for d like z scores, d is expressed in units of SD Effect size is basically a z score of the difference SPSS does not report effect size as part of the t test output, but you can easily calculate it yourself Calculation of d is the same for repeated and independent designs. For one sample you would subtract the null hypothesis value instead of the condition 2 mean for the numerator. For the denominator in the case of a one sample t you would just use the SD of the single sample Effect size is basically the size of the observed difference expressed as a proportion of the variation in the data

Effect size in the blueberry example blue 57.47% – sugar 50.77% d = Blue SD 10.3% + sugar SD 9.7% 2 d is 6.7 / 10.05 = 0.66 This means the mean difference of 6.7% between the two groups is equal to 0.66 pooled SD

Interpreting effect size Cohen (1988) d 0.2 is a small effect d 0.5 is a medium effect d 0.8 is a large effect d of > approx 1.5 is a very large effect Warning: Andy Field 3rd Edition uses a different measure of effect size, called r For this course, you must use Cohen’s d

95% confidence interval of the difference 9.4.5 Previously, you calculated a 95% confidence interval around a sample mean, and it is also possible to calculate a 95% confidence interval of the difference between two sample means In the blueberry example, the difference in exam scores between the two groups was 6.7% This difference is actually a point estimate produced by a single experiment, and in principle there is an underlying sampling distribution of similar experiments with the same sample size Each time you repeated the experiment the result (mean difference) would be slightly different The point estimate of the difference can be converted to an interval estimate using a similar method to that for calculation of an interval estimate based on a sample mean Note that sometimes the lower bound of the 95% confidence interval includes numbers that are less than zero (negative numbers). This indicates that you could sometimes repeat the experiment and get a result that appears to go in the opposite direction to the one you have just competed due to sampling error. In such cases you will probably also find that the t statistic is not large enough to reject the null hypothesis.

95% confidence interval of the difference For the blueberry example, SPSS reports a 95% confidence interval of 1.5 to 11.9 for the underlying mean difference in the population This means that if we replicated the experiment with the same sample sizes in the sugar and blueberry groups we could be 95% confident of observing a difference in exam scores of somewhere between 1.5 and 11.9 percentage points Reporting the 95% confidence interval of the difference is becoming common practice in journal articles Note that sometimes the lower bound of the 95% confidence interval includes numbers that are less than zero (negative numbers). This indicates that you could sometimes repeat the experiment and get a result that appears to go in the opposite direction to the one you have just competed due to sampling error. In such cases you will probably also find that the t statistic is not large enough to reject the null hypothesis.

Reporting t test results Here is an example of how to put all the different statistics from today’s lecture into a paragraph “Participants in the blueberry condition achieved higher exam scores (mean = 57.47%, SD = 10.3%) than those in the sugar cube condition (mean = 50.77%, SD = 9.7%). The mean difference between conditions was 6.7%, which is a medium effect size (d = 0.66); the 95% confidence interval for the estimated population mean difference is between 1.5 and 11.9%. An independent t test revealed that, if the null hypothesis were true, such a result would be highly unlikely to have arisen (t(58) = 2.58; p = 0.006 (one tailed)).” This paragraph gives the reader more useful information than the traditional practice of reporting only the null hypothesis test Always report descriptive statistics (mean, SD, effect size, confidence interval of difference) before inferential statistics that test the hypothesis and have a p value (for example, t).

Result of facial feedback workshop (Tues) “The 38 participants who held the pencil between their teeth rated the cartoon as funnier (mean = 5.6, SD = 1.80) than the 42 participants who held the pencil between their lips (mean = 4.7, SD = 1.78). The mean difference between conditions was 0.89, which is a medium effect size (d = 0.49); the 95% confidence interval for the estimated population mean difference is between 0.1 and 1.7. An independent t test revealed that, if how the pencil was held had no influence on how the cartoon was perceived, such a result would be unlikely to have arisen (t(78) = 2.21; p = 0.015 (one tailed)).” Note: DOF are (38-1) + (42-1) = 78 On Monday the group sizes were very unequal (??!!) and the result was quite different, and non-significant.