Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Hypothesis Testing Dr Jenny Freeman Mathematics & Statistics Help University of Sheffield.

Similar presentations


Presentation on theme: "Introduction to Hypothesis Testing Dr Jenny Freeman Mathematics & Statistics Help University of Sheffield."— Presentation transcript:

1 Introduction to Hypothesis Testing Dr Jenny Freeman Mathematics & Statistics Help University of Sheffield

2 Learning outcomes By the end of the session you should:
Understand what is meant by a probability distribution Understand the terminology needed for basic hypothesis testing Understand the difference between a statistically significant difference and a meaningful difference

3 Download the slides from the MASH website
MASH > Resources > Statistics Resources > Workshop materials

4 Frequency distribution: birthweight
A histogram is a frequency distribution Histograms are commonly used to look at the spread and shape of the data Most babies are in the middle with fewer babies at the extremes The % of babies in the sample between 3 and 4 kgs = 60% 15% of babies were above 4kg Would you expect the same results with a different sample? These data look like they are normally distributed…

5 Probability distributions example: Normal distribution
Sample data can be used to estimate population probabilities/ percentages For the normal distribution, if we have the sample mean and standard deviation, the population probability curve and associated probabilities can be estimated Probability curve: a very smooth histogram used to estimate probabilities/ percentages in the population It describes the theoretical distribution of values, for a population with the same mean and standard deviation as the sample data

6 Probability distributions example: Normal distribution
The Normal distribution is: Bell shaped and symmetrical about the mean Completely described by the mean and standard deviation (i.e. if you know these two quantities, you can draw the entire curve) Sometimes called the Gaussian distribution after the German mathematician Carl Friedrich Gauss (1777 to 1855) The mean and the median are the SAME for normally distributed data

7 Estimating probabilities
We can think of this theoretical curve as representing the Normal probability distribution The total area under the curve = 1. Can you think why? It can be used to estimate the probability of individuals having a particular range of values As we just mentioned, the area under a normal curve represents the probability that a normal variable will lie in that particular interval. In this graph here, the shaded area represents the probability that a standard normal variable is greater than z of alpha. So the probability of z being greater than a but less than b is equal to the area under the curve between two points a and b. The probability of z being greater than a, is the area under the curve to the right of a And the probability of z being less than a, is the area under the curve to the left of a.

8 Estimating probabilities
The shaded area represents the probability (p) of obtaining a value greater than a; i.e. P(X > a) = p As we just mentioned, the area under a normal curve represents the probability that a normal variable will lie in that particular interval. In this graph here, the shaded area represents the probability that a standard normal variable is greater than z of alpha. So the probability of z being greater than a but less than b is equal to the area under the curve between two points a and b. The probability of z being greater than a, is the area under the curve to the right of a And the probability of z being less than a, is the area under the curve to the left of a. For example, we could look at P(having a birthweight greater than 4kg)

9 Using tables for probabilities
X = birth weight For ‘greater than’ probabilities, the probabilities get smaller as x increases Mean Standard deviation 3.4kg 0.57kg Birthweight (x) More than probablity More than percentage 1.5 0.9996 99.96% 2.0 0.993 99.3% 2.5 0.943 94.3% 3.0 0.759 75.9% 3.5 0.430 43.0% 4.0 0.146 14.6% 4.5 0.027 2.7% 5.0 0.003 0.3% What if we want to calculate probability of a baby weighing more than 4.2kg?

10 Using tables for probabilities
Probabilities tabulated for distribution with mean = 3.4, SD = 0.57 What is the probability that a baby weighs more than 4.2 kgs?

11 Estimating probabilities
The shaded area is the probability of getting a value in the values of a and b: P(a<X < b) Given that probability tables usually show P(X> a), how might we work out P(a < X < b)? As we just mentioned, the area under a normal curve represents the probability that a normal variable will lie in that particular interval. In this graph here, the shaded area represents the probability that a standard normal variable is greater than z of alpha. So the probability of z being greater than a but less than b is equal to the area under the curve between two points a and b. The probability of z being greater than a, is the area under the curve to the right of a And the probability of z being less than a, is the area under the curve to the left of a. 𝑃 𝑎<𝑋<𝑏 =𝑃 𝑋>𝑎 −𝑃(𝑋>𝑏)

12 Estimating probabilities
Note that for continuous probability density functions, we estimate the probability for an interval (based on area), NOT the probability for a single value This is because as the interval gets smaller, the area gets smaller such that for a single value, the area is zero and thus the exact probability for a single value is 0 As we just mentioned, the area under a normal curve represents the probability that a normal variable will lie in that particular interval. In this graph here, the shaded area represents the probability that a standard normal variable is greater than z of alpha. So the probability of z being greater than a but less than b is equal to the area under the curve between two points a and b. The probability of z being greater than a, is the area under the curve to the right of a And the probability of z being less than a, is the area under the curve to the left of a. See:

13 Exercise 1: Normal probabilities
X = Birthweight; Mean = 3.4kg, SD = 0.57kg What’s the probability of a baby weighing: More than 4.5kg More than 2.3kg

14 Exercise 1: Normal probabilities
X = Birthweight; Mean = 3.4kg, SD = 0.57kg What’s the probability of a baby weighing: Less than or equal to 2.3kg Between 2.3kg & 4.5kg

15 Key properties of normal distribution
Values are often discussed as being a number of standard deviation (𝑠) from the mean ( 𝑥 ): 68% of data lie within approximately 1 standard deviation above and below the mean The normal distribution is completely determined by two parameters: the mean and the standard deviation. The mean tells us where the distribution is centred, it tells us the location of the peak. The standard deviation which is the square root of the variance, controls the width of the distribution. In this graph, the blue, red and yellow curves have a mean of zero, this means that they are centred on zero. What makes them wider or narrower is the variance, so the blue line has a variance of 0.2 and it is the narrowest of all. The red curve has a variance of one and the yellow curve has a variance of 5, it is the widest of all. The green curve has a mean of minus 2 and a variance of 0.5.

16 Key properties of normal distribution
The middle 95% is often used to describe ‘most people’ For normally distributed data 95% of data lie within 1.96 standard deviations above and below the mean (Sometimes rounded so approximately 95% of data lie within 2 standard deviations of the mean) Limits are calculated as: 𝑚𝑒𝑎𝑛±(1.96×𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛) 𝑥 ±(1.96×𝑠) The normal distribution is completely determined by two parameters: the mean and the standard deviation. The mean tells us where the distribution is centred, it tells us the location of the peak. The standard deviation which is the square root of the variance, controls the width of the distribution. In this graph, the blue, red and yellow curves have a mean of zero, this means that they are centred on zero. What makes them wider or narrower is the variance, so the blue line has a variance of 0.2 and it is the narrowest of all. The red curve has a variance of one and the yellow curve has a variance of 5, it is the widest of all. The green curve has a mean of minus 2 and a variance of 0.5.

17 95% of babies weigh between 2.28 and 4.62kg
Example What is the birthweight of most babies? (mean = 3.4kg and SD = 0.57kg) Limits are calculated as: 𝑚𝑒𝑎𝑛±(1.96×𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛) 𝑥 ±(1.96×𝑠) First calculate 1.96 SD’s: x 0.57 = 1.12 Lower limit: 3.4 – 1.12 = 2.28 Upper limit: = 4.62 95% of babies weigh between 2.28 and 4.62kg

18 Travel time to work (sample of 30 journeys)
X = travel time in minutes 𝑚𝑒𝑎𝑛= 𝑥 =32.8 𝑆𝐷=𝑠=4.6 We can use this sample to estimate probabilities in the general population

19 95% of journey times are between and minutes
Exercise 2 By how much can I expect my journey time to vary for direction A? (mean = 32.8 minutes and SD = 4.6 minutes) Limits are calculated as: 𝑚𝑒𝑎𝑛±(1.96×𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛) 𝑥 ±(1.96×𝑠) First calculate 1.96 SD’s: x = Lower limit: – = Upper limit: = 95% of journey times are between and minutes

20 Commonly used ranges (for info)
Interquartile range contains the middle 50% of values for sample data Measurements for people are often divided into percentiles e.g. ‘child’s height is in the bottom 20% for their age’. Normal ranges are based on sample data but are used to represent individuals in the population. They are also known as reference ranges Confidence interval is used to give a range of values for a population parameter e.g. mean (discussed in the next section)

21 Standard Normal Distribution
A different probability distribution is needed for every combination of mean and SD Before computers, one special distribution (z) with a mean of 0 and SD of 1 existed

22 How do we get from a distribution with a mean ≠0 or SD ≠1, to the standard normal distribution, which has a mean of 0 and SD of 1? We standardised! As standardisation is used in other parts of statistics, we will cover it here

23 Distributions with mean ≠ 0
Standardise data in order to get mean = 0, SD = 1. Standardise using the following formula (where x is the original score for an individual and z represents the transformed score): Standardise all values Mean = 32.8; SD = 4.6 Mean = 0, SD = 1 Note: Z is sometimes called a Z score or a standard deviation (SD) score

24 Example: 11 journeys to work
ID Journey time (X) 1 32 2 35 3 34 4 40 5 31 6 7 8 39 9 42 10 11 27 Total 380 Mean 34.5 SD 4.4 𝑋−𝑚𝑒𝑎𝑛

25 Example: 11 journeys to work
ID Journey time (X) 𝑋−𝑚𝑒𝑎𝑛 1 32 -2.5 2 35 0.5 3 34 4 40 5.5 5 31 -3.5 6 -0.5 7 8 39 4.5 9 42 7.5 10 11 27 -7.5 Total 380 Mean 34.5 SD 4.4 𝑋−𝑚𝑒𝑎𝑛

26 Example: 11 journeys to work
ID Journey time (X) 𝑋−𝑚𝑒𝑎𝑛 1 32 -2.5 2 35 0.5 3 34 4 40 5.5 5 31 -3.5 6 -0.5 7 8 39 4.5 9 42 7.5 10 11 27 -7.5 Total 380 Mean 34.5 SD 4.4 𝑋−𝑚𝑒𝑎𝑛 𝑋−𝑚𝑒𝑎𝑛 𝑆𝐷

27 Example: 11 journeys to work
ID Journey time (X) 𝑋−𝑚𝑒𝑎𝑛 𝑋−𝑚𝑒𝑎𝑛 𝑆𝐷 1 32 -2.5 -0.6 2 35 0.5 0.1 3 34 4 40 5.5 1.2 5 31 -3.5 -0.8 6 -0.5 -0.1 7 8 39 4.5 1.0 9 42 7.5 1.7 10 11 27 -7.5 -1.7 Total 380 Mean 34.5 SD 4.4 𝑋−𝑚𝑒𝑎𝑛 𝑋−𝑚𝑒𝑎𝑛 𝑆𝐷

28 Exercise 3: calculating Z scores
A baby is born weighing 4.5 kg. Given the mean weight is 3.4 and SD is 0.57, calculate the Z score for this baby x = Individual score of interest 𝑧= 𝑥− = 4.5− =1.93 Is this within the 95% normal range? Use the normal distribution one-sided probability table to calculate the probability of getting a Z score above 1.93

29 Table: Normal curve tail probabilities (one tailed)
Table: Normal curve tail probabilities (one tailed). Standard normal probability in right-hand tail

30 Other probability distributions
Other distributions can be used to calculate probabilities Depends upon data type, distribution, question to be answered Each has a particular statistic that you calculate. Here are a few examples (there are many others): F distribution: f statistic t distribution: t statistic χ2 distribution χ2 statistic The Z statistic is used for the standard normal distribution

31 Hypothesis testing

32 Populations and samples
Taking a sample from a population Sample data are used to ‘represents the whole population

33 Types of statistics Descriptive statistics summarise and describe sample data we have collected Inferential statistics are obtained when we use sample data to infer something about the wider population

34 Hypothesis testing Hypothesis testing is a method of making decisions about populations using sample data Sample data are used to decide which of two possible statements about a population is most likely to be true We do this by comparing what we have observed to what we expected

35 Hypothesis testing: main steps

36 Define study question Think carefully about the main research question. What do you want to know? What variables will be used to test the question? There are specific tests for different types of data Think about the analysis before carrying out the study

37 Null and alternative hypothesis (H0 & H1)
State your null hypothesis (H0) (statement you are looking for evidence to disprove) State your study (alternative) hypothesis (H1 or HA) which is usually the opposite of the null hypothesis

38 The court case Members of a jury have to decide whether a person is guilty or innocent based on evidence Null: The person is innocent Alternative: The person is not innocent The null (innocent) can only be rejected if there is enough evidence to disprove it

39 The court case Decision: Convict/Release
A man may be guilty or innocent of a crime He is presumed innocent unless there is evidence to suggest otherwise Members of a jury have to decide whether a person is guilty or innocent based on evidence Decision: Convict/Release

40 𝐻 0 : 𝜇 𝐴 = 𝜇 𝐵 Null and alternative hypothesis (H0 & H1)
For comparing means the null is: There is no difference in the population means 𝐻 0 : 𝜇 𝐴 = 𝜇 𝐵 Where: μA is the population mean for group A μB is the population mean for group B When investigating relationships the null is: There is no association between x and y

41 Exercise 4: Hypotheses What would the null and alternative hypotheses be for these research questions? Did class affect survival on board the Titanic? Do students who attend MASH workshops do better in their statistics module than those who do not?

42 Example: Module marks 10 students who attended a MASH workshop and 10 students who did not attend a MASH workshop Results: Can we conclude that there is a difference between the populations? MASH No MASH Difference Mean module mark (%) 68 64 4

43 Variation in single samples
Every sample taken from a population will contain different numbers so the difference between means varies between samples Talk about mu and x bar etc

44 Test Statistic Test Statistic is a number calculated from sample data to decide whether to reject the null hypothesis about the population or not. It varies between different test and looks at what we observed and compares it to what we would expect under the null hypothesis. For our test (final module mark): Test statistic = Difference between means Variability of difference A test statistic is a value calculated to help decide whether the null hypothesis is true.

45 P values If you repeated a study numerous times you would get a variety of test statistics which form a distribution P-value = Probability of getting a test statistic as extreme as the one calculated, if the null is true

46 Example: Module marks Null: The mean module mark is the same for students who attended a MASH workshop and those who did not If this is true, we would expect some test statistics to be negative and some positive just by chance

47 Statistical significance
We say that our result is statistically significant if the p-value is less than some predefined level, referred to as the significance level (), usually set at 5% Small p-value = null unlikely to be true p < 0.05 p ≥ 0.05 Result is Statistically significant Not statistically significant Decide Reject the null in favour of the alternative That there is insufficient evidence to reject the null hypothesis We cannot say that the null hypothesis is true, only that there is not enough evidence to reject it

48 Statistical significance
Null: The mean module mark for students who attend MASH workshops is the same as for students who do not attend MASH workshops Alternative: The mean module mark for students who attend MASH workshops is the higher than for students who do not attend MASH workshop p < 0.05 p ≥ 0.05 Result is Statistically significant Not statistically significant Decide Reject the null in favour of the alternative That there is insufficient evidence to reject the null hypothesis Conclusion There is evidence to suggest that students who attend a MASH workshop do better in their statistics module than students who do not attend a MASH workshop There is a LACK OF EVIDENCE to suggest that students who attend a MASH workshop do better in their statistics module than students who do not attend a MASH workshop

49 Exercise 5: Statistical significance
The significance level is usually set at 5%, this is conventional rather than fixed – for stronger proof could use a level of 1% (0.01) The smaller the p-value, the more confident we are with our decision to reject The p-value for the test of a difference in module marks between students who do and do not attend a MASH workshop was What would you conclude and how confident are you with your decision? p-value Decision ≥ 0.05 Do not reject Evidence to reject Strong evidence to reject < 0.001 Overwhelming evidence to reject

50 Example: Module marks As p < 0.05, there is evidence to suggest that students who attend a MASH workshop do better in their statistics module than students who do not attend a workshop. As p = 0.02, there is a 2% chance of rejecting the null when it is true (i.e. 1 in 50) What is the difference? For the sample tested, those who attended a MASH workshop scored 4 percentage points higher than those who did not attend a workshop in their final module exam

51 Exercise 6: The magic 0.05 What’s the probability of getting a head?
What’s the probability of getting 2 heads in a row? If we toss the coin 4 times, what is the probability of getting 4 heads?

52 Errors Whether you decide to reject the null or not, you may make a mistake… …you may conclude there is a difference when in fact there isn’t one… …you may conclude there isn’t a difference when in fact there is one… false positive error false negative error

53 Type 1 error, α You commit a type 1 error when you reject the null when in fact it is true Type 1 error: This is the p-value Probability of rejecting the null when it’s true Probability of committing a false positive error Convicting an innocent man Concluding that there is a difference when there is not Example: P(Concluding that attending a MASH workshop makes a difference to the module marks, when it does not) = 0.02 (2%) i.e. 1 in 50 chance that in concluding there is a difference, you have made an error

54 Type 2 error, β You commit a type 2 error when you fail to reject the null when in fact it is false Type 2 error: Probability of accepting the null when it’s false Probability of committing a false negative error Releasing a guilty man Concluding that there is no difference when there is Example: P(Concluding that attending a MASH workshop makes no difference when in fact it does)

55 Making a decision: choosing correctly

56 Making a decision The probability of rejecting the null hypothesis when it is actually false is called the POWER of the study (Power=1-prob(committing a false negative error)). It is the probability of concluding that there is a difference, when a difference truly exists

57 Making a decision Releasing a guilty man

58 Convicting an innocent man
Making a decision Convicting an innocent man An acceptable type 1 error rate is usually pre-determined as 0.05 or 5% (the significance level)

59 Making a decision: an alternative viewpoint

60 Example: Loaded die? I’m playing a game and get a six 4 times in a row
As a statistician, I’m thinking this is a bit weird. Could the die be loaded? I throw it 30 times and record the number each time I want to see if there is statistical evidence to suggest that the die is loaded

61 Example: Chi-squared test (χ2)
The chi-squared test is for categorical data It compares expected frequencies assuming the null (of no difference) is true with the observed frequencies from the study P-values are calculated using the Chi-squared distribution ( χ2)

62 Hypothesis testing: main steps
Null hypothesis (H0): The die is fair Alternative hypothesis (H1): The die is not fair Set the significance level: Standard procedure is to set this at 5% so that we reject the null if p < 0.05 (Probability of rejecting null < 5%)

63 My observed values Null hypothesis(H0): The die is fair Outcome
Observed Oi 1 2 4 3 7 5 6 10 Total 30

64 What is the null distribution?
You each have a die – throw it 30 times and record the values you get Outcome Observed Oi 1 2 3 4 5 6 Total How many 6’s do you each get?

65 Variation in number of 6’s
I’m expecting the variation across all your throws to look something like this: So if the null is true and the die is fair, we would still expect the number of sixes to vary from person to person

66 What are my expected values?
Outcome Observed Oi 1 2 4 3 7 5 6 10 Total 30 How many 6’s would I expect to get in 30 throws of the die?

67 What are my expected values?
Outcome Observed Oi 1 2 4 3 7 5 6 10 Total 30 How many 6’s would I expect to get in 30 throws of the die? 𝑃 𝐺𝑒𝑡𝑡𝑖𝑛𝑔 𝑎 𝑠𝑖𝑥 =𝑃 6 = 1 6 Expected number of 6’s: 𝐸 6 =𝑛×𝑃 6 =30× 1 6 = 30 6 =5 The probability of each number is the same so the expected number of each is also 5

68 Expected vs Observed The bar chart compares my observed and expected values The red bars represent the null hypothesis that there is no difference (each number is expected to occur the same number of times)

69 Chi-squared test statistic
The test statistic for the Chi-squared test uses the sum of the squared differences between each pair of observed (O) and expected values (E): 𝜒 2 𝑡𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐= 𝑖=1 𝑛 ( 𝑂 𝑖 − 𝐸 𝑖 ) 2 𝐸 𝑖

70 Chi-squared test statistic
First calculate difference between observed and expected values Outcome i Observed (Oi) Expected (Ei) Difference (𝑂 𝑖 − 𝐸 𝑖 ) 1 2 5 -3 4 -1 3 -4 7 6 10 Total 30 What happens when you sum these differences?

71 Chi-squared test statistic
Square the differences Outcome i Observed (Oi) Expected (Ei) Difference (𝑂 𝑖 − 𝐸 𝑖 ) ( 𝑂 𝑖 −𝐸 𝑖 ) 2 1 2 5 -3 9 4 -1 3 -4 16 7 6 10 25 Total 30

72 Chi-squared test statistic
Divide each squared difference by the expected value 5 Outcome i Observed (Oi) Expected (Ei) Difference (𝑂 𝑖 − 𝐸 𝑖 ) ( 𝑂 𝑖 −𝐸 𝑖 ) 2 ( 𝑂 𝑖 − 𝐸 𝑖 ) 2 𝐸 𝑖 1 2 5 -3 9 1.8 4 -1 0.2 3 -4 16 3.2 7 0.8 6 10 25 Total 30

73 Chi-squared test statistic
Sum the squared differences Outcome i Observed (Oi) Expected (Ei) Difference (𝑂 𝑖 − 𝐸 𝑖 ) ( 𝑂 𝑖 −𝐸 𝑖 ) 2 ( 𝑂 𝑖 − 𝐸 𝑖 ) 2 𝐸 𝑖 1 2 5 -3 9 1.8 4 -1 0.2 3 -4 16 3.2 7 0.8 6 10 25 Total 30 11.2 How do we get a p-value?

74 Chi-squared distribution (χ2)
Chi-squared is a skewed distribution which varies depending on the degrees of freedom For calculating a test statistic for a one sample chi-squared: v = df = outcomes – 1 = 6 – 1 = 5 Outcomes = numbers 1 - 6

75 Chi-squared distribution (χ2, df=5)

76 P-value for my test P-value = Probability of getting a test statistic of at least 11.2 if the null is true and the die is fair In Excel: =CHISQ.DIST.RT(TS, v) = CHISQ.DIST.RT (11.2, 5) = 0.048 As we square the differences in the test statistic, the chi squared test is always one tailed. Decision: REJECT the null as p < 0.05

77 Conclusion Null hypothesis ( H0): The die is fair
Alternative hypothesis ( H1): The die is not fair REJECT the null as p < 0.05 If the null is rejected, conclude with the alternative hypothesis CONCLUSION: There is significant evidence (p = 0.048) to suggest that the die is NOT fair There is only a 4.8% chance that we have made the wrong decision if the null is true. The smaller the p-value, the more confident you can be with your decision to reject the null hypothesis As we square the differences in the test statistic, the chi squared test is always one tailed.

78 33% of throws were 6’s compared to only 3% of 3’s
Conclusion If a chi-squared result is significant, use %’s to explain the differences Outcome i Observed Oi % of throws 1 2 6.7 4 13.3 3 3.3 7 23.3 5 6 20.0 10 33.3 Total 30 33% of throws were 6’s compared to only 3% of 3’s

79 Exercise 7: Testing your own die
You all have fair die – or do you??? Outcome i Observed Oi Expected Ei Difference Oi – Ei (Oi -Ei)2 (Oi -Ei) Ei 1 5 2 3 4 6 Total

80 Chi squared distribution (χ2, df=5)
Test Statistic 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0000 0.9998 0.9991 0.9976 0.9953 0.9921 0.9880 0.9830 0.9770 0.9702 1 0.9626 0.9541 0.9449 0.9349 0.9243 0.9131 0.9012 0.8889 0.8761 0.8628 2 0.8491 0.8351 0.8208 0.8063 0.7915 0.7765 0.7614 0.7461 0.7308 0.7154 3 0.7000 0.6846 0.6692 0.6538 0.6386 0.6234 0.6083 0.5934 0.5786 0.5639 4 0.5494 0.5351 0.5210 0.5071 0.4934 0.4799 0.4666 0.4536 0.4408 0.4282 5 0.4159 0.4038 0.3920 0.3804 0.3690 0.3579 0.3471 0.3365 0.3262 0.3161 6 0.3062 0.2966 0.2872 0.2781 0.2692 0.2606 0.2521 0.2439 0.2359 0.2282 7 0.2206 0.2133 0.2062 0.1993 0.1926 0.1860 0.1797 0.1736 0.1676 0.1618 8 0.1562 0.1508 0.1456 0.1405 0.1355 0.1307 0.1261 0.1216 0.1173 0.1131 9 0.1091 0.1051 0.1013 0.0977 0.0941 0.0907 0.0874 0.0842 0.0811 0.0781 10 0.0752 0.0725 0.0698 0.0672 0.0647 0.0622 0.0599 0.0577 0.0555 0.0534 11 0.0514 0.0494 0.0476 0.0457 0.0440 0.0423 0.0407 0.0391 0.0376 0.0362 12 0.0348 0.0334 0.0321 0.0309 0.0297 0.0285 0.0274 0.0264 0.0253 0.0243 13 0.0234 0.0225 0.0216 0.0207 0.0199 0.0191 0.0184 0.0176 0.0169 0.0163 14 0.0156 0.0150 0.0144 0.0138 0.0133 0.0127 0.0122 0.0117 0.0113 0.0108 15 0.0104 0.0099 0.0095 0.0092 0.0088 0.0084 0.0081 0.0078 0.0074 0.0071 16 0.0068 0.0066 0.0063 0.0060 0.0058 0.0056 0.0053 0.0051 0.0049 0.0047 17 0.0045 0.0043 0.0041 0.0040 0.0038 0.0036 0.0035 0.0033 0.0032 0.0031 18 0.0029 0.0028 0.0027 0.0026 0.0025 0.0024 0.0023 0.0022 0.0021 0.0020 19 0.0019 0.0018 0.0017 0.0016 0.0015 0.0014 0.0013 20 0.0012 0.0011 0.0010 0.0009 0.0008

81 Exercise 7: Testing your own die
Null: Alternative: Test Statistic: P-value Conclusion:

82 Limitations of a hypothesis test
All that we know from a hypothesis test is how likely the difference we observed is, given that the null hypothesis is true The results of a significance test do not tell us what the difference is or how large the difference is To do this, we need to supplement the hypothesis test with an estimate of the likely size of the difference and its confidence interval which will give us a range of values in which the true population difference is likely to lie

83 Statistical significance & meaningful difference
Statistical significance does not necessarily mean the result is meaningful Large sample sizes and small standard deviations lead to significant results even when the difference is small A meaningful difference is one that is big enough to be worthwhile Small sample sizes may not detect actual differences in the population Giving a confidence interval will help you decide whether the difference is meaningful (range of values within which you expect the true value to lie – not covered here)

84 Example Imagine you were asked to rate your confidence with statistics on a scale of 1 – 100 (1 being petrified!) before and after this course What change in confidence would make enduring a two day course worthwhile? 1 point change 10 point change 20 point change

85 The population mean could be negative (people feel less confident)
Statistical significance & meaningful difference, 95% confidence intervals added Mean change = 0 CI (-5, 5) The population mean could be negative (people feel less confident) Decision: Don’t go!

86 Statistical significance & meaningful difference, 95% confidence intervals added
Mean change = 10 CI (5, 15) There’s evidence that the change is positive but not by much Decision: Think about it

87 Statistical significance & meaningful difference, 95% confidence intervals added
Huge variation in change CI (-1, 42) due to small sample size or large SD

88 Good chance of at least a 10 point change; CI (10, 30)
Statistical significance & meaningful difference, 95% confidence intervals added Good chance of at least a 10 point change; CI (10, 30)

89 Statistical significance & meaningful difference, 95% confidence intervals added

90 Summary: Hypothesis testing steps
State a clear research question and null Summarise the sample data with statistics/graphs Choose the right test Run the correct analysis and obtain a p-value If p < 0.05, conclude statistical significance (of a relationship/ difference between means) Conclude in terms of the research question not forgetting to use summary statistics to explain any differences or relationship Don’t forget to check the assumptions!

91 Summary Research questions need to be turned into a statement for which we can find evidence to disprove - the null hypothesis The study data are reduced down to a single probability - the probability of observing our result, or one more extreme, if the null hypothesis is true (P-value ) We use this P-value to decide whether to reject or not reject the null hypothesis BUT we need to remember that ‘statistical significance’ does not necessarily mean a difference is meaningful, or important Confidence intervals should always be quoted with a hypothesis test to give the magnitude and precision of the difference Remember to summarise the results. State not only whether the result is significant, but also what it is and what it means

92 Learning outcomes You should now:
Understand what is meant by a probability distribution Understand the terminology needed for basic hypothesis testing Understand the difference between a statistically significant difference and a meaningful difference

93 Maths And Statistics Help
Statistics appointments: Mon-Fri (10am-1pm) Statistics drop-in: Mon-Fri (10am-1pm), Weds (4-7pm)

94 Resources: All resources are available in paper form at MASH or on the MASH website

95 Contacts Follow MASH on twitter: @mash_uos Staff
Jenny Freeman Basile Marquier Marta Emmett Website Follow MASH on


Download ppt "Introduction to Hypothesis Testing Dr Jenny Freeman Mathematics & Statistics Help University of Sheffield."

Similar presentations


Ads by Google