Download presentation
Presentation is loading. Please wait.
Published bySheila Willis Modified over 6 years ago
1
Introduction to Study Skills & Research Methods (HL10040)
Hypothesis Testing Introduction to Study Skills & Research Methods (HL10040) Dr James Betts
2
Lecture Outline: What is Hypothesis Testing? Hypothesis Formulation
Statistical Errors Effect of Study Design Test Procedures Test Selection.
3
Organising, summarising & describing data
Statistics Descriptive Inferential Correlational Organising, summarising & describing data Generalising Relationships Significance
4
The dependent variable can be generalised from n to N
Sampling Error Statistics Effective sampling is essential to correctly generalise back to our target population The dependent variable can be generalised from n to N
5
What is Hypothesis Testing?
Null Hypothesis Alternative Hypothesis A = B A B We also need to establish: 1) How unequal are these observations? 2) Are these observations reflective of the general population?
6
Example Hypotheses: Isometric Torque
Is there any difference in the length of time that males and females can sustain an isometric muscular contraction? Null Hypothesis Alternative Hypothesis ♂ = ♀ ♂ ♀
7
Example Hypotheses: Isometric Torque
Is there any difference in the length of time that males and females can sustain an isometric muscular contraction? Alternative Hypothesis (HA) or experimental (HE) There is a significant difference in the DV between males and females. n.b. these are 2-tailed hypotheses. Most common and more recommended. Null Hypothesis (H0) There is not a significant difference in the DV between males and females
8
Example Hypotheses: Isometric Torque
Is there any difference in the length of time that males and females can sustain an isometric muscular contraction? Useful analogy- the criminal trial Imagine you are the prosecutor H0 = Defendant not guilty HA = Defendant guilty Your job is to provide sufficient evidence (i.e. ‘beyond reasonable doubt’) that the defendant is not innocent. The evidence is the data you have collected (finger prints versus lactates) and once proved ‘not innocent’ we are able to reject the null and accept what remains, i.e. the alternative. IMPORTANTLY, what do you think we can relate ‘reasonable doubt’ to?????????? O.O5 MEANS THE PROBABILITY WE ARE WRONG. SO WE NEED TO BE AT LEAST 95% SURE BEFORE CONVICTING/ACCEPTING THE ALTERNATIVE Remember: the p-value does NOT tell us the probability they are innocent but rather the probability of finding our evidence assuming they are innocent
9
Example Hypotheses: Isometric Torque
Is there any difference in the length of time that males and females can sustain an isometric muscular contraction? N♀ n.b. This is why effective sampling is so important... N♂ n♀ n♂ Sustained Isometric Torque (seconds)
10
Example Hypotheses: Isometric Torque
Is there any difference in the length of time that males and females can sustain an isometric muscular contraction? N♀ …poor/insufficient sampling can lead to errors… N♂ n♀ n♂ Sustained Isometric Torque (seconds)
11
Statistical Errors Type 1 Errors
-Rejecting H0 when it is actually true -Concluding a difference when one does not actually exist Type 2 Errors -Accepting H0 when it is actually false (e.g. previous slide) -Concluding no difference when one does exist Errors can occur due to biased/inadequate sampling, poor experimental design or the use of inappropriate/non-parametric tests.
12
Back to Study Design Independent Measures Repeated Measures
Individual scores in each data set are independent of one another Repeated Measures Individual scores in each data set are dependent/paired/correlated
13
Back to Study Design PLACEBO T O1 Oa P Independent Measures
Individual scores in each data set are independent of one another Repeated Measures Individual scores in each data set are dependent/paired/correlated 2 Distinct Groups T O1 O2 Pre-Experimental designs. Same individuals tested twice
14
Back to Study Design R PLACEBO Independent Measures Repeated Measures
True-Experimental design. Independent Measures Individual scores in each data set are independent of one another Repeated Measures Random Group Assignment O1 T O2 PLACEBO P O4 O3 R Depends on how equivalent groups were achieved Cross-Over Design
15
Example Hypotheses: Isometric Torque
Is there any difference in the length of time that males and females can sustain an isometric muscular contraction? So the above example is an measures design Which therefore requires an independent t-test. Independent AKA Students’ (Gosset’s) t-test
16
Independent t-test: Calculation
Is this a significant effect? n♀ n♂ Sustained Isometric Torque (seconds) Mean SD n ♀ 18.5 1.74 25 ♂ 17.5 1.72
17
Independent t-test: Calculation
Step 1: Calculate the Standard Error for Each Mean SEM♀ = SD/√n = 1.74/5 = 0.348 SEM♂ = SD/√n = 1.72/5 = 0.344 Mean SD n ♀ 18.5 1.74 25 ♂ 17.5 1.72
18
Independent t-test: Calculation
Step 2: Calculate the Standard Error for the difference in means SEMdiff = √ SEM♀2 + SEM♂2 = √ = 0.501 Mean SD n ♀ 18.5 1.74 25 ♂ 17.5 1.72
19
Independent t-test: Calculation
Step 3: Calculate the t statistic t = (Mean♀ - Mean♂) / SEMdiff = 2.00 Mean SD n ♀ 18.5 1.74 25 ♂ 17.5 1.72
20
Independent t-test: Calculation
Step 4: Calculate the degrees of freedom (df) df = (n♀ - 1) + (n♂ - 1) = 48 Mean SD n ♀ 18.5 1.74 25 ♂ 17.5 1.72
21
Independent t-test: Calculation
Step 5: Determine the critical value for t using a t-distribution table n.b. Use 0.05 for 2 tailed test Degrees of Freedom Critical t-ratio 44 46 48 50 2.015 2.013 2.011 2.009 Mean SD n ♀ 18.5 1.74 25 ♂ 17.5 1.72
22
Independent t-test: Calculation
Step 6 finished: Compare t calculated with t critical Calculated t = 2.00 Critical t = 2.01 Therefore, t calculated < t critical Effect size n.s. Mean SD n ♀ 18.5 1.74 25 ♂ 17.5 1.72
23
Independent t-test: Calculation
Interpretation: P > 0.05 Reject HA & Accept HO Conclusion: There is not a significant difference in the DV between males and females. Mean SD n ♀ 18.5 1.74 25 ♂ 17.5 1.72
24
Independent t-test: Calculation
Evaluation: The wealth of available literature supports that females can sustain isometric contractions longer than males. This may suggest that the findings of the present study represent a type error Possible solution: Increase n Mean SD n ♀ 18.5 1.74 25 ♂ 17.5 1.72
25
Independent t-test: SPSS Output
Swim Data from SPSS session 8 Ignore sign > So P < 0.05 Calculated t df 18 = critical t 2.101
26
Repeated Measures Designs
As shown earlier, a repeated measures design infers that data in each data set can be paired or correlated with one another An independent t-test is inappropriate to analyse such data Instead, a paired t-test should be used…
27
Advantages of using Paired Data
Data from independent samples is heavily influenced by variance between subjects i.e. This data would have a large SD associated with an independent t-test simply because some subjects performed better than others HOWEVER… Large SD (variance)
28
Advantages of using Paired Data
Data from independent samples is heavily influenced by variance between subjects …using the same participants on two occasions allows us to pair up the data… …now we can remove between subject variance from subsequent analysis…
29
Paired t-test: Calculation
Subject Week 1 Week 2 Diff (D) Diff2 (D2) 1 10 12 2 50 52 3 20 25 4 8 5 115 120 6 75 80 7 45 170 175 Steps 1 & 2: Complete this table ∑D = ∑D2 =
30
Paired t-test: Calculation
Step 3: Calculate the t statistic t = n x ∑D2 – (∑D)2 = √ (n - 1) ∑D ∑D = ∑D2 =
31
Paired t-test: Calculation
Step 3: Calculate the t statistic t = x 137 – (31) = √ 31 ∑D = ∑D2 =
32
Paired t-test: Calculation
Steps 4 & 5: Calculate the df and use a t-distribution table to find t critical Critical t-ratio (0.05 level) Critical t-ratio (0.01 level) Degrees of Freedom 1 2 3 4 5 6 7 8 9 12.71 4.303 3.182 2.776 2.571 2.447 2.365 2.306 2.262 63.657 9.925 5.841 4.604 4.032 3.707 3.499 3.355 3.250 df = n -1
33
Paired t-test: Calculation
Step 6 finished: Compare t calculated with t critical Calculated t = 7.06 Critical t = 3.499 Therefore, t calculated > t critical Effect size sig. Mean SD n Week 1 61.6 56.6 8 Week 2 65.5 57.5
34
Paired t-test: Calculation
Interpretation: P < 0.05 Reject H0 & Accept HA Conclusion: There is a significant difference in the DV between week 1 and week 2. Mean SD n Week 1 61.6 56.6 8 Week 2 65.5 57.5
35
Paired t-test: SPSS Output
Push-up Data from lecture 3 Calculated t Ignore sign > So P < 0.01 df 7 = critical t (0.05) (0.01)
36
Parametric versus Non-Parametric
Both the t-tests just shown are parametric tests These examine for differences in the mean Therefore the mean must be an accurate descriptor Normal Non-normal ?
37
Example Hypotheses: Isometric Torque
Is there any difference in the length of time that males and females can sustain an isometric muscular contraction? Normal Distribution mean is appropriate t-test Mean A Mean B Sustained Isometric Torque (seconds)
38
Example Hypotheses: Isometric Torque
Is there any difference in the length of time that males and females can sustain an isometric muscular contraction? NON-Normal Distribution mean is INappropriate Type 2 error Mean A Mean B Sustained Isometric Torque (seconds)
39
…assumptions of parametric analyses
All means and paired differences are ND (this is the main consideration) N acquired through random sampling Data must be of at least the interval LOM Data must be Continuous. …but see Norman (2010) Adv. Health Sci. Educ.
40
Non-Parametric Tests These tests use the median and do not assume anything about distribution, i.e. ‘distribution free’ Mathematically, value is ignored (i.e. the magnitude of differences are not compared) Instead, data is analysed simply according to rank.
41
e.g. Exam grades (ordinal) from 14 students in 2 separate schools
Non-Parametric Tests Independent Measures Mann-Whitney Test Repeated Measures Wilcoxon Test e.g. Exam grades (ordinal) from 14 students in 2 separate schools
42
Mann-Whitney U: Calculation
Step 1: Rank all the data from both groups in one series, then total each School A School B Student Grade Rank Student Grade Rank J. S. L. D. H. L. M. J. T. M. T. S. P. H. B- B- A+ D- B+ A- F T. J. M. M. K. S. P. S. R. M. P. W. A. F. D C+ C+ B- E C- A- Median = B-; ∑RA = Median = C+; ∑RB =
43
Mann-Whitney U: Calculation
Step 2: Calculate two versions of the U statistic using: U1 = (nA x nB) + (nA + 1) x nA - ∑RA 2 AND… U2 = (nA x nB) + (nB + 1) x nB - ∑RB 2 Median = B-; ∑RA = Median = C+; ∑RB =
44
Mann-Whitney U: Calculation
Step 2: Calculate two versions of the U statistic using: U1 = (nA x nB) + (nA + 1) x nA - ∑RA 2 …OR to save time you can calculate U1 and then U2 as follows U2 = (nA x nB) - U1 Median = B-; ∑RA = Median = C+; ∑RB =
45
Mann-Whitney U: Calculation
Step 3 finished: Select the smaller of the two U statistics (U1 = 17.5; U2 = 31.5) …now consult a table of critical values for the Mann-Whitney test n 0.05 0.01 6 5 2 7 8 4 8 13 7 9 17 11 Conclusion Median A = Median B Calculated U must be less than critical U to conclude a significant difference
46
Mann-Whitney U: SPSS Output
Calculated U (lower value) 17.5 > 8 So P > 0.05 n.s.
47
e.g. One group pre-test post-test, assumed non-normal
Non-Parametric Tests Independent Measures Mann-Whitney Test Repeated Measures Wilcoxon Test e.g. One group pre-test post-test, assumed non-normal
48
Wilcoxon Signed Ranks: Calculation
Step 1: Rank all the differences in one series (ignoring signs), then total each Pre-training OBLA (kph) Post-training OBLA (kph) Athlete Diff. Rank Signed Ranks J. S. L. D. H. L. M. J. T. M. T. S. P. H. -7 -3 Medians = ∑Signed Ranks =
49
Wilcoxon Signed Ranks: Calculation
Step 2: The smaller of the T values is our test statistic (T+ = 18; T- = 10) …now consult a table of critical values for the Wilcoxon test n 0.05 6 7 2 8 3 9 5 Conclusion Median A = Median B Calculated T must be less than critical T to conclude a significant difference
50
Wilcoxon Signed Ranks: SPSS Output
10 > 2 So P > 0.05 n.s.
51
So which stats test should you use?
Nominal Q1. What is the LOM? Interval/Ratio Ordinal No Q2. Are the data ND? Yes Q3. Are the data paired or independent?
52
Why do we use Hypothesis Testing?
It is easy (i.e. data in P value out) It provides the ‘Illusion of Scientific Objectivity’ Everybody else does it.
53
Problems with Hypothesis Testing?
P<0.05 is an arbitrary probability (P<0.06?) The size of the effect is not expressed The variability of this effect is not expressed Induction/deduction - reproducability Overall, hypothesis testing ignores ‘judgement’. Surely God loves 0.06 as much as 0.05 (i.e. 94% certain) i.e. we know A is bigger than B but not by how much exactly (so we would need to calculate effect size as well) Providing a P value alone also leaves us wondering how far individual measurements might have varied from the mean (so we need to include SD or SEM as well) This relates back to the scientist’s desire for objectivity, it is obviously important that we do not underestimate the effectiveness of our own judgement (i.e. if P=0.06 would it be smart to accept the null hypothesis and move on?)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.