Hypothesis Testing Introduction to Study Skills & Research Methods (HL10040) Dr James Betts
Lecture Outline: What is Hypothesis Testing? Hypothesis Formulation Statistical Errors Effect of Study Design Test Procedures Test Selection.
StatisticsDescriptiveInferentialCorrelational Relationships GeneralisingOrganising, summarising & describing data Significance
Sampling Error Statistics The dependent variable can be generalised from n to N Effective sampling is essential to correctly generalise back to our target population
What is Hypothesis Testing? A B A = B Null Hypothesis We also need to establish: 1) How unequal are these observations? 2) Are these observations reflective of the general population? Alternative Hypothesis
Example Hypotheses: Isometric Torque Is there any difference in the length of time that males and females can sustain an isometric muscular contraction? Null Hypothesis Alternative Hypothesis ♂ = ♀ ♂ ♀♂ ♀
Example Hypotheses: Isometric Torque Is there any difference in the length of time that males and females can sustain an isometric muscular contraction? Null Hypothesis (H 0 ) There is not a significant difference in the DV between males and females Alternative Hypothesis (H A ) or experimental (H E ) There is a significant difference in the DV between males and females. n.b. these are 2-tailed hypotheses. Most common and more recommended.
Example Hypotheses: Isometric Torque Is there any difference in the length of time that males and females can sustain an isometric muscular contraction? Useful analogy- the criminal trial Imagine you are the prosecutor H 0 = Defendant not guilty H A = Defendant guilty We must assume that the defendant is innocent until proven guilty.
Example Hypotheses: Isometric Torque Is there any difference in the length of time that males and females can sustain an isometric muscular contraction? Sustained Isometric Torque (seconds) N♂N♂ N♀N♀ n♂n♂ n♀n♀ n.b. This is why effective sampling is so important...
Example Hypotheses: Isometric Torque Is there any difference in the length of time that males and females can sustain an isometric muscular contraction? Sustained Isometric Torque (seconds) N♂N♂ N♀N♀ n♂n♂ n♀n♀ …poor/insufficient sampling can lead to errors…
Statistical Errors Type 1 Errors - Rejecting H 0 when it is actually true -Concluding a difference when one does not actually exist Type 2 Errors - Accepting H 0 when it is actually false (e.g. previous slide) -Concluding no difference when one does exist Errors can occur due to biased/inadequate sampling, poor experimental design or the use of inappropriate/non- parametric tests.
Back to Study Design Independent Measures –Individual scores in each data set are independent of one another Repeated Measures –Individual scores in each data set are dependent/paired/correlated
Back to Study Design Independent Measures –Individual scores in each data set are independent of one another Repeated Measures –Individual scores in each data set are dependent/paired/correlated T O1O1 O2O2 T O1O1 OaOa P Pre-Experimental designs. 2 Distinct Groups Same individuals tested twice
Back to Study Design Independent Measures –Individual scores in each data set are independent of one another Repeated Measures True-Experimental design. Depends on how equivalent groups were achieved O1O1 TO2O2 P O4O4 O3O3 R Random Group Assignment Cross-Over Design
Example Hypotheses: Isometric Torque Is there any difference in the length of time that males and females can sustain an isometric muscular contraction? So the above example is anmeasures design –Which therefore requires an independent t-test. Independent AKA Students’ (Gosset’s) t-test
Sustained Isometric Torque (seconds) n♂n♂ n♀n♀ Independent t-test: Calculation MeanSDn ♀ ♂ Is this a significant effect?
Independent t-test: Calculation MeanSDn ♀ ♂ Step 1: Calculate the Standard Error for Each Mean SEM ♀ = SD/√n = 1.74/5 = SEM ♂ = SD/√n = 1.72/5 = 0.344
Independent t-test: Calculation MeanSDn ♀ ♂ Step 2: Calculate the Standard Error for the difference in means SEMdiff = √ SEM ♀ 2 + SEM ♂ 2 = √ = 0.501
Independent t-test: Calculation MeanSDn ♀ ♂ Step 3: Calculate the t statistic t = (Mean ♀ - Mean ♂ ) / SEMdiff = 2.00
Independent t-test: Calculation MeanSDn ♀ ♂ Step 4: Calculate the degrees of freedom (df) df = (n ♀ - 1) + (n ♂ - 1) = 48
Independent t-test: Calculation MeanSDn ♀ ♂ Step 5: Determine the critical value for t using a t-distribution table Degrees of FreedomCritical t-ratio n.b. Use 0.05 for 2 tailed test
Independent t-test: Calculation MeanSDn ♀ ♂ Step 6 finished: Compare t calculated with t critical Calculated t = 2.00 Critical t = 2.01 Therefore, t calculated < t critical Effect size n.s.
Independent t-test: Calculation MeanSDn ♀ ♂ Interpretation: P > 0.05Reject H A & Accept H O Conclusion: There is not a significant difference in the DV between males and females.
Independent t-test: Calculation MeanSDn ♀ ♂ Evaluation: The wealth of available literature supports that females can sustain isometric contractions longer than males. This may suggest that the findings of the present study represent a type error Possible solution: Increase n
Independent t-test: SPSS Output Swim Data from SPSS session 8 Calculated t df 18 = critical t Ignore sign > So P < 0.05
Repeated Measures Designs As shown earlier, a repeated measures design infers that data in each data set can be paired or correlated with one another An independent t-test is inappropriate to analyse such data Instead, a paired t-test should be used…
Advantages of using Paired Data Data from independent samples is heavily influenced by variance between subjects i.e. This data would have a large SD associated with an independent t-test simply because some subjects performed better than others HOWEVER… Large SD (variance)
Advantages of using Paired Data Data from independent samples is heavily influenced by variance between subjects …using the same participants on two occasions allows us to pair up the data… …now we can remove between subject variance from subsequent analysis…
Paired t-test: Calculation SubjectWeek 1Week 2Diff (D)Diff 2 (D 2 ) ∑D =∑D 2 = Steps 1 & 2: Complete this table
Paired t-test: Calculation ∑D =∑D 2 = Step 3: Calculate the t statistic t = n x ∑ D 2 – (∑D) 2 = √ (n - 1) ∑D
Paired t-test: Calculation ∑D =∑D 2 = Step 3: Calculate the t statistic t = 8 x 137 – (31) 2 = 7.06 √ 7 31
Paired t-test: Calculation Steps 4 & 5: Calculate the df and use a t-distribution table to find t critical Degrees of Freedom Critical t-ratio (0.05 level) df = n -1 Critical t-ratio (0.01 level)
Paired t-test: Calculation Step 6 finished: Compare t calculated with t critical Calculated t = 7.06 Critical t = Therefore, t calculated > t critical Effect size sig. MeanSDn Week Week
Paired t-test: Calculation MeanSDn Week Week Interpretation: P < 0.05Reject H 0 & Accept H A Conclusion: There is a significant difference in the DV between week 1 and week 2.
Paired t-test: SPSS Output Push-up Data from lecture 3 Calculated t df 7 = critical t (0.05) (0.01) Ignore sign > So P < 0.01
Parametric versus Non-Parametric Both the t-tests just shown are parametric tests These examine for differences in the mean Therefore the mean must be an accurate descriptor NormalNon-normal ?
Example Hypotheses: Isometric Torque Is there any difference in the length of time that males and females can sustain an isometric muscular contraction? Sustained Isometric Torque (seconds) Normal Distribution mean is appropriate t-test Mean A Mean B
Example Hypotheses: Isometric Torque Is there any difference in the length of time that males and females can sustain an isometric muscular contraction? Sustained Isometric Torque (seconds) NON-Normal Distribution mean is INappropriate Mean A Mean B Type 2 error
…assumptions of parametric analyses All means and paired differences are ND (this is the main consideration) N acquired through random sampling Data must be of at least the interval LOM Data must be Continuous. …but see Norman (2010) Adv. Health Sci. Educ.
Non-Parametric Tests These tests use the median and do not assume anything about distribution, i.e. ‘distribution free’ Mathematically, value is ignored (i.e. the magnitude of differences are not compared) Instead, data is analysed simply according to rank.
Non-Parametric Tests Independent Measures –Mann-Whitney Test Repeated Measures –Wilcoxon Test e.g. Exam grades (ordinal) from 14 students in 2 separate schools
Mann-Whitney U: Calculation Step 1: Rank all the data from both groups in one series, then total each Student School ASchool B Student Grade Rank J. S. L. D. H. L. M. J. T. M. T. S. P. H. T. J. M. M. K. S. P. S. R. M. P. W. A. F. B- B- A+ D- B+ A- F D C+ C+ B- E C- A- Median = B-; Median = C+; ∑R A = ∑R B =
Mann-Whitney U: Calculation Step 2: Calculate two versions of the U statistic using: Median = B-; Median = C+; ∑R A = ∑R B = U 1 = (n A x n B ) + 2 (n A + 1) x n A - ∑R A AND… U 2 = (n A x n B ) + 2 (n B + 1) x n B - ∑R B
Mann-Whitney U: Calculation Step 2: Calculate two versions of the U statistic using: Median = B-; Median = C+; ∑R A = ∑R B = U 1 = (n A x n B ) + 2 (n A + 1) x n A - ∑R A …OR to save time you can calculate U 1 and then U 2 as follows U 2 = (n A x n B ) - U 1
Mann-Whitney U: Calculation Step 3 finished: Select the smaller of the two U statistics (U 1 = 17.5; U 2 = 31.5) …now consult a table of critical values for the Mann-Whitney test n Calculated U must be less than critical U to conclude a significant difference Conclusion Median A = Median B
Mann-Whitney U: SPSS Output Calculated U (lower value) 17.5 > 8 So P > 0.05 n.s.
Non-Parametric Tests Independent Measures –Mann-Whitney Test Repeated Measures –Wilcoxon Test e.g. One group pre-test post-test, assumed non-normal
Wilcoxon Signed Ranks: Calculation Step 1: Rank all the differences in one series (ignoring signs), then total each Athlete Pre-training OBLA (kph) Rank J. S. L. D. H. L. M. J. T. M. T. S. P. H ∑Signed Ranks = Post-training OBLA (kph) Diff. Signed Ranks Medians =
Wilcoxon Signed Ranks: Calculation Step 2: The smaller of the T values is our test statistic (T+ = 18; T- = 10) …now consult a table of critical values for the Wilcoxon test n Calculated T must be less than critical T to conclude a significant difference Conclusion Median A = Median B
Wilcoxon Signed Ranks: SPSS Output 10 > 2 So P > 0.05 n.s.
So which stats test should you use? Q1. What is the LOM? Ordinal Nominal Interval/Ratio Q2. Are the data ND? No Yes Q3. Are the data paired or independent?
Why do we use Hypothesis Testing? It is easy (i.e. data in P value out) It provides the ‘Illusion of Scientific Objectivity’ Everybody else does it.
Problems with Hypothesis Testing? P<0.05 is an arbitrary probability (P<0.06?) The size of the effect is not expressed The variability of this effect is not expressed Overall, hypothesis testing ignores ‘judgement’.