Statistics in Science Sample Size Determination for Efficient use of Resources PGRM 3.5
Statistics in Science 4 determining factors: A – D Variability of experimental material Expressed as either (i) the Standard deviation (SD) of the response (ii) the CV (= 100*SD/Mean). The CV for biological responses is often in range 10-30%. Size of difference expected (d) Based on (i) Knowledge of similar work. (ii) Knowledge of the science (iii) economically important difference? A B
Statistics in Science Estimating the SD & CV (for A) From analysis of similar data: SD is estimated by √MSE From the literature: SEM (SE of mean) = √(MSE/r) so SD = SEM × √r Example: Chowdhury and Rosario (1994) J. Agric. Sci. Camb 122, Randomised block with 5 blocks (r=5) SEM for Dry matter yield = SD estimated by × 5 = Mean ≈ 5 so CV ≈ 100 × 0.460/5 = 9.2%
Statistics in Science Estimating the SD & CV (contd) Example: Wayne el al. (1999).J. Ecol 87, Replication = 6 SED for reproductive weight per stand is Recall! SED = √2 × SEM soSD = √(r/2) × SED SD estimated by × √(6/2) = 0.51 Mean ≈ 1.2 CV ≈ 100 × 0.51/1.2 = 42.5%
Statistics in Science More determining factors: C & D Criteria for rejecting the null hypothesis Significance Level = Probability of rejecting the null when it is true. (ie concluding there is a difference when there is not) Recall: rejecting when p < 0.05 gives significance level 0.05 Typical levels : 0.05, 0.01, Power = Probability of concluding there is a difference when there is one of size d Typical levels: , 0.95 D C
Statistics in Science Calculation of replicates per treatment Fixing significance at 0.05, and power at 80% To detect a d% difference the required replication per treatment is: r = 16(CV/d) 2 Example CV = 15%, d = 10%, r = 16 (15/10) 2 = 36
Statistics in Science Review of Resource Use To see how precise an experiment actually was the formula above can be rewritten as d = 4 CV/r to give d = 4 SEM% (= 2.82 SED%) where SEM (SED) are expressed as % of the overall mean. Example follows:
Statistics in Science Example: Review of resource use Suppose an experiment with two treatments has the following result Treatment 1 2SEM The grand mean is 12.3 and the SEM as a percentage of that is 8.9%. The formula says that a real underlying difference between treatments of size 4 x 8.9% = 35.6% would have about an 80% chance of being detected at the 5% level in this experiment.