Central Locations and Variability RAISHEINE JOYCE DALMACIO MS BIO I
Concept and Importance: Central tendency A single value that represents the whole population or a sample of particular characteristics. Value or characteristics that fall in or near the middle. Originated from the concept of “average man” A number of ways have been developed for the measurement of central representative value(s).
StationLHCLSCDCDCAOther AlgaeAbiotic OtherBiota OTAOTDOTL Mainland Island
Measures of Central Tendency
Mean ArithmeticGeometricHarmonic -if data are in numerical series Used when: the data are not skewed (no extreme outliers) the individual data points are not dependent on each other -if data are in geometric series Used when: the data are inter-related–for example, when discussing returns on investment or interest rates. No zero values -reciprocal of arithmetic series Used when: a large population where the majority of the values are distributed uniformly but where there are a few outliers with significantly higher values No zero values
StationLHCLSCDCDCAOther AlgaeAbiotic OtherBiota OTAOTDOTL Mainland Mean (Arithmetic) Mean (Geometric)17.05#NUM! #NUM!1.94#NUM! Mean (Harmonic)14.44#NUM! #NUM!0.73#NUM! Island Mean (Arithmetic) Mean (Geometric) #NUM! 9.50#NUM! Mean (Harmonic) #NUM! 4.78#NUM!
Median and Mode After sorting and arranging a data set as an array, the value that falls right in the middle of the scale is called the median. Mode is defined as the value that appears most frequently in a given set of data.
StationLHCLSCDCDCAOther AlgaeAbiotic OtherBiota OTAOTDOTL Mainland Median Island Median
Conclusion Based on the results for the arithmetic, geometric, harmonic means, and median the best measure of central tendency to use would be the arithmetic mean because of the multiple zero values and also because we only have 4 values for each data set, making the median a quite inaccurate measure. By comparing the arithmetic means of each station, LHCLSCDCDCAOther AlgaeAbioticOTAOTDOTL Mainland Island
Measures of Variability
Variability implies how the observations are either scattered all over or clustered around the central location the basis for comparison, without which the definition of statistics is incomplete measured by using various parameters
Range- difference between the largest and the smallest observations in a set of data. Interquartile range- difference between the third and the first quartiles. Mean deviation- dispersion of data is measured more comprehensively considering all the deviations of observations from the central location. Variance and standard deviation- average of the squared deviations. Positive square root of variance, which is called standard deviation (SD), is used for the presentation purpose to express variation of a particular mean.
StationLHCLSCDCDCAOther AlgaeAbiotic OtherBiota OTAOTDOTL Mainland Range Mean Deviation InterQuartile Range Variance Standard Deviation Island Range Mean Deviation InterQuartile Range Variance Standard Deviation
Comparison of Standard Deviations LHCLSCDCDCA Other Algae AbioticOTAOTDOTL Mainland Island
Basics of Hypothesis Formulation and Testing
Null hypothesis (H 0 ) assumes that there is no difference between the new and old ones. Alternate hypothesis (H A ) assumes that the new idea is better or true and goes against the traditional belief. In statistical procedure, null hypothesis is tested, not the alternative hypothesis. (diagram) Significance level- probability (P) of occurrence of any event by chance or random error. Confidence level, limits, and interval- When concluding that any hypothesis is true or false, there is a certain level of confidence. In most biological research, a confidence level of 95% is considered sufficient. Any mean has two confidence limits: the lower limit (LL) and the upper limit (UL) for a given level of confidence. The difference between the two limits is called confidence interval (CI). Statistical and biological significance
Errors in hypothesis testingSelection of statistical tools
Test of goodness-of-fit Test for normal distribution (normality test): The normality test is the gateway test which determines whether we should choose a parametric or nonparametric test. If the collected data are normally distributed, then parametric tests are used for hypothesis testing; but, if they are not normally distributed, then we have to either normalize them by using data transformation methods or use nonparametric tests instead. Normally, the x2 -test or K–S test is used to determine whether the data set is normally distributed or not.
Application We are given the percentages of live hard corals found in 12 stations each for the mainland and islands. We are to compare the amount of live hard corals in the two categories and make a conclusion. First we establish our null and alternative hypotheses. H 0 = “There is no significant difference between the amount of live hard corals in the mainland and island.“ H A = “There is a significant difference between the amount of live hard corals in the mainland and island.“ Here is the raw data for the experiment: StationLHC Mainland StationLHC Island
Application (continued)
Degrees of FreedomProbability, p
Experimental Designs and Analysis of Variance
ANOVA When a hypothesis is tested by comparing variances after partitioning, the method is called analysis of variance. More specifically, the effect of any factor is considered significant if the variance of a treatment is higher than the variance among the replicates.
Completely Randomized Design (CRD) Used to study the effects of one factor, i.e. treatment or fixed factor, keeping others constant; therefore, it is often called a single- factor experiment. All of the experimental units should be uniform, and the types of selected factors (treatments) are randomly assigned to the experimental units. Can be done by using a lottery system, random numbers/table, or any other method. Before randomizing, we need to determine the required total number of experimental units (n). If there are “t” different treatments of a single factor and the treatments are replicated “r” times, then: Total experimental units (n) = t × r
Treatment combinations or experimental design for CRD.
The following equation represents the mathematical model for CRD:
ANOVA table for CRD. 1. Group the data by treatments and calculate the treatment totals (T), grand total (G), grand mean, and coefficient of variation (CV). 2. Using the number of treatments (t) and the number of replications (r), determine the df for each source of variation. 3. Construct an outline/table of ANOVA as shown. 4. Using X i to represent the measurement of the ith plot, T I as the total of the ith treatment, and n as the total number of experimental plots [i.e. n = rt], calculate the correction factor (CF) and the various sums of square (SS). 5. Calculate the mean square (MS) for each source of variation by dividing SS by their corresponding df. 6. Calculate the F-value (R.A. Fisher) for testing significance of the treatment difference, i.e. mean square of treatment divided by the mean square error (F = MST/MSE). 7. Enter all of the computed values in the ANOVA table. 8. Obtain the tabular F-values using: f 1 = treatment df = (t − 1) f 2 = error df = t (r − 1) and compare for conclusion. Basis of Conclusions to be Made. Parametric test: One-way Anova
Nonparametric test: Kruskal-Wallis test (H- test) Nonparametric tests are similar to parametric tests for ANOVA; but, they use ranks rather than the original data for analysis. Therefore, they are also called “ANOVA by ranks.” When the samples are not from normally distributed data or the variances are heterogeneous, ranks are assigned to the observation for analysis. As in parametric tests, the Kruskal-Wallis test only determines whether there is an effect by a factor, but it doesn’t compare among the means. A nonparametric method has also been developed for the purpose of multiple comparisons.
Randomized Complete Block Design The randomized complete block design (RCBD) is probably the most widely used design because, in reality, it is difficult to find all identical or uniform experimental units in the field of aquaculture, especially in outdoor ponds. Some of them are closer to or separated by canals, roads, shade, etc. Even when using cages, some of them are closer to dikes, whereas others can be far away. Similarly, few rows of indoor tanks can be in a darker area, whereas others can be in brighter areas.
Randomized Complete Block Design These factors can have large effects on response variables, but these effects can neither be avoided nor even minimized to negligible levels. In such cases, the only option is to separate their effects while designing the experiment by blocking. The experimental units that are thought to be uniform are considered one block. Blocking minimizes the random error by separating the experimental/random error, thereby maximizing the chance of treatment effects becoming significant. However, care should be taken while designing the experiment. All of the treatments have to be included in each block.
Randomized Complete Block Design