Central Locations and Variability RAISHEINE JOYCE DALMACIO MS BIO I.

Slides:



Advertisements
Similar presentations
I OWA S TATE U NIVERSITY Department of Animal Science Using Basic Graphical and Statistical Procedures (Chapter in the 8 Little SAS Book) Animal Science.
Advertisements

Statistical Techniques I EXST7005 Start here Measures of Dispersion.
BPS - 5th Ed. Chapter 241 One-Way Analysis of Variance: Comparing Several Means.
CHAPTER 25: One-Way Analysis of Variance Comparing Several Means
CHAPTER 25: One-Way Analysis of Variance: Comparing Several Means ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner.
KRUSKAL-WALIS ANOVA BY RANK (Nonparametric test)
Measures of Dispersion
© 2010 Pearson Prentice Hall. All rights reserved The Complete Randomized Block Design.
Topic 2: Statistical Concepts and Market Returns
Final Review Session.
Lecture 9: One Way ANOVA Between Subjects
Biostatistics Unit 2 Descriptive Biostatistics 1.
Edpsy 511 Homework 1: Due 2/6.
Inference about a Mean Part II
Chapter 2 Simple Comparative Experiments
Experimental Evaluation
Inferences About Process Quality
July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 4 Summarizing Data.
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
Analysis of Variance. ANOVA Probably the most popular analysis in psychology Why? Ease of implementation Allows for analysis of several groups at once.
Measurement Tools for Science Observation Hypothesis generation Hypothesis testing.
Statistical Analysis Statistical Analysis
QNT 531 Advanced Problems in Statistics and Research Methods
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
Quantitative Skills: Data Analysis
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
Measures of Central Tendency and Dispersion Preferred measures of central location & dispersion DispersionCentral locationType of Distribution SDMeanNormal.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Basic concept Measures of central tendency Measures of central tendency Measures of dispersion & variability.
Describing Data Using Numerical Measures. Topics.
Skewness & Kurtosis: Reference
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
INVESTIGATION 1.
Nonparametric Tests IPS Chapter 15 © 2009 W.H. Freeman and Company.
Measures of Central Tendency. These measures indicate a value, which all the observations tend to have, or a value where all the observations can be assumed.
1 ANALYSIS OF VARIANCE (ANOVA) Heibatollah Baghi, and Mastee Badii.
Copyright © Cengage Learning. All rights reserved. 12 Analysis of Variance.
Introduction to Basic Statistical Tools for Research OCED 5443 Interpreting Research in OCED Dr. Ausburn OCED 5443 Interpreting Research in OCED Dr. Ausburn.
Chapter Eight: Using Statistics to Answer Questions.
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/19.
Statistical analysis Why?? (besides making your life difficult …)  Scientists must collect data AND analyze it  Does your data support your hypothesis?
IE241: Introduction to Design of Experiments. Last term we talked about testing the difference between two independent means. For means from a normal.
Quality Control: Analysis Of Data Pawan Angra MS Division of Laboratory Systems Public Health Practice Program Office Centers for Disease Control and.
Applied Quantitative Analysis and Practices LECTURE#07 By Dr. Osman Sadiq Paracha.
Chapters Way Analysis of Variance - Completely Randomized Design.
Descriptive Statistics(Summary and Variability measures)
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Introduction Dispersion 1 Central Tendency alone does not explain the observations fully as it does reveal the degree of spread or variability of individual.
 List the characteristics of the F distribution.  Conduct a test of hypothesis to determine whether the variances of two populations are equal.  Discuss.
PCB 3043L - General Ecology Data Analysis Organizing an ecological study What is the aim of the study? What is the main question being asked? What are.
Descriptive and Inferential Statistics
Virtual University of Pakistan
Statistical analysis.
ESTIMATION.
Two-Sample Hypothesis Testing
Statistical Data Analysis - Lecture /04/03
Statistical analysis.
Comparing Three or More Means
PCB 3043L - General Ecology Data Analysis.
Basic Practice of Statistics - 5th Edition
Chapter 2 Simple Comparative Experiments
Description of Data (Summary and Variability measures)
Chapter Nine: Using Statistics to Answer Questions
1-Way Analysis of Variance - Completely Randomized Design
Presentation transcript:

Central Locations and Variability RAISHEINE JOYCE DALMACIO MS BIO I

Concept and Importance: Central tendency A single value that represents the whole population or a sample of particular characteristics. Value or characteristics that fall in or near the middle. Originated from the concept of “average man” A number of ways have been developed for the measurement of central representative value(s).

StationLHCLSCDCDCAOther AlgaeAbiotic OtherBiota OTAOTDOTL Mainland Island

Measures of Central Tendency

Mean ArithmeticGeometricHarmonic -if data are in numerical series Used when: the data are not skewed (no extreme outliers) the individual data points are not dependent on each other -if data are in geometric series Used when: the data are inter-related–for example, when discussing returns on investment or interest rates. No zero values -reciprocal of arithmetic series Used when: a large population where the majority of the values are distributed uniformly but where there are a few outliers with significantly higher values No zero values

StationLHCLSCDCDCAOther AlgaeAbiotic OtherBiota OTAOTDOTL Mainland Mean (Arithmetic) Mean (Geometric)17.05#NUM! #NUM!1.94#NUM! Mean (Harmonic)14.44#NUM! #NUM!0.73#NUM! Island Mean (Arithmetic) Mean (Geometric) #NUM! 9.50#NUM! Mean (Harmonic) #NUM! 4.78#NUM!

Median and Mode After sorting and arranging a data set as an array, the value that falls right in the middle of the scale is called the median. Mode is defined as the value that appears most frequently in a given set of data.

StationLHCLSCDCDCAOther AlgaeAbiotic OtherBiota OTAOTDOTL Mainland Median Island Median

Conclusion Based on the results for the arithmetic, geometric, harmonic means, and median the best measure of central tendency to use would be the arithmetic mean because of the multiple zero values and also because we only have 4 values for each data set, making the median a quite inaccurate measure. By comparing the arithmetic means of each station, LHCLSCDCDCAOther AlgaeAbioticOTAOTDOTL Mainland Island

Measures of Variability

Variability implies how the observations are either scattered all over or clustered around the central location the basis for comparison, without which the definition of statistics is incomplete measured by using various parameters

Range- difference between the largest and the smallest observations in a set of data. Interquartile range- difference between the third and the first quartiles. Mean deviation- dispersion of data is measured more comprehensively considering all the deviations of observations from the central location. Variance and standard deviation- average of the squared deviations. Positive square root of variance, which is called standard deviation (SD), is used for the presentation purpose to express variation of a particular mean.

StationLHCLSCDCDCAOther AlgaeAbiotic OtherBiota OTAOTDOTL Mainland Range Mean Deviation InterQuartile Range Variance Standard Deviation Island Range Mean Deviation InterQuartile Range Variance Standard Deviation

Comparison of Standard Deviations LHCLSCDCDCA Other Algae AbioticOTAOTDOTL Mainland Island

Basics of Hypothesis Formulation and Testing

Null hypothesis (H 0 ) assumes that there is no difference between the new and old ones. Alternate hypothesis (H A ) assumes that the new idea is better or true and goes against the traditional belief. In statistical procedure, null hypothesis is tested, not the alternative hypothesis. (diagram) Significance level- probability (P) of occurrence of any event by chance or random error. Confidence level, limits, and interval- When concluding that any hypothesis is true or false, there is a certain level of confidence. In most biological research, a confidence level of 95% is considered sufficient. Any mean has two confidence limits: the lower limit (LL) and the upper limit (UL) for a given level of confidence. The difference between the two limits is called confidence interval (CI). Statistical and biological significance

Errors in hypothesis testingSelection of statistical tools

Test of goodness-of-fit Test for normal distribution (normality test): The normality test is the gateway test which determines whether we should choose a parametric or nonparametric test. If the collected data are normally distributed, then parametric tests are used for hypothesis testing; but, if they are not normally distributed, then we have to either normalize them by using data transformation methods or use nonparametric tests instead. Normally, the x2 -test or K–S test is used to determine whether the data set is normally distributed or not.

Application We are given the percentages of live hard corals found in 12 stations each for the mainland and islands. We are to compare the amount of live hard corals in the two categories and make a conclusion. First we establish our null and alternative hypotheses. H 0 = “There is no significant difference between the amount of live hard corals in the mainland and island.“ H A = “There is a significant difference between the amount of live hard corals in the mainland and island.“ Here is the raw data for the experiment: StationLHC Mainland StationLHC Island

Application (continued)

Degrees of FreedomProbability, p 

Experimental Designs and Analysis of Variance

ANOVA When a hypothesis is tested by comparing variances after partitioning, the method is called analysis of variance. More specifically, the effect of any factor is considered significant if the variance of a treatment is higher than the variance among the replicates.

Completely Randomized Design (CRD) Used to study the effects of one factor, i.e. treatment or fixed factor, keeping others constant; therefore, it is often called a single- factor experiment. All of the experimental units should be uniform, and the types of selected factors (treatments) are randomly assigned to the experimental units. Can be done by using a lottery system, random numbers/table, or any other method. Before randomizing, we need to determine the required total number of experimental units (n). If there are “t” different treatments of a single factor and the treatments are replicated “r” times, then: Total experimental units (n) = t × r

Treatment combinations or experimental design for CRD.

The following equation represents the mathematical model for CRD:

ANOVA table for CRD. 1. Group the data by treatments and calculate the treatment totals (T), grand total (G), grand mean, and coefficient of variation (CV). 2. Using the number of treatments (t) and the number of replications (r), determine the df for each source of variation. 3. Construct an outline/table of ANOVA as shown. 4. Using X i to represent the measurement of the ith plot, T I as the total of the ith treatment, and n as the total number of experimental plots [i.e. n = rt], calculate the correction factor (CF) and the various sums of square (SS). 5. Calculate the mean square (MS) for each source of variation by dividing SS by their corresponding df. 6. Calculate the F-value (R.A. Fisher) for testing significance of the treatment difference, i.e. mean square of treatment divided by the mean square error (F = MST/MSE). 7. Enter all of the computed values in the ANOVA table. 8. Obtain the tabular F-values using: f 1 = treatment df = (t − 1) f 2 = error df = t (r − 1) and compare for conclusion. Basis of Conclusions to be Made. Parametric test: One-way Anova

Nonparametric test: Kruskal-Wallis test (H- test) Nonparametric tests are similar to parametric tests for ANOVA; but, they use ranks rather than the original data for analysis. Therefore, they are also called “ANOVA by ranks.” When the samples are not from normally distributed data or the variances are heterogeneous, ranks are assigned to the observation for analysis. As in parametric tests, the Kruskal-Wallis test only determines whether there is an effect by a factor, but it doesn’t compare among the means. A nonparametric method has also been developed for the purpose of multiple comparisons.

Randomized Complete Block Design The randomized complete block design (RCBD) is probably the most widely used design because, in reality, it is difficult to find all identical or uniform experimental units in the field of aquaculture, especially in outdoor ponds. Some of them are closer to or separated by canals, roads, shade, etc. Even when using cages, some of them are closer to dikes, whereas others can be far away. Similarly, few rows of indoor tanks can be in a darker area, whereas others can be in brighter areas.

Randomized Complete Block Design These factors can have large effects on response variables, but these effects can neither be avoided nor even minimized to negligible levels. In such cases, the only option is to separate their effects while designing the experiment by blocking. The experimental units that are thought to be uniform are considered one block. Blocking minimizes the random error by separating the experimental/random error, thereby maximizing the chance of treatment effects becoming significant. However, care should be taken while designing the experiment. All of the treatments have to be included in each block.

Randomized Complete Block Design