Steps in Statistical Testing: 1) State the null hypothesis (Ho) and the alternative hypothesis (Ha). 2) Choose an acceptable and appropriate level of significance (a) and sample size (n) for your particular study design. 3) Determine the appropriate statistical technique and corresponding test statistic. 4) Collect the data and compute the value of the test statistic. 5) Calculate the number of degrees of freedom for the data set. 6) Compare the value of the test statistic with the critical values in a statistical table for the appropriate distribution and using the correct degrees of freedom. 7) Make a statistical decision and express the statistical decision in terms of the problem under study.
P values What is "statistical significance" (p-value). What is "statistical significance" (p-value). The statistical significance of a result is the probability that the observed relationship (e.g., between variables) or a difference (e.g., between means) in a sample occurred by pure chance ("luck of the draw"), and that in the population from which the sample was drawn, no such relationship or differences exist. Using less technical terms, one could say that the statistical significance of a result tells us something about the degree to which the result is "true" (in the sense of being "representative of the population"). The statistical significance of a result is the probability that the observed relationship (e.g., between variables) or a difference (e.g., between means) in a sample occurred by pure chance ("luck of the draw"), and that in the population from which the sample was drawn, no such relationship or differences exist. Using less technical terms, one could say that the statistical significance of a result tells us something about the degree to which the result is "true" (in the sense of being "representative of the population"). Specifically, the p-value represents the probability of error that is involved in accepting our observed result as valid, that is, as "representative of the population." For example, a p-value of.05 (i.e.,1/20) indicates that there is a 5% probability that the relation between the variables found in our sample is a "fluke." Specifically, the p-value represents the probability of error that is involved in accepting our observed result as valid, that is, as "representative of the population." For example, a p-value of.05 (i.e.,1/20) indicates that there is a 5% probability that the relation between the variables found in our sample is a "fluke."
Why sample size matters If there are very few observations, then there are also respectively few possible combinations of the values of the variables, and thus the probability of obtaining by chance a combination of those values indicative of a strong relation is relatively high. If there are very few observations, then there are also respectively few possible combinations of the values of the variables, and thus the probability of obtaining by chance a combination of those values indicative of a strong relation is relatively high. Consider the following illustration. If we are interested in two variables (Gender: male/female and white cell count (WCC): high/low) and there are only four subjects in our sample (two males and two females), then the probability that we will find, purely by chance, a 100% relation between the two variables can be as high as one-eighth. Consider the following illustration. If we are interested in two variables (Gender: male/female and white cell count (WCC): high/low) and there are only four subjects in our sample (two males and two females), then the probability that we will find, purely by chance, a 100% relation between the two variables can be as high as one-eighth. Specifically, there is a one-in-eight chance that both males will have a high WCC and both females a low WCC, or vice versa. Now consider the probability of obtaining such a perfect match by chance if our sample consisted of 100 subjects; the probability of obtaining such an outcome by chance would be practically zero. Specifically, there is a one-in-eight chance that both males will have a high WCC and both females a low WCC, or vice versa. Now consider the probability of obtaining such a perfect match by chance if our sample consisted of 100 subjects; the probability of obtaining such an outcome by chance would be practically zero.
sample size example Example. "Baby boys to baby girls ratio." Consider the following example from research on statistical reasoning (Nisbett, et al., 1987). There are two hospitals: in the first one, 120 babies are born every day, in the other, only 12. On average, the ratio of baby boys to baby girls born every day in each hospital is 50/50. Example. "Baby boys to baby girls ratio." Consider the following example from research on statistical reasoning (Nisbett, et al., 1987). There are two hospitals: in the first one, 120 babies are born every day, in the other, only 12. On average, the ratio of baby boys to baby girls born every day in each hospital is 50/50. However, one day, in one of those hospitals twice as many baby girls were born as baby boys. In which hospital was it more likely to happen? The answer is obvious for a statistician, but as research shows, not so obvious for a lay person: It is much more likely to happen in the small hospital. The reason for this is that technically speaking, the probability of a random deviation of a particular size (from the population mean), decreases with the increase in the sample size. However, one day, in one of those hospitals twice as many baby girls were born as baby boys. In which hospital was it more likely to happen? The answer is obvious for a statistician, but as research shows, not so obvious for a lay person: It is much more likely to happen in the small hospital. The reason for this is that technically speaking, the probability of a random deviation of a particular size (from the population mean), decreases with the increase in the sample size.
What are degrees of freedom? The number of degrees of freedom is the number of values in the final calculation of a statistic that are free to vary. The number of degrees of freedom is the number of values in the final calculation of a statistic that are free to vary. Estimates of statistical parameters can be based upon different amounts of information or data. The number of independent pieces of information that go into the estimate of a parameter is called the degrees Estimates of statistical parameters can be based upon different amounts of information or data. The number of independent pieces of information that go into the estimate of a parameter is called the degrees A data set contains a number of observations, say, n. They constitute n individual pieces of information. These pieces of information can be used to estimate either parameters or variability. In general, each item being estimated costs one degree of freedom. The remaining degrees of freedom are used to estimate variability. All we have to do is count properly. A single sample: There are n observations. There's one parameter (the mean) that needs to be estimated. That leaves n-1 degrees of freedom for estimating variability. Two samples: There are n1+n2 observations. There are two means to be estimated. That leaves n1+n2-2 degrees of freedom for estimating variability. One-way ANOVA with g groups: There are n1+..+ng observations. There are g means to be estimated. That leaves n1+..+ng-g degrees of freedom for estimating variability. This accounts for the denominator degrees of freedom for the F statistic.
Roadmap for next 5 weeks Type of Data Collected Goal of the StudyFrom Normal distribution Rank, Score, or Non-normal Distribution Binomial (2 outcomes) Describe one populationMean, Std DevMedian, Interquartile rangeProportion Compare pop. to hypothetical value One-sample t TestWilcoxon TestChi-square or Binomial Test Compare 2 unpaired groupsUnpaired t TestMann-Whitney TestFisher's Test or Chi-square* Compare >2 unmatched groupsOne-way ANOVAKruskal-Wallis TestChi-square Test Compare >2 matched groupsRepeated-measures ANVOAFriedman Test Association b/w 2 variablesPearson correlationSpearman correlation Predict value from a measured value Simple linear regressionNonparametric regression Predict value from several measured values Multiple linear Regression
Nonparametric Tests Categorical Data Categorical Data Contingency table- examine the relationship between 2 categorical variables Contingency table- examine the relationship between 2 categorical variables Chi Square Test Chi Square Test McNemar’s Test- analysis of paired categorical variables, disagreement or change McNemar’s Test- analysis of paired categorical variables, disagreement or change Mantel-Haenszel Test- relationship between 2 variables accounting for influence by a 3 rd variable Mantel-Haenszel Test- relationship between 2 variables accounting for influence by a 3 rd variable Parametric Alternatives Mann-Whitney U or KS- compare 2 independent groups, alternative to a t test Wilcoxon Sign Test- alternative to 1 sample t test Kruskal-Wallis- compare 2 or more independent groups alternative to ANOVA Friedman Test- compare 2 or more related groups alternative to repeated measures ANOVA Spearman’s Rho- association between 2 variables, alternative to pearson’s r
Basic Methodology Many nonparametric procedures are based on ranked data. Data are ranked by ordering them from lowest to highest and assigning them, in order, the integer values from 1 to the sample size. Many nonparametric procedures are based on ranked data. Data are ranked by ordering them from lowest to highest and assigning them, in order, the integer values from 1 to the sample size. Ties are resolved by assigning tied values the mean of the ranks they would have received if there were no ties, e.g., 117, 119, 119, 125, 128 becomes 1, 2.5, 2.5, 4, 5. (If the two 119s were not tied, they would have been assigned the ranks 2 and 3. The mean of 2 and 3 is 2.5.) Ties are resolved by assigning tied values the mean of the ranks they would have received if there were no ties, e.g., 117, 119, 119, 125, 128 becomes 1, 2.5, 2.5, 4, 5. (If the two 119s were not tied, they would have been assigned the ranks 2 and 3. The mean of 2 and 3 is 2.5.)
Contingency Table Analyze the association between 2 categorical values Analyze the association between 2 categorical values R possible responses x C possible categories R possible responses x C possible categories Uses counts (frequencies) in a crosstab table and perform Chi Square test Uses counts (frequencies) in a crosstab table and perform Chi Square test Chi square statistic compares observed counts with expected counts if there were no association Chi square statistic compares observed counts with expected counts if there were no association May combine categories before analysis to increase counts May combine categories before analysis to increase counts Ho: There is no association between the 2 variables Ho: There is no association between the 2 variables Ha: the 2 variables are associated. Ha: the 2 variables are associated.
How is Pearsonian chi-square calculated for tabular data? Calculation of this form of chi-square requires four steps: Computing the expected frequencies. For each cell in the table, the expected frequency must be calculated. The expected frequency for a given column is the column total divided by n, the sample size. The expected frequency for a given row is the row total divided by n. The expected frequency for a cell is the column expectation times the row expectation times n. This formula reduces to the expected frequency for a given cell equaling its row total times its column total, divided by n. Application of the chi-square formula. Let O be the observed value of each cell in a table. Let E be the expected value calculated in the previous step. For each cell, subtract E from O, then square the result and then divide by E. Do this for every cell and sum all the results. This is the chi-square value for the table. If Yates' correction for continuity is to be applied, due to cell counts below 5, the calculation is the same except for each cell, subtract an additional.5 from the difference of O - E, prior to squaring and then dividing by E. This reduces the size of the calculated chi-square value, making a finding of significance less likely -- a penalty deemed appropriate for tables with low counts in some cells.
Contingency Table General notation for a 2 x 2 contingency table. Variable Data type 1 Data type 2 Totals Category 1 aba + b Category 2 cdc + d Totala + cb + da + b + c + d = N For a 2 x 2 contingency table the Chi Square statistic is calculated by the formula:
Contingency Table Results Table 2. Number of animals that survived a treatment. Dead AliveTotal Treated Not treated Total Applying the formula above we get: Chi square = 105[(36)(25) - (14)(30)]2 / (50)(55)(39)(66) = We now have our chi square statistic (x2 = 3.418), our predetermined alpha level of significalnce (0.05), and our degrees of freedom (df =1). Entering the Chi square distribution table with 1 degree of freedom and reading along the row we find our value of x2 (3.418) lies between and The corresponding probability is 0.10<P<0.05. This is below the conventionally accepted significance level of 0.05 or 5%, so the null hypothesis that the two distributions are the same is verified. probability level (alpha) Df
McNemars Test Paired values Paired values before and after before and after Outcomes Outcomes + before and + after + before and + after + before and – after + before and – after - before and – after - before and – after - before and + after - before and + after Ho: The probability of a species having + before and – after is = to the probability of a species having – before and + after Ho: The probability of a species having + before and – after is = to the probability of a species having – before and + after Ha: The probability of a species having + before and – after is ≠ to the probability of a species having – before and + after Ha: The probability of a species having + before and – after is ≠ to the probability of a species having – before and + after
Mantel-Haenszel Comparison Often used to pool results from multiple contingency tables Often used to pool results from multiple contingency tables Detect influence of a third variable on 2 dichotomous variables Detect influence of a third variable on 2 dichotomous variables Ho: controlling for (or within forests) there is no relationship between dbh and tree age Ho: controlling for (or within forests) there is no relationship between dbh and tree age Ha: controlling for (or within forests) there is a relationship between dbh and tree age Ha: controlling for (or within forests) there is a relationship between dbh and tree age
Mann Whitney U and Kruskal Wallis 2 samples must be independent 2 samples must be independent Advantages over parametric t tests or ANOVA Advantages over parametric t tests or ANOVA Sample sizes are small and normality is questionable Sample sizes are small and normality is questionable Data contain extreme outliers Data contain extreme outliers Data are ordinal Data are ordinal Ho: The groups have the same distribution Ho: The groups have the same distribution Ha: The groups do not have the same distribution Ha: The groups do not have the same distribution Kruskal Wallis Kruskal Wallis Ho: There are no differences in the distributions of the groups Ho: There are no differences in the distributions of the groups Ha: There are differences in the distributions of the groups Ha: There are differences in the distributions of the groups
Mann Whitney and Kruskal Wallis To calculate the value of Mann- Whitney U test, we use the following formula: To calculate the value of Mann- Whitney U test, we use the following formula: Where: Where: U=Mann-Whitney U test U=Mann-Whitney U test N1=sample size one N1=sample size one N2= Sample size two N2= Sample size two Ri = Rank of the sample size Ri = Rank of the sample size 1. Arrange the data of both samples in a single series in ascending order. 2. Assign rank to them in ascending order. In the case of a repeated value, assign ranks to them by averaging their rank position. 3. Once this is complete, ranks of the different samples are separated and summed up as R1 R2 R3, etc. 4. To calculate the value of Kruskal- Wallis test, apply the following formula: Where, H = Kruskal- Wallis test n = total number of observations in all samples Ri = Rank of the sample
KW Example Raw MeasuresRanked Measures ABCABC A, B, C Combined sum of ranks average of ranks
Sign test and Wilcoxon Rank Sign Paired values Paired values Before and after Before and after Stimulus and response Stimulus and response Counts number of positive and negative differences Counts number of positive and negative differences WR Absolute values of differences WR Absolute values of differences Ho: The probability of a + difference is = to the probability of a – difference Ha: The probability of a + difference is ≠ to the probability of a – difference
Spearman’s Rho Measures strength of an increasing or decreasing relationship between 2 variables Measures strength of an increasing or decreasing relationship between 2 variables -1 to 1 -1 to 1 Advantages over Pearson’s r Advantages over Pearson’s r Ranking procedure minimizes outliers Ranking procedure minimizes outliers Ordinal data categories or ranges Ordinal data categories or ranges Sample size is small or the normality of one of the variables is questionable Sample size is small or the normality of one of the variables is questionable