Statistical Analysis
Statistics u Description –Describes the data –Mean –Median –Mode u Inferential –Allows prediction from the sample to the population in general
Normal distribution
Standard deviation u Defined as square root of the variance. u Measure of the dispersion of the data. u rule for σ-2σ-3σ u Denoted by letter σ (lower case sigma).
Reporting descriptive statistics
Box plots
p values u Value that gives the confidence that the test results occurred by chance. u Typically must be less than.1 or.05. u Must always be reported as part of the data.
Reporting the statistics
Tests u T-test u ANOVA u Regression u Correlation u Non-parametric tests
T-test u Tests two different sets of values u Assumes a normal distribution u Different forms if the variance of the samples are different u Different forms for independent or dependent samples (whether the two samples data can be paired up)
T-test
ANOVA u Observed variance between different dependent variables in the experiment u Assumes a normal distribution and also assumes the treatment only effects the mean and not the variance
Correlation u Degree of fit between actual scores for a dependent variable and the predicted values based on a regression u Measures the degree of relationship u Correlation coefficients can range from to The value of represents a perfect negative correlation while a value of represents a perfect positive correlation. A value of 0.00 represents a lack of correlation.
Correlation
u This line is called the regression line or least squares line, because it is determined such that the sum of the squared distances of all the data points from the line is the lowest possible.
Regression u Prediction of the dependent variable value based on one or more independent variables u Measures the type of relationship between multiple values u Gives the percent of the variance accounted for by each element
Regression u But the world is complex and, in most cases, we are interested in comparisons that can’t be captured adequately using just two variables. Accordingly, analogues of the methods we’ve discussed so far have been developed to analyze relations between suites of variables. Because these suites are composed of multiple variables— as opposed to pairs of variables—the family of methods we’re now going to discuss are useful for ‘multiple variable’ or ‘multivariate’ analysis
Regression
u Performing a regression on the previous data gives:
Non-parametric tests u Don’t assume a normal distribution u Can be used with ordinal or nominal data u Weaker test, but less restrictions u Chi-square test u the Mann-Whitney U test u Wilcoxon signed-rank test
Mann-Whitney U test u Non-parametric test for assessing whether the medians between 2 samples are the same u for independent data u Whitney.htm
Wilcoxon signed-rank u Used for related samples u No assumptions on distribution
Confidence intervals u How sure are we that we have enough people in the sample u Methods of calculating either –how big the sample should be –how much confidence you can place in an existing sample
Confidence intervals u Since there are no comparable studies, estimates of the standard deviation was difficult. We used the values obtained by Cardinal & Siedler (1995) in their study of readability of healthcare material: sd = 12 for low groups and sd = 10 for high groups. They also saw a difference of 14 percent in total score between groups. Thus, the numbers we used for the power analysis were: control mean = 53 sd = 12 and experimental group mean = 67 sd = 10. For a significance level of.05 and a power of.9, this gives a value of 12 in each cell of the test design.
Outliers u Data that looks to not be part of the set. Want to remove it, but no real standards for what makes it real or an error. u For example, if one is calculating the average temperature of 10 objects in a room, and most are between 20-25° Celsius, but an oven is at 350° C, the median of the data may be 23 but the mean temperature will be 55 u ml#Correlations
u significant digits u writing up the statistics in an article
End