Histograms and Distributions Experiment: Do athletes have faster reflexes than non-athletes? Questions: - You go out and 1st collect the reaction time of 25 non- athletes.
Histograms and Distributions Calculate the mean… Non-Athletes reaction time in millliseconds (ms)
Histograms and Distributions Calculate the mean score… Compare: athletesNon- athletes mean Athletes reaction time in millliseconds (ms)
Histograms and Distributions Make a histogram to display the data… Non-Athletes reaction time in millliseconds (ms) arranged from low to high reaction time
Histograms and Distributions Histogram = a plot of frequency Non-athletes Sample size: 25
Histograms and Distributions Athletes Sample size: 25
Histograms and Distributions AthletesNon-athletes MEAN: Compare the histograms of non-athletes to athletes:
Histograms and Distributions AthletesNon-athletes MEAN: Compare the histograms of non-athletes to athletes: Reaction time (ms) Number of students (frequency) Q: Is there really a difference between these two groups???
Histograms and Distributions The student decided to collect more data (larger sample size), which is really the only option at this point… binnon-athletesathletes sample size7377 Reaction time (ms) Number of students (frequency) Athletes Non-athletes MEAN:
Histograms and Distributions Reaction time (ms) Number of students (frequency) Athletes Non-athletes MEAN: AthletesNon-athletes MEAN: Reaction time (ms) Number of students (frequency) Comparison of histograms with small vs. large sample size: Sample size: 25 in each group (N=50)Sample size: 73 in non-athletes 77 in athletes (N=150)
Histograms and Distributions AthletesNon-athletes MEAN: Let’s go back to the small sample size data… Reaction time (ms) Number of students (frequency) How can we determine if there is a significant difference between these two groups?
Histograms and Distributions Standard deviation (sigma) Normal or Gaussian Distribution First one needs to determine the standard deviation, which is basically a measure of the width of the histogram. For example, the mean of the non-athletes is ms. If the standard dev. is determined to be 30 ms, then it is assumed that 68.2% of the data will fall between /- 30ms (between and ms). Would you prefer your standard dev. to be larger or smaller in value?
Histograms and Distributions How do we determine the standard deviation (sigma) of the mean?
Histograms and Distributions 1. Find the distance between each value and the mean This will tell you how far away each value is from the mean and begin to help you understand the width of your distribution.
Histograms and Distributions 2. Square all the differences
Histograms and Distributions 3. Sum all the squares
Histograms and Distributions 4. Divide the sum by the number of scores minus (variance)
Histograms and Distributions 5. Take the square root of the variance 31.6 (standard deviation)
Histograms and Distributions Standard deviation formula (what we just did): - the square root of the sum of the squared deviations from the mean divided by the number of scores minus one
Histograms and Distributions Standard deviation formula: Non-athletes: SD(σ)=31.6 Athletes: SD(σ)=30.6 Are these groups statistically different from each other??
Histograms and Distributions T-Test assesses whether the means of two groups are statistically different from each other
Histograms and Distributions
= Standard Error of the difference
Histograms and Distributions
Therefore the t-value is related to how different the means are and how broad yours data is. A high t-value is obviously what you hope for… Calculate the t-score
Histograms and Distributions t = Degrees of freedom is the sum of the people in both groups minus 2 df = 48
Histograms and Distributions The null hypothesis vs the hypothesis 1. The hypothesis: Athletes will have a quicker reaction time than non-athletes. 2. The null hypothesis: The null hypothesis always states that there is no relationship between the two groups or there is no difference in reaction time between athletes and non- athletes.
Histograms and Distributions 3. Therefore, the probability that there is a difference between the two groups is 1 minus the p-value. 4. In order for the data to support the hypothesis, the p-value must be high or low? 1. The p-value is a number between 0 and 1. The p-value 2. It is the probability (hence the p-value) that there is no difference between the groups supporting the null hypothesis. The p-value should be low (<0.05), which says that there is less than a 5% chance that there is no difference between the two groups. Therefore, there is greater than 95% chance that there is a difference.
Histograms and Distributions When the p-value is less than 0.05, we say that the data is statistically significant, and there may be a real difference between the two groups. Statistical Significance Be warned that just because p is less than 0.05 between two groups doesn’t mean that there is actually a difference. For example, if we find p < 0.05 for the reaction time experiment, it doesn’t mean that there is a definite difference between athletes and non-athletes. It only means that there is a difference in our data, but our data might be flawed or there is not enough data yet (sample size too small) or we measured the data improperly, or the sampling wasn’t random, or the experiment was garbage, etc… Doubt is the greatest tool of any scientist (person).
Histograms and Distributions The p-value is found by using a standard t-table in combination with the t-value and the degrees of freedom previously determined: How is the p-value determined?
Histograms and Distributions Now you try it: 1. On Edmodo you will find data collected by Tom and Ileana regarding one’s ability to estimate the length of a line or the number of spots on a screen. 2. The questions were accompanied with a survey that asked for the subject’s grade level, ethnicity, participation in sports, and honors vs. regents level. 3. The wanted to know if any of these differences would correlate to their ability to estimate. How should we analyze this data?
Histograms and Distributions 1. Begin by choosing the dependent variable like grade for example. Since the T-test can only look at two groups simultaneously and there are four grades, we need to perform all the possible combinations (there was apparently only one 9 th grader and therefore the sample size is too low to look at this grade): 10 th vs 11 th 10 th vs 12 th 11 th vs 12 th We also would want to know if the mean of each group is significantly different than the actual value. Actual value vs 10 th Actual value vs 11 th Actual value vs 12 th This needs to be done twice, once for the line estimation and once for the dots estimation!!
Histograms and Distributions These are the tables you need to fill out: GradeMeanSDVariance 10 th 11 th 12 th GadesDifference of means Variability of Groups T-scoreP-value 10 th vs actual 11 th vs actual 12 th vs actual 10 th vs 11 th 10 th vs 12 th 11 th vs 12 th Write a conclusion based on your analysis. Remember, just because p < 0.5 it doesn’t necessarily mean you hypothesis is supported!