Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical Analysis - Mean(Average), Median, Mode, Range - Standard Deviation - T-test/ANOVA - Correlation - Chi Test - Percent Change.

Similar presentations


Presentation on theme: "Statistical Analysis - Mean(Average), Median, Mode, Range - Standard Deviation - T-test/ANOVA - Correlation - Chi Test - Percent Change."— Presentation transcript:

1 Statistical Analysis - Mean(Average), Median, Mode, Range - Standard Deviation - T-test/ANOVA - Correlation - Chi Test - Percent Change

2 Reasons for using statistics
Since we can’t measure the whole population, we need to take a sample to represent the population. Statistical analysis allows scientists to evaluate the accuracy and precision of data

3 An investigation of shell length variation in a mollusc species
A marine gastropod (Thersites bipartita) has been sampled from two different locations: Sample A: Shells found in full marine conditions Sample B: Shells found in brackish water conditions. sample size = 10 shells length of the shell measured as shown Experimental DESIGN The data obtained form the two locations will be used to illustrate the statistical calculations required.

4 Analysis of Gastropod Data
measured height of shells (ruler) Units: mm ± 0.5 mm (ERROR) Significant digits Uncertainty all measuring devices! reflects the precision of the measurement There should be no variation in the precision of raw data must be consistent

5 To estimate uncertainty take the smallest unit of the measuring instrument and divide it by two.
For example, a stopwatch measures time in hundredths of a second. If you measure a time of seconds for somebody to run the 100m sprint, then the uncertainty is ± seconds, or more clearly written: the measured time is (10.04 ± 0.005) seconds.

6 Mean (Average) Mean or Average = sum of values divided by the number of values Example is my individual data points Total = 67 There are 11 data points Calculating the mean 67/11 = 6.09

7 Median, Mode and Range Median 2+4+4+4+6+7+8+8+5+9+10
The middle value (rank order) Good measure of central tendency for skewed distributions Mode Most common data value Good measure for qualitative or bimodal distributions Range = 10-2 = 8 Difference between the largest and smallest data values Gives a crude indication of spread of data

8 Mean with the full data range
The data can be represented on a graph that might show the mean and the full range of data. Marine population: mean= 30.7 Range = 23-43 Brackish population: mean = 41.3 Range = 32-51

9 Error bars are a graphical representation of the variability of data.
Biological systems are subject to a genetic program and environmental variation. Consequently when we collect a set of data for a given variable it shows variation. When displaying data in graphical formats we can show the variation using error bars. - Repeated measurements and multiple readings of data improve the reliability of data Error bars are a graphical representation of the variability of data. We will use the standard deviation for error bars.

10 Standard Deviation Measure of the spread of data around the mean.
Can be used either as a measure of variation within a data set or of the accuracy of a measurement.

11 It is assumed that there is a normal distribution of values around the mean and that the data is not skewed to either end.

12 Standard Deviation The standard deviation calculated is a measure of the spread of the data values around the mean. Population 1*. Mean = 31.4 Standard deviation(s)= 5.7 Population 2*. Mean =41.6 Standard deviation(s) = 4.3 Raw Data Processed Data *Note- these are a different set of samples of shell lengths

13 Graphing the mean and the standard deviation.
One way to represent our data is to draw a graph that includes error bars of the standard deviation. Here each sample has the mean ± 1 standard deviation. There is no overlap in the distributions for shell length between these two populations. The question being considered is: Is there a significant difference between the two samples from different locations? Are the differences in the two samples just due to chance selection? or

14 Graphing Mean with STD DEV as Error Bars
Figure 1: Mean length of mollusc shell in the different types of water. Error bars represent one standard deviation. This standard deviation graph compares 68% of the population and begins to show that they look different.

15 What does a small standard deviation mean?
The standard deviation is useful for comparing the means and the spread of data between two or more samples. What does a small standard deviation mean? What does a large standard deviation mean? A small standard deviation indicates that the data is clustered closely around the mean value. (narrow variation) Conversely, a large standard deviation indicates a wider spread around the mean (wider variation)

16 Practice The average leaf length of one plant is 3.5 cm with a standard deviation of 1.0 cm. What does this indicate? A. 95% of all leaves fall within the ranges of 3.0 to 4.0 cm B. 68% of all leaves fall within the ranges of 2.5 to 4.5 cm C. 68% of all leaves fall within the ranges of 3.0 to 4.0 cm D. 95% of all leaves fall within the ranges of 2.5 to 4.5 cm (Total 1 mark)

17 Standard deviation(s)= 5.7 Population 2. Mean =41.3
In the introduction to this topic we considered the sampling of the same species of mollusc from two different locations. We have already calculated the means and the standard deviation for these sample. (note: The standard deviation is for the sample not the population) Population 1. Mean = 31.4 Standard deviation(s)= 5.7 Population 2. Mean =41.3 Standard deviation(s) = 4.3 The question we are considering is: Is there a significant difference between these two populations? OR
 Is any difference between the two samples just because of random sampling differences?

18 The t-test Another common form of data analysis is to compare two sets of data to see if they are the same or different. For example are the mollusc shells from the two locations significantly different? If the means of the two sets are very different, then it is easy to decide, but often the means are quite close and it is difficult to judge whether the two sets are the same or are significantly different. To compare two sets of data use the t test , which tells you the probability (P) that the two sets are basically the same. This is called the null hypothesis (H0)

19 Hypothesis Tests H0 is null hypothesis H1 alternative hypothesis
Status quo Nothing out of the ordinary Two means are equal, or no association between two variables H1 alternative hypothesis Something IS going on Two means are different, or there is an association between two variables.

20 (Using our example) Null Hypothesis HO:
There is no significant difference between the length of the shells of the two samples except as caused by chance selection of data. OR Alternative hypothesis H1: There is a significant difference between the length of the shells in sample A and sample B.

21 The t-test (cont.) Used to determine if there is a significant difference between two means The higher the probability, the more likely it is that the two sets are the same, and that any differences are just due to random chance. The lower the probability, the more likely it is that that the two sets are significantly different, and that the differences are real. Where do you draw the line between these two conclusions?

22 In biology the critical probability is usually taken as 0.05 (or 5%).
This may seem very low, but it reflects the facts that biology experiments are expected to produce quite similar results. if P>0.05 then the two sets are the same (ACCEPT the null:HO) if P<0.05 then the two sets are different (REJECT the null and support the alternative: H1). For the t test to work: the number of repeats should be as large as possible, and certainly > 10. Normal Distribution Accept alternative means you are 95% confident that the data you collected is being influenced by the independent variable your investigating.

23 t-test using Excel For the examples you'll use in biology, tails is always 2 , and type can be: 1, paired 2,Two samples equal variance 3, Two samples unequal variance

24 Conclusion: The mean mollusc shell lengths are different , and the t-test shows that there is only a tiny 0.03% probability that this difference is due to chance, so the shell length is significantly different in the two locations.

25 Writing the Conclusions
1. State null hypothesis & alternative hypothesis (based on research ?) 2. Set critical P level at P=0.05 (5%) 3. Write the decision rule— If P > 5% then the two sets are the same (i.e. Accept the null hypothesis). If P < 5% then the two sets are different (i.e. Reject the null hypothesis). 4. Write a summary statement based on the decision. The null hypothesis rejected since calculated P = (< 0.05; two-tailed test). 5. Write a statement of results in standard English. There is a significant difference between the length of the shells in sample A and sample B.

26 Practice The t-test is used to test the statistical significance of a difference. What is that difference? A. Between observed and expected results B. Between the means of two samples C. Between the standard deviation of two samples D. Between the size of two samples (Total 1 mark)

27 The t-test using a t-table
Let's compare the heights of men and women in the United States. The null hypothesis in this case is: women and men in the United States are equally tall, on average.

28 To test the hypothesis, we gather data from 10 men and women, chosen randomly. The data are shown graphically. We can see that the heights of men and women overlap broadly although the tallest individuals are men and the shortest are women. N=20

29 For this example, the calculation of t gives a value of 2.791.
Now we can consult a table of critical values of t. Here is a portion of a table of critical values of t: For t-test, the degrees of freedom is calculated as n-2, where n represents the total number of values

30 To determine the degrees of freedom:
No Significant difference between means Significant difference between means To determine the degrees of freedom: Total data points in both populations then use df= n-2 (n= 10 samples for women + 10 samples for men= 20) df= n-2 =20-2 =18 The degrees of freedom for our example is 18 If we scroll across the line for 18 degrees of freedom, we can find that our observed value of t (2.791) lies between the critical values of and Ask Rich to explain last bullet

31 Accept the null if the t-value is less than the critical value
Reject the null hypothesis if your t-value is greater than the critical value. Because the t-value for our test is greater than the critical value, we reject the null hypothesis there is a difference between women's height and men's height and infer that men are taller than women. With this statistical test, we are able to make inferences about all humans based on a small sub-sample. That is power!

32 Practice The mean heights of students on the basketball and volleyball squads are measured. There are 12 players on each squad. t=1.8. Which of the following is the most valid conclusion? (1 point) A) Accept the null hypothesis. There is no significant difference. B) Reject the null hypothesis. There is no significant difference. C) Accept the null hypothesis. There is a significant difference. D) Reject the null hypothesis. There is a significant difference.

33 The single factor Analysis of Variance (ANOVA)
Used to determine if there is a significant difference between more than two means Same rules to the t-test apply (null, alternative, p-value) Mollusc shell length (mm) from three different locations Marine Brackish Fresh 1 43 51 54 2 36 49 52 3 34 47 50 4 33 46 5 44 6 30 7 28 35 38 8 24 37 40 9 23 10 41 Mean 30.8 42.5 45.5 Std Dev 6.3 6.1

34 Adding ANOVA to your computer (you only need to do this once)
1 1 31 3 3 1 2

35 1 6 1 4 1 5 1 7

36 Single Factor ANOVA 1 1 2 Scroll up to find single factor 1
Input ALL raw data Alpha = 0.05 1

37 Correlation The existence of a correlation does not establish that there is a causal relationship between two variables When analyzing an experiment you are very often looking for an association between variables. This can be a correlation to see if two variables vary together, or a relation to see how one variable affects another. One test is the Pearson correlation coefficient ( r ) +1 (perfect positive correlation) through 0 (no correlation) to -1 (perfect negative correlation).

38 Pearson correlation (r)
Data are continuous & normally distributed In Excel, r is calculated using the formula: = CORREL(X range, Y range) It is usual to draw a scatter graph of the data whenever a correlation is being investigated.

39 Causative: Use linear regression
Fits a straight line to data Gives slope & intercept m and c in the equation: y = mx + c

40 Correlations Positive Correlation No Correlation Negative Correlation

41 Causation Correlation does not imply causation.
It is important to realize that a showing that a correlation exists between two sets of data does not necessarily mean that there is a causal effect between the two variables. In other words, It doesn’t always mean there is a logical connection between cause and effect Correlation does not imply causation. Here are some unusual examples of correlation but not causations ! Ice cream sales and the number of shark attacks on swimmers are correlated. Skirt lengths and stock prices are highly correlated (as stock prices go up, skirt lengths get shorter). The number of cavities in elementary school children and vocabulary size have a strong positive correlation. Clearly there is no real interaction between the factors involved simply a co-incidence of the data. Therefore, correlation doesn’t PROVE causation, but suggests it needs further investigation!

42 Practice What does the following scatter graph show?
A. No correlation between these variables B. Strong positive correlation between these variables C. Strong negative correlation between these variables D. Weak negative correlation between these variables (Total 1 mark)

43 Chi-Squared Test (X2) Statistical tool used to determine how far data you observe deviates from what you expect to observe

44

45 Chi-square statistic: To find the probability value (p) associated with the obtained Chi-square statistic a. Calculate degrees of freedom (df) df = (#rows-1)*(#columns -1) for an association df= (# of outcomes – 1) for a theory b. Use table of CRITICAL VALUES for Chi-square test to find the p value. Species Frequency Cattails only 6 Seaweed only 8 Both species 11 Neither species 5 This chart will be 2 rows and 2 columns, you will see in 4.1 df for association= (2-1) * (2-1) = 1*1= 1

46 Let’s assume we calculate X2 to be 0.031 for an association
P >0.05 P <0.05 No Significant association Significant association between means between means X2= 0.031, previously calculated df=1 Since P>0.05, The is no significant associations between the means

47 Chi-Squared Testing Compare the Chi-squared value with the Critical Value Null Hypothesis (H0) : If the X2 < CV, then ACCEPT the Null Hypothesis (There is NO Association between the variables) i.e. The two species are distributed independently Alternative Hypothesis (H1): If the X2 > CV, then REJECT the Null Hypothesis (There is a significant Association between the variables)…aka ACCEPT the Alternative Hypothesis i.e. The two species are associated (either positively so they tend to occur together or negatively so they tend to occur apart)

48 Calculating CHITEST in Excel
Comparing Observed Counts to a Theory C C

49

50

51 Testing for an Association between Groups of Counts
Expected = (Sum of the Rows * Sum of the Columns)/ Total

52 Testing for an Association between Groups of Counts
Expected = (Sum of the Rows * Sum of the Columns)/ Total Sum of row Sum of columns Total

53 Expected = (Sum of the Rows * Sum of the Columns)/ Total

54

55

56 Percent Change Percent Change = New Value - Original Value * 100 Original Value

57

58 Percent Change = New Value - Original Value * 100
Example: You have 10 amoebas in a petri dish. Three days later you have 25. What is the % change in the number of amoebas in the petri dish? Step 1: Subtract the original from the new value 25 – 10 = 15 Step 2: Divide the change in value by the original value 15/10 = 1.5 Step 3: Multiply step 2 value by 100 to get the % change 1.5 * 100 = 150% change (increase)


Download ppt "Statistical Analysis - Mean(Average), Median, Mode, Range - Standard Deviation - T-test/ANOVA - Correlation - Chi Test - Percent Change."

Similar presentations


Ads by Google