Statistical Analysis - Mean(Average), Median, Mode, Range - Standard Deviation - T-test/ANOVA - Correlation - Chi Test - Percent Change.

Slides:



Advertisements
Similar presentations
Modifyuse bio. IB book IB Biology Topic 1: Statistical Analysis ary/Science/c4b/1/stat1.htm
Advertisements

Relationship between Variables Assessment Statement Explain that the existence of a correlation does not establish that there is a causal relationship.
An investigation of shell length variation in a mollusc species A marine gastropod (Thersites bipartita) has been sampled from two different locations:A.
Copyright © Allyn & Bacon (2007) Statistical Analysis of Data Graziano and Raulin Research Methods: Chapter 5 This multimedia product and its contents.
The Simple Regression Model
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Relationships Among Variables
Lecture 16 Correlation and Coefficient of Correlation
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
Correlation and Linear Regression
TOPIC 1 STATISTICAL ANALYSIS
Statistical Analysis Statistical Analysis
Data Collection & Processing Hand Grip Strength P textbook.
Statistical Analysis I have all this data. Now what does it mean?
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
Introduction to Inferential Statistics Statistical analyses are initially divided into: Descriptive Statistics or Inferential Statistics. Descriptive Statistics.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Sampling  When we want to study populations.  We don’t need to count the whole population.  We take a sample that will REPRESENT the whole population.
1.1 Statistical Analysis. Learning Goals: Basic Statistics Data is best demonstrated visually in a graph form with clearly labeled axes and a concise.
STATISTICS!!! The science of data. What is data? Information, in the form of facts or figures obtained from experiments or surveys, used as a basis for.
Chapter Eight: Using Statistics to Answer Questions.
Statistical Analysis. Null hypothesis: observed differences are due to chance (no causal relationship) Ex. If light intensity increases, then the rate.
Data Analysis.
Relationship between Variables Assessment Statement Explain that the existence of a correlation does not establish that there is a causal relationship.
PCB 3043L - General Ecology Data Analysis.
Statistical analysis Why?? (besides making your life difficult …)  Scientists must collect data AND analyze it  Does your data support your hypothesis?
Statistical Analysis IB Topic 1. IB assessment statements:  By the end of this topic, I can …: 1. State that error bars are a graphical representation.
The 2 nd to last topic this year!!.  ANOVA Testing is similar to a “two sample t- test except” that it compares more than two samples to one another.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
Statistics Made Simple
GS/PPAL Section N Research Methods and Information Systems
Statistical analysis.
Practice As part of a program to reducing smoking, a national organization ran an advertising campaign to convince people to quit or reduce their smoking.
Regression and Correlation
Dependent-Samples t-Test
AP Biology Intro to Statistics
STATISTICS FOR SCIENCE RESEARCH
Inference and Tests of Hypotheses
Modify—use bio. IB book  IB Biology Topic 1: Statistical Analysis
Statistical analysis.
PCB 3043L - General Ecology Data Analysis.
Understanding Results
AP Biology Intro to Statistics
Elementary Statistics
POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.
Social Research Methods
Introduction to Inferential Statistics
By C. Kohn Waterford Agricultural Sciences
STATISTICAL ANALYSIS.
Kin 304 Inferential Statistics
AP Biology Intro to Statistics
Correlation and Regression
STEM Fair Graphs & Statistical Analysis
TOPIC 1: STATISTICAL ANALYSIS
Statistical Analysis Error Bars
Statistical Analysis IB Topic 1.
STATISTICS Topic 1 IB Biology Miss Werba.
Correlation and Regression
Statistics Made Simple
STATISTICAL ANALYSIS.
Standard Deviation & Standard Error
Correlation and the Pearson r
SIMPLE LINEAR REGRESSION
15.1 The Role of Statistics in the Research Process
Statistical analysis.
Chapter Nine: Using Statistics to Answer Questions
1.1 Statistical Analysis.
Presentation transcript:

Statistical Analysis - Mean(Average), Median, Mode, Range - Standard Deviation - T-test/ANOVA - Correlation - Chi Test - Percent Change

Reasons for using statistics Since we can’t measure the whole population, we need to take a sample to represent the population. Statistical analysis allows scientists to evaluate the accuracy and precision of data

An investigation of shell length variation in a mollusc species A marine gastropod (Thersites bipartita) has been sampled from two different locations: Sample A: Shells found in full marine conditions Sample B: Shells found in brackish water conditions. sample size = 10 shells length of the shell measured as shown Experimental DESIGN The data obtained form the two locations will be used to illustrate the statistical calculations required.

Analysis of Gastropod Data measured height of shells (ruler) Units: mm ± 0.5 mm (ERROR) Significant digits Uncertainty all measuring devices! reflects the precision of the measurement There should be no variation in the precision of raw data must be consistent

To estimate uncertainty take the smallest unit of the measuring instrument and divide it by two. For example, a stopwatch measures time in hundredths of a second. If you measure a time of 10.04 seconds for somebody to run the 100m sprint, then the uncertainty is ± 0.005 seconds, or more clearly written: the measured time is (10.04 ± 0.005) seconds.

Mean (Average) Mean or Average = sum of values divided by the number of values Example 4+4+8+8+2+5+7+4+6+9+10 is my individual data points Total = 67 There are 11 data points Calculating the mean 67/11 = 6.09

Median, Mode and Range Median 2+4+4+4+6+7+8+8+5+9+10 The middle value (rank order) Good measure of central tendency for skewed distributions Mode 2+4+4+4+6+7+8+8+5+9+10 Most common data value Good measure for qualitative or bimodal distributions Range 2+4+4+4+6+7+8+8+5+9+10 = 10-2 = 8 Difference between the largest and smallest data values Gives a crude indication of spread of data

Mean with the full data range The data can be represented on a graph that might show the mean and the full range of data. Marine population: mean= 30.7 Range = 23-43 Brackish population: mean = 41.3 Range = 32-51

Error bars are a graphical representation of the variability of data. Biological systems are subject to a genetic program and environmental variation. Consequently when we collect a set of data for a given variable it shows variation. When displaying data in graphical formats we can show the variation using error bars. - Repeated measurements and multiple readings of data improve the reliability of data Error bars are a graphical representation of the variability of data. We will use the standard deviation for error bars.

Standard Deviation Measure of the spread of data around the mean. Can be used either as a measure of variation within a data set or of the accuracy of a measurement.

It is assumed that there is a normal distribution of values around the mean and that the data is not skewed to either end.

Standard Deviation The standard deviation calculated is a measure of the spread of the data values around the mean. Population 1*. Mean = 31.4 Standard deviation(s)= 5.7 Population 2*. Mean =41.6 Standard deviation(s) = 4.3 Raw Data Processed Data *Note- these are a different set of samples of shell lengths

Graphing the mean and the standard deviation. One way to represent our data is to draw a graph that includes error bars of the standard deviation. Here each sample has the mean ± 1 standard deviation. There is no overlap in the distributions for shell length between these two populations. The question being considered is: Is there a significant difference between the two samples from different locations? Are the differences in the two samples just due to chance selection? or

Graphing Mean with STD DEV as Error Bars Figure 1: Mean length of mollusc shell in the different types of water. Error bars represent one standard deviation. This standard deviation graph compares 68% of the population and begins to show that they look different.

What does a small standard deviation mean? The standard deviation is useful for comparing the means and the spread of data between two or more samples. What does a small standard deviation mean? What does a large standard deviation mean? A small standard deviation indicates that the data is clustered closely around the mean value. (narrow variation) Conversely, a large standard deviation indicates a wider spread around the mean (wider variation)

Practice The average leaf length of one plant is 3.5 cm with a standard deviation of 1.0 cm. What does this indicate? A. 95% of all leaves fall within the ranges of 3.0 to 4.0 cm B. 68% of all leaves fall within the ranges of 2.5 to 4.5 cm C. 68% of all leaves fall within the ranges of 3.0 to 4.0 cm D. 95% of all leaves fall within the ranges of 2.5 to 4.5 cm (Total 1 mark)

Standard deviation(s)= 5.7 Population 2. Mean =41.3 In the introduction to this topic we considered the sampling of the same species of mollusc from two different locations. We have already calculated the means and the standard deviation for these sample. (note: The standard deviation is for the sample not the population) Population 1. Mean = 31.4 Standard deviation(s)= 5.7 Population 2. Mean =41.3 Standard deviation(s) = 4.3 The question we are considering is: Is there a significant difference between these two populations? OR
 Is any difference between the two samples just because of random sampling differences?

The t-test Another common form of data analysis is to compare two sets of data to see if they are the same or different. For example are the mollusc shells from the two locations significantly different? If the means of the two sets are very different, then it is easy to decide, but often the means are quite close and it is difficult to judge whether the two sets are the same or are significantly different. To compare two sets of data use the t test , which tells you the probability (P) that the two sets are basically the same. This is called the null hypothesis (H0)

Hypothesis Tests H0 is null hypothesis H1 alternative hypothesis Status quo Nothing out of the ordinary Two means are equal, or no association between two variables H1 alternative hypothesis Something IS going on Two means are different, or there is an association between two variables.

(Using our example) Null Hypothesis HO: There is no significant difference between the length of the shells of the two samples except as caused by chance selection of data. OR Alternative hypothesis H1: There is a significant difference between the length of the shells in sample A and sample B.

The t-test (cont.) Used to determine if there is a significant difference between two means The higher the probability, the more likely it is that the two sets are the same, and that any differences are just due to random chance. The lower the probability, the more likely it is that that the two sets are significantly different, and that the differences are real. Where do you draw the line between these two conclusions?

In biology the critical probability is usually taken as 0.05 (or 5%). This may seem very low, but it reflects the facts that biology experiments are expected to produce quite similar results. if P>0.05 then the two sets are the same (ACCEPT the null:HO) if P<0.05 then the two sets are different (REJECT the null and support the alternative: H1). For the t test to work: the number of repeats should be as large as possible, and certainly > 10. Normal Distribution Accept alternative means you are 95% confident that the data you collected is being influenced by the independent variable your investigating.

t-test using Excel For the examples you'll use in biology, tails is always 2 , and type can be: 1, paired 2,Two samples equal variance 3, Two samples unequal variance

Conclusion: The mean mollusc shell lengths are different , and the t-test shows that there is only a tiny 0.03% probability that this difference is due to chance, so the shell length is significantly different in the two locations.

Writing the Conclusions 1. State null hypothesis & alternative hypothesis (based on research ?) 2. Set critical P level at P=0.05 (5%) 3. Write the decision rule— If P > 5% then the two sets are the same (i.e. Accept the null hypothesis). If P < 5% then the two sets are different (i.e. Reject the null hypothesis). 4. Write a summary statement based on the decision. The null hypothesis rejected since calculated P = 0.0003 (< 0.05; two-tailed test). 5. Write a statement of results in standard English. There is a significant difference between the length of the shells in sample A and sample B.

Practice The t-test is used to test the statistical significance of a difference. What is that difference? A. Between observed and expected results B. Between the means of two samples C. Between the standard deviation of two samples D. Between the size of two samples (Total 1 mark)

The t-test using a t-table Let's compare the heights of men and women in the United States. The null hypothesis in this case is: women and men in the United States are equally tall, on average.

To test the hypothesis, we gather data from 10 men and women, chosen randomly. The data are shown graphically. We can see that the heights of men and women overlap broadly although the tallest individuals are men and the shortest are women. N=20

For this example, the calculation of t gives a value of 2.791. Now we can consult a table of critical values of t. http://davidmlane.com/hyperstat/t_table.html Here is a portion of a table of critical values of t: For t-test, the degrees of freedom is calculated as n-2, where n represents the total number of values

To determine the degrees of freedom: No Significant difference between means Significant difference between means To determine the degrees of freedom: Total data points in both populations then use df= n-2 (n= 10 samples for women + 10 samples for men= 20) df= n-2 =20-2 =18 The degrees of freedom for our example is 18 If we scroll across the line for 18 degrees of freedom, we can find that our observed value of t (2.791) lies between the critical values of 2.101 and 2.878. Ask Rich to explain last bullet

Accept the null if the t-value is less than the critical value Reject the null hypothesis if your t-value is greater than the critical value. Because the t-value for our test is greater than the critical value, we reject the null hypothesis there is a difference between women's height and men's height and infer that men are taller than women. With this statistical test, we are able to make inferences about all humans based on a small sub-sample. That is power!

Practice The mean heights of students on the basketball and volleyball squads are measured. There are 12 players on each squad. t=1.8. Which of the following is the most valid conclusion? (1 point) A) Accept the null hypothesis. There is no significant difference. B) Reject the null hypothesis. There is no significant difference. C) Accept the null hypothesis. There is a significant difference. D) Reject the null hypothesis. There is a significant difference.

The single factor Analysis of Variance (ANOVA) Used to determine if there is a significant difference between more than two means Same rules to the t-test apply (null, alternative, p-value) Mollusc shell length (mm) from three different locations   Marine Brackish Fresh 1 43 51 54 2 36 49 52 3 34 47 50 4 33 46 5 44 6 30 7 28 35 38 8 24 37 40 9 23 10 41 Mean 30.8 42.5 45.5 Std Dev 6.3 6.1

Adding ANOVA to your computer (you only need to do this once) 1 1 31 3 3 1 2

1 6 1 4 1 5 1 7

Single Factor ANOVA 1 1 2 Scroll up to find single factor 1 Input ALL raw data 3 Alpha = 0.05 1

Correlation The existence of a correlation does not establish that there is a causal relationship between two variables When analyzing an experiment you are very often looking for an association between variables. This can be a correlation to see if two variables vary together, or a relation to see how one variable affects another. One test is the Pearson correlation coefficient ( r ) +1 (perfect positive correlation) through 0 (no correlation) to -1 (perfect negative correlation).

Pearson correlation (r) Data are continuous & normally distributed In Excel, r is calculated using the formula: = CORREL(X range, Y range) It is usual to draw a scatter graph of the data whenever a correlation is being investigated.

Causative: Use linear regression Fits a straight line to data Gives slope & intercept m and c in the equation: y = mx + c

Correlations Positive Correlation No Correlation Negative Correlation

Causation Correlation does not imply causation. It is important to realize that a showing that a correlation exists between two sets of data does not necessarily mean that there is a causal effect between the two variables. In other words, It doesn’t always mean there is a logical connection between cause and effect Correlation does not imply causation. Here are some unusual examples of correlation but not causations ! Ice cream sales and the number of shark attacks on swimmers are correlated. Skirt lengths and stock prices are highly correlated (as stock prices go up, skirt lengths get shorter). The number of cavities in elementary school children and vocabulary size have a strong positive correlation. Clearly there is no real interaction between the factors involved simply a co-incidence of the data. Therefore, correlation doesn’t PROVE causation, but suggests it needs further investigation!

Practice What does the following scatter graph show? A. No correlation between these variables B. Strong positive correlation between these variables C. Strong negative correlation between these variables D. Weak negative correlation between these variables (Total 1 mark)

Chi-Squared Test (X2) Statistical tool used to determine how far data you observe deviates from what you expect to observe

Chi-square statistic: To find the probability value (p) associated with the obtained Chi-square statistic a. Calculate degrees of freedom (df) df = (#rows-1)*(#columns -1) for an association df= (# of outcomes – 1) for a theory b. Use table of CRITICAL VALUES for Chi-square test to find the p value. Species Frequency Cattails only 6 Seaweed only 8 Both species 11 Neither species 5 This chart will be 2 rows and 2 columns, you will see in 4.1 df for association= (2-1) * (2-1) = 1*1= 1

Let’s assume we calculate X2 to be 0.031 for an association P >0.05 P <0.05 No Significant association Significant association between means between means X2= 0.031, previously calculated df=1 Since P>0.05, The is no significant associations between the means

Chi-Squared Testing Compare the Chi-squared value with the Critical Value Null Hypothesis (H0) : If the X2 < CV, then ACCEPT the Null Hypothesis (There is NO Association between the variables) i.e. The two species are distributed independently Alternative Hypothesis (H1): If the X2 > CV, then REJECT the Null Hypothesis (There is a significant Association between the variables)…aka ACCEPT the Alternative Hypothesis i.e. The two species are associated (either positively so they tend to occur together or negatively so they tend to occur apart)

Calculating CHITEST in Excel Comparing Observed Counts to a Theory C C

Testing for an Association between Groups of Counts Expected = (Sum of the Rows * Sum of the Columns)/ Total

Testing for an Association between Groups of Counts Expected = (Sum of the Rows * Sum of the Columns)/ Total Sum of row Sum of columns Total

Expected = (Sum of the Rows * Sum of the Columns)/ Total

Percent Change Percent Change = New Value - Original Value * 100 Original Value

Percent Change = New Value - Original Value * 100 Example: You have 10 amoebas in a petri dish. Three days later you have 25. What is the % change in the number of amoebas in the petri dish? Step 1: Subtract the original from the new value 25 – 10 = 15 Step 2: Divide the change in value by the original value 15/10 = 1.5 Step 3: Multiply step 2 value by 100 to get the % change 1.5 * 100 = 150% change (increase)