Beak of the Finch Natural Selection Statistical Analysis.

Slides:



Advertisements
Similar presentations
1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.
Advertisements

Quantitative Skills 4: The Chi-Square Test
Hypothesis Testing IV Chi Square.
Inferential Statistics & Hypothesis Testing
1 Analysis of Variance This technique is designed to test the null hypothesis that three or more group means are equal.
DATA ANALYSIS I MKT525. Plan of analysis What decision must be made? What are research objectives? What do you have to know to reach those objectives?
BHS Methods in Behavioral Sciences I
Lect 10b1 Histogram – (Frequency distribution) Used for continuous measures Statistical Analysis of Data ______________ statistics – summarize data.
T-Tests Lecture: Nov. 6, 2002.
Chapter 9 Hypothesis Testing.
Today Concepts underlying inferential statistics
1 Confidence Interval for Population Mean The case when the population standard deviation is unknown (the more common case).
Inferential Statistics
Statistical Analysis. Purpose of Statistical Analysis Determines whether the results found in an experiment are meaningful. Answers the question: –Does.
POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance,
AM Recitation 2/10/11.
Estimation and Hypothesis Testing Faculty of Information Technology King Mongkut’s University of Technology North Bangkok 1.
Statistical Analysis I have all this data. Now what does it mean?
Chi-Squared Test.
Hypothesis Testing:.
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 2: Basic techniques for innovation data analysis. Part I: Statistical inferences.
Section #4 October 30 th Old: Review the Midterm & old concepts 1.New: Case II t-Tests (Chapter 11)
Statistical Analysis Statistical Analysis
Significance Tests …and their significance. Significance Tests Remember how a sampling distribution of means is created? Take a sample of size 500 from.
Fall 2013 Lecture 5: Chapter 5 Statistical Analysis of Data …yes the “S” word.
1 Level of Significance α is a predetermined value by convention usually 0.05 α = 0.05 corresponds to the 95% confidence level We are accepting the risk.
Statistics Primer ORC Staff: Xin Xin (Cindy) Ryan Glaman Brett Kellerstedt 1.
Comparing Means: t-tests Wednesday 22 February 2012/ Thursday 23 February 2012.
Estimation of Statistical Parameters
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 22 Using Inferential Statistics to Test Hypotheses.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
Chapter 15 Data Analysis: Testing for Significant Differences.
Statistical Analysis Mean, Standard deviation, Standard deviation of the sample means, t-test.
Education Research 250:205 Writing Chapter 3. Objectives Subjects Instrumentation Procedures Experimental Design Statistical Analysis  Displaying data.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Lecture 5: Chapter 5: Part I: pg Statistical Analysis of Data …yes the “S” word.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
1 Psych 5500/6500 The t Test for a Single Group Mean (Part 1): Two-tail Tests & Confidence Intervals Fall, 2008.
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
Statistical Hypotheses & Hypothesis Testing. Statistical Hypotheses There are two types of statistical hypotheses. Null Hypothesis The null hypothesis,
Statistics in Biology. Histogram Shows continuous data – Data within a particular range.
Introduction to Inferential Statistics Statistical analyses are initially divided into: Descriptive Statistics or Inferential Statistics. Descriptive Statistics.
Essential Question:  How do scientists use statistical analyses to draw meaningful conclusions from experimental results?
The Statistical Analysis of Data. Outline I. Types of Data A. Qualitative B. Quantitative C. Independent vs Dependent variables II. Descriptive Statistics.
Chi square analysis Just when you thought statistics was over!!
Physics 270 – Experimental Physics. Let say we are given a functional relationship between several measured variables Q(x, y, …) x ±  x and x ±  y What.
Chapter 8 Parameter Estimates and Hypothesis Testing.
Inferential Statistics. Coin Flip How many heads in a row would it take to convince you the coin is unfair? 1? 10?
Chapter Eight: Using Statistics to Answer Questions.
Statistical Analysis. Null hypothesis: observed differences are due to chance (no causal relationship) Ex. If light intensity increases, then the rate.
PCB 3043L - General Ecology Data Analysis. PCB 3043L - General Ecology Data Analysis.
Data Analysis.
PCB 3043L - General Ecology Data Analysis.
Statistical analysis Why?? (besides making your life difficult …)  Scientists must collect data AND analyze it  Does your data support your hypothesis?
© Copyright McGraw-Hill 2004
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Learning Objectives After this section, you should be able to: The Practice of Statistics, 5 th Edition1 DESCRIBE the shape, center, and spread of the.
Chapter 13 Understanding research results: statistical inference.
The Chi Square Equation Statistics in Biology. Background The chi square (χ 2 ) test is a statistical test to compare observed results with theoretical.
Descriptive Statistics Used in Biology. It is rarely practical for scientists to measure every event or individual in a population. Instead, they typically.
Data Analysis. Qualitative vs. Quantitative Data collection methods can be roughly divided into two groups. It is essential to understand the difference.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
PCB 3043L - General Ecology Data Analysis Organizing an ecological study What is the aim of the study? What is the main question being asked? What are.
Chapter 9 Introduction to the t Statistic
Outline Sampling Measurement Descriptive Statistics:
PCB 3043L - General Ecology Data Analysis.
AP Biology Intro to Statistics
AP Biology Intro to Statistics
Presentation transcript:

Beak of the Finch Natural Selection Statistical Analysis

Key Concepts and Learning Objectives Evolution by way of natural selection can only occur if heritable traits vary among individuals in a population. In a given environment, individuals with one form of a trait may be able to better exploit some aspects of the environment than individuals with other forms of the trait can. Natural selection involves the differential survival and reproduction of individuals with different heritable traits. Evolution occurs when inherited traits in a population change over successive generations. Graphing allows scientists to more readily identify patterns and trends in data, including in ecological and population data. Statistical tools provide a way to quantify variability in biological data and describe the degree of uncertainty in the results obtained using these data.

Part A: Calculating Descriptive Statistics 1. What is a “mean?” 2. What is the “standard deviation” and what does it tell us? 3. What is the “standard error of the mean” and the “confidence interval”? 4. What is a “t-test” and what does the “t-test” tell us? 5. What is “Chi-square” and what does it tell us? 6. How do scientists determine if their data are statistically significant?

The “mean” In order to analyze a trait from an entire population, scientists will focus on a small subset of individuals from that population. Ideally, this small subset will generate data that is referred to as “normally distributed.”

The Mean: (7.1+11.2+9.3+9.0+8.5)/5 = 45.1/5 = 9.02mm Finch # 1 2 3 4 5 Beak Depth (mm) 7.1 11.2 9.3 9.0 8.5 (7.1+11.2+9.3+9.0+8.5)/5 = 45.1/5 = 9.02mm

Measures of Variability Variability describes the extent to which numbers in a data set diverge from the central tendency. It is a measure of how “spread out” the data are. The most common measures of variability are range, variance and standard deviation .

The Sample Standard Deviation (s) The sample standard deviation (s) is essentially the average of the deviation between each measurement in the sample and the sample mean ( ). The sample standard deviation is an estimate of the standard deviation in the larger population. = # of samples = individual measurement = sample mean

The Sample Standard Deviation (s) Finch # 1 2 3 4 5 Beak Depth (mm) 7.1 11.2 9.3 9.0 8.5 (7.1-9.02)2 + (11.2-9.02)2 + (9.3-9.02)2 + (9.0-9.02)2 + (8.5-9.02)2 = (5-1) 3.69 + 4.75 + 0.078 + 0.0004 + 0.27 4 2.197 = 1.48 8.79 = 4

What does the standard deviation mean for normally distributed data? 1 SD 34.1% 2 SD 47.5% For normally distributed data, 68% of the measurements will fall within 1 SD of the mean. 95% of the measurements will fall within 2 SD of the mean.

Graphing the Mean and SD Finch# Beak depth (mm)   1 7.1 2 11.2 3 9.3 4 9 5 8.5 9.02 Mean 1.482228053 SD 10.5mm 7.54mm

Measures of confidence Standard Error of the Mean and 95% Confidence Interval The standard deviation (s) represents how “spread out” measurements in a sample population are from the sample mean ( ). However, the sample mean is not necessarily identical to the population mean. This degree of error between the sample mean and the population mean is represented by the “standard error of the mean” (SEM). Unlike standard deviation, which focuses only on the sample population and sample mean, the SEM tells us how close the sample mean is to the population mean. S = standard deviation n = # of samples For the SEM, the larger your sample size (n), the smaller your SEM (smaller error bars). The smaller your SEM, the closer your sample mean is to the population mean.

Example : Calculating & Graphing SEM 9.68mm (larger error bar) SEM = 1.48 5 8.36mm = 0.66 SEM = 1.44 20 (smaller error bar) = 0.32 9.51mm 8.87mm

The 95% CI is essentially the SEM x 2. Measures of confidence Standard Error of the Mean and 95% Confidence Interval The 95% confidence interval (95% CI) is related to the SEM in that there’s a 68% chance that the population mean falls within +/- 1SEM of your sample mean and a 95% chance that the population mean falls within +/- 2SEMs from your sample mean. The equation for 95% CI is: -1 SEM 34.1% -2 SEM 47.5% +1 SEM +2 SEM 95% CI = 2s n 95% CI = 1.96s n which is typically rounded to The 95% CI is essentially the SEM x 2.

Calculating & Graphing SEM vs. 95% CI error bars Finch# Body Mass (g) 1 15.7 2 12.3 3 14.4 4 17.1 5 15.5 6 13.8 7 11.6 8 14.2 9 12.7 10 14.9 Mean 14.22 SD 1.69 (14.75g) (13.69g) (15.29g) 2s SEM= 95% CI = n (13.15g) = 1.69 = 2(1.69) 10 10 = 0.53 = 1.07

Inferential Statistics: What it means for data to be “statistically significant” Scientists want to determine if Royal Blue Tang fish (delicious prey for carnivorous sharks) swim faster when surrounded by sharks compared to being in a shark-free environment. They measured and graphed the velocities of six fish in both a shark-free and shark-filled tank.

What it means for data to be “statistically significant” Shark-free tank Shark-filled tank   20.2 35.5 17.8 40.2 19.3 38.8 22.3 42.5 25.6 60.8 24.7 47.9 Mean 21.65 44.28 SD 3.09 9.09 SEM 1.26 3.71 95% CI 2.52 7.42 Velocities of Blue-Tang fish (mph) Error bars: +/- SEM Error bars: 95% CI

What it means for data to be “statistically significant” Error bars: +/- SEM Error bars: 95% CI Based on the graph and error bars (SEM or 95%CI) would you conclude there’s a difference in how fast the fish swim between the shark and shark-free group? Would you conclude this difference is “statistically significant”? Why?

What it means for data to be “statistically significant” Error bars: +/- SEM Error bars: 95% CI Based on the graph and error bars (SEM or 95%CI) would you conclude there’s a difference in how fast the fish swim between the shark and shark-free group? Would you conclude this difference is “statistically significant”? Why? Since the error bars in each graph do not overlap, one would presume that the difference in the velocities between the two groups is statistically significant, and the fish in the shark-filled tank do indeed swim faster than those in a shark-free environment To be certain that there is a statistically significant difference between the two groups a statistical test, referred to as the Student’s t-test, can be performed.

Inferential statistics: The Null hypothesis (H0), α (Alpha) level and Student’s t-Test Statistical tests evaluate the null hypothesis (H0). The null hypothesis states that there is no difference between two sample means ( 1 = 2), and therefore by inference, no difference between two population means (μ1 = μ2). In our Blue-Tang fish example, the null hypothesis would state that the mean velocity of fish in the shark-free tank equals the mean velocity of fish in the shark-infested tank. Any differences between the mean velocities would purely be by chance, and not statistically significant. The alternative hypothesis (H1) states that there is a statistically significant difference between the two means. When performing a statistical test to either accept or reject H0, a significance level, or Alpha (α) level, is set. The α level is the probability that you’ll reject the null hypothesis when it’s actually true. Scientists usually set the α level at 0.05, meaning that when you perform your statistical test to accept or reject H0, you can be 95% certain that your conclusion is accurate.

Inferential statistics: The Null hypothesis (H0), α (Alpha) level and Student’s t-Test The Student’s t-test is the statistical test used to determine whether to accept or reject the null hypothesis ( 1 = . 2). It tells us if two data sets are indeed significantly different. To perform the Student’s t-test, first calculate the tobs : = sample mean = sample variance* = # of values * = The standard deviation, squared =

+ = 5.77 Inferential statistics: Student’s t-Test continued… -22.63 Calculate the tobs for Blue-Tang velocity data: Shark-free tank Shark-filled tank   20.2 35.5 17.8 40.2 19.3 38.8 22.3 42.5 25.6 60.8 24.7 47.9 Mean 21.65 44.28 SD 3.09 9.09 SEM 1.26 3.71 95% CI 2.52 7.42 Velocities of Blue-Tang fish (mph) (absolute value) 21.65 – 44.28 3.092 + 9.092 6 6 -22.63 22.63 = 5.77 15.36 1.59 + 13.77

Inferential statistics: Student’s t-Test continued… Once the tobs has been calculated, compare it with the tcritical value for the appropriate α-level and degrees of freedom (df). The significance level (α) we are using here is 0.05 Degrees of freedom = (n1 + n2) – 2 ; For our fish study, DF = (6+6) -2 = 10 If the tobs > tcrit, then H0 is rejected. If tobs < tcrit, then you cannot reject the null hypothesis (ie – the difference between the two groups is purely by chance). tcrit for our study = 2.23 tobs = 5.77 Since tobs is greater than tcrit, we can reject the null hypothesis and say with 95% confidence (α=0.05) that there is a difference in the mean velocities of Blue-Tang fish and it is statistically significant!

Analyzing frequencies : The Chi-Square Test The examples and analyses done so far have used sample populations that have quantitative numerical measurements (ie – velocities, beak size, body mass, etc). But what about when you want to analyze the chance of something occurring? (ie – heads vs. tails in a coin toss, frequency of a particular genotype in offspring, link between disease exposure and death, etc). The Chi-Square Test allows us to determine whether a frequency pattern in a sample population is statistically significant, or has occurred simply by chance. The following equation calculates the Chi-Square test statistic: Performing the Chi-Square test is similar to the Student’s t-test in that the objective is to accept or reject the null hypothesis (H0). The null hypothesis (H0) states that any difference between the observed frequency and the expected frequency is purely by chance, and not statistically significant. = Chi-Square value = observed values = expected values

Analyzing frequencies : The Chi-Square Test Steps to performing a Chi-Square Analysis: Establish what the H0 is for your study. Determine the degrees of freedom (n-1). In this case, df refers to how many traits/characteristics you’re observing. Calculate the Chi-Square (X2) value from observed and expected data. Determine if the X2 value is greater than or less than the critical value. From this assessment, accept or reject the null hypothesis.

Analyzing frequencies : The Chi-Square Test Example: When predicting the birth ratio of boys vs. girls, one would expect the chance of each to be 50%:50%. This is your expected frequency. However, something unusual has been reported in the town of Newton, MA where there is a mysteriously higher frequency of boys vs. girls. Residents are suspicious of a possible environmental factor that could be leading to these skewed birth outcomes. You conduct a study to determine if this observation is true, or pure chance. After collecting data from 50 births in Newton, you observe that 34 were boys and 16 were girls. Is the difference between the observed frequency and expected frequency statistically significant, or is this outcome likely due to chance?

Analyzing frequencies : The Chi-Square Test 1. Define H0: Any difference between the expected frequency of 1:1 (boy:girl) births, and the observed frequency is purely due to chance and not statistically significant. 2. Chi-Square value calculations: Gender Observed (o) Expected (e) (o-e) (o-e)2 (o-e)2/e Boys 34 25 9 81 3.24 Girls 16 -9 X2 = 3.24 + 3.24 = 6.48 3. Determine the degrees of freedom (df): df = # of categories minus 1. Here we have two categories (boys and girls), so df = 2-1 = 1. 4. Use the critical values table (next slide) to determine if the probability (p) of your observed data occurred simply by chance, or if there is a statistically significant difference between the observed and expected values. The p-value is essentially the same as the α-level. Both values represent your significance/confidence level and are usually set at 0.05 (95% confidence).

Analyzing frequencies : The Chi-Square Test If the X2 value > critical value for your significance level (usually set to p=0.05 for 95% confidence) then you can reject the H0 and conclude that the difference between observed and expected data is statistically significant. In our example, X2=6.48 and the critical value for df=1 (p=0.05) = 3.841. Because X2 > critical value, we can reject the null hypothesis that the difference in frequency of boy vs. girl births is due to chance. We are 95% confident that the difference is not due to chance, meaning there must be some other explanation as to why there are so many more boys than girls born in Newton. Furthermore, because 6.48 lies between p-values of 0.025 and 0.01, we can say with 97.5%-99% confidence that the greater population of male newborns is highly statistically significant.