PCB 3043L - General Ecology Data Analysis.

Slides:



Advertisements
Similar presentations
Data Freshman Clinic II. Overview n Populations and Samples n Presentation n Tables and Figures n Central Tendency n Variability n Confidence Intervals.
Advertisements

Chapter 19 Data Analysis Overview
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Statistical Analysis I have all this data. Now what does it mean?
DATA ANALYSIS FOR RESEARCH PROJECTS
Statistical Analysis Statistical Analysis
Fall 2013 Lecture 5: Chapter 5 Statistical Analysis of Data …yes the “S” word.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
 IWBAT summarize data, using measures of central tendency, such as the mean, median, mode, and midrange.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Chapter 15 Data Analysis: Testing for Significant Differences.
Statistical Analysis Mean, Standard deviation, Standard deviation of the sample means, t-test.
Statistical Analysis I have all this data. Now what does it mean?
Beak of the Finch Natural Selection Statistical Analysis.
Basic Statistics for Engineers. Collection, presentation, interpretation and decision making. Prof. Dudley S. Finch.
Lecture 5: Chapter 5: Part I: pg Statistical Analysis of Data …yes the “S” word.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
Statistics in Biology. Histogram Shows continuous data – Data within a particular range.
Introduction to Inferential Statistics Statistical analyses are initially divided into: Descriptive Statistics or Inferential Statistics. Descriptive Statistics.
Understanding Your Data Set Statistics are used to describe data sets Gives us a metric in place of a graph What are some types of statistics used to describe.
Chapter Eight: Using Statistics to Answer Questions.
PCB 3043L - General Ecology Data Analysis. PCB 3043L - General Ecology Data Analysis.
Data Analysis.
PCB 3043L - General Ecology Data Analysis.
Statistical analysis Why?? (besides making your life difficult …)  Scientists must collect data AND analyze it  Does your data support your hypothesis?
Descriptive Statistics Used in Biology. It is rarely practical for scientists to measure every event or individual in a population. Instead, they typically.
PCB 3043L - General Ecology Data Analysis Organizing an ecological study What is the aim of the study? What is the main question being asked? What are.
AP PSYCHOLOGY: UNIT I Introductory Psychology: Statistical Analysis The use of mathematics to organize, summarize and interpret numerical data.
Chapter 18 Data Analysis Overview Yandell – Econ 216 Chap 18-1.
Methods of Presenting and Interpreting Information Class 9.
Outline Sampling Measurement Descriptive Statistics:
Statistics in Forensics
Descriptive and Inferential Statistics
CHAPTER 12 More About Regression
Statistical analysis.
Regression and Correlation
Statistics in Management
CHAPTER 13 Data Processing, Basic Data Analysis, and the Statistical Testing Of Differences Copyright © 2000 by John Wiley & Sons, Inc.
MATH-138 Elementary Statistics
Doc.RNDr.Iveta Bedáňová, Ph.D.
Measures of Dispersion
Inference and Tests of Hypotheses
Modify—use bio. IB book  IB Biology Topic 1: Statistical Analysis
Statistical analysis.
Statistics.
CHAPTER 12 More About Regression
Chapter 5 STATISTICS (PART 1).
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
NUMERICAL DESCRIPTIVE MEASURES
STATISTICS For Research
Introduction to Inferential Statistics
Research Statistics Objective: Students will acquire knowledge related to research Statistics in order to identify how they are used to develop research.
1.3 Data Recording, Analysis and Presentation
Descriptive and inferential statistics. Confidence interval
STATISTICS Topic 1 IB Biology Miss Werba.
CHAPTER 12 More About Regression
Welcome!.
Product moment correlation
Data Processing, Basic Data Analysis, and the
15.1 The Role of Statistics in the Research Process
CHAPTER 12 More About Regression
(-4)*(-7)= Agenda Bell Ringer Bell Ringer
Chapter Nine: Using Statistics to Answer Questions
Practice As part of a program to reducing smoking, a national organization ran an advertising campaign to convince people to quit or reduce their smoking.
Inferences for Regression
BUSINESS MARKET RESEARCH
Presentation transcript:

PCB 3043L - General Ecology Data Analysis

OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data Why use statistics? Describing data Measures of central tendency Measures of spread Normal distributions Using Excel Producing tables Producing graphs Analyzing data Statistical tests T-Tests ANOVA Regression

Organizing an ecological study What is the aim of the study? What is the main question being asked? What are your hypotheses? Collect data Summarize data in tables Present data graphically Statistically test your hypotheses Analyze the statistical results Present a conclusion to the proposed question

http://assets.cambridge.org/052166/005X/sample/052166005Xwsc00.pdf

Basic sampling terminology Variables Populations Samples Parameters Statistics

What is a variable? Variable: any defined characteristic that varies from one biological entity to another. Examples: plant height, bird weight, human eye color, N° of tree species If an individual is selected randomly from a population, it may display a particular height, weight, etc. If several individuals are selected, their characteristics may be very similar or very different.

What is a population? Population: the entire collection of measurements of a variable of interest. Example: if we are interested in the heights of pine trees in Everglades National Park (Plant height is our variable) then our population would consist of all the pine trees in Everglades National Park .

What is a sample? Sample: smaller groups or subsets of the population which are measured and used to estimate the distribution of the variable within the true population Example: the heights of 100 pine trees in Everglades National Park may be used to estimate the heights of trees within the entire population (which actually consists of thousands of trees) How do you think you can increase your accuracy on your study?

What is a parameter? Parameter: any calculated measure used to describe or characterize a population Example: the average height of pine trees in Everglades National Park

What is a statistic? Statistic: an estimate of any population parameter Example: the average height of a sample of 100 pine trees in Everglades National Park

Why use statistics? It is not always possible to obtain measures and calculate parameters of variables for the entire population of interest. Statistics allow us to estimate these values for the entire population based on multiple, random samples of the variable of interest. The larger the number of samples, the closer the estimated measure is to the true population measure. Statistics also allow us to efficiently compare populations to determine differences among them. Statistics allow us to determine relationships between variables.

Statistical analysis of data Heights of pine trees at 2 sites in Everglades National Park Site 1 Site 2 5 4 7 2 3 8 6 Measures of central tendency Measures of dispersion and variability

Measures of central tendency Where is the center of the distribution? mean ( or μ): arithmetic mean…… median: the value in the middle of the ordered data set mode: the most commonly occurring value Example data set : 1, 2, 2, 2, 3, 5, 6, 7, 8, 9, 10 Mean = (1 + 2 + 2 + 2+ 3 + 5 + 6 + 7 + 8 + 9 + 10)/11 = 55/11 = 5 Median = 1, 2, 2, 2, 3, 5, 6, 7, 8, 9,10 = 5 1, 2, 2, 2, 3, 5, 6, 7, 8, 9,10,11 = (5+6)/2 = 5.5 Mode = 1, 2, 2, 2, 3, 5, 6, 7, 8, 9, 10 = 2

Measures of dispersion and variability How widely is the data distributed? range: largest value minus smallest value variance (s2 or σ2) ………….…………. standard deviation (s or σ)………………… Small spread=more clustered data Large spread Small spread

Same mean but variance are different

Measures of dispersion and variability Example data set: 0, 1, 3, 3, 5, 5, 5, 7, 7, 9, 10 Variance = 9.8 Standard Deviation = 3.13 Range = 10 Example data set: 0, 10, 30, 30, 50, 50, 50, 70, 70, 90, 100 Variance = 980 Standard Deviation = 31.30 Range = 100

Normal distribution of data A data set in which most values are around the mean, with fewer observations towards the extremes of the range of values The distribution is symmetrical about the mean gaussian

Proportions of a Normal Distribution A normal population of 1000 body weights μ = 70kg σ = 10kg 500 weights are > 70kg 500 weights are < 70 kg

Proportions of a Normal Distribution How many bears have a weight > 80kg μ = 70kg σ = 10kg X = 80kg We use an equation to tell us how many standard deviations from the mean the X value is located: = = We then use a special table to tell us what proportion of a normal distribution lies beyond this Z value This proportion is equal to the probability of drawing at random a measurement (X) greater than 80kg Z = X – μ σ Z = 80 – 70 10 1

Z table Look for Z value on table (1.0) Find associated P value (0.1587) P value states there is a 15.87% ((0.1587/1)x100) chance that a bear selected from the population of 1000 bears measured will have a weight greater than 80kg

Probability distribution tables There are multiple probability tables for different types of statistical tests. e.g. Z-Table, t-Table, Χ2-Table Each allows you to associate a “critical value” with a “P value” This P value is used to determine the significance of statistical results

Using Excel Program used to organize data Produce tables Perform calculations Make graphs Perform statistical tests

Organizing data in tables Allows you to arrange data in a format that is best for analysis The following are the steps you would use:

Performing calculations Allows you to perform several calculations Sum, Average, Variance, Standard deviation Basic subtraction, addition, multiplication More complex formulas

Making graphs Bar Charts……. Scatter Plots…………………. Bar Charts- adding error bars Frequency Histograms Using the ‘countif’ function Scatter Plots Adding trendlines

Making graphs Bar Charts……. Scatter Plots…………………. Bar Charts- adding error bars Frequency Histograms Using the ‘countif’ function Scatter Plots Adding trendlines

Analyzing Data in Excel Statistical tests can be done to determine: Whether or not there is a significant difference between two data sets (Student’s t-test) Whether or not there is a significant difference between more than two data sets (ANOVA) Whether or not there is a significant relationship between two variables (Regression analysis)

Analyzing Data in Excel The following steps must be followed: Choose an appropriate statistical test State H0 and HA Run test to produce Test Statistic Examine P-value Decide to accept or reject H0

Analyzing Data in Excel Normally, you would have to calculate the critical value and look up the P value on a table All tests done in Excel provide the P value for you This P value is used to determine the significance of statistical results This P value must be compared to an α value α value is usually 0.05 or less (e.g. 0.01) Less than 5% chance that the null hypothesis is true The lower the α value the more certain we about rejecting the null Hypothesis First thing you must do is select which statistical test you want to perform This is how it is done……..

t-Tests Used to compare the means of two populations and answer the question: Is there a significant difference between the two populations? Example: Is there a significant difference between the average height of pine trees from 2 sites in Everglades National Park? You cannot use this test to compare two different types of data (e.g. water depth data and soil depth data). It can only compare two sets of data based on the same data type (e.g. water depth data from two different sites) The two data sets that are being compared must be presented in the same units. (e.g. you can compare two sets of data if both are recorded in days. You cannot compare data recorded in units of days with data recorded in units of months)

Your Null Hypothesis is always: 1. Choose an appropriate statistical test 2. State H0 and HA 3. Run test to produce Test Statistic 4. Examine P-value 5. Decide to accept or reject H0 Your Null Hypothesis is always: There is no significant difference between the two compared populations (μ1= μ2) Your Alternative Hypothesis is always: There is a difference between the two compared populations (μ1 ≠ μ2)

1. Choose an appropriate statistical test 2. State H0 and HA 3 1. Choose an appropriate statistical test 2. State H0 and HA 3. Run test to produce Test Statistic 4. Examine P-value 5. Decide to accept or reject H0

t-Tests When you run the test, look for the p-value 1. Choose an appropriate statistical test 2. State H0 and HA 3. Run test to produce Test Statistic 4. Examine P-value 5. Decide to accept or reject H0 When you run the test, look for the p-value If p > 0.05 then fail to reject your Null Hypothesis and state that “there is no significant difference between the two compared populations” If p < 0.05 then reject your Null Hypothesis and state that “there is a significant difference between the two compared populations”

t-Tests When you run the test, look for the p-value 1. Choose an appropriate statistical test 2. State H0 and HA 3. Run test to produce Test Statistic 4. Examine P-value 5. Decide to accept or reject H0 When you run the test, look for the p-value Our results show P = 0.09903 Therefore P > 0.05 (This means that there is greater than a 5% chance that our null hypothesis is true) So we must fail to reject the Null Hypothesis and state that “there is no significant difference between the two compared populations”

ANOVA Used to compare the means of more than two populations and answer the question: Is there a significant difference between the populations? Example: Is there a significant difference between the average height of pine trees from 4 sites in Everglades National Park? For comparing a particular feature of two or more populations, use a Single Factor ANOVA For comparing a particular feature of two or more populations, subdivided into two groups, use a Two Factor ANOVA

Your Null Hypothesis is always: 1. Choose an appropriate statistical test 2. State H0 and HA 3. Run test to produce Test Statistic 4. Examine P-value 5. Decide to accept or reject H0 Your Null Hypothesis is always: There is no significant difference between the compared populations (μ1 = μ2 = μ3 = μ4 …..) Your Alternative Hypothesis is always: There is a difference between the compared populations (μ1 ≠ μ2 ≠ μ3 ≠ μ4 …..)

1. Choose an appropriate statistical test 2. State H0 and HA 3 1. Choose an appropriate statistical test 2. State H0 and HA 3. Run test to produce Test Statistic 4. Examine P-value 5. Decide to accept or reject H0

ANOVA When you run the test, look for the p-value 1. Choose an appropriate statistical test 2. State H0 and HA 3. Run test to produce Test Statistic 4. Examine P-value 5. Decide to accept or reject H0 When you run the test, look for the p-value If p > 0.05 then fail to reject your Null Hypothesis and state that “there is no significant difference between the compared populations” If p < 0.05 then reject your Null Hypothesis and state that “there is a significant difference between at least two of the compared populations”

ANOVA When you run the test, look for the p-value 1. Choose an appropriate statistical test 2. State H0 and HA 3. Run test to produce Test Statistic 4. Examine P-value 5. Decide to accept or reject H0 When you run the test, look for the p-value Our results show P = 0.002197 Therefore P < 0.05 (This means that there is less than a 5% chance that our null hypothesis is true) So we must reject your Null Hypothesis and state that “there is a significant difference between at least two of the compared populations”

ANOVA Remember: The ANOVA result will only tell you that None of the data sets are significantly different from each other OR At least two of the data sets among the data sets being compared are significantly different If there is a significant difference between at least two data sets, it will not tell you which two.

Two-way ANOVA Used to compare the means of more than two populations that are subdivided into two or more groups and answer the question: Is there a significant difference between the populations? Example: Is there a significant difference between the average height of pine trees from 4 sites in Everglades National Park, during the wet and dry season?

1. Choose an appropriate statistical test 2. State H0 and HA 3 1. Choose an appropriate statistical test 2. State H0 and HA 3. Run test to produce Test Statistic 4. Examine P-value 5. Decide to accept or reject H0

Two-way ANOVA When you run the test, look for the interaction p-value 1. Choose an appropriate statistical test 2. State H0 and HA 3. Run test to produce Test Statistic 4. Examine P-value 5. Decide to accept or reject H0 When you run the test, look for the interaction p-value If p > 0.05 then fail to reject your Null Hypothesis and state that “there is no significant difference between the compared populations” If p < 0.05 then reject your Null Hypothesis and state that “there is a significant difference between at least two of the compared populations”

Two-way ANOVA Our results show P = 0.2888 1. Choose an appropriate statistical test 2. State H0 and HA 3. Run test to produce Test Statistic 4. Examine P-value 5. Decide to accept or reject H0 Our results show P = 0.2888 Therefore P > 0.05 (This means that there is a greater than a 5% chance that our null hypothesis is true) So we must fail to reject the Null Hypothesis and state that “there is no significant difference between the compared populations”

Regression analysis Used to determine whether or not there is a linear relationship between two variables and answer the question: Is there a significant linear relationship between two variables? Example: Is there a significant relationship between the average height of pine trees and soil depth in Everglades National Park? It basically creates an equation (or line) that best predicts Y values based on X values. You cannot use this test to compare populations. It only compares variables. You are looking at two different variables (e.g. water depth (cm) and plant abundance (no. of individuals), so the data sets do not have to be presented in the same units

Your Null Hypothesis is always: 1. Choose an appropriate statistical test 2. State H0 and HA 3. Run test to produce Test Statistic 4. Examine P-value 5. Decide to accept or reject H0 Your Null Hypothesis is always: There is no significant linear relationship between the two variables Your Alternative Hypothesis is always: There is a significant linear relationship between the two variables

The closer R square is to 0, the less well it fits the data. R squared: how well “y” can be predicted by “x”, i.e. how strong the linear relationship is between the two variables. The closer R square is to 0, the less well it fits the data. The closer R square is to 1, more it fits the data. Example: R square value of 0.04 The regression line does not fit the data well Many of the points lie far from the line, so there is not a defined linear relationship between the two variables “x” cannot be used to predict “y” Example: R square value of 0.94 The regression line fits the data well The points all lie fairly close to the line, so there is a defined linear relationship between the two variables “x” can be used to predict “y”

1. Choose an appropriate statistical test 2. State H0 and HA 3 1. Choose an appropriate statistical test 2. State H0 and HA 3. Run test to produce Test Statistic 4. Examine P-value 5. Decide to accept or reject H0

Regression analysis 1. Choose an appropriate statistical test 2. State H0 and HA 3. Run test to produce Test Statistic 4. Examine P-value 5. Decide to accept or reject H0 When you run the test, look for the Significance F or Sample p-value If p > 0.05 then fail to reject your Null Hypothesis and state that “There is no significant linear relationship between the two variables” If p < 0.05 then reject your Null Hypothesis and state that “There is a significant linear relationship between the two variables”

Regression analysis When you run the test, look for the p-value 1. Choose an appropriate statistical test 2. State H0 and HA 3. Run test to produce Test Statistic 4. Examine P-value 5. Decide to accept or reject H0 When you run the test, look for the p-value Our results show Significance F or Sample p-value = 1.65E08 = 0.0000000165 Therefore P < 0.05 (This means that there is less than a 5% chance that our null hypothesis is true) So we must reject your Null Hypothesis and state that “There is a significant linear relationship between the two variables” Next look at the R squared value Our results show R squared = 0.975 Therefore the line fits the data well “x” can be used to predict “y”

Ecological study What is the aim of the study? What is the main question being asked? What are your hypotheses? Collect data Summarize data in tables Present data graphically Statistically test your hypotheses Analyze the statistical results Present a conclusion to the proposed question

Distance from trail (m) Aim: To determine whether or not there are changes in heights of Pine trees with distance from the edge of a forest trail in Everglades National Park. Hypotheses: HO: There is no significant relationship between distance from the edge of the trail and Pine tree height HA: There is a significant relationship between distance from the edge of the trail and Pine tree height Results: Discussion/Conclusion: The gap created by the trail may be adversely affecting Pine trees, such that they are shorter near the trail and become taller with distance from the trail. Average tree height of pine trees along transect from forest trail to interior forest at ENP Distance from trail (m) Plant heights (m) 2.1 5 2.7 10 2.9 15 3.1 20 3.4 25 3.7 30 3.8 35 4.5 40 4.6 45 4.8 50 5.6 SUM 41.2 AVERAGE 3.74 STANDARD DEVIATION 1.04 P = 1.65E-08 Since P < 0.05, reject Ho Therefore, there is a significant relationship between distance from the edge of the trail and Pine tree height R Square = 0.97, so there is a strong positive linear relationship between distance from the trail and plant height

Assignment – Worksheet 1 Three questions: T-test Single factor ANOVA and Two-way ANOVA Regression analysis