Data Analysis
A Few Necessary Terms Categorical Variable: Discrete groups, such as Type of Reach (Riffle, Run, Pool) Continuous Variable: Measurements along a continuum, such as Flow Velocity What type of variable would “Mottled Sculpin /meter2” be? What type of variable is “Substrate Type”? What type of variable is “% of bank that is undercut”?
A Few Necessary Terms Explanatory Variable: Independent variable. On x-axis. The variable you use as a predictor. Response Variable: Dependent variable. On y-axis. The variable that is hypothesized to depend on/be predicted by the explanatory variable.
Statistical Tests: Appropriate Use For our data, the response variable will always be continuous. T-test: A categorical explanatory variable with 2 options. ANOVA: A categorical explanatory variable with >2 options. Regression: A continuous explanatory variable
Statistical Tests Hypothesis Testing: In statistics, we are always testing a Null Hypothesis (Ho) against an alternate hypothesis (Ha). Test Statistic: p-value: The probability of observing our data or more extreme data assuming the null hypothesis is correct Statistical Significance: We reject the null hypothesis if the p-value is below a set value, usually 0.05.
Student’s T-Test Tests the statistical significance of the difference between means from two independent samples
Compares the means of 2 samples of a categorical variable Mottled Sculpin/m2 Cross Plains Salmo Pond
Precautions and Limitations Meet Assumptions Observations from data with a normal distribution (histogram) Samples are independent Assumed equal variance (boxplot) No other sample biases Interpreting the p-value
Analysis of Variance (ANOVA) Tests the statistical significance of the difference between means from two or more independent samples Grand Mean Mottled Sculpin/m2 Riffle Pool Run ANOVA website
Precautions and Limitations Meet Assumptions Observations from data with a normal distribution Samples are independent Assumed equal variance No other sample biases Interpreting the p-value Pairwise T-tests to follow
Simple Linear Regression What is it? Least squares line When is it appropriate to use? Assumptions? What does the p-value mean? The R-value? How to do it in excel
Simple Linear Regression Tests the statistical significance of a relationship between two continuous variables, Explanatory and Response
Precautions and Limitations Meet Assumptions Observations from data with a normal distribution Samples are independent Assumed equal variance Relationship is linear No other sample biases Interpret the p-value and R-squared value.
Residual Plots Residuals are the distances from observed points to the best-fit line Residuals always sum to zero Regression chooses the best-fit line to minimize the sum of square-residuals. It is called the Least Squares Line.
Residuals
Residual vs. Fitted Value Plots Observed Values (Points) Model Values (Line)
Residual Plots Can Help Test Assumptions “Normal” Scatter Curve (linearity) Fan Shape: Unequal Variance
Have we violated any assumptions?
R-Squared and P-value High R-Squared Low p-value (significant relationship)
R-Squared and P-value Low R-Squared Low p-value (significant relationship)
R-Squared and P-value High R-Squared High p-value (NO significant relationship)
R-Squared and P-value Low R-Squared High p-value (No significant relationship)
P-value indicates the strength of the relationship between the two variables You can think of this as a measure of predictability R-Squared indicates how much variance is explained by the explanatory variable. If this is low, other variables likely play a role. If this is high, it DOES NOT INDICATE A SIGNIFICANT RELATIONSHIP!