You have data! What’s next? Data Analysis, Your Research Questions, and Proposal Writing Zoo 511 Spring 2014.

Slides:



Advertisements
Similar presentations
Assumptions underlying regression analysis
Advertisements

Simple Linear Regression Analysis
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Forecasting Using the Simple Linear Regression Model and Correlation
Inference for Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Chapter 13 Multiple Regression
Correlation and Regression. Spearman's rank correlation An alternative to correlation that does not make so many assumptions Still measures the strength.
Chapter 12 Multiple Regression
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter Topics Types of Regression Models
Linear Regression and Correlation Analysis
Chapter 11 Multiple Regression.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Lecture 23 Multiple Regression (Sections )
Linear Regression Example Data
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Business Statistics - QBM117 Statistical inference for regression.
Correlation and Regression Analysis
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Simple Linear Regression Analysis
ANNOUNCEMENTS Ecology job fair: March 1 st (tomorrow!) 10:00-2:00, Birge Hall Atrium FOR TODAY Grab all 4 handouts in front Get computer, download “stats.
Correlation & Regression
Regression and Correlation Methods Judy Zhong Ph.D.
Introduction to Linear Regression and Correlation Analysis
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
Introduction to Linear Regression
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
MBP1010H – Lecture 4: March 26, Multiple regression 2.Survival analysis Reading: Introduction to the Practice of Statistics: Chapters 2, 10 and 11.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
Data Analysis.
Lecture 10: Correlation and Regression Model.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
PCB 3043L - General Ecology Data Analysis.
ANOVA, Regression and Multiple Regression March
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Statistics for Managers Using Microsoft® Excel 5th Edition
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
BUSINESS MATHEMATICS & STATISTICS. Module 6 Correlation ( Lecture 28-29) Line Fitting ( Lectures 30-31) Time Series and Exponential Smoothing ( Lectures.
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
Chapter 13 Simple Linear Regression
The simple linear regression model and parameter estimation
Chapter 4 Basic Estimation Techniques
Inference for Least Squares Lines
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
Statistics for Managers using Microsoft Excel 3rd Edition
Correlation and Simple Linear Regression
CHAPTER 29: Multiple Regression*
Correlation and Simple Linear Regression
When You See (This), You Think (That)
Simple Linear Regression and Correlation
Algebra Review The equation of a straight line y = mx + b
3.2. SIMPLE LINEAR REGRESSION
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

You have data! What’s next? Data Analysis, Your Research Questions, and Proposal Writing Zoo 511 Spring 2014

Part 1: Research Questions

Write down > 2 things you thought were interesting or engaging during the field trip (can be a species, a habitat feature, a relationship, etc). You can phrase these as questions, but you don’t have to yet.

Part 1: Research Questions What makes a good question?

Your questions should be specific and answerable Does sculpin CPUE differ among geomorphic units? Is brown trout density related to flow velocity? In what kind of stream are brown trout most likely to be found? What habitat do fish prefer? NOT SO USEFULUSEFUL

Current Velocity (m/s) Brown Trout/m 2 POOLRUNRIFFLE Sculpin CPUE …and statistically testable Does sculpin CPUE differ among geomorphic units? Is brown trout density related to flow velocity?

Part 2: Statistics How do we find the answer to our question?

Why use statistics? Are there more green sunfish in pools or runs? Run Pool Statistics help us find patterns in the face of variation, and draw inferences beyond our sample sites Statistics help us tell our story; they are not the story in themselves! ??

Statistics Vocab (take notes on your worksheet) Categorical Variable: Discrete groups, such as Type of Reach (Riffle, Run, Pool) Continuous Variable: Measurements along a continuum, such as Flow Velocity What type of variable is “Mottled Sculpin /meter 2” ? What type of variable is “Substrate Type”?

Explanatory/Predictor Variable: Independent variable. On x-axis. The variable you use to predict another variable. Response Variable: Dependent variable. On y-axis. The variable that is hypothesized to depend on/be predicted by the explanatory variable. Statistics Vocab

Mean: The most likely value of a random variable or set of observations if data are normally distributed (the average) Variance: A measure of how far the observed values differ from the expected variables (Standard deviation is the square root of variance). Normal distribution: a symmetrical probability distribution described by a mean and variance. An assumption of many standard statistical tests. N~(μ 1,σ 1 ) N~(μ 1,σ 2 )N~(μ 2,σ 2 ) Statistics Vocab

Hypothesis Testing: In statistics, we are always testing a Null Hypothesis (H o ) against an alternate hypothesis (H a ). p-value: The probability of observing our data or more extreme data assuming the null hypothesis is correct Statistical Significance: We reject the null hypothesis if the p-value is below a set value (α), usually Statistics Vocab

What test do you need? For our data, the response variable will probably be continuous. T-test: A categorical explanatory variable with only 2 options. ANOVA: A categorical explanatory variable with >2 options. Regression: A continuous explanatory variable

Tests the statistical significance of the difference between means from two independent samples Student’s T-Test Null hypothesis: No difference between means.

Cross Plains Salmo Pond Mottled Sculpin/m 2 Compares the means of 2 samples of a categorical variable p = 0.09

Analysis of Variance (ANOVA) Tests the statistical significance of the difference between means from two or more independent groups Riffle Pool Run Mottled Sculpin/m 2 Null hypothesis: No difference between means p = 0.03

Precautions and Limitations Meet Assumptions Samples are independent Assumed equal variance (this assumption can be relaxed) Variance not equal sculpin density in poolssculpin density in runs

Precautions and Limitations Meet Assumptions Samples are independent Assumed equal variance (this assumption can be relaxed) Observations from data with a normal distribution (test with histogram)

Precautions and Limitations Meet Assumptions Samples are independent Assumed equal variance (this assumption can be relaxed) Observations from data with a normal distribution (test with histogram) No other sample biases

Simple Linear Regression Analyzes relationship between two continuous variables: predictor and response Null hypothesis: there is no relationship (slope=0)

Residuals Least squared line (regression line: y=mx+b)

Residuals Residuals are the distances from observed points to the best-fit line Residuals always sum to zero Regression chooses the best-fit line to minimize the sum of square-residuals. It is called the Least Squares Line.

Precautions and Limitations Meet Assumptions Relationship is linear (not exponential, quadratic, etc) X is measured without error Y values are measured independently Normal distribution of residuals

Have we violated any assumptions?

Residual Plots Can Help Test Assumptions 0 “Normal” Scatter 0 Fan Shape: Unequal Variance 0 Curve (linearity)

if assumptions are violated Try transforming data (log transformation, square root transformation) Most of these tests are robust to violations of assumptions of normality and equal variance (only be concerned if obvious problems exist) Diagnostics (residual plots, histograms) should NOT be reported in your paper. Stating that assumptions were tested is sufficient.

Precautions and Limitations Meet Assumptions Relationship is linear (not exponential, quadratic, etc) X is measured without error Y values are measured independently Normal distribution of residuals Interpret the p-value and R-squared value

Residuals

P-value: probability of observing your data (or more extreme data) if no relationship existed - Indicates the strength of the relationship, tells you if your slope (i.e. relationship) is non- zero (i.e. real) R-Squared: indicates how much variance in the response variable is explained by the explanatory variable -Does not indicate significance

R-Squared and P-value High R-Squared Low p-value (significant relationship)

R-Squared and P-value Low R-Squared Low p-value (significant relationship)

R-Squared and P-value High R-Squared High p-value (NO significant relationship)

R-Squared and P-value Low R-Squared High p-value (No significant relationship)

We just talked about: Types of variables 3 statistical tests: t-test, ANOVA, linear regression When to use these tests How to interpret the test statistics How to be sure you’re meeting assumptions of the tests

Part 3: Proposal

Writing a Proposal What is the function of a proposal? – To get money

Writing a Proposal What is the function of a proposal? What information should go in a proposal? – Research goals/objectives/hypotheses/questions – Why does this matter? (Rationale) – Procedure / Methods – Future directions / implications – Budget/cost analysis – Expected results

Other data you can use Previous years’ data on website: all of the same information was collected from the same place, around the same time of year. Replication! USGS: Background info: from the Upper Sugar River Watershed Association Think about these data sources as you generate your questions.