Who will you trust? Field technicians? Software programmers? Statisticians? Instructors? GIS technicians? Other researchers? Yourself?
Regression (Correlation) Modeling Creates a model in N-Dimensional “Hyper-Space” Defined by: Covariates Response variables Mathematics used to create the model Statistics used to optimize parameters Options for model evaluation Predictor variables
Multiple Linear Regression
Linear Regression: 2 Predictors Mathworks.com
Non-Linear Regression
Regression Methods Continuous Regression: Linear Regression Generalized Linear Models (GLM) Generalized Additive Models (GAMs) Categorical Regression (trees): Regression Trees Classification and regression trees (CART) Machine Learning: Maximum Entropy (Maxent) NPMR, HEMI, BRTs, etc.
Brown Shrimp Size Add graph from work
Terminology Plant uses: I prefer: Measured value and response variable Explanatory variable I prefer: Response variable I’ll use “measured value” to identify measured values in field data Covariate: Explanatory variable used to build the model Predictor: Explanatory variable used to predict The covariate and the predictor will be different in cases like predicting effects of climate change in the future.
Douglas Fir Habitat Model 1 Habitat Quality 1000 Precipitation (mm)
Predictor Model Prediction
Model Selection and Parameter Estimation Field Data Covariate Predictor Model Prediction
Model Selection and Parameter Estimation Field or Sample Data Covariate Predictor Model Model Validation Prediction
Douglas-Fir sample data Create the Model Model “Parameters” Precip Extract Prediction To Points Text File Attributes To Raster
Data Response Variable Covariates Predictors From the field data (sample data) Covariates From the field or remotely sensed Predictors Typically remotely sensed Sample as covariates for training Can be different for predicting to new scenarios
Response Variable What is the: Will it answer your question? Spatial uncertainty? Temporal uncertainty? Measurement uncertainty? Will it answer your question?
Covariate Variables What is the: Spatial uncertainty? Temporal uncertainty? Measurement uncertainty? How well does the collection time of the covariates match the field data? Do they co-vary with the phenomena? Do the covariates “correlate”?
Types of uncertainty Accuracy (bias) Precision (repeatability) Reliability (consistency of a set of measurements) Resolution (fineness of detail) Logical consistency Adherence to structural rules, attributes, and relationships Completeness Plant uses Accuracy a little differently
Types of Errors Gross errors Random Systematic errors Transcription Sinks in DEMs Random Estimated using probability theory Systematic errors “Drift” in instruments Dropped lines in Landsat All of these types of uncertainty can be compensated for in some cases.
Gross Errors Lat/Lon: Dates: Measurements: Reversed 0, names, dates, etc. Dates: Extended in databases Measurements: Inconsistent units Inconsistent protocols What can you expect from a field team?
Occurrences of Polar Bears From The Global Biodiversity Information Facility (www.gbif.org, 2011)
Systematic Errors Landsat Scan line Error
Response Variable Qualification Tools Maps (various resolutions) Examine the data values: How many digits? Repeating patterns, gross errors? “Documentation” Measurements: Occurrences? Binary: Histogram Categorical: Histogram Continuous: Histogram
What’s the Impact on Models?
Significant Digits How many digits to represent 1 meter? Geographic: Lat/Lon? UTM: Eastings/Northings?
Significant Digits Geographic: UTM: 1 digit = 1 degree 1 degree ~ 110 km 0.00001 ~ 1.1 meters UTM: 1 digit = 1 meter
Covariate Qualification Maps Documentation Examine the data: How many digits? Integer or floating point? Repeating patterns? Histograms
CONUS Annual Percip.
Covariate Uncertinaty
Min Temp of Coldest Month After applying a filter to the raster
Histograms hist(Temp,breaks=400)
Covariate Correlation Correlation Plots Pearson product-moment correlation coefficient Spearman’s rho – non parametric correlation coefficient
Correlation plots
California Correlations
California Predictors
Response vs. Covariates For Occurrences: Histogram covariates at occurrences vs. overall covariates For Binary Data: Histogram covariates for each value For Categorical Data : Or scatter plots For Continuous Data Scatter plots
Covariate Occurrence Histograms Precipitation with Douglas-Fir Occurrences
Douglas Fir Model In HEMI 2 Green shows a histogram of precipitation for all of California Histograms are scaled to go from 0 to 1 (all values) Green: Histogram of all of California Red: Histogram of Douglas-Fir Occurrences
Doug-Fir Height vs. Precip.
Douglas Fir Height After gridding to coarse grid cells
Terrestrial Predictors Elevation: Slope Aspect Absolute Aspect Distance to: Roads Streams (streamline) Climate Precip Temp Soil Type RS: Landsat MODIS NDVI, etc.
Marine Predictors Temp DO2 Salinity Depth Rugosity (roughness) Current (at depths) Wind
More Complicated Associated species Trophic levels Temporal Cyclical
Predictor Layers Means, mins, maxes Range of values Heterogeneity Spatial layers: Distance to… Topography: elevation, slope, aspect
Field Data and Predictors As close to field measurements as possible Clean and aggregate data as needed Documenting as you go Estimate overall uncertainty Answer the question: What spatial, temporal, and measurement scales are appropriate to model at given the data?
Temporal Issues Divide data into months, seasons, years, decades. Consistent between predictors and response Extract predictors as close to sample location and dates as possible Use the “best” predictor layers
Additional Slides
Dimensions of uncertainty Space Time Attribute Scale Relationships
Basic Tools Histograms: What is the distribution of occurrences of values (range and shape) Scattergrams: What is the relationship between response and predictor variables and between predictor variables QQPlots: Are the residuals normally distributed?
Types of Data “God does not play dice” “the end of certainty” Einstein “the end of certainty” Prigogine, 1977 Nobel Prize What remains is: Quantifiable probability with uncertainty
Uncertainty Factors Inherent uncertainty in the world Limitation of human congnition Limitation of measurement Uncertainty in processing and analysis