Download presentation
Presentation is loading. Please wait.
Published byAileen Logan Modified over 9 years ago
1
Robert Plant != Richard Plant
2
Sample Data Response, covariates Predictors Remotely sensed Build Model Uncertainty Maps Covariates Direct or Remotely sensed Training Data Test Data Predictive Map The Model Statistics Qualify, Prep Qualify, Prep Qualify, Prep Predict Summarize Predicted Values Validate Randomness Inputs Outputs Repeated Over and Over Field Data Response, coordinates Processes Temp Data Random split? May be the same data
3
Cross-Validation Split the data into training (build model) and test (validate) data sets Leave-p-out cross-validation –Validate on p samples, train on remainder –Repeated for all combinations of p Non-exhaustive cross-validation –Leave-p-out cross-validation but only on a subset of possible combinations –Randomly splitting into 30% test and 70% training is common
4
K-fold Cross Validation 1 2 3 4 5 6 7 8 9 10 Training Test
5
Bootstrapping Drawing N samples from the sample data (with replacement) Building the model Repeating the process over and over
6
Random Forest N samples drawn from the data with replacement Repeated to create many trees –A “random forest” “Splits” are selected based on the most common splits in all the trees Bootstrap aggregation or “Bagging”
7
Boosting Can a set of weak learners create a single strong learner? (Wikipedia) –Lots of “simple” trees used to create a really complex tree "convex potential boosters cannot withstand random classification noise,“ –2008 Phillip Long (at Google) and Rocco A. Servedio (Columbia University)
8
Boosted Regression Trees BRTs combine thousands of trees to reduce deviance from the data Currently popular More on this later
9
Sensitivity Testing Injecting small amounts of “noise” into our data to see the effect on the model parameters. –Plant The same approach can be used to model the impact of uncertainty on our model outputs and to make uncertainty maps
10
Jackknifing Trying all combinations of covariates
11
Extrapolation vs. Prediction Modeling: Creating a model that allows us to estimate values between data Extrapolation: Using existing data to estimate values outside the range of our data Extrapolation Prediction From model
12
Building Models Selecting the method Selecting the predictors (“Model Selection”) Optimizing the coefficients/parameters of the model
13
Response Drives Method Occurrences: Maxent, HEMI Binary: GLM with logistic Categorical: Classification Tree Counts: GLM with Poisson Continuous: –Linear for linear –GLM with Gamma for distances –GAM for others Can convert between types when required and appropriate
14
Occurrences to: Binary: –Create a count data set as below –Use the field calculator to convert values >0 to 1 Count: –Take one of your predictor variable rasters and convert it to a polygon mesh –Add an attribute that counts the number of occurrences in each polygon Continuous: –Convert your point data set to a density raster, then convert the raster to points
15
Binary (presence/absence) to: Occurrences: –Remove values that are zero (absences) Count: –Convert one predictor variable to a polygon mesh –Add an attribute that sums the counts in each polygon Continuous: –To create a density of presences: Remove zero values Convert point data to density raster –To create a mean value: Convert one predictor variable to a polygon mesh Add an attribute that counts the number of presence points in each polygon Add an attribute that counts the number of absences in each polygon –Find the mean of the presence count and the absence count
16
Binary (presence/absence) to: Continuous: –To create a density of presences: Remove zero values Convert point data to density raster –To create a mean value: Convert one predictor variable to a polygon mesh Add an attribute that counts the number of presence points in each polygon Add an attribute that counts the number of absences in each polygon Find the mean of the presence count and the absence count?
17
Count to: Occurrence: –Remove any values with count of 0 Binary: – Add a column and set it to 0 where the count is 0 and 1 where the count is greater than 0 Continuous: –Convert the points to a raster
18
Continuous to: Occurrence: –Remove any values greater than zero (note that this may be height >0 or setting a reasonable threshold) Binary: – Select a threshold and values below that value are 0 and those above are 1 Count: –Direction conversion only makes sense if direction relationship –Otherwise, count points with attribute > value
19
Sample Data Response, covariates Predictors Remotely sensed Build Model Uncertainty Maps Covariates Direct or Remotely sensed Training Data Test Data Predictive Map The Model Statistics Qualify, Prep Qualify, Prep Qualify, Prep Predict Summarize Predicted Values Validate Randomness Inputs Outputs Repeated Over and Over Field Data Response, coordinates Processes Temp Data Random split? May be the same data
20
Model Selection Need a method to select the “best” set of predictors –Really to select the best method, predictors, and coefficients (parameters) Should be a balance between fitting the data and simplicity –R 2 – only considers fit to data (but linear regression is pretty simple)
21
Simplicity Everything should be made as simple as possible, but not simpler. –Albert Einstein "Albert Einstein Head" by Photograph by Oren Jack Turner, Princeton, licensed through Wikipedia
22
Parsimony “…too few parameters and the model will be so unrealistic as to make prediction unreliable, but too many parameters and the model will be so specific to the particular data set so to make prediction unreliable.” –Edwards, A. W. F. (2001). Occam’s bonus. p. 128– 139; in Zellner, A., Keuzenkamp, H. A., and McAleer, M. Simplicity, inference and modelling. Cambridge University Press, Cambridge, UK.
23
Parsimony Anderson Under fitting model structure …included in the residuals Over fitting residual variation is included as if it were structural Parsimony
24
Akaike Information Criterion AIC K = number of estimated parameters in the model L = Maximized likelihood function for the estimated model
25
AIC Only a relative meaning Smaller is “better” Balance between complexity: –Over fitting or modeling the errors –Too many parameters And bias –Under fitting or the model is missing part of the phenomenon we are trying to model –Too few parameters
26
Likelihood
28
-2 Times Log Likelihood
29
p(x) for a fair coin HeadsTails 0.5 What happens as we flip a “fair” coin?
30
p(x) for an unfair coin Heads Tails 0.8 What happens as we flip a “fair” coin? 0.2
31
p(x) for a coin with two heads Heads 1.0 What happens as we flip a “fair” coin? 0.0Tails
32
Does likelihood from p(x) work? if the likelihood is the probability of the data given the parameters, and a response function provides the probability of a piece of data (i.e. probability that this is suitable habitat) we can use the probability that a specific occurrence is suitable as the p(x|Parameters) Thus the likelihood of a habitat model (while disregarding bias) Can be computed by L(ParameterValues|Data)=p(Data1|ParameterValues)*p(Data2|ParameterValues)... Does not work, the highest likelihood will be to have a model with 1.0 everywhere, have to divide the model by it’s area so the area under the model = 1.0 Remember: This only works when comparing the same dataset!
33
Akaike…
34
AICc
35
BIC Bayesian Information Criterion Adds n (number of samples)
36
Extra slides
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.