Introduction to species distribution Models

1 Introduction to species distribution Models
Species Distribution Modeling (SDM) Introduction to species distribution Models Yacine Kouba (PhD) Pyrenean Institute of Ecology

Basic concepts ECOLOGICAL NICHE What drives species distributions? All species have tolerance limits for environmental factors beyond which individuals cannot survive, grow, or reproduce (Von Liebig, 1840)

Species Distribution Modeling (SDM) Basic concepts TOLERANCE RANGE Lower limit of tolerance Upper limit of tolerance Range of optimum Environmental gradient Low Temperature High

Basic concepts LIMITING FACTORS The geographical range of a species is not always limited by the presence of barrieres that prevent its spread. It is often limited by a particular factor in the environmental that limits it ability to survive, grow or reproduce- These are Limiting Factors

Basic concepts LIMITING FACTORS Limiting factors include ABIOTIC (physical) factors and BIOTIC factors Abiotic factors . Light availability/ day lenght, seasonality . Moisture/ water availability . Temperature, diurnal temperature range . Elevation or depth, pressure . Salinity . Wind . pH, CO2, O2, availability of N, P, K, S, Mg, etc.

Basic concepts LIMITING FACTORS Limiting factors include ABIOTIC (physical) factors and BIOTIC factors Biotic factors . Competition . Predation . Parasitism . Disease . Pollinators, dispersal agents . Suitable food availability # Organism my have a wide range of tolerance to some factors and a narrrow range to other factors

Basic concepts ECOLOGICAL NICHE Miller& Spoolman 2009 Species A has a narrow niche breadth and is a speialist Species B has a wide niche breadth and is a generalist

Basic concepts ECOLOGICAL NICHE What is it in detail? (Peterson et al. 2011)

Basic concepts ECOLOGICAL NICHE What is it in detail? Fundamental (theoretical) niche - Is the full spectrum of environment al factors that can be potentially utilized by an organism Realized (actual) niche -represent a subset of a fundamental niche that the organism can actually utilize restricted by: - dispersal limitations - competitors, predators - realized environment (existent conditions)

Type of SDM What are spatial predictions of species distributions? # Interpolating biological survey data in space #Quantitative predictive models of species-environment relationships – Species abundance – Probability a species is present – Potential species distribution – Habitat suitability – Realized niche In sum: SDMs are empirical models relating field observations to environmental predictor variables based on statistically or theoretically derived methods. (Guisan & Zimmermann 2000)

SDM-overview Species distribution= f(environmental data) Species occurence Environmental data Prediction from model

SDM- overview Rainfull zones suitability Altitude zones suitability Habitat model based on suitable rainfaull and altitude zones in comdination with species observation Species observations

14 Model of niche in ecological dimensions
Species Distribution Modeling (SDM) SDM-overview Geographic Space Environmental Space ecological niche modeling temperature Model of niche in ecological dimensions precipitation Projection back onto geography Native range prediction Invaded range prediction

BASICS OF SPECIES DISTRIBUTION MODELING EmpIrical Model (algorithm) Data Model Peterson et al. 2007

The data model Species data Sources Personal collection: occurrence localities can be obtained during field surveys by an individual or small group of researchers 2) Large surveys: distribution information may be available from surveys undertaken by a large number of people. 3) Museum collections: occurrence localities can be obtained from collections in natural history museums. 4) Online resources: distributional data from a variety of sources are increasing being made available over the internet. For example, the Global Biodiversity Information Facility ( is collating available datasets from a diversity of sources and making the information available online via a searchable web portal.

The data model Species data #Biogeographical scale Point observations, lots of them, opportunistic Scale of analysis km Many to one #Ecological scale Scale of data collection m2 Probability sample designs Scale of analysis m2 One to one • Does scale matter?

The data model Species data #Biogeographical scale

The data model Species data #Ecological scale

Species Distribution Modeling (SDM) The data model Environmental data Species actual distribution Direct vs indirect variables. Direct are those variables directly related with physiological aspects of the species studied. •Biotic Interactions •Disturbance •Dispersal Fundamental niche From Franklin (2009) Water Light (PAR) Heat Sum Mineral nuntrients Temperature regime Soil Precipitation Evapo-transperation Radiation regime Geology Topography Climate

The data model Environmental data Percent slope Mean Annual Precipitation Mean July maximum Mean January minimum Topographic Moisture Index Soil Order Summer Radiation Winter Radiation

The data model Environmental data

The data model Environmental data * Depending on the working scale environmental varible’s relevance change:

The emperical model 1 2 3 4 From Guisan and Zimmermann (2000)

The emperical model “A complete step-evaluation” 1 Conceptual formulation Theory 1 1 Sp data Env. data aborda Does the model address the problem? Does it describe the true relationship? Is the model consistent with ecological theory?

The emperical model “A complete step-evaluation” 2 Statistical formulation Theory 1 1 Sp data Env. data 2 2 Model 1.-Do we have enough and/or representative data? 2.-Do we have the appropiate predictors? 3.-Is the model suitable for the sampling design used? 4.-Did we select a method correctly?

The emperical model “A complete step-evaluation” 3 Model calibration Theory 1 1 Sp data Env. data 3 2 2 Model 1. Cross-validation: - live-one-out -k-fold Cross-validation 2. Bootstrapping 2. Model Averaging (Bayesian methods)

29 Prediction explanation
Species Distribution Modeling (SDM) The emperical model “A complete step-evaluation” 4 Model evaluation Theory 1 1 Sp data Env. data 3 2 4 2 Model What are the measures of prediction accuracy? 4 Prediction explanation

The emperical model “A complete step-evaluation” ### Measures of prediction accuracy Threshold-dependent Test statistics derived from the “Confusion matrix” Theshold-independent AUC

The emperical model “A complete step-evaluation” Threshold-dependent Predictive map (probabilities of species occurance)

The emperical model “A complete step-evaluation” Threshold-dependent The species records constitute the test data. Frequencies of each type of outcome are commonly entered into a confusion matrix

The emperical model “A complete step-evaluation” Confusion (errors) matrix Sensitivity = TP / (TP + FN) Specificity = TN / (TN + FP) % Omission error = 1 - Sensitivity % comission error = 1 - Specificity Predicted Observed Presence Absence SUM TP FP FN TN TR+FN TN+FP n

The emperical model “A complete step-evaluation” Confusion (errors) matrix % Correct classification = (TP+TN) / Total Sensitivity = TP / (TP + FN) Specificity = TN / (TN + FP)

The emperical model “A complete step-evaluation” AUC # Area under the curve (AUC) of receiver operating characteristic (ROC) plot ––AUC compares a probabilistic prediction with a binary outcome ––Proportion of time a prediction of presence will be higher than prediction of absence for a true presence ––range 0.5 (no better than random) to 1.0 (perfect discrimination)

The emperical model Types of algorithms #1. Propfile methods: only consider 'presence' data BIOCLIM: is a classic 'climate-envelope-model‘ Domain algorithm (Carpenter et al. 1993) Mahalanobis (Mahalanobis, 1936) Ecological-Niche Factor Analysis (ENFA; Hirzel et al, 2002) Genetic Algorithm for Rule-set Production (GARP; Stockwell, 1999) #2. Regression models: presence and absence (background) data Generalized Linear Models (GLMs) Generalized Additive Models (GAMs) Bayesian model #3. Machine learning methods: presence and absence (background) data MaxEnt (Maximum Entropy; Phillips et al., 2004, 2006) Boosted Regression Trees (Friedman, 2001) Random Forest (Prasad et al. 2006) Support Vector Machines (SVMs; Vapnik, 1998) Artificial Neural Network (ANN; Older et al. 2008)

The emperical model Types of algorithms

38 ## Spatial Autocorrelation (SAC)
Species Distribution Modeling (SDM) Main shortcomings in SDM ## Multicollinearity ## Non-linearity ## Spatial Autocorrelation (SAC)

Main shortcomings in SDM ## Multicollinearity (between predictor variables) Modest collinearity Considerable collinearity

Main shortcomings in SDM ## Multicollinearity (between predictor variables)

Main shortcomings in SDM ## Non-linearity • The relationship predictor-response variable is assumed to be linear and is not.

Main shortcomings in SDM ## Spatial Autocorrelation (SAC) #Most statistics are based on the assumption that the values of observations in each sample are independent of one another. # Ecology and Biogeography -> Spatial statistics In spatial statistics the assumption of the indepence of the observations can be violated because of the presence of Spatial Autocorrelation. So, what is Spatial Autocorrelation? The simplest definition of spatial-autocorrelation states that pairs of subjects that are close to each other are more likely to have values that are more similar, and pairs of subjects far apart from each other are more likely to have values that are less similar.

Main shortcomings in SDM ## Spatial Autocorrelation (SAC) Presence of species Absence of species

SDM-NEW CHALLANGES 1. Improvement of methods for modeling presence-only data. 2. Improvement of methods for model selection and evaluation. 3. Accounting for biotic interactioons, and assessing model uncertainty.

