Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Niche Modeling

Similar presentations


Presentation on theme: "Introduction to Niche Modeling"— Presentation transcript:

1 Introduction to Niche Modeling
A small bit of theory re: niches How niche modeling works G-space and E-space How it came to be Uses in ecology and evolution - present, past and future modeling of species distributions - predicting disease spread - predicting invasive species spread - niche conservation Note: some material has been used from internet sources in regards to niche modeling pedagogy so thanks to Arthur Chapman, Town Peterson, Enrique Martinez-Meyer and others.

2 Niche Distinctions Grinnellian Eltonian Hutchinsonian
Spatially explicit Focus on Non-interactive Requirements for populations to thrive Measurable from distribution Eltonian Focus on community impacts, biotic interactions, i.e. species functional roles Hutchinsonian Also focus on non-interactive requirements Defined Fundamental Niche– mostly what we think of as environmental variables Defined Realized Niche– subset of Fundamental Niche + biotic interactions

3 Chthamalus and Balanus In the intertidal. Balanus cannot stand
Two barnacle species, Chthamalus and Balanus In the intertidal. Balanus cannot stand exposure to air - similar fundamental and realized Niche. Chthalamus cannot compete with Balanus but if Balanus is removed, it can survive lower in the intertidal - different fundamental and realized niche. Balanus - You can get a sense for the difference between fundamental and realized niche in this slide. - You are looking at two species of barnacles that occur in the intertidal. One species occurs in the lower intertidal and the other in the high intertidal. Balanus has a physical limiting factor --- it cannot be exposed to much air. It is therefore restricted to the lower intertidal. Chthamalus is not limited in its distribution by any physical limiting factor– it could occur in the lower intertidal, but it doesn’t show up there because it is out competed by Balanus. Therefore its realized niche is smaller than its potential niche. 3

4 HOW CAN WE RECONSTRUCT THE FUNDAMENTAL NICHE?
(we can start by looking at where a species occurs) Poecile gambeli – Mountain chickadee Dots are occurrences of Poecile gambeli across its range 4

5 How Can We Model the Fundamental Niche?
Geographic Space Ecological Space ecological niche modeling temperature Model of niche in ecological dimensions precipitation occurrence points on current distribution We can take occurrence points in geographic space and ask what environmental conditions are like where those occurrences are located. We can then generate a graph of something like temperature and precipitation, delimiting where the species falls in that space. 5

6 From Peterson and Soberon
Geographic Space Ecological Space ecological niche modeling temperature Model of niche in ecological dimensions precipitation occurrence points on current distribution Projection back onto onto climate landscapes at the Last Glacial Maximum Current range prediction Last Glacial Maximum prediction From Peterson and Soberon

7 SOME TERMINOLOGY Geographic Space Environmental Space
G is the geographic space, typically composed of 2-D pixels Ga , Gp = The abiotically suitable area (potential distribution) Gb = The biotically suitable area Gm = Accessable area through dispersal Gi = Invadable distributional area Go = Occupied distributional area Gdata = set of observations (presences, and, if existing, true absences). E Environmental space of environmental variables. Ea Scenopoetic fundamental niche Ei Invadable niche space Eo Occupied niche space Ep Biotically reduced niche

8 Example Mapping Between Geographic Space and Environmental Space
Porque no occupado? Ea Note: This Area is occupied but not sampled --- (because you are Omiscient In this example. Work with me.) Eo Go is shown as gray shading, and Ga is “white”

9 General species’ distribution modeling approach
Notice that the diagram uses the same hypothetical case as in the previous slide. A modelling technique (e.g. GARP, Maxent) is used to characterize the species’ niche in environmental space by relating observed occurrence localities to a suite of environmental variables. Notice that, in environmental space, the model may not identify either the species’ occupied niche or fundamental niche; rather, the model identifies only that part of the niche defined by the observed records. When projected back into geographical space, the model will identify parts of the actual distribution and potential distribution. For example, the model projection labeled 1 identifies the known distributional area. Projected area 2 identifies part of the actual distribution that is currently unknown; however, a portion of the actual distribution is not predicted because the observed occurrence records do not identify the full extent of the occupied niche (i.e. there is incomplete sampling; see area D on the previous slide). Similarly, modeled area 3 identifies an area of potential distribution that is not inhabited (the full extent of the potential distribution is not identified because the observed occurrence records do not identify the full extent of the fundamental niche due to, for example, incomplete sampling, biotic interactions, or constraints on species dispersal; see areas D and E on the previous slide). Modified from NCEP module Species distribution modeling for conservation educators and practitioners.

10 Key factors determining the degree to which observed localities can be used to estimate the niche or distribution: Equilibrium: A species is said to be at equilibrium with current environmental conditions if it occurs in all suitable areas, whilst being absent from all unsuitable areas. What causes disequilibrium? Sampling adequacy: The extent to which the observed occurrence records provide a sample of the environmental space. The importance of this cannot be overestimated How could you possibly know? Regarding equilibrium: For example, the degree of equilibrium for different groups of organisms in Europe varies considerably, with more dispersive species (e.g. birds) relatively closer to equilibrium than less dispersive species (e.g. reptiles) (Araújo & Pearson 2005). More or less BIOLOGICAL Regarding sampling adequacy: In some instances, very few occurrence records may be available, perhaps due to limited survey effort. In such cases, the available records are unlikely to provide a sample of available environments that is sufficient to enable to the full range of conditions occupied by the species to be identified. In other cases, surveys may have provided extensive occurrence records that provide a fairly accurate picture as to the environments inhabited by a species in a particular region (e.g. European plant data). Each of these factors (equilibrium and sampling adequacy) should be carefully considered to ensure appropriate use of a species’ distribution model More or less EFFORT/LOGISTIC, based on human liabilities not biology Modified from NCEP module Species distribution modeling for conservation educators and practitioners.

11 The Ideal Scenario: at equilibrium and good sampling
In the ideal case, the species of interest would be at equilibrium and we would have complete sampling of the environment. In such a case the actual and potential distributions would be identical and we would expect to model both accurately. However, how useful is the model under these circumstances? Modified from NCEP module Species distribution modeling for conservation educators and practitioners.

12 Suppose high equilibrium but poor sampling (in both geographical and environmental space)
New areas to survey! In this case, the model identified part of the species actual distribution that is unknown. GREAT for predicting new areas to survey! Modified from NCEP module Species distribution modeling for conservation educators and practitioners.

13 Suppose high equilibrium and poor sampling in geographical space,
but good sampling in environmental space Note that there is not necessarily a direct relationship between sampling adequacy in geographical space and in environmental space. It is quite possible that poor sampling in geographical space could still result in good sampling in environmental space. For example, if a geographic area that has not been sampled has environmental conditions that are similar to those in an area that has been sampled, then sampling adequacy in environmental space will not be affected. Modified from NCEP module Species distribution modeling for conservation educators and practitioners.

14 Suppose low equilibrium but good sampling
Potential Distribution Fundamental Niche In this case the model identified an area of potential distribution that is environmentally similar to where the species has been observed, but which is not inhabited. This type of prediction may be very useful for identifying sites suitable for the reintroduction of an endangered species, identifying areas where the species may become invasive, or guiding field surveys toward the discovery of unknown species that are closely related. Distribution models may therefore prove useful even in cases where species’ equilibrium is low. Modified from NCEP module Species distribution modeling for conservation educators and practitioners.

15

16 Circle A represents area where abiotic conditions are right for a species to occur (Ga)
Circle B represent the area where lack of competition, disease, and occurrence of mutualists allows populations to grow. Circle M is area within which individuals & populations are capable of moving due to lack of dispersal barriers. Go is occupied area Gi is invadable area Note: niche modeling pulls occurrences from that intersection.

17 From Soberon and Peterson, 2005, Biodiversity Informatics
Circle A represents area where abiotic conditions are right for a species to occur (=Fundamental niche Ea) Circle B represent the area where lack of competition,disease, and occurrence of mutualists allows populations to grow Circle M is area within which individuals & populations are capable of moving due to lack of dispersal barriers Intersection of A and B is biotically reduced niche (Ep) Intersection A, M, B is occupied niche space (Eo). E From Soberon and Peterson, 2005, Biodiversity Informatics

18 SOME POSSIBLE OUTCOMES
Best Case: Weak, diffuse abiotic interactions and lack of dispersal barriers create general overlap. No dispersal barriers, but area of “correct” biotic interactions different from area of correct abiotic conditions. Estimate of FN using occurrence data should be carefully examined FN (and potential distribution) will be much larger than actual distribution due to dispersal limitations From Soberon and Peterson, 2005, Biodiversity Informatics

19 What abiotic factors determine fundamental niche?
The answer is complicated (but important) Species have physiological tolerances, migration limitations and evolutionary forces that limit adaptation A starting point for physiology may be traits A starting point for abiotic factors is often climate Climate variables often also correlate with other variables (elevation, land cover)

20 “Easy” In Theory --- But how does it work in practice?
The development of spatial ecological modeling approaches occurs in 90s But has origins in ongoing innovations from the 70s forward A bit of history…

21 How do we in practice model the “scenopoetic” ecological niche
How do we in practice model the “scenopoetic” ecological niche? and How do we determine a species distribution (actual and potential) and what is the difference?

22 Around 1990 three things happened
Large databases of presences of species (mainly computerized scientific collections) began being accessible at significant amounts

23 II. GIS… Geographical Information Systems technology became widely accessible to ecologists and biogeographers

24 IV. Worldwide Environmental Data Layers
Remote sensing data Land cover/land type Vegetation Terrain Ocean SST, chlorophyll Slope, aspect, flow rate hydrology data Climatology databases Worldclim (what we’ll use in this class) Models of worldwide past and future climates (IPCC) All other ancillary data layers (roads, human population density, etc)

25 Which leads to an NCEAS Working Group
Title: Choosing (and making available) the right environmental layers for modeling how the environment controls the distribution and abundance of organisms Aim: To generate co-registered environmental data layers at 1km resolution representing climate, vegetation/landcover, hydrology/topography, marine.

26 A TOUGH GIG (Actually this meeting was a lot of work!)

27 WORLDWIDE MEAN ANNUAL TEMPERATURES (GREEN=cold, RED=hot)‏
NOW We have amazing resources now for looking at worldwide climate These maps show reconstructions of climate across the world, now and at 18,000 years before present, based on general circulation models. The red colors are warm, and the green colors cold If you are interested in general circulation models and how they are constructed, talk to me after class Note that at the LGM it was much colder in north temperate regions than now. - LGM (based on General Circulation Models)‏ 27

28 WORLDWIDE MEAN ANNUAL TEMPERATURES (GREEN=cold, RED=hot)‏
NOW North America Here is a view of climate conditions now in North America, on the upper panel - The lower panel shows predicted temperatures if we double CO2 concentrations in the atmosphere over the next one hundred years. Note further warming, especially in the north. Double CO2, 2100 CE, North America (CCM models)‏ 28

29 stack of environmental data layers Set of occurrence
record precipitation Inputs into a niche model: stack of environmental data layers Set of occurrence records representing presences temperature elevation soils

30 NICHE AND DISTRIBUTION MODELING
Input: Species Presence Input Env. Data Layers CAN WE PREDICT NICHE AND DISTRIBUTION FROM SUCH DATA? (answer: maybe!) From Maxent presentation by Pearson

31 The outcome of a niche model is:
a prediction of suitable habitats for that taxon (based on the input data). Output of suitability can be a yes/no or a probability function from Panel B - input data points in black and suitable habitat in the western US for Neotoma cinerea Panel D - close-up of suitable/unsuitable areas in the Great Basin of Western NA.

32 PART 1 : Idealized Workflow for building and validating a species distribution model:
Acquire species occurrence data (e.g. fieldwork, museum voucher specimens, observations, surveys, etc) Map/vet the species’ distribution data; especially if coordinates are from third-party sources (e.g. removing geographic and environmental outliers) Apply modeling algorithm (e.g. Bioclim, Maxent, artificial neural network, general linear model, boosted regression tree) Process environmental layers to generate predictor variables important in defining species’ distributions (e.g. maximum daily temperature, frost days, soil water balance) and convert to appropriate formats Collate GIS database of environmental layers (e.g. temperature, precipitation, soil type) Model calibration (select suitable parameters, test importance of alternative predictor variables)

33 PART 2 : Idealized Workflow for building and validating a species distribution model:
Test model performance through additional fieldwork or statistical approach (e.g. AUC or Kappa or null model comparisons) If possible, test model against observed data, such as occurrence records in an invaded region, or distribution shifts over recent decades Model species’ distribution in a different region (e.g. for an invasive species) or for a different time period (e.g. under future climate scenario) Create map of current modeled distribution The steps shown here follow those on the previous slide and are for testing the model and making a prediction once the model has been calibrated. The first step is to test how well the model is able to predict the known distribution, and the second step is to make a prediction (e.g. into a new region or for a different climate scenario) and, if possible, test if the predictions agree with observed data. Modified from NCEP module Species distribution modeling for conservation educators and practitioners.

34 A stopping point

35 Adapted from a presentation by Enrique Martinez-Meyer and others
SOME ISSUES WITH MODELING Determining Species Distribution given that: Most occurrence data available for the vast majority of species are presence-only Sampling effort across most species’ distributional ranges is uneven and eco-geographically biased We do not know what environmental variables are relevant for each species.

36 Modeling Niches All niche modeling approaches model the function approximating the true relationship between the environment (i.e., the niche) and species geographic occurrences/distribution.

37 Modeling Niches P2 All want to estimate function f = μ(Gdata, E) - that is the result of applying an algorithm to data given an environmental space E to estimate G (distribution) Different algorithms have different data requirements True presence-only Presence-absence Presence-background (can be any sample from within environment) Presence-pseudoabsence (a pseudoabsence cannot be where a species is known to occur)

38 Algorithms Applied to the Problem
Method(s) Model/software name Species data type Climatic envelope BIOCLIM Presence-only Gower Metric DOMAIN Ecological Niche Factor Analysis (ENFA) BIOMAPPER Presence/background Maximum Entropy MAXENT Genetic algorithm GARP Presence/pseudo-absence Regression: Generalized linear model (GLM) and Generalized additive model (GAM) GRASP Presence/absence Artificial Neural Network (ANN) SPECIES Classification and regression trees (CART), GLM, GAM and ANN BIOMOD Boosted decision trees (implemented in R) Multivariate adaptive regression splines (MARS) From Richard Pearson et al. 2006

39 Niche Modeling Has Problems PT 2 tradeoffs w/algorithms
Many algorithms do not handle asymmetric data (e.g. GLM, GAM) Many don’t handle interaction effects (BioClim) - Some of the do not handle nominal environmental variables (e.g. soil classes) [e.g. BioClim, ENFA] - Many stochastic algorithms present different solutions even under identical parameterization and input data (e.g. GARP) - We do not know the ‘real’ distribution of species, so we do not know when models are making mistakes and when are filling knowledge gaps.

40 Modeling Approaches Presence only (bioclimatic envelopes or mahalanobis distance) – points inside envelope suitable or distance of points away from mean values (farther away equals less suitable) Presence-absence – GAMs, GLMs, MARs, CARTs. Use a link or function or set of logical statements describing the multivariate relationship between mean of response variable and predictor variables. Note: best for determining occupied distribution (not potential dist.) Presence-background – Maxent finds the probability distribution most spread out, or closest to uniform, subject to constraints given observed occurrence records information and environmental conditions across study area. All regression techniques work with background as well. Presence-pseudoabsence – GARP. Rule set predictions.

41 Example of Presence-Only Envelope Approach - BioClim
Heuristic based model Works with presence-only data Simple to use 35-dimensional Hypercube in climate-space (19 in Diva-GIS) Tends to over-predict Works with small number of records Will work in batch mode Can’t make quantitative predictions or provide confidence levels Used for predicting potential distributions Versions incorporated into Diva-GIS

42 BioClim Type Modeling The dot-dash line square is the BioClim fit
of the data (for two dimensions ) This defines an range of the values in the occupied by a species across all environmental variables for all axes. Anything in this box might be considered “suitable”. From Peterson et al. ms. Ecological Niches and Geographic Distributions: A Modeling Perspective

43 Presence-Background Modeling
No known absences How to determine false absences from true absences then? Solution (of sorts): Compare background is the set of grid cells used in modeling Note: These points include input true presences Question: What does this mean for model validation?

44 Modeling with Maxent Assume presence records come from some unknown probability distribution called p How to estimate probability function over a set of grid cells, G? What is the probability that any one grid cell, g, is suitable for a species?

45 variables and determine means, SDs in terms of experienced climate
Modeling in Maxent We can join the presence records for a taxon to the underlying environmental variables and determine means, SDs in terms of experienced climate annual minimum coolest month maximum warmest month range coolest quarter warmest Wettest dryest Mean 17.2 6.2 26.1 19.9 12.3 21.3 20.0 13.8 S.D. 1.8 2.0 1.6 2.1 3.6 Min 12.1 0.2 23.9 18.1 5.8 18.3 10.6 5%-ile 25%-ile 16.4 6.1 24.6 18.5 11.8 20.2 2.5 75%-ile 7.2 12.8 2.8 14.8 95%-ile 19.6 9.2 29.0 23.0 15.2 23.7 23.4 17.6 max 29.4 25.4 23.8 23.6 Temperature profiles for Acacia orites

46 Modeling with Maxent Each grid cell has a set of “features” defined by the environment. Features can be the raw environment or some more complex function of those environmental variables (linear, quadratic, logistic) Grid cells with presences can be summed to determine means and SDs across all environmental variables in order to estimate p Means of the probability distribution match the observed means Find the flattest function (one that maximizes entropy)

47 Modeling with Maxent Maxent is an iterative approach
Starts with a fully uniform distribution over all grid cells Conducts optimization routine to maximize “gain” Gain is likelihood statistic maximizing the probability of the presences given input data and in relation to the background data Gain will asymptote (maximizing fit) leading final probability distribution Distribution becomes the basis for fitted predictor variable coefficients These coefficients are used to assess probability of presence

48 Maxent Maxent is run by first selecting a set of input environmental data layers in a common GIS forrmat (gridded .ASC giles) Next select a set of species occcurrence locations defined by lat/lon Important to subset data into training and testing. Training data builds model, testing data is used for validation

49 More on Maxent maximum spread = maximizing the log likelihood of the data associated with the presence sites minus a penalty term (think AIC) Penalty term is basically related to a weighting based on how much information the environmental data adds to the model. The best weighting term is discovered through a sequential updating algorithm run a specified number of iterations (you can change this parameter)

50 More on Maxent Maxent regularization parameter determines “penalty function” - smaller values tend to overfit models (typically leading to smaller geo. distributions) & larger values do the opposite. You can choose culmulative versus logistic outputs. Logistic is interpreted as probability of presence (e.g. what you most often want) Definitely create response curves What about features?

51 More on Maxent What are features? The environmental layers are used to produce "features", which constrain the probability distribution that is being computed.  The available feature types are linear, quadratic, product, threshold and hinge/discrete. Some features give Maxent a lot of latitude in deriving response variables. You can choose to include different types of features

52 More on Maxent What does a Maxent run produce?
A HTML file showing run outputs A grid file importable into a GIS CSV files containing ommission, prediction details Focus on the HTML file, which contains: A picture of the map A table of different thresholds * A model validation statistical summary * An explanation of importance of variables Response curves * we’ll discuss model validation tomorrow


Download ppt "Introduction to Niche Modeling"

Similar presentations


Ads by Google