Download presentation
Presentation is loading. Please wait.
1
Principles: Range edges
What determines the edge of geographic ranges? There are changes in local population dynamics at the edge of a distribution, and more net losses than net gains These population level changes are brought about by: Changes in abiotic factors (physical barriers, climate factors, absence of essential resources) and biotic factors (impact of competitors, predators or parasites) Genetic mechanisms that prevent species from becoming more widespread. Abiotic/biotic factors are only limiting because a species has not evolved the morphological / physiological / ecological means to overcome them. Populations do not just cease to exist at the edge of their geographic distributions, but rather taper off gradually. The edges of geographic ranges are thus defined by a change in the local population dynamics, where net gains in population are reduced to levels lower than the net losses. These population level changes are brought about by: Changes in abiotic factors (physical barriers, climate factors, absence of essential resources) and biotic factors (impact of competitors, predators or parasites) Genetic mechanisms that prevent species from becoming more widespread. Abiotic/biotic factors are only limiting because a species has not evolved the morphological / physiological / ecological means to overcome them. That is, most species have the potential to invade adjacent areas, assuming minor adaptation on the part of the species… it is this genetic drift at the edges of populations that causes evolution, since large-scale transformation within optimal conditions is unlikely.
2
Principles: Response curves
Plot of species presence with variation in some environmental variable. Most models assume a Gaussian response, but in fact it is seldom Gaussian, and may take on a variety of shapes. Especially in complex communities, response curves may exhibit truncated forms due to biotic interactions. The ability of the chosen model to represent this response curve is critical to model performance. Response curves are an integral part of SDM. In essence, a response curve is a plot of species presence data in relation to a changing environmental variable. Models generally assume that the response curve to environmental variables is Gaussian or bell-shaped, but in reality, the response curve may be variously-shaped. In complex communities, response curves often assume truncated forms, since biotic interactions may reduce the realised niche from its potential. Model performance is thus underpinned by its capacity to successfully represent this response curve.
3
Specifics: Niche-based modelling
Environmental Variables Species Distribution Model Calibration Yes Independent evaluation dataset No 70/30% Random Calibration/Evaluation Sample Independent evaluation dataset How does niche-based modelling work? Firstly, the series of points at which a species has been observed (the species distribution) is projected onto a series of environmental layers, to get a dataset of variables at known locations of the species. If an independent evaluation dataset is available, the model can be calibrated using the training data. If not, the dataset needs to be divided into two separate datasets: one for calibrating the model, and a second (smaller) one to evaluate the accuracy of the model. The model is then calibrated, and the accuracy of the model is evaluated by determining how well it operates for the evaluation dataset. Evaluation is done by means of a confusion matrix, as shown. If the model is not shown to perform suitably well, it needs to be refined, otherwise it can then be used to predict current and future distributions. Model Evaluation Final Model used to project current and future distributions
4
Niche-based modelling – assumptions
Environmental factors drive species distribution Species are in equilibrium with their environment Limiting variables – are they really limiting? Coincidence with climate or climate shift Evidence for species dying/not reproducing due to climate Collinearity of variables Assumption of assembly rules: niche assembly vs dispersal assembly Static vs dynamic approaches: data snapshot or time series response? There are a number of assumptions implicit in any model: Environmental factors drive species distribution (at least those factors used are integral in the process) Species are in equilibrium with their environment – if the ecosystem is transitional, then current distributions may not correspond to optimum conditions for a species, but be driven primarily by interspecific interactions. Limiting variables – are they really limiting? Coincidence with climate or climate shift – the ability of a species to move in response to climatic change is assumed to be great. Evidence for species dying/not reproducing due to climate – for climate change models the fact that a species is able to move and will not die out is essential. Collinearity of variables – they may not vary together, and if this is the case, (for instance, rainfall goes down but temperature goes up), the species response may not be as expected. Assumption of assembly rules – it is assumed that the current distribution of species is guided more by the suitability to the current niche than as a result of historical and evolutionary dispersal of the species. Static vs dynamic approaches – is it more appropriate to model for a given period in the present or future and assume that there is a constant trend between these periods, or to model for a number of intermediary steps and assess the suitability of response to the changing conditions.
5
Cautionary note on modelling in general
Risk of all models: GIGO- Garbage in, garbage out Need to understand assumptions, explicit and implicit Models are an abstraction of reality, meant to improve our understanding of core processes. There are several things to be borne in mind whatever the model used: most important of all is the GIGO (Garbage in, garbage out ) principle – the results of a model can never be better than the data used to construct and run it. If the basic input data is incorrect, this error can be multiplied many times over. All assumptions of the model, both implicit and explicit, must be understood and must be scientifically defensible. Finally, models are an abstraction and simplification of reality – the accuracy and assumptions depend on the scale of the model, and so whilst the model will improve our understanding of the processes involved, it does not necessarily mean that the outputs are gospel truth.
6
Variables and their selection
Species only select their habitats in the broadest sense (Heglund 2002), and distribution patterns are the cumulative result of a large number of fine scale decisions made to maximize resource acquisition. The more accurately these fine-scale resources can be approximated and access quantified, the better the model should perform if all models were equal. Predictions at broad scales can use broader environmental variables, often associated with the fundamental niche, Finer scale predictions need to concern themselves more with those variables that determine the realized niche. Although we talk about a species choosing to live in an optimal environment, the reality is that species only select their habitats in a very broad sense, and patterns of species distribution are mostly the result of a number of fine scale decisions and events that cause a species to optimise resource use on a very local scale. Obviously, then, if we can model these fine-scale resources themselves rather than the broader averaging of conditions, the model would perform better, all models being equal. Consequently, we can say that the broader scale models derive the fundamental niche of a species by using brad scale averaged variables, whilst finer scale models must necessarily concern themselves with those variables that determine the realisation of the niche. (Pearson & Dawson 2003)
7
Environmental Variables
MAP, Psummer, Pwinter MAT, Tmin, Tmax, Tmin06 Soil (pH, texture, organic C, fertility) Avoid indirect measures of a variable which is a challenge project into the future e.g. slope, aspect, altitude Difficult variables – Solar radiation, wind Environmental variables are often direct variables, such as precipitation (typically mean annual rainfall and seasonal rainfall averages), temperature (mean annual temperature, maximum and minimum averaged over a long period, and even specific data such as the mean minimum temperature in June). Soil conditions include pH, texture, organic carbon and fertility, but these measures are often difficult to obtain on an appropriate scale, since there can be considerable variation even within an area). In general, it is a good idea to avoid indirect measures of a variable, which is obviously a challenge since much of a country is not monitored, and many such measures are not easily taken. Features such as slope and altitude allow some degree of projection into the future, since the lapse rate (extent to which temperature changes with increasing altitude) may closely parallel changing climatic conditions. Solar radiation and wind, which are essential for plant growth responses and dispersal are both particularly challenging to obtain accurate measures of.
8
Derived Variables Growing degree days (e.g. base 5°C)
PET – Thornthwaite, Priestly-Taylor, Linacre Water Balance – Crudely defined as MAP – PET Favourable soil moisture days– Modelled using e.g. ACRU, WATBUG Palmer Drought Stress Index – PDSI Program Derived variables are calculated from the primary variables by a variety of different functions, and these variables, whilst not as directly accurate, may also provide some insight into the distribution of a species. Growing degree days are the number of days in which the temperature is above a minimum level required for plant growth. Potential evapotranspiration is calculated using one of several formulae calculated by Thornthwaite, Priestly-Taylor, Linacre or several others. From this can be derived the water balance of the soil, which is crudely defined as mean annual rainfall minus the PET. Hence the number of favourable soil moisture days for plant growth can be modelled using one of several models, and finally the Palmer drought stress is calculated using the PDSI programme. Each of these variables represents a further step in abstraction, through which potential errors might be compounded, so it is useful to bear in mind that derived variables are necessarily less accurate than primary environmental variables.
9
BIOCLIM Environmental Variables
Annual Mean Temperature Mean Monthly Temperature Isothermality Temperature Seasonality Max Temperature of Warmest Month Min Temperature of Coldest Month Temperature Annual Range Mean Temperature of Wettest Quarter Mean Temperature of Driest Quarter Mean Temperature of Warmest Quarter Mean Temperature of Coldest Quarter Annual Precipitation Precipitation of Wettest Month Precipitation of Driest Month Precipitation Seasonality Precipitation of Wettest Quarter Precipitation of Driest Quarter Precipitation of Warmest Quarter Precipitation of Coldest Quarter Environmental variables are often direct variables, such as precipitation (typically mean annual rainfall and seasonal rainfall averages), temperature (mean annual temperature, maximum and minimum averaged over a long period, and even specific data such as the mean minimum temperature in June). Soil conditions include pH, texture, organic carbon and fertility, but these measures are often difficult to obtain on an appropriate scale, since there can be considerable variation even within an area). In general, it is a good idea to avoid indirect measures of a variable, which is obviously a challenge since much of a country is not monitored, and many such measures are not easily taken. Features such as slope and altitude allow some degree of projection into the future, since the lapse rate (extent to which temperature changes with increasing altitude) may closely parallel changing climatic conditions. Solar radiation and wind, which are essential for plant growth responses and dispersal are both particularly challenging to obtain accurate measures of.
10
Species distribution datasets
Museum/Herbarium data e.g. Precis (Sabonet) Survey Atlas data e.g. Protea Atlas Expert Atlas e.g. Birds of Africa Field data e.g. Ackdat or TSP databases Presence / Absence data Georeference accuracy e.g. GPS / QDS Taxonomy affects numbers Taxonomic updates of older museum data Species distribution data are the backbone of any modelling process, and whilst the optimal source for data is through field work, there are a number of different sources and datasets that provide suitable data, depending on the species being modelled: Museum/Herbarium data e.g. Precis (Sabonet) Survey Atlas data e.g. Protea Atlas Expert Atlas e.g. Birds of Africa Field data e.g. Ackdat or TSP databases Presence / Absence data – it is rare to obtain comprehensive absence data for a species, because unless it is specifically being searched for within an area, many species may be easily overlooked. Ideally, though we should include absence data when constructing models, as it improves model accuracy considerably. Georeference accuracy e.g. GPS / QDS Many older datasets will have considerably poorer reference accuracy, and consequently will be less suitable for fine-scale mapping. Taxonomy affects numbers – as the taxonomic definitions of species change, so the number of records for a given species will differ according to the date of collection and the current taxonomical understanding. Taxonomic updates of older museum data – for this reason it is important to periodically update the taxonomic definitions of older museum records if at all possible. Fieldwork Survey Atlas Expert Atlas Herbaria Specimens Museum Specimens Data sources and their typical scales Presence/Absence Presence Locality Type 1-5 degree degree 1-15 minutes 1-5km 1-1000m
11
Different types of models
BioClimatic envelope e.g. Bioclim Domain Models Ordinary Regression e.g. incl. in Arc-SDM Generalised additive models (GAM) e.g. GRASP Generalised linear models (GLM) e.g. incl. in Biomod Ordination (e.g. CCA) e.g. ENFA Classification and regression trees (CART) e.g. incl. in Biomod Genetic Algorithm e.g. GARP Artificial neural networks e.g. SPECIES Bayesian e.g. WinBUGS A quick outline of the different models available for species distribution modelling at present: BioClimatic envelope e.g. Bioclim Ordinary Regression e.g. incl. in Arc-SDM Generalised additive models (GAM) e.g. GRASP Generalised linear models (GLM) e.g. incl. in Biomod Ordination (e.g. CCA) e.g. ENFA Classification and regression trees (CART) e.g. incl. in Biomod Genetic Algorithm e.g. GARP Artificial neural networks e.g. SPECIES Bayesian e.g. WinBUGS
12
How do we choose a model type?
BIOCLIM Show suitability Relative value Boolean 0 or 1 There are a burgeoning number of models to choose from these days, and the stock collection is constantly having more added to it, so how does one go about deciding which is the most appropriate model to use? DOMAIN Gives a Probability
13
Principles What question do you want to answer? Data considerations
What environmental data do you have access to? What is the resolution and extent of this data? Categorical or continuous data? Scale considerations. (Thuiller et al 2003 – GAMs better at performing consistent across scales because of ability model to complex response curves) Different variables important at different scales (Pearson& Dawson 2003) Good example of an informed modeled solution: Gibson et al 2004 Different models compared: summary of such studies in Segurado & Araujo 2005, Thuiller et al 2003. In order to select an appropriate model, we need to take the following into account: What is the question you want to answer? In some cases this will lead directly to a specific model type. The data guides decisions as well:What environmental data do you have access to? What is the resolution and extent of this data? Categorical or continuous data? The scale of the model is important.Thuiller et al showed in 2003 that GAMs are better at performing consistently across scales due to their ability to model complex response curves. However, certain other models might work better at a specific scale. Certain variables are highly relevant at a fine scale but have little overall effect at a larger scale, and vice versa – ie: the guiding factors are scale-depndent. For a good example of an informed model solution, have a look at Gibson et al’s paper of 2004. Many papers have been published comparing different models, and a summary of these studies has been prepared by Segurado and Araujo (2005), and Thuiller et al (2003).
14
Model calibration and evaluation
Once you have decided on a model type, then you need an methodology to select the best model from a suite of potential models, all with different combinations of the selected environmental variables. Once you have decided on a model type, then you need an methodology to select the best model from a suite of potential models, all with different combinations of the selected environmental variables. This methodology is designed to test the accuracy of prediction of a model, and obviously depends on both the suitability of the model and the usefulness of the environmental variables used. Each of the detailed model selection methods appraises the model outputs on some or all of their fit, complexity, and sample size. It is important to note that the order of variable use may affect the accuracy of a GLM model, but does not do so in most other models, including GAMs. Stepwise selection of variables: order doesn’t matter in GAM, does with GLM Click magnifying glass to enlarge table. (from Johnson & Omland 2004, Rushton et al 2004).
15
Models and their selection - BioClimatic Envelope
Environmental Variables Species Distribution Frequency Value classes This is a simple description of the bioclimatic envelope method of species distribution modelling. The database of environmental values at known species locations is compiled as usual, and the model then determines the range of these characteristics within which the species is observed by simply selecting the outer values. These are used to construct a ruleset which is then applied to the full set of environmental characteristics in order to extrapolate the hypothetical distribution. This is tested against the evaluation dataset, and the process is repeated until the model has a basic minimum level of accuracy. IF Tann =[23,29] °C AND Tmin06=[5,12] °C AND Rann=[609,1420] AND Soils=[1,4,5,8] THEN SP=PRESENT
16
How good are the predictions?
(Fielding & Bell 1997, Guisan and Zimmerman, 2000) Output data = probability values Observed data = presence – absence data How to compare? Need a probability threshold to derive a misclassification matrix (MM) Actual Predicted The evaluation process within the model, and the final assessment of accuracy necessarily entail some sort of process whereby the predictions are appraised. The output data are typically probability values (that is, likelihoods of observing the species at a given site), whilst the observed data are obviously presence-absence data. Clearly these cannot be readily compared, and in order to get around, this, some threshold probability level must be derived or set, above which a species is counted as predicted present. Using this threshold value, a misclassification matrix (or confusion index) can be built up, as shown in the table at the bottom of the page.
17
Kappa statistic Based on the MM Take into account chance agreement
Estimation of Kappa for a range of threshold and keep the best Ke = [(TN+FN)x(TN+FP) + (FP+TP)x(FN+TP)]/n² Ko = (TN + TP)/n K = [Ko – Ke] / [1 – Ke] Scales between 0 and 1; >0.7 good, 0.4 – 0.7 fair, <0.4 poor From the misclassification matrix a function such as the Kappa statistic can be calculated. This function takes the possibility of chance agreements into account. The kappa statistic is calculated for a range of threshold values, and the best threshold value is kept as a reasonable predictor of species presence. The value of a model’s predictions is assessed on a scale between 0 and 1. Anything over 0.7 is a good model, models below 0.4 are a poor fit, and anything between these levels is considered a reasonable fit. (Thuiller 2004, pers comm.)
18
Receiver operating characteristic analysis (ROC)
Sensitivity TP/(FN+TP) (true positive fraction) Specificity TN/(FP+TN) (true negative fraction) Plot sensitivity and specificity for a range of thresholds Calculate Area-under-curve (AUC): 0.8 good, 0.6 – 0.8 fair, 0.5 random, <0.6 poor 0.2 0.4 0.6 0.8 1 0.0 1.0 1 - specificity A second method of assessing model accuracy is receiver operating characteristic analysis (ROC). This compares the sensitivity of a model (the true positive fraction, or extent to which is correctly predicts species presence) and the specificity (the true negative fraction). These factors are plotted against each other for a range of different thresholds, and the area under the curve is then calculated. Once again, the assessment value is between 0 and 1, with values over 0.8 being good, and those below 0.6 being a poor fit.
19
How good are the predictions?
Testing and training data sets (30:70) Comparison across models, or across var’s with same model. Number of explanatory variables. Model development and improvement is iterative process Delineating the predictive ability of predictor variables (Lobo et al 2002) Evaluate model output against historical data (Hilbert et al 2004) Use of modelled data in conservation planning (Hannah et al; Cabeza at al, 2004; Loiselle et al 2003) Some factors should always be borne in mind when modelling species environmental niches. It is important to have both testing and training data sets for evaluation of predictions (and if there are no independent datasets, the data should be randomly divided into these sets in a 70:30 ratio). Comparison across models and across variables within models is essential. The number of variables used in a model should not be too great, because there is a chance that factors could then obscure the action of others. On the other hand, there should be sufficient variables to reasonably explain the species distribution. The development and improvement is an iterative process, which involves repeating it a number of times over until the best combination of model and factors is achieved. If possible, the delineating of the predictive ability of predictor variables is very useful, as it can give insight into the factors that are most important for species distribution. Evaluating the model against historical data provides a useful means of testing model accuracy, if such records are available (especially in a highly-transformed landscape). For further reading on the use of modelled data in conservation planning, it would be useful to find and read the following papers: (Hannah et al, Cabeza at al 2004, Loiselle et al 2003).
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.