Modeling Species Distribution with MaxEnt

Slides:



Advertisements
Similar presentations
Introduction to GRCP Boualem RABTA Center for World Food Studies (SOW-VU) Vrije Universiteit - Amsterdam.
Advertisements

Brief introduction on Logistic Regression
Probability Distributions CSLU 2850.Lo1 Spring 2008 Cameron McInally Fordham University May contain work from the Creative Commons.
TRIM Workshop Arco van Strien Wildlife statistics Statistics Netherlands (CBS)
Climate Predictability Tool (CPT)
Robert Plant != Richard Plant. Sample Data Response, covariates Predictors Remotely sensed Build Model Uncertainty Maps Covariates Direct or Remotely.
Introduction to SPSS Allen Risley Academic Technology Services, CSUSM
Maxent interface.
Border around project area Everything else is hardly noticeable… but it’s there Big circles… and semi- transparent Color distinction is clear.
By Hrishikesh Gadre Session II Department of Mechanical Engineering Louisiana State University Engineering Equation Solver Tutorials.
A Simple Guide to Using SPSS© for Windows
Map Analysis with Raster Datasets Francisco Olivera, Ph.D., P.E. Department of Civil Engineering Texas A&M University.
Maxent Implements “Maximum Entropy” modeling –Entropy = randomness –Maximizes randomness by removing patterns –The pattern is the response Website with.
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
Let’s pretty it up!. Border around project area Everything else is hardly noticeable… but it’s there Big circles… and semi- transparent Color distinction.
ESRM 250 & CFR 520: Introduction to GIS © Phil Hurvitz, KEEP THIS TEXT BOX this slide includes some ESRI fonts. when you save this presentation,
Lecture II-2: Probability Review
Rebecca Boger Earth and Environmental Sciences Brooklyn College.
Introduction to ArcGIS for Environmental Scientists Module 2 – GIS Fundamentals Lecture 5 – Coordinate Systems and Map Projections.
Habitat Analysis in ArcGIS Use of Spatial Analysis to characterize used resources Thomas Bonnot
Connecting with Computer Science 2 Objectives Learn why numbering systems are important to understand Refresh your knowledge of powers of numbers Learn.
Introduction to InVEST ArcGIS Tool Nasser Olwero GMP, Bangkok April
Internet Map Server Help This presentation briefly describes the Internet map server viewer and model interface and how to work them.
Importing your Own Data To display in GIS Lab 4a: (Table Join) Mapping By State, County, or Nation.
Climate Predictability Tool (CPT) Ousmane Ndiaye and Simon J. Mason International Research Institute for Climate and Society The Earth.
MSc in Geoinformatics – Managing Energy, Resources, Environment Teacher Training Dushanbe, – TEMPUS This project has.
MANAGEMENT AND ANALYSIS OF WILDLIFE BIOLOGY DATA Bret A. Collier 1 and T. Wayne Schwertner 2 1 Institute of Renewable Natural Resources, Texas A&M University,
How do we represent the world in a GIS database?
Ryan DiGaudio Modified from Catherine Jarnevich, Sunil Kumar, Paul Evangelista, Jeff Morisette, Tom Stohlgren Maxent Overview.
Shell Script Programming. 2 Using UNIX Shell Scripts Unlike high-level language programs, shell scripts do not have to be converted into machine language.
Introduction to ArcGIS for Environmental Scientists Module 1 – Data Visualization Chapter 3 – Symbology and Labeling.
Raster Concepts.
Museum and Institute of Zoology PAS Warsaw Magdalena Żytomska Berlin, 6th September 2007.
Lecture 3 The Digital Image – Part I - Single Channel Data 12 September
A Simple Guide to Using SPSS ( Statistical Package for the Social Sciences) for Windows.
O LLIVIER & C O New Zealand Projections Projections Workshop 2004 Basic Projection Steps NZ Transverse Mercator Projections in ArcGIS 9.0 Reprojecting.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
SP5 - Neuroinformatics SynapsesSA Tutorial Computational Intelligence Group Technical University of Madrid.
WFM 6311: Climate Risk Management © Dr. Akm Saiful Islam WFM 6311: Climate Change Risk Management Akm Saiful Islam Lecture-7:Extereme Climate Indicators.
Climate Predictability Tool (CPT) Ousmane Ndiaye and Simon J. Mason International Research Institute for Climate and Society The Earth.
Mr. Magdi Morsi Statistician Department of Research and Studies, MOH
O LLIVIER & C O New Zealand Projections Projections Workshop Basic Projection Steps NZ Transverse Mercator Projections in ArcGIS 8.2 Reprojecting Images.
Ryan DiGaudio Modified from Catherine Jarnevich, Sunil Kumar, Paul Evangelista, Jeff Morisette, Tom Stohlgren Maxent Overview.
Phil Hurvitz Avian Conservation Lab Meeting 8. March. 2002
How Good is a Model? How much information does AIC give us? –Model 1: 3124 –Model 2: 2932 –Model 3: 2968 –Model 4: 3204 –Model 5: 5436.
Tutorial I: Missing Value Analysis
U.S. Department of the Interior U.S. Geological Survey Automatic Generation of Parameter Inputs and Visualization of Model Outputs for AGNPS using GIS.
Spatial Analysis with Raster Datasets-1 Francisco Olivera, Ph.D., P.E. Srikanth Koka Department of Civil Engineering Texas A&M University.
CE 525. REGRESSION VIDEO Return Quiz Why regression? Re-watch video as it will be on the midterm! 1. This is the difference between actual observed values.
Summation Notation, Percentiles and Measures of Central Tendency Overheads 3.
Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.
Downloading the MAXENT Software
(Unit 6) Formulas and Definitions:. Association. A connection between data values.
Use of Maxent for predictive habitat mapping of CWC in the Bari canyon Bargain Annaëlle Foglini Federica, Bonaldo Davide, Pairaud Ivane & Fabri Marie-Claire.
Lecture 18: Spatial Analysis Using Rasters Jeffery S. Horsburgh CEE 5190/6190 Geographic Information Systems for Civil Engineers Spring 2016.
Introduction to InVEST ArcGIS Tool
Deep Feedforward Networks
How Good is a Model? How much information does AIC give us?
James K Beard, Ph.D. April 20, 2005 SystemView 2005 James K Beard, Ph.D. April 20, 2005 April 122, 2005.
Spatial Analysis: Raster
Modeling Species Distribution with MaxEnt Bryce Maxell, Acting Director, Montana Natural Heritage Program & Scott Story, Nongame Data Manager, Montana.
Downloading and Preparing GIS Precipitation Data Layers
Introduction to Statistics
Basic Statistical Terms
Preparing raster files for Condatis
Spatial Analysis: Raster
Spatial Statistics A 15 minute Tour….
Clip & Convert to ASCII Program Kelly Knapp Spring 2010
Spatial Analysis with Raster Datasets-1
More on Maxent Env. Variable importance:
Presentation transcript:

Modeling Species Distribution with MaxEnt Bryce Maxell, Acting Director, Montana Natural Heritage Program & Scott Story, Nongame Data Manager, Montana Fish, Wildlife and Parks

Agenda - Wednesday 8-9 Introduction to MaxEnt 9:05-10 Reptile and Amphibian Model Examples 10:05-11 Installation and Walkthrough of MaxEnt 11:05-12 Preparation of Data 12-1 Lunch 1-1:55 Thresholds & Model Validation 2-3 Using models in your DSS 3 - 5 Hands-on Session Tomorrow 8-11 Hands-on, Data Prep, Questions & Discussion

About to start again folks on the phone.

Installing and Running MaxEnt INSTALLATION

Download & Install http://www.cs.princeton.edu/~schapire/maxent/ Current MaxEnt Version = 3.3.3e Requires Java Version 1.4 or later Type java –version at command prompt http://www.java.com Extract the .zip file to a very simple directory No spaces, no strange characters, short C:\maxent Three files are installed Maxent.bat Maxent.jar Readme.txt Download the tutorial Word document

Check Java Version

Set PATH and customize .bat file My Computer  Properties  Advanced  Environment Variables  System Variables  PATH  Edit Add to end of the PATH  ;c:\maxent Change the maxent.bat file Change the extension to .txt so that you can edit it with Notepad Change line reading java -mx512m -jar maxent.jar to… java -mx512m -jar c:\maxent\maxent.jar Change the extension back to .bat Note that changing the 512 to another number allocates more memory 512 Mb = 0.5 Gb 1024 = 1 Gb 1536 = 1.5 Gb 2048 = 2 Gb

Running MaxEnt Basic modeling run

Required Inputs Species presence localities (“samples”) file Environmental feature layers Output directory Note that coordinate systems other than Lat/Long are permitted but the samples file coordinate system must match that of the environmental feature layers We will talk more about preparing feature datasets in the next discussion

MaxEnt – Main Screen

Supply presence localities

The file can have a .txt or a .csv extension. The file should have a header Multiple species are permitted The x coordinate must come before the y coordinate Note that duplicate points will be dropped (duplicates are those that fall in same grid cell) Keep track of the points that you use in a database if you want (might want to preserve a unique identifier from your point observation database) You will get warnings for any points that fall outside of any of the input feature layers

Supply folder containing environmental feature layers

Change variable types as necessary Supply an output directory

Ready to Run

What MaxEnt Does Reads through each layer to Determine type Create .mxe file for each layer in maxent.cache Extracts the random background and sample data You will get warnings about points that are “missing some environmental data” Calculates the gain until a threshold is reached Creates the output grids for each species (this takes the longest) Creates the thumbnail .png images

Time Required Ten feature layers (3 categorical) 2 Species 46 million pixels 2 Species Intel Core 2 Quad CPU (2.83 GHz) 4.00 GB RAM Windows 7 32-bit Operating System 512Mb of memory specified Without maxent.cache = 38 minutes With maxent.cache = 24 minutes

Running MaxEnt Examining output

Output plots folder logfile maxentResults.csv For each species .asc .html .lambdas _omission.csv _sampleAverages.csv _samplePredictions.csv maxentResults.csv contains all of the variable importantce, threshold information, one row per species Html has pointers to a variety of plots omission receiver operating curve table of threshold values pictures of the model analysis of variable contribution raw data outputs and control parameters

Logfile Timestamp Version of MaxEnt Samples file name Warnings Command line to repeat Species Layers Layertypes Directories for: samples file, layers, output Number of samples Maximum gain

Gain Closely related to deviance, a measure of GOF in GAM and GLM Starts at zero and heads toward an asymptote MaxEnt trying to come up with best fit Average log probability of presence samples minus a constant Gain indicates how closely the model is concentrated around presence samples Avg likelihood of presence samples = exp(gain)

Gain Examples McCown’s Longspur Olive-sided Flycatcher Resulting gain: 2.275 Average likelihood for presence points = 9.728 Olive-sided Flycatcher Resulting gain: 1.297 Average likelihood for presence points = 3.658 Average likelihood of the presence sample is X times higher than that of a background pixel

Html Analysis of omission/commission Receiver Operating Curve (AUC calculated) Preset Thresholds Pictures of the Model Analysis of Variable Contributions Raw Outputs

Omission Rate vs. Cumulative Threshold

Receiver Operating Curve

Sample Predictions File Coordinates for all points Test or Training Predicted values Raw Cumulative Logistic Use this file to calculate deviance Use samples procedure in ArcMap to extract the ones and zeros (above threshold or not)

Sample Predictions File

Logistic Ouput High probability of suitable conditions Low predicted probability of suitable conditions White dots = training (1059 points or 75%) Purple dots = test (352 points or 25%)

Viewing Data in ArcMap Build Raster Attribute Table (Categorical) .vat.dbf Build Histograms (Classified) .aux Build Pyramids .rrd .xml For species output grids Convert ASCII to Raster (Output Data Type = FLOATING) Output as .bil (Band interleaved by line)

MORE Advanced parameters Running MaxEnt MORE Advanced parameters

Running MaxEnt Replicate runs

Running MaxEnt BATCH MODE

Preparation of Data Scott Story

Required Inputs Species presence localities (“samples”) file Environmental feature layers Output directory Note that coordinate systems other than Lat/Long are permitted but the samples file coordinate system must match that of the environmental feature layers We will talk more about preparing feature datasets in the next discussion

Getting Feature Data Ready Same projection (coordinate system, units, datum) Same resolution Same extent ESRI ascii format

Two Raster Datasets Land cover Precipitation Source = Montana Natural Heritage Program Type = IMAGINE Image Cell size = 30 meters Columns & Rows =33005, 24008 Spatial Reference = Montana State Plane (NAD83) Pixel Type = Unsigned Integer (8-bit) Source = PRISM Climate Center Type = ASCII grid Cell size = 0.0083333333 Columns & Rows = 7025, 3105 Spatial Reference = undefined (see metadata) Pixel Type = Signed Integer (32-bit)

Two Raster Datasets Land cover Precipitation

Making Rasters Match Define coordinate systems for both Set some environment variables Tools Options  Geoprocessing Tab  Environments General Settings: Extent and Snap Raster Raster Analysis Settings: Cell Size, Mask Project Raster Select target raster to match for output cell size

Precipitation Reprojected & Resampled Same exact extent Same exact number or rows & columns Same exact cell size Real test is…does Maxent throw any errors? In this case…it worked! Getting all your data layers squared away will take some time!

Deriving New Raster Data - Ruggedness

Types of Environmental Features Continuous (Quantitative) Interval-scale (interval data, order, linear scale) Ordinal variables (scale unknown-transformed?, rank clear) Ratio-scale (interval data, ordered, not on linear scale, e. g. temp on F or C scale) Categorical (Qualitative) Nominal (e.g. gender) Ordinal (has order, e.g. low to great) Dummy variables from quantitative (classes) Name the ASCII files with CONT or CAT prefix

Preparing Point Data Create a separate file for each species Combine them all\groups of them into one file Probably want to retain a unique identifier May want to setup scripts in ArcGIS to extract presence data Might also want more control of how background data is selected Let’s look at an example script - ExtractModelInputData.py

Other “Feature” Layers Masks useful if you want to train a model using only a subset of the region mask.asc containing a constant value (1, for example) in area of interest and no-data values everywhere else. Bias assumption that species occurrence data are unbiased good understanding of the spatial pattern values should indicate relative sampling effort

Representing the output THRESHOLDS

Logistic Output (Ranges 0-1)

Reclassify with ArcGIS

Preset MaxEnt Thresholds Cumulative Threshold Logistic Threshold Fractional Predicted Area Training Omission Rate Test Omission Rate Fixed Cumulative Value 1 1 0.043 0.344 0.002 0.000 Fixed Cumulative Value 5 5 0.172 0.255 0.020 Fixed Cumulative Value 10 10 0.260 0.210 0.044 0.082 Minimum Training Presence 0.699 0.029 0.365 10 Percentile Training Presence 17.522 0.351 0.167 0.099 0.151 Equal Training Sensitivity & Specificity 21.989 0.393 0.149 0.148 0.205 Maximum Training Sensitivity Plus Specificity 9.201 0.248 0.216 0.035 0.065 Equal test sensitivity & specificity 18.603 0.361 0.162 0.106 Maximum test sensitivity plus specificity 7.729 0.225 0.228 Balance Training Omission, Predicted Area, &Threshold Value 1.054 0.047 0.342 Equate Entropy of Thresholded & Original Distributions 5.465 0.182 0.250 0.021 0.026 Purple row/column are only calculated in test data is specified. Column of p-values is also present at far right. One-sided p-value for test of null hypothesis that test points are predicted no better than by a random prediction with the same fractional predicted area The “balance” threshold minimizes 6 * training omission rate + 0.04 * cumulative threshold + 1.6 * fractional predicted area

Thresholds – Ends of Spectrum Balance Training Omission, Predicted Area, &Threshold Value Equal Training Sensitivity & Specificity

Model Validation MODEL VALIDATION

Validation Metrics Receiver Operating Curve – obtained by plotting, for each threshold in this range, the proportion of true positive against the proportion of false positive Area Under Curve – computed by computing the area under the above described curve Deviance – 2 times the log probability of the test data. Absolute Validation Index - the proportion of presence evaluation points falling above the threshold or within the GAP predicted distribution Point Biserial Correlation - The correlation between a model’s predictions and presence/absence in test data (regarded as a 01 variable)

_samplePredictions.csv

Discussion Point

Topics Left Data Prep Output Thresholds Validation Batch Replicates