Ryan DiGaudio Modified from Catherine Jarnevich, Sunil Kumar, Paul Evangelista, Jeff Morisette, Tom Stohlgren Maxent Overview.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

A Climate-based Interpretation of Limber Pine Management Scenarios in Rocky Mountain National Park Contributors: Bill Monahan, Tammy Cook, Jeff Connor,
Chapter 16 Inferential Statistics
Brief introduction on Logistic Regression
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Maximum Entropy spatial modeling with imperfect data.
Lecture 22: Evaluation April 24, 2010.
Sensitivity Analysis for Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare Research and Quality (AHRQ)
Maxent interface.
BHS Methods in Behavioral Sciences I April 25, 2003 Chapter 6 (Ray) The Logic of Hypothesis Testing.
Statistical Methods Chichang Jou Tamkang University.
A COMPARISON OF APPROACHES FOR VERIFYING SOUTHWEST REGIONAL GAP VERTEBRATE-HABITAT DISTRIBUTION MODELS J. Judson Wynne, Charles A. Drost and Kathryn A.
PSY 307 – Statistics for the Behavioral Sciences
Darlene Goldstein 29 January 2003 Receiver Operating Characteristic Methodology.
1 Validation and Verification of Simulation Models.
Performance Evaluation in Computer Vision Kyungnam Kim Computer Vision Lab, University of Maryland, College Park.
Lecture 10 Comparison and Evaluation of Alternative System Designs.
Maxent Implements “Maximum Entropy” modeling –Entropy = randomness –Maximizes randomness by removing patterns –The pattern is the response Website with.
Educational Research by John W. Creswell. Copyright © 2002 by Pearson Education. All rights reserved. Slide 1 Chapter 8 Analyzing and Interpreting Quantitative.
The Bell Shaped Curve By the definition of the bell shaped curve, we expect to find certain percentages of the population between the standard deviations.
Decision analysis and Risk Management course in Kuopio
Decision Tree Models in Data Mining
Today Evaluation Measures Accuracy Significance Testing
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
Confidence Intervals and Hypothesis Testing - II
Hypothesis Testing.
BASIC STATISTICS: AN OXYMORON? (With a little EPI thrown in…) URVASHI VAID MD, MS AUG 2012.
CHAPTER 4 Research in Psychology: Methods & Design
Assessment of Model Development Techniques and Evaluation Methods for Binary Classification in the Credit Industry DSI Conference Jennifer Lewis Priestley.
Ecological Niche Factor Analysis-ENFA
StAR web server tutorial for ROC Analysis. ROC Analysis ROC Analysis: This module allows the user to input data for several classifiers to be tested.
9th International Symposium on Wild Boar and others Suids, Hannover 2012 Factors influencing wild boar presence in agricultural landscape: a habitat suitability.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Instructor Resource Chapter 5 Copyright © Scott B. Patten, Permission granted for classroom use with Epidemiology for Canadian Students: Principles,
● Final exam Wednesday, 6/10, 11:30-2:30. ● Bring your own blue books ● Closed book. Calculators and 2-page cheat sheet allowed. No cell phone/computer.
1 Lecture 19: Hypothesis Tests Devore, Ch Topics I.Statistical Hypotheses (pl!) –Null and Alternative Hypotheses –Testing statistics and rejection.
EVIDENCE ABOUT DIAGNOSTIC TESTS Min H. Huang, PT, PhD, NCS.
Candidate KBA Identification: Modeling Techniques for Field Survey Prioritization Species Distribution Modeling: approximation of species ecological niche.
Museum and Institute of Zoology PAS Warsaw Magdalena Żytomska Berlin, 6th September 2007.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 16.
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
1 Snow depth distribution Neumann et al. (2006). 2.
Evaluating Impacts of MSP Grants Hilary Rhodes, PhD Ellen Bobronnikov February 22, 2010 Common Issues and Recommendations.
Jennifer Lewis Priestley Presentation of “Assessment of Evaluation Methods for Prediction and Classification of Consumer Risk in the Credit Industry” co-authored.
Enrique Martínez-Meyer
Chapter 10 Verification and Validation of Simulation Models
Academic Research Academic Research Dr Kishor Bhanushali M
HYPOTHESIS TESTING Null Hypothesis and Research Hypothesis ?
Evaluating Impacts of MSP Grants Ellen Bobronnikov Hilary Rhodes January 11, 2010 Common Issues and Recommendations.
C82MST Statistical Methods 2 - Lecture 1 1 Overview of Course Lecturers Dr Peter Bibby Prof Eamonn Ferguson Course Part I - Anova and related methods (Semester.
BOT / GEOG / GEOL 4111 / Field data collection Visiting and characterizing representative sites Used for classification (training data), information.
Module 1: Measurements & Error Analysis Measurement usually takes one of the following forms especially in industries: Physical dimension of an object.
Ryan DiGaudio Modified from Catherine Jarnevich, Sunil Kumar, Paul Evangelista, Jeff Morisette, Tom Stohlgren Maxent Overview.
Machine Learning 5. Parametric Methods.
HYPOTHESIS TESTING Null Hypothesis and Research Hypothesis ?
1 Chapter 5 – Density estimation based on distances The distance measures were originally developed as an alternative to quadrat sampling for estimating.
Testing Predictive Performance of Ecological Niche Models A. Townsend Peterson, STOLEN FROM Richard Pearson.
+ Unit 6: Comparing Two Populations or Groups Section 10.2 Comparing Two Means.
Chapter 13 Understanding research results: statistical inference.
BHS Methods in Behavioral Sciences I May 9, 2003 Chapter 6 and 7 (Ray) Control: The Keystone of the Experimental Method.
 1 Species Richness 5.19 UF Community-level Studies Many community-level studies collect occupancy-type data (species lists). Imperfect detection.
Downloading the MAXENT Software
Use of Maxent for predictive habitat mapping of CWC in the Bari canyon Bargain Annaëlle Foglini Federica, Bonaldo Davide, Pairaud Ivane & Fabri Marie-Claire.
Chapter 13 Simple Linear Regression
CHAPTER 4 Research in Psychology: Methods & Design
MODELING THE CURRENT AND FUTURE DISTRIBUTIONS OF
Modelling alien invasives using the GARP system
Invasive Species.
Invasive Species.
More on Maxent Env. Variable importance:
Presentation transcript:

Ryan DiGaudio Modified from Catherine Jarnevich, Sunil Kumar, Paul Evangelista, Jeff Morisette, Tom Stohlgren Maxent Overview

Objectives By the end of this section, you should be able to: Describe, broadly, Maxent modeling and some examples of its application Give some examples of why you would and would not use Maxent State the assumptions and limitations of Maxent Describe the inputs and outputs of Maxent Articulate what the mapped predictions from Maxent mean Evaluate model predictive performance Ask the most important questions when handed a Maxent model Construct and run a “customized” Maxent model

Why Maxent for this Workshop Don’t have time for all methods Heavily used in the literature and well known Consistently performs as well or better than other methods in comparisons One of the more approachable methods September 2013Best Practices for Systematic Conservation Planning 3

Background: The model Maxent is a general purpose method that makes predictions from incomplete information Machine learning method Used in many fields Generative approach Estimates an unknown probability by finding the probability distribution of maximum entropy given some constraints

Background: Specifically for SDM Developed by Steven Phillips at AT&T Labs- Research Many applications Often termed ‘Presence-only’, but closer to resource availability Provides a model of habitat suitability

Why Use Maxent Best suited for exploratory or opportunistically collected data Presence only data Maximize the utility of publically available museum and herbaria data Small number of occurrences Poor or absent information on sampling methods Not confident in absences Continually performs well in comparison with other methods

Reasons to Not Use Maxent No amount of statistical modeling can compensate for poor definition of the problem (Austin, 2002) Have robust sampling method with confidence presence and absence data Somewhat of a black box compared to other methods Statistics are still new and underdeveloped More familiar with other SDM modeling method September 2013Best Practices for Systematic Conservation Planning 7 ?

Applications Find correlates to species distributions What might be important drivers to a species distribution Journal of Zoology Volume 279, Issue 1, pages 27-35, 13 MAY 2009 DOI: /j x Volume 279, Issue 1,

Applications Map/predict current distributions Conservation area delineation Identify new, undocumented populations New populations of endangered species Identify population for early detection and rapid response Heikkinen et al. 2007

Applications Predict to new time or locations Evaluate how projected changes in climate will impact distributions Identify the potential distribution of an invasive species Diversity and Distributions Volume 13, Issue 4, pages , 1 JUN 2007 DOI: /j x

Assumptions and Limitations Individuals have been sampled randomly across landscape Sampling unit of size equal to the grain size of available environmental data Environmental variables represent all factors that constrain the geographical distribution of the species Species is in equilibrium with its environment Background locations are representative of the available environment and address bias

Assumptions and Limitations Occurrences represent the intended response of interest

Assumptions and Limitations Limitations Cannot estimate prevalence with presence-only Two species can have Same range Same distribution of occurrences Ones is rarer Sample selection bias (more on this later) Background locations are treated as pseudo-absences for model evaluation CPAW NPS

The inputs Presence locations Background locations Random or user provided Environmental information (for presence and background)

The inputs ASCII grids of area of interest (optional) Sampling effort (optional) Test points (optional) Projection ASCII grids (optional)

Maxent Graphical User Interface September 2013Best Practices for Systematic Conservation Planning 16

Features Linear Quadratic Product September 2013Best Practices for Systematic Conservation Planning 17 Threshold Hinge Categorical Product of two variables (X1*X2)Binary classification

Settings Features September 2013Best Practices for Systematic Conservation Planning 18 Sample sizeFeature type > 80All 15 – 79Linear, quadratic, hinge 10 – 14Linear, quadratic < 10Linear

Settings Output format Logistic 0-1 values Most common – Index of habitat suitability Raw Relative occurrence rate Cumulative Sum of all raw values less than or equal to the raw value at that location scaled to be between September 2013Best Practices for Systematic Conservation Planning 19

Settings MESS analysis Testing Regularization Replicates September 2013Best Practices for Systematic Conservation Planning 20

More Settings Extrapolate Clamping Bias file September 2013Best Practices for Systematic Conservation Planning 21

Outputs Html file modelResults.csv AUC evaluation Many different threshold values Variable importance Continuous probability prediction map.asc file with no projection defined MESS map (more on this later)

Omission rate Omission Type II = False Negative = alternative is true = Failing to raise alarm when there is a wolf Comission Type I error rate = False positive = null is true = crying wolf with no wolf September 2013Best Practices for Systematic Conservation Planning 23

Receiver-operating characteristic (ROC) Area Under the Curve (AUC) Sensitivity True positive rate (% of correctly classified positives) Specificity True negative rate (% of correctly classified negatives) Range 0 – 1 September 2013Best Practices for Systematic Conservation Planning 24 Fractional predicted area True Positive Rate

Picture of Predicted Model September 2013Best Practices for Systematic Conservation Planning 25 Standard deviation Logistic mean map

Response Curves Two types September 2013Best Practices for Systematic Conservation Planning 26 Variable is varied while all other variables held at the mean – look at changes in logistic prediction Model is run with only one variable – better for interpretations if there are correlations

Variable Contributions September 2013Best Practices for Systematic Conservation Planning 27 Percent contribution Calculated during model development from changes in the gain Permutation importance Each variable's values are changed at the training and background locations and the model re-evaluated Large change in metrics = high contribution in model

Jackknife How variables contribute to model performance September 2013Best Practices for Systematic Conservation Planning 28

Model documentation September 2013Best Practices for Systematic Conservation Planning29

Results csv Contains much of the information displayed in the html file September 2013Best Practices for Systematic Conservation Planning30

September 2013Best Practices for Systematic Conservation Planning31

Phillips et al, 2009, Ecological Applications 19: Target background – accounting for bias Sensitive to sampling bias Same bias in background  cancels out Random background Targeted background Sampling effort

Target backgound September 2013Best Practices for Systematic Conservation Planning33

Regularization September 2013Best Practices for Systematic Conservation Planning34 Regularization = 0.25 Regularization = 1.o Regularization = 3.o AUC =.98 AUC =.92 AUC =.88

Regularization September 2013Best Practices for Systematic Conservation Planning35 Regularization = 0.25 Regularization = 1.o Regularization = 2.o

Multivariate environmental similarity surface (MESS) Maps September 2013Best Practices for Systematic Conservation Planning36 Novel Environments Conditions that are outside the environmental range used to develop the model M0del predictions into these locations should be interpreted with caution

Clamping September 2013Best Practices for Systematic Conservation Planning37 Clamping on Clamping off

Thresholds September 2013Best Practices for Systematic Conservation Planning 38 Freeman, E. A., & Moisen, G. G. (2008). A comparison of the performance of threshold criteria for binary classification in terms of predicted prevalence and kappa. Ecological Modelling, 217(1), 48-58

September 2013Best Practices for Systematic Conservation Planning39 Maximizing sensitivity plus specificity 10% minimum training presence

Checking for Overfitting Test and Train AUC difference >0.05 may be overfit Response curves ecologically plausible? Prediction patchy? September 2013Best Practices for Systematic Conservation Planning 40

Evaluating Maxent Models and Predictions Ecological plausibility Threshold independent metrics E.g., AUC Threshold dependent metrics Many (Kappa, True Skill Statistic), but problems with threshold selection and background points used Independent data set Where are the novel areas Checks for overfitting

Questions to Ask/ Best practices What is the source of the occurrence data? How did you define the background? Why? Where there any signs of overfitting? Where are the novel areas? If threshold used, ask why that threshold was chosen Merow et al. (2013) provide a great guideline on the important decision that need to be made for a Maxent model September 2013Best Practices for Systematic Conservation Planning 42

September 2013Best Practices for Systematic Conservation Planning 43

September 2013Best Practices for Systematic Conservation Planning 44