Testing Predictive Performance of Ecological Niche Models A. Townsend Peterson, STOLEN FROM Richard Pearson.

Slides:



Advertisements
Similar presentations
Lecture 3 Validity of screening and diagnostic tests
Advertisements

Diagnostic Metrics Week 2 Video 3. Different Methods, Different Measures  Today we’ll continue our focus on classifiers  Later this week we’ll discuss.
Discrete Choice Modeling William Greene Stern School of Business IFS at UCL February 11-13, 2004
Learning Algorithm Evaluation
Step 3: Critically Appraising the Evidence: Statistics for Diagnosis.
Evaluation of segmentation. Example Reference standard & segmentation.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.
Lecture Notes for Chapter 4 (2) Introduction to Data Mining
Interpreting Kappa in Observational Research: Baserate Matters Cornelia Taylor Bruckner Vanderbilt University.
Maximum Entropy spatial modeling with imperfect data.
Receiver Operating Characteristic (ROC) Curves
Lecture 22: Evaluation April 24, 2010.
Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.
Model Evaluation Metrics for Performance Evaluation
CS 8751 ML & KDDEvaluating Hypotheses1 Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal.
Performance measures Morten Nielsen, CBS, BioCentrum, DTU.
© sebastian thrun, CMU, The KDD Lab Intro: Outcome Analysis Sebastian Thrun Carnegie Mellon University
Learning From Data Chichang Jou Tamkang University.
Darlene Goldstein 29 January 2003 Receiver Operating Characteristic Methodology.
ROC Curves.
Comparing Database Search Methods & Improving the Performance of PSI-BLAST Stephen Altschul.
CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 07: Cost-Sensitive Measures.
Today Evaluation Measures Accuracy Significance Testing
CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 01: Training, Testing, and Tuning Datasets.
Evaluating Classifiers
Chapter 8 Introduction to Hypothesis Testing
Criteria for Assessment of Performance of Cancer Risk Prediction Models: Overview Ruth Pfeiffer Cancer Risk Prediction Workshop, May 21, 2004 Division.
Evaluation – next steps
1 Evaluating Model Performance Lantz Ch 10 Wk 5, Part 2 Right – Graphing is often used to evaluate results from different variations of an algorithm. Depending.
Division of Population Health Sciences Royal College of Surgeons in Ireland Coláiste Ríoga na Máinleá in Éirinn Indices of Performances of CPRs Nicola.
Performance measurement. Must be careful what performance metric we use For example, say we have a NN classifier with 1 output unit, and we code ‘1 =
Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp )
Copyright © 2003, SAS Institute Inc. All rights reserved. Cost-Sensitive Classifier Selection Ross Bettinger Analytical Consultant SAS Services.
Ryan DiGaudio Modified from Catherine Jarnevich, Sunil Kumar, Paul Evangelista, Jeff Morisette, Tom Stohlgren Maxent Overview.
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 8 th Edition Chapter 9 Hypothesis Testing: Single.
Enrique Martínez-Meyer
Error & Uncertainty: II CE / ENVE 424/524. Handling Error Methods for measuring and visualizing error and uncertainty vary for nominal/ordinal and interval/ratio.
Evaluating Results of Learning Blaž Zupan
Computational Intelligence: Methods and Applications Lecture 16 Model evaluation and ROC Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Model Evaluation l Metrics for Performance Evaluation –How to evaluate the performance of a model? l Methods for Performance Evaluation –How to obtain.
Bab /57 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 2 Model Overfitting & Classifier Evaluation.
Prediction statistics Prediction generally True and false, positives and negatives Quality of a prediction Usefulness of a prediction Prediction goes Bayesian.
Evaluating Predictive Models Niels Peek Department of Medical Informatics Academic Medical Center University of Amsterdam.
Preventing Overfitting Problem: We don’t want to these algorithms to fit to ``noise’’ Reduced-error pruning : –breaks the samples into a training set and.
Pattern Classification an Diagnostic Decision Yongnan Ji.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
Evidence based medicine Diagnostic tests Ross Lawrenson.
1 Performance Measures for Machine Learning. 2 Performance Measures Accuracy Weighted (Cost-Sensitive) Accuracy Lift Precision/Recall –F –Break Even Point.
Ryan DiGaudio Modified from Catherine Jarnevich, Sunil Kumar, Paul Evangelista, Jeff Morisette, Tom Stohlgren Maxent Overview.
Evaluating Classification Performance
Quiz 1 review. Evaluating Classifiers Reading: T. Fawcett paper, link on class website, Sections 1-4 Optional reading: Davis and Goadrich paper, link.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Evaluating Classifiers Reading: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)An introduction to ROC analysis.
Evaluation of learned models Kurt Driessens again with slides stolen from Evgueni Smirnov and Hendrik Blockeel.
Hypothesis Testing Steps for the Rejection Region Method State H 1 and State H 0 State the Test Statistic and its sampling distribution (normal or t) Determine.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Data Analytics CMIS Short Course part II Day 1 Part 4: ROC Curves Sam Buttrey December 2015.
Performance measures Morten Nielsen, CBS, Department of Systems Biology, DTU.
Handbook for Health Care Research, Second Edition Chapter 11 © 2010 Jones and Bartlett Publishers, LLC CHAPTER 11 Statistical Methods for Nominal Measures.
Evaluating Classifiers. Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)
Model validation: Introduction to statistical considerations Miguel Nakamura Centro de Investigación en Matemáticas (CIMAT), Guanajuato, Mexico
Sensitivity, Specificity, and Receiver- Operator Characteristic Curves 10/10/2013.
Machine Learning – Classification David Fenyő
Evaluating Results of Learning
REMOTE SENSING Multispectral Image Classification
Screening, Sensitivity, Specificity, and ROC curves
Figure 1. Table for calculating the accuracy of a diagnostic test.
Roc curves By Vittoria Cozza, matr
More on Maxent Env. Variable importance:
Presentation transcript:

Testing Predictive Performance of Ecological Niche Models A. Townsend Peterson, STOLEN FROM Richard Pearson

Niche Model Validation Diverse challenges … –Not a single loss function or optimality criterion –Different uses demand different criteria –In particular, relative weights applied to omission and commission errors in evaluating models Nakamura: “which way is relevant to adopt is not a mathematical question, but rather a question for the user” –Asymmetric loss functions

Where do I get testing data????

(after Araújo et al Gl. Ch. Biol.) Model calibration and evaluation strategies: resubstitution 100% Same region Different region Different time Different resolution Evaluation Calibration Projection All available data

(after Araújo et al Gl. Ch. Biol.) Model calibration and evaluation strategies: independent validation 100% All available data Same region Different region Different time Different resolution Evaluation Calibration Projection

(after Araújo et al Gl. Ch. Biol.) Model calibration and evaluation strategies: data splitting 70% Test data Same region Different region Different time Different resolution Evaluation Calibration Projection Calibration data 30%

Types of Error

The four types of results that are possible when testing a distribution model (see Pearson NCEP module 2007)

Presence-absence confusion matrix Predicted present Predicted absent Recorded present Recorded (or assumed) absent a (true positive) c (false negative) b (false positive) d (true negative)

Thresholding

Selecting a decision threshold (p/a data) (Liu et al Ecography 29: )

Selecting a decision threshold (p/a data)

Omission (proportion of presences predicted absent) (c/a+c) Commission (proportion of absences predicted present) (b/b+d)

LPTT10 Selecting a decision threshold (p-o data)

Threshold-dependent Tests (= loss functions)

The four types of results that are possible when testing a distribution model (see Pearson NCEP module 2007)

Presence-absence test statistics Predicted present Predicted absent Recorded present Recorded (or assumed) absent a (true positive) c (false negative) b (false positive) d (true negative) Proportion (%) correctly predicted (or ‘accuracy’, or ‘correct classification rate’): (a + d)/(a + b + c + d)

Cohen’s Kappa: Presence-absence test statistics Predicted present Predicted absent Recorded present Recorded (or assumed) absent a (true positive) c (false negative) b (false positive) d (true negative)

Proportion of observed presences correctly predicted (or ‘sensitivity’, or ‘true positive fraction’): a/(a + c) Presence-only test statistics Predicted present Predicted absent Recorded present Recorded (or assumed) absent a (true positive) c (false negative) b (false positive) d (true negative)

Proportion of observed presences correctly predicted (or ‘sensitivity’, or ‘true positive fraction’): a/(a + c) Proportion of observed presences incorrectly predicted (or ‘omission rate’, or ‘false negative fraction’): c/(a + c) Presence-only test statistics Predicted present Predicted absent Recorded present Recorded (or assumed) absent a (true positive) c (false negative) b (false positive) d (true negative)

Presence-only test statistics: testing for statistical significance U. sikorae Leaf-tailed gecko (Uroplatus) U. sikorae Success rate: 4 from 7 Proportion predicted present: Binomial p = Success rate: 6 from 7 Proportion predicted present: Binomial p = 0.008

Proportion of observed (or assumed) absences correctly predicted (or ‘specificity’, or ‘true negative fraction’): d/(b + d) Absence-only test statistics Predicted present Predicted absent Recorded present Recorded (or assumed) absent a (true positive) c (false negative) b (false positive) d (true negative)

Proportion of observed (or assumed) absences correctly predicted (or ‘specificity’, or ‘true negative fraction’): d/(b + d) Proportion of observed (or assumed) absences incorrectly predicted (or ‘commission rate’, or ‘false positive fraction’): b/(b + d) Absence-only test statistics Predicted present Predicted absent Recorded present Recorded (or assumed) absent a (true positive) c (false negative) b (false positive) d (true negative)

AUC: a threshold-independent test statistic Predicted present Predicted absent Recorded presentRecorded (or assumed) absent a (true positive) c (false negative) b (false positive) d (true negative) sensitivity = a/(a+c) specificity = d/(b+d) (1 – omission rate) (fraction of absences predicted present)

1 - specificity sensitivity Predicted probability of occurrence Frequency set of ‘absences’ set of ‘presences’ set of ‘absences’ set of ‘presences’ Threshold-independent assessment: The Receiver Operating Characteristic (ROC) Curve AB C (check out: