Museum and Institute of Zoology PAS Warsaw Magdalena Żytomska Berlin, 6th September 2007.

Slides:



Advertisements
Similar presentations
DECISION TREES. Decision trees  One possible representation for hypotheses.
Advertisements

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.
Data preprocessing before classification In Kennedy et al.: “Solving data mining problems”
Self Organization of a Massive Document Collection
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 143, Brown James Hays 02/22/11 Many slides from Derek Hoiem.
Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/15/12.
Bioinformatics GIS Applications Anatoly Petrov.
Maxent interface.
Feature Selection for Regression Problems
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Basic Data Mining Techniques Chapter Decision Trees.
Lecture 9 Hidden Markov Models BioE 480 Sept 21, 2004.
Mutual Information for Image Registration and Feature Selection
Analysis of Simulation Input.. Simulation Machine n Simulation can be considered as an Engine with input and output as follows: Simulation Engine Input.
October 14, 2010Neural Networks Lecture 12: Backpropagation Examples 1 Example I: Predicting the Weather We decide (or experimentally determine) to use.
Modeling Effects of Anthropogenic Impact and Climate in the Distribution of Threatened and Endangered Species in Florida Background Protection of natural.
Optimization of thermal processes2007/2008 Optimization of thermal processes Maciej Marek Czestochowa University of Technology Institute of Thermal Machinery.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
CHAPTER 12 ADVANCED INTELLIGENT SYSTEMS © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang.
Weed mapping tools and practical approaches – a review Prague February 2014 Weed mapping tools and practical approaches – a review Prague February 2014.
Single-Factor Experimental Designs
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
Chapter 9 – Classification and Regression Trees
Basic Statistics for Engineers. Collection, presentation, interpretation and decision making. Prof. Dudley S. Finch.
Ryan DiGaudio Modified from Catherine Jarnevich, Sunil Kumar, Paul Evangelista, Jeff Morisette, Tom Stohlgren Maxent Overview.
Module 1: Statistical Issues in Micro simulation Paul Sousa.
Comparison of Bayesian Neural Networks with TMVA classifiers Richa Sharma, Vipin Bhatnagar Panjab University, Chandigarh India-CMS March, 2009 Meeting,
Candidate KBA Identification: Modeling Techniques for Field Survey Prioritization Species Distribution Modeling: approximation of species ecological niche.
GEOSTATISICAL ANALYSIS Course: Special Topics in Remote Sensing & GIS Mirza Muhammad Waqar Contact: EXT:2257.
Role of Spatial Database in Biodiversity Conservation Planning Sham Davande, GIS Expert Arid Communities Technologies, Bhuj 11 September, 2015.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Selecting Input Probability Distribution. Simulation Machine Simulation can be considered as an Engine with input and output as follows: Simulation Engine.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
Enrique Martínez-Meyer
Model Selection and Validation. Model-Building Process 1. Data collection and preparation 2. Reduction of explanatory or predictor variables (for exploratory.
Simulation is the process of studying the behavior of a real system by using a model that replicates the system under different scenarios. A simulation.
Statistics What is statistics? Where are statistics used?
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
1 1 Slide Simulation Professor Ahmadi. 2 2 Slide Simulation Chapter Outline n Computer Simulation n Simulation Modeling n Random Variables and Pseudo-Random.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Ryan DiGaudio Modified from Catherine Jarnevich, Sunil Kumar, Paul Evangelista, Jeff Morisette, Tom Stohlgren Maxent Overview.
Mapping distributions of marine organisms using environmental niche modelling - AquaMaps K. Kaschner, J. Ready, S. Kullander, R. Froese and many more….INCOFISH,
Tutorial I: Missing Value Analysis
September 28, 2000 Improved Simultaneous Data Reconciliation, Bias Detection and Identification Using Mixed Integer Optimization Methods Presented by:
MATHS CORE AND OPTIONAL ASSESSMENT STANDARDS Found in the Subject Assessment Guidelines (SAG) Dated January 2008.
Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 02/22/11.
Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.
Computacion Inteligente Least-Square Methods for System Identification.
Chapter 11 – Neural Nets © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
Computer Simulation Henry C. Co Technology and Operations Management,
INTRODUCTION TO STATISTICS
Chapter 7. Classification and Prediction
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Introduction to Summary Statistics
Introduction to Summary Statistics
Chapter 10 Verification and Validation of Simulation Models
Morgan Bruns1, Chris Paredis1, and Scott Ferson2
Introduction to Summary Statistics
Introduction to Summary Statistics
Neural Networks and Their Application in the Fields of Coporate Finance By Eric Séverin Hanna Viinikainen.
Introduction to Summary Statistics
Introduction to Summary Statistics
Introduction to Summary Statistics
On Maxent Jorge Soberon University of Kansas.
Multi-Information Based GCPs Selection Method
Introduction to Summary Statistics
More on Maxent Env. Variable importance:
Presentation transcript:

Museum and Institute of Zoology PAS Warsaw Magdalena Żytomska Berlin, 6th September 2007

MIIZ participation in EDIT Predictive distribution modelling (BGBM) Organise an eConference on predictive modelling techniques (BGBM 0.5 PM/MIIZ 0.5 PM) Trial maintenance and user documentation (MIIZ /HNHM ) Compile test data (MIIZ 4 PM/HNHM 4.5 PM) Software installation and testing (MIIZ 3.5 PM) Bug reports (MIIZ 2 PM/HNHM 4.5 PM) Deliverables D5.4.3 eConference on possibilities and drawbacks of modelling techniques (Month 19) (BGBM /MIIZ). D5.4.5 Predictive distribution modelling report (Month 21) (BGBM /MIIZ) D Test data report (Month 25) (MIIZ/HNHM)

Predictive distribution modelling Input data:  sample record data, point vector layer  background (environmental data), raster data (usualy in ASCII format)

Input data If environmental layers come from different sources, they have to meet specific conditions: the same spatial resolution, same coordinates of each pixel (initial pixel), accuracy of preserving spatial resolution or coordinates must be the same, each layer must contain equal number of pixels.

Choosing background data Histogram analysis as a method of selecting a subset of environmetal layers which have most significant impact on species distribution. Histogram shows the frequency distribution of values of each climatic parameter throughout the species’ known range:  Normal or skewed distribution - parameter may be an important influence on species distribution,  Truncated distribution – parameter is irrelevant, species could tolerate other values of this parameter that were not included in the species climatic envelope,  No pattern distribution - parameter does not appear to influence the species distribution

Predictive modelling algorithms. BIOCLIM Is based on a boxcar environmental envelope algorithm. The minimum and maximum values for each environmental predictor define the multidimensional environmental box where the element is known to occur. Study area sites that hale environmental conditions within the boundaries of this multidimensional box are predicted as potential sites of occupancy.

Predictive modelling algorithms. BIOCLIM Advantages:  uses presence occurrence data only,  suitable when the number of known records is extremely low (BioClim is particularly useful modeling system for use with threatened species),  model easily interpreted and represented as a predicted distribution map.

Predictive modelling algorithms. BIOCLIM Disadvantages:  a tendency to over-predict,  does not address potential correlations and interactions among environmental variables,  gives equal weight to all environmental predictors,  sensitive to outliers and sampling bias,  cannot use categorical data,  no procedure for variable selection,

Predictive modelling algorithms. MAXENT- maximum entropy approach MaxEnt estimates the most uniform distribution (maximum entropy) of the occurrence points across the study area given the constraint that the expected value of each environmental predictor variable under this estimated distribution matches its empirical average (average values for the set occurrence data). The program starts with an uniform probability distribution and iteratively altering one weight at a time to maximize the likelihood to reach the optimum probability distribution. The algorithm is guaranteed to converge and therefore the outputs are deterministic.

Predictive modelling algorithms. MAXENT Advantages:  uses presence occurrence data only,  probability distribution mathematically defined therefore model formulation relatively transparent,  can consider interactions between environmental variables,  provides ability to consider polynomial transformations of the environmental predictors,  potential to investigate the influence each environmental predictor has on the elements distribution,  relatively easy to run, stand alone software,  seems to perform relatively well with small sample sizes of occurrence data.

Predictive modelling algorithms. MAXENT Disadvantages:  no procedure for variable selection,  extremely computer intensive,  limited experiments investigating potential weaknesses when dealing with biased sampling,

MaxENT output map

Predictive modelling algorithms. Desktop GARP Genetic Algorithm for Rule-set Prediction (GARP) uses several predictive modelling algorithms. A GARP run begins by dividing the element occurrence data set into two subsets: training and test dataset.The first rule is generated by applying one of the four algorithms and evaluating the omission and commission errors. In the next iteration it resamples the occurrence data again applying another algorithm to create another rule. The model is then evaluated and changes in prediction accuracy will determine whether to incorporate or disregard the rule from the rule set. This process is repeated until it cannot create a better model or it has reached the maximum number of iterations set by the user

Predictive modelling algorithms. Desktop GARP: Advantages:  uses presence occurrence data only,  relatively easy to run, stand alone software,  theoretically GARP should perform better than individual implementations of the algorithms it employs, since it searches for and applies only the most appropriate ‘rules’.

Predictive modelling algorithms. Desktop GARP Disadvantages:  not easily interpreted, a black box,  prediction maps not deterministic, outputs will be different between GARP runs even when using the same occurrence data,  generates pseudo-absences and does not allow one to substitute collected absence data,  tendency for commission errors,  no procedure for variable selection.

Desktop GARP output map