Geo479/579: Geostatistics Ch15. Cross Validation.

Slides:



Advertisements
Similar presentations
Spatial point patterns and Geostatistics an introduction
Advertisements

SPATIAL DATA ANALYSIS Tony E. Smith University of Pennsylvania Point Pattern Analysis Spatial Regression Analysis Continuous Pattern Analysis.
Geo479/579: Geostatistics Ch14. Search Strategies.
WFM 6202: Remote Sensing and GIS in Water Management © Dr. Akm Saiful IslamDr. Akm Saiful Islam WFM 6202: Remote Sensing and GIS in Water Management Akm.
Data Sources The most sophisticated forecasting model will fail if it is applied to unreliable data Data should be reliable and accurate Data should be.
Evaluating Hypotheses
Ordinary Kriging Process in ArcGIS
Experimental Evaluation
The Basics  A population is the entire group on which we would like to have information.  A sample is a smaller group, selected somehow from.
On Comparing Classifiers: Pitfalls to Avoid and Recommended Approach Published by Steven L. Salzberg Presented by Prakash Tilwani MACS 598 April 25 th.
Applications in GIS (Kriging Interpolation)
Method of Soil Analysis 1. 5 Geostatistics Introduction 1. 5
Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: prediction Original citation: Dougherty, C. (2012) EC220 - Introduction.
1 CSI5388 Data Sets: Running Proper Comparative Studies with Large Data Repositories [Based on Salzberg, S.L., 1997 “On Comparing Classifiers: Pitfalls.
Geo479/579: Geostatistics Ch13. Block Kriging. Block Estimate  Requirements An estimate of the average value of a variable within a prescribed local.
Copyright © Cengage Learning. All rights reserved. 8 Tests of Hypotheses Based on a Single Sample.
RESEARCH A systematic quest for undiscovered truth A way of thinking
Geo479/579: Geostatistics Ch17. Cokriging
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 7. Using Probability Theory to Produce Sampling Distributions.
Lecture 14 Dustin Lueker. 2  Inferential statistical methods provide predictions about characteristics of a population, based on information in a sample.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Biostatistics IV An introduction to bootstrap. 2 Getting something from nothing? In Rudolph Erich Raspe's tale, Baron Munchausen had, in one of his many.
Geo479/579: Geostatistics Ch12. Ordinary Kriging (1)
Statistics and Quantitative Analysis Chemistry 321, Summer 2014.
Eng.Mosab I. Tabash Applied Statistics. Eng.Mosab I. Tabash Session 1 : Lesson 1 IntroductiontoStatisticsIntroductiontoStatistics.
Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values.
One-sample In the previous cases we had one sample and were comparing its mean to a hypothesized population mean However in many situations we will use.
Assessing the quality of spatial predictions Xiaogang (Marshall) Ma School of Science Rensselaer Polytechnic Institute Tuesday, Mar 26, 2013 GIS in the.
Statistical analysis Prepared and gathered by Alireza Yousefy(Ph.D)
GEOSTATISICAL ANALYSIS Course: Special Topics in Remote Sensing & GIS Mirza Muhammad Waqar Contact: EXT:2257.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Spatial Interpolation III
3 Averages and Variation
Statistical Inference Statistical Inference involves estimating a population parameter (mean) from a sample that is taken from the population. Inference.
Chapter 9 Three Tests of Significance Winston Jackson and Norine Verberg Methods: Doing Social Research, 4e.
Geo479/579: Geostatistics Ch4. Spatial Description.
Class 4 Ordinary Least Squares CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Overview.
Chapter Seventeen. Figure 17.1 Relationship of Hypothesis Testing Related to Differences to the Previous Chapter and the Marketing Research Process Focus.
Issues concerning the interpretation of statistical significance tests.
Chapter 3: Statistical Significance Testing Warner (2007). Applied statistics: From bivariate through multivariate. Sage Publications, Inc.
Lecture 6: Point Interpolation
Interpolation and evaluation of probable Maximum Precipitation (PMP) patterns using different methods by: tarun gill.
Chapters 6 & 7 Overview Created by Erin Hodgess, Houston, Texas.
Geo479/579: Geostatistics Ch7. Spatial Continuity.
Copyright © 2009 Pearson Education, Inc. 8.1 Sampling Distributions LEARNING GOAL Understand the fundamental ideas of sampling distributions and how the.
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman 1 Assumptions 1) Sample is large (n > 30) a) Central limit theorem applies b) Can.
8-1 MGMG 522 : Session #8 Heteroskedasticity (Ch. 10)
Geo479/579: Geostatistics Ch12. Ordinary Kriging (2)
Geo597 Geostatistics Ch11 Point Estimation. Point Estimation  In the last chapter, we looked at estimating a mean value over a large area within which.
Lecture 8: Measurement Errors 1. Objectives List some sources of measurement errors. Classify measurement errors into systematic and random errors. Study.
Copyright © Cengage Learning. All rights reserved. 8 9 Correlation and Regression.
CWR 6536 Stochastic Subsurface Hydrology Optimal Estimation of Hydrologic Parameters.
The Statistical Imagination Chapter 7. Using Probability Theory to Produce Sampling Distributions.
Statistical analysis.
Data Analysis.
Nature of Estimation.
Assumptions For testing a claim about the mean of a single population
Statistical analysis.
Ch9 Random Function Models (II)
Sampling and Sampling Distributions
Lecture 19: Spatial Interpolation II
8.1 Sampling Distributions
Application of Geostatistical Analyst in Spatial Interpolation
Section 7.7 Introduction to Inference
Sampling Distributions
Concepts and Applications of Kriging
MGS 3100 Business Analysis Regression Feb 18, 2016
Chapter 14, part C Goodness of Fit..
Presentation transcript:

Geo479/579: Geostatistics Ch15. Cross Validation

Why is Cross Validation Useful?  Cross validation (CV) allows us to compare estimated and true values using only the information available in the sample data set  CV may help us to choose between different weighting procedures, search strategies, variogram models, or estimation methods

Why is Cross Validation Useful..  In practice, CV results are often used simply to compare the distribution of the estimation errors or residuals from different estimation procedures and choose the one that works better  A careful study of the spatial distribution of cross validated residuals (estimated minus true values) can provide insights into where an estimation procedure may run into trouble

Cross Validation Method  The sample value at a particular location is temporarily removed from the sample data set

Cross Validation Method..  The value at the same location is then estimated using the remaining samples  Once the estimation is calculated we can compare it to the true sample value that was initially removed from the sample data set  This procedure is repeated for all available samples

CV as a Quantitative Tool  Table 15.2 shows that kriging is better because the estimation errors from ordinary kriging have a mean closer to 0 and have less spread

CV as a Quantitative Tool.. Smooth Effect !!!

CV as a Quantitative Tool..  One of the factors that limits the conclusions that can legitimately be drawn from a cross validation exercise is recurring problem of clustering  =>If our original sample data set is spatially clustered, then so, too, are our cross validated residuals. Therefore, some conclusions drawn from it may be applicable to the entire map area, others may not

CV as a Qualitative Tool  Figure 15.4 shows a map of the ordinary kriging residuals from the cross validation study. A “+” symbol indicates an overestimation, and a “- “symbol for underestimation.  We prefer them to be conditionally unbiased with respect to their location. On this type of display we hope to see the “+” and “-“ symbols are mixed.

Type 1 and Type 2 Samples These are two values of an indicator variable, T. This variable is explained on p4-6. Its statistical and spatial distribution is displayed on p73-75

CV as a Qualitative Tool..  In Figure 15.4 there is a fairly large patch of positive residuals around 110E, 180N  Most of the samples in this area are type 1 samples (type 1: T=1; type 2: T=2), so we need to consider how the ordinary kriging approach performs for the other type 1 samples

CV as a Qualitative Tool..  We focus on type 1 because of the specific goal. To improve the estimation, we expand the 25m search radius to 30m. The residuals were improved and shown in Figure 15.6  CV can also bring frustration since it often reveals problems that do not have straightforward solutions

CV as a Goal- Oriented Tool  Imagine the Walker Lake data set is an ore deposit, suppose that economic cutoff is 300 ppm; material with an ore grade of greater than 300 ppm will be classified as ore. Material less than 300 ppm will be classified as waste.  Figure 15.7: There are two types of misclassification FalseNegativeError False Positive Error Ore Waste

CV as a Goal- Oriented Tool..  For applications in which misclassification has important consequences, the minimization of the misclassification may be a much more relevant criterion than the various statistical criteria  The magnitude of misclassification is less important than the misclassification itself

Limitations of Cross Validation  CV can generate pairs of true and estimated values only at sample locations  Clustering problem in the sample data set  In practice, the residuals may be more representative of only certain regions or particular ranges of values

Limitations of Cross Validation..  Clustering problem can be overcome either by calculating declustered mean of residuals or by performing CV at a selected subset of locations that is representative of the entire study area  If very close nearby samples are not available in the actual estimation, it makes little sense to include them in CV  The problem areas identified by cross validation may warrant additional sampling, especially when there are major consequences