A JMP® SCRIPT FOR GEOSTATISTICAL CLUSTER ANALYSIS OF MIXED DATA SETS WITH SPATIAL INFORMATION Steffen Brammer.

Slides:



Advertisements
Similar presentations
Introduction to Hypothesis Testing
Advertisements

Applications of one-class classification
Social network partition Presenter: Xiaofei Cao Partick Berg.
PARTITIONAL CLUSTERING
1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Sections 7-1 and 7-2 Review and Preview and Estimating a Population Proportion.
Confidence Intervals This chapter presents the beginning of inferential statistics. We introduce methods for estimating values of these important population.
Sampling Distributions and Sample Proportions
Image Segmentation Image segmentation (segmentace obrazu) –division or separation of the image into segments (connected regions) of similar properties.
Spring INTRODUCTION There exists a lot of methods used for identifying high risk locations or sites that experience more crashes than one would.
Spatial Analysis Longley et al., Ch 14,15. Transformations Buffering (Point, Line, Area) Point-in-polygon Polygon Overlay Spatial Interpolation –Theissen.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 8-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Chapter Topics Confidence Interval Estimation for the Mean (s Known)
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
7-2 Estimating a Population Proportion
Statistical Comparison of Two Learning Algorithms Presented by: Payam Refaeilzadeh.
Assumption of Homoscedasticity
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 15 The.
Sampling Distribution of the Mean Problem - 1
2nd Level Analysis Jennifer Marchant & Tessa Dekker
Chapter 10 Hypothesis Testing
Confidence Intervals and Hypothesis Testing - II
Lecture 7: Simulations.
10/03/2005NOV-3300-SL M. Weiss, F. Baret D. Allard, S. Garrigues From local measurements to high spatial resolution VALERI maps.
Disease Prevalence Estimates for Neighbourhoods: Combining Spatial Interpolation and Spatial Factor Models Peter Congdon, Queen Mary University of London.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
1 Introduction to Hypothesis Testing. 2 What is a Hypothesis? A hypothesis is a claim A hypothesis is a claim (assumption) about a population parameter:
1 Local search and optimization Local search= use single current state and move to neighboring states. Advantages: –Use very little memory –Find often.
Edoardo PIZZOLI, Chiara PICCINI NTTS New Techniques and Technologies for Statistics SPATIAL DATA REPRESENTATION: AN IMPROVEMENT OF STATISTICAL DISSEMINATION.
Copyright © Cengage Learning. All rights reserved. 2 Descriptive Analysis and Presentation of Single-Variable Data.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Selecting and Recruiting Subjects One Independent Variable: Two Group Designs Two Independent Groups Two Matched Groups Multiple Groups.
GEOSTATISICAL ANALYSIS Course: Special Topics in Remote Sensing & GIS Mirza Muhammad Waqar Contact: EXT:2257.
Overview of CCSS Statistics and Probability Math Alliance September 2011.
5-4-1 Unit 4: Sampling approaches After completing this unit you should be able to: Outline the purpose of sampling Understand key theoretical.
Chapter 9 Three Tests of Significance Winston Jackson and Norine Verberg Methods: Doing Social Research, 4e.
CS654: Digital Image Analysis
Spatial Analysis & Geostatistics Methods of Interpolation Linear interpolation using an equation to compute z at any point on a triangle.
Sampling  When we want to study populations.  We don’t need to count the whole population.  We take a sample that will REPRESENT the whole population.
AP Statistics Chapter 10 Notes. Confidence Interval Statistical Inference: Methods for drawing conclusions about a population based on sample data. Statistical.
Concepts and Applications of Kriging
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 7-5 Estimating a Population Variance.
Question paper 1997.
Esri UC 2014 | Technical Workshop | Concepts and Applications of Kriging Eric Krause Konstantin Krivoruchko.
Summarizing Risk Analysis Results To quantify the risk of an output variable, 3 properties must be estimated: A measure of central tendency (e.g. µ ) A.
Esri UC2013. Technical Workshop. Technical Workshop 2013 Esri International User Conference July 8–12, 2013 | San Diego, California Concepts and Applications.
Methods for Dummies Second level Analysis (for fMRI) Chris Hardy, Alex Fellows Expert: Guillaume Flandin.
Sampling and Confidence Interval How can it be that mathematics, being after all a product of human thought independent of experience, is so admirably.
Stochastic Hydrology Random Field Simulation Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
Color Image Segmentation Mentor : Dr. Rajeev Srivastava Students: Achit Kumar Ojha Aseem Kumar Akshay Tyagi.
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 8 th Edition Chapter 9 Hypothesis Testing: Single.
Lecture Slides Elementary Statistics Twelfth Edition
Christianne MARIZ, Laurent WAGNER
2nd Level Analysis Methods for Dummies 2010/11 - 2nd Feb 2011
CHAPTER 16 ECONOMIC FORECASTING Damodar Gujarati
Data Analysis in Particle Physics
Environmental Modeling Basic Testing Methods - Statistics
An Introduction to Statistics
CONCEPTS OF ESTIMATION
Stochastic Hydrology Random Field Simulation
Network Screening & Diagnosis
Concepts and Applications of Kriging
8.3 – Estimating a Population Mean
Concepts and Applications of Kriging
Lecture Slides Elementary Statistics Twelfth Edition
Introduction to Confidence Intervals
Discrimination and Classification
Introduction to Machine learning
MULTIFAN-CL implementation of deterministic and stochastic projections
Presentation transcript:

A JMP® SCRIPT FOR GEOSTATISTICAL CLUSTER ANALYSIS OF MIXED DATA SETS WITH SPATIAL INFORMATION Steffen Brammer

OBJECTIVE Example 1 Find a location for your luxury car sales outlet No customers ??? Why Location of outlet Data set w samples across city at specific locations Establish mean for each suburb to identify affluent clientele Reality Affluent Suburb

How much pesticide do you need to get rid of the bugs? Example 2 How much pesticide do you need to get rid of the bugs? Strong bug population only in trees along roads -> remove tree lines from your statistics Data set w samples at specific locations Establish mean for each field

Interpolation algorithm Example 3 Image processing Interpolation algorithm Available pixel Completed image Reality

High grade gold mineralisation in quartz vein stockwork Example 4 Mine geology Sample data Domains High grade gold mineralisation in quartz vein stockwork (pink lines)

MIXED DATA Single data set with two (or more) underlying populations that are independent of each other – need to separate into sub-sets (‘domains’) before any statistical analysis Challenge: allocate samples within the range of overlap to the correct domain

DECOMPOSITION OF DATA By statistical decomposition various methods and algorithms available In a spatial framework (that is, samples are not randomly distributed, but a spatial relationship exists between samples), not only the value of sample must be taken into account, but also its location By manual creation of polygons to separate various domains By geostatistical decomposition, eg geostatistical cluster analysis ROMARY, T., RIVOIRARD, J. et al. 2012. Domaining by Clustering Multivariate Geostatistical Data. In: ABRAHAMSEN et al. (eds) Geostatistics Oslo 2012, pp. 455-466, Springer, Dordrecht Conventional geostatistical methods struggle or fail when the clusters are intertwined with irregular, discontinuous or complex geometries New concept developed and applied using JMP® Assumption 1: Distribution of underlying populations are known – outcome after decomposition must honour the distribution Assumption 2: Populations occur in clusters with a certain degree of connectivity between its samples Brammer, S. 2015. Domaining of long-tailed bimodal data-sets with statistical methods. In: The Danie Krige Geostatistical Conference. SAIMM, Johannesburg. pp. 281-286 Brammer, S. 2015. A self-guiding domaining tool for long-tailed bi-modal data sets. In: Proceedings of the 17th annual conference of the International Association for Mathematical Geosciences. Sept 5-13, 2015, Freiberg (Saxony), Germany

CONCEPT & METHODOLOGY 1st step (a) Original sample data with small outlier population 1st step Establish statistical moments of underlying sample populations* *assuming both populations are approx. normally distributed Mean, spread, number of samples (a) Build target histogram of expected outcome (b) (b)

CONCEPT & METHODOLOGY (cont.) Sample grid CONCEPT & METHODOLOGY (cont.) Domains (Reality) red dots – outlier domain Sample grid (detail) x x x x 2nd step Build a continuous search path through sample grid Pick random seed within upper domain Follow progressively adjacent samples as long as they fit into the target histogram x x x x Seed Sample x x

CONCEPT & METHODOLOGY (cont.) Search path stops when no sample in neighbourhood fits into target histogram outside high-grade zone; lower tail of target histogram is filled up Once search is interrupted, repeat search from new random seed Repeat procedure until all samples potentially belonging to the upper domain are investigated

SCRIPT 1 . ESTIMATION OF STATISTICAL MOMENTS Original sample data with small outlier population (b) Input dialog 1 – estimated parameters (a) Input dialog 2 – iteration parameters (b)

Calculate statistical moments for various distribution scenarios (a) Fit distribution and assess goodness-of-fit (b) Record critical parameters (c) (a) (b) (c)

Iterate through all possible combinations in nested loops (d) Rank output values by goodness-of-fit tests and chose best option as final result (e) (d) (e)

SCRIPT 2 SEARCH PATH THROUGH SAMPLE GRID Input dialog 1 – assign columns (a) Input dialog 2 – statistical moments, as established by Script 1 (b) Input dialog 3 – search parameters (c) (b) (c)

Set up target histogram for outlier population (a) Set up rotation matrix for oriented search (b) (a) (b)

3. Select random seed sample from outlier population (c) 4. Select all samples within specified neighbourhood (d) (c) (d)

5. Select sample within specified neighbourhood that fits into target histogram (e) 6. Increase the number of samples of the respective histogram bin (e) (e) 7. Go to selected sample and continue search at new location (e) 8. Continue search as long as criteria of target histogram is satisfied, then chose new seed sample of next cluster and repeat search until whole grid is investigated (f) (f)

Post-processing to clean up results (g) Repeat whole procedure several times to conduct cluster analysis with a variety of different seed samples and different search orientations (as result of single run depends on random sequence of seed samples) (g)

Results are given as probabilities for each sample to belong to the outlier population Select specified number of samples with highest probabilities for final result

FINAL RESULT Reality Result After 10 runs Result After 25 runs Result

How to do this without ?!? No idea.....! Thank You!!!