Indian Institute of Technology Bombay Bayesian Probabilistic or Weights of Evidence Model for Mineral Prospectivity Mapping.

Slides:



Advertisements
Similar presentations
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Advertisements

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Probabilistic models Haixu Tang School of Informatics.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)
LECTURE 11: BAYESIAN PARAMETER ESTIMATION
Intro to Bayesian Learning Exercise Solutions Ata Kaban The University of Birmingham 2005.
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
Assuming normally distributed data! Naïve Bayes Classifier.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Confidence Estimation for Machine Translation J. Blatz et.al, Coling 04 SSLI MTRG 11/17/2004 Takahiro Shinozaki.
Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 11: Probabilistic Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Today Linear Regression Logistic Regression Bayesians v. Frequentists
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Jong Gyu Han, Keun Ho Ryu, Kwang Hoon Chi and Yeon Kwang Yeon
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Bayesian Inference Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical.
Naïve Bayes Classification Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 14, 2014.
Computer vision: models, learning and inference
Rutgers CS440, Fall 2003 Introduction to Statistical Learning Reading: Ch. 20, Sec. 1-4, AIMA 2 nd Ed.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Prima M. Hilman Head of Information Division – Centre for Geological Resources REPUBLIC OF INDONESIA MINISTRY OF ENERGY AND MINERAL RESOURCES GEOLOGICAL.
Indian Institute of Technology Bombay GIS-based mineral resource potential mapping - Modelling approaches  Exploration datasets with homogenous coverage.
2. Probabilistic Mineral Resource Potential Mapping The processing of geo-scientific information for the purpose of estimating probabilities of occurrence.
Probabilistic Models in IR Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata Using majority of the slides from.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Naive Bayes Classifier
“PREDICTIVE MODELING” CoSBBI, July Jennifer Hu.
Logistic Regression Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata September 1, 2014.
LOGISTIC REGRESSION David Kauchak CS451 – Fall 2013.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
MINERAL PROSPECTIVITY MAPPING Alok Porwal
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
Model-based Spatial Data integration. MODELS OUTPUT MAP = ∫ (Two or More Maps) The integrating function is estimated using either: – Theoretical understanding.
Therapeutic Equivalence & Active Control Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute.
1 A Bayesian statistical method for particle identification in shower counters IX International Workshop on Advanced Computing and Analysis Techniques.
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Lecture notes 9 Bayesian Belief Networks.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Lecture: Forensic Evidence and Probability Characteristics of evidence Class characteristics Individual characteristics  features that place the item.
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
Indian Institute of Technology Bombay Bayesian Probabilistic or Weights of Evidence Model for Mineral Prospectivity Mapping.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Spatial Smoothing and Multiple Comparisons Correction for Dummies Alexa Morcom, Matthew Brett Acknowledgements.
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
Probability. Probability Probability is fundamental to scientific inference Probability is fundamental to scientific inference Deterministic vs. Probabilistic.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture Probabilistic Information Retrieval.
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
Intro to Bayesian Learning Exercise Solutions Ata Kaban The University of Birmingham.
Bayesian Learning Reading: Tom Mitchell, “Generative and discriminative classifiers: Naive Bayes and logistic regression”, Sections 1-2. (Linked from.
Lecture 1.31 Criteria for optimal reception of radio signals.
Bayesian and Markov Test
Bayes for Beginners Stephanie Azzopardi & Hrvoje Stojic
Lecture 15: Text Classification & Naive Bayes
Revision (Part II) Ke Chen
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Statistical NLP: Lecture 4
Revision (Part II) Ke Chen
Lecture: Forensic Evidence and Probability Characteristics of evidence
The Naïve Bayes (NB) Classifier
Bayes for Beginners Luca Chech and Jolanda Malamud
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
Naïve Bayes Classifier
Presentation transcript:

Indian Institute of Technology Bombay Bayesian Probabilistic or Weights of Evidence Model for Mineral Prospectivity Mapping

Indian Institute of Technology Bombay 2 Probabilistic model (Weights of Evidence) What is needed for the WofE calculations? training point –A training point layer – i.e. known mineral deposits; predictor maps raster –One or more predictor maps in raster format.

Indian Institute of Technology Bombay PROBABILISTIC MODELS (Weights of Evidence or WofE) Four steps: 1.Convert multiclass maps to binary maps 2.Calculation of prior probability 3.Calculate weights of evidence (conditional probability) for each predictor map 4. Combine weights

Indian Institute of Technology Bombay The probability of the occurrence of the targeted mineral deposit type when no other geological information about the area is available or considered. Study area (S) Target deposits D Assuming- 1.Unit cell size = 1 sq km 2.Each deposit occupies 1 unit cell Total study area = Area (S) = 10 km x 10 km = 100 sq km = 100 unit cells Area where deposits are present = Area (D) = 10 unit cells Prior Probability of occurrence of deposits = P {D} = Area(D)/Area(S)= 10/100 = 0.1 Prior odds of occurrence of deposits = P{D}/(1-P{D}) = 0.1/0.9 = k 1k Calculation of Prior Probability

Indian Institute of Technology Bombay 5 Convert multiclass maps into binary maps Define a threshold value, use the threshold for reclassification Multiclass map Binary map

Indian Institute of Technology Bombay How do we define the threshold? Use the distance at which there is maximum spatial association as the threshold ! Convert multiclass maps into binary maps

Indian Institute of Technology Bombay Spatial association – spatial correlation of deposit locations with geological feature. A B C D A B C D 10km 1km Gold Deposit (D) Study area (S) Convert multiclass maps into binary maps

Indian Institute of Technology Bombay A B C D Which polygon has the highest spatial association with D? More importantly, does any polygon has a positive spatial association with D ??? What is the expected distribution of deposits in each polygon, assuming that they were randomly distributed? What is the observed distribution of deposits in each polygon? Positive spatial association – more deposits in a polygon than you would expect if the deposits were randomly distributed. If observed >> expected; positive association If observed = expected; no association If observed << expected; negative association Convert multiclass maps into binary maps

Indian Institute of Technology Bombay A B C D Area (A) = n(A) = 25; n(D|A) = 2 Area (B) = n(A) = 21; n(D|B) = 2 Area(C) = n(C) = 7; n(D|C) = 2 Area(D) = n(D) = 47; n(D|D) = 4 Area (S) = n(S) = 100; n(D) = 10 OBSERVED DISTRIBUTION Convert multiclass maps into binary maps

Indian Institute of Technology Bombay A B C D Area (A) = n(A) = 25; n(D|A) = 2.5 Area (B) = n(A) = 21; n(D|B) = 2.1 Area(C) = n(C) = 7; n(D|C) = 0.7 Area(D) = n(D) = 47; n(D|D) = 4.7 (Area (S) = n(S) = 100; n(D) = 10) EXPECTED DISTRIBUTION Expected number of deposits in A = (Area (A)/Area(S))*Total number of deposits Convert multiclass maps into binary maps

Indian Institute of Technology Bombay A B C D Area (A) = n(A) = 25; n(D|A) = 2.5 Area (B) = n(A) = 21; n(D|B) = 2.1 Area(C) = n(C) = 7; n(D|C) = 0.7 Area(D) = n(D) = 47; n(D|D) = 4.7 (Area (S) = n(S) = 100; n(D) = 10) EXPECTED DISTRIBUTION Area (A) = n(A) = 25; n(D|A) = 2 Area (B) = n(A) = 21; n(D|B) = 2 Area(C) = n(C) = 7; n(D|C) = 2 Area(D) = n(D) = 47; n(D|D) = 4 Area (S) = n(S) = 100; n(D) = 10 OBSERVED DISTRIBUTION Only C has positive association! So, A, B and D are classified as 0; C is classified as 1. Another way of calculating the spatial association : = Observed proportion of deposits/ Expected proportion of deposits = Proportion of deposits in the polygon/Proportion of the area of the polygon = [n(D|A)/n(D)]/[n(A)/n(S)] Positive if this ratio >1 Nil if this ratio = 1 Negative if this ratio is < 1 Convert multiclass maps into binary maps

Indian Institute of Technology Bombay L A B C D 10km 1km Gold Deposit (D) Study area (S) Convert multiclass maps into binary maps – Line features

Indian Institute of Technology Bombay 1km Gold Deposit (D) Distance from the fault No. of pixels No of deposits Ratio (Observed to Expected) Convert multiclass maps into binary maps – Line features

Indian Institute of Technology Bombay Calculate observed vs expected distribution of deposits for cumulative distances Gold Deposit (D) Distance from the fault No. of pixels Cumul No. of pixels No of deposits Cumul No. of deposits Ratio (Observed to Expected) =< 1 – positive association (Reclassified into 1) >1 – negative association (Reclassified into 0) Convert multiclass maps into binary maps – Line features

Indian Institute of Technology Bombay Weights of evidence ~ quantified spatial associations of deposits with geological features Study area (S) 10k Target deposits 10k Unit cell 1k Objective: To estimate the probability of occurrence of D in each unit cell of the study area Approach: Use BAYES’ THEOREM for updating the prior probability of the occurrence of mineral deposit to posterior probability based on the conditional probabilities (or weights of evidence) of the geological features. Calculation of Weights of Evidence Geological Feature (B1) Geological Feature (B2)

Indian Institute of Technology Bombay P{D|B} = P{D& B} P{B} = P{D} P{B|D} P{B} P{D|B} = P{D & B} P{B} = P{D} P{B|D} P{B} Posterior probability of D given the presence of B Posterior probability of D given the absence of B Bayes’ theorem: D- Deposit B- Geol. Feature THE BAYES EQUATION ESTIMATES THE PROBABILTY OF A DEPOSIT GIVEN THE GEOLOGICAL FEATURE FROM THE PROBABILITY OF THE FEATURE GIVEN THE DEPOSITS Observation Inference Calculation of Weights of Evidence

Indian Institute of Technology Bombay It has been observed that on an average 100 gold deposits occur in 10,000 sq km area of specific geological areas. In such areas, 80% of deposits occur in Ultramafic (UM) rocks, however, 9.6% of barren areas also occur in Ultramafic rocks. You are exploring a 1 sq km area of an Archaean geological province with Ultramafic rocks (UM). What is the probability that the area will contain a gold deposit? Assume that a gold deposit occupies 1 sq km area. EXERCISE P(D|UM) = P(D) x [P(UM|D) / P(UM)] P(D) = n(D)/n(S) P(UM|D) = n(UM & D)/n(D) P(UM) = n(UM)/n(S) P(D|UM) = (100/10000) * [(80/100)/(1030.4/10000)] = n(S) = n(D) = n(UM&D) = n(UM) = 10, ? 80% of % (10, ) =

Indian Institute of Technology Bombay Using odds (P/(1-P)) formulation: O{D|B} = O{D} P{B|D} Odds of D given the presence of B O{D|B} = O{D} P{B|D} Odds of D given the absence of B Taking logs on both sides: Log e (O{D|B}) = Log e (O{D}) + Log of odds of D given the presence of B P{B|D} log e Log e (O{D|B}) = Log e (O{D}) + Log of odds of D given the absence of B P{B|D} log e +ive weight of evidence (W+) -ive weight of evidence (W- ) Calculation of Weights of Evidence

Indian Institute of Technology Bombay Contrast (C) measures the net strength of spatial association between the geological feature and mineral deposits Contrast = W+ – W- + ive Contrast – net positive spatial association -ive Contrast – net negative spatial association zero Contrast – no spatial association Can be used to test spatial associations Calculation of contrast

Indian Institute of Technology Bombay Total number of cells in study area:n(S) Total number of cells occupied by deposits (D):n(D) Total number of cells occupied by the feature (B):n(B) Total number of cells occupied by both feature and deposit:n(B&D) = n( )/n(D) = n( )/ = n( )/n(D) = n( )/ B & D P{B|D} B & D P{B|D} B & D n(D) B1 D B2 B1 D B1 S D Calculation of Probabilty P(D) = n(D)/n(S)

Indian Institute of Technology Bombay Basic quantities for estimating weights of evidence Total number of cells in study area:n(S) Total number of cells occupied by deposits (D):n(D) Total number of cells occupied by the feature (B):n(B) Total number of cells occupied by both feature and deposit:n(B&D) Derivative quantities for estimating weights of evidence Total number of cells not occupied by D: n( ) = n(S) – n(D) Total number of cells not occupied by B: n( ) = n(S) – n(B) Total number of cells occupied by B but not D:n( B & D) = n(B) – n( B & D) Total number of cells occupied by D but not B: n(B & D) = n(D) – n(B & D) Total number of cells occupied by neither D but nor B: n( B & D) = n(S) – n(B) – n(D) + n( B & D) D B Probabilities are estimated as area (or no. of unit cells) proportions P{B|D} log e W+ = P{B|D} log e W- = = n( )/n(D) = n( )/ = n( )/n(D) = n( )/ B & D P{B|D} B & D P{B|D} B & D n(D) Where, B1 D B2 B2 D B1 D B1 S D B2 S D Calculation of Weights of Evidence

Indian Institute of Technology Bombay Exercise B2 D B1 D 10k B1 B2 S B1 S D B2 S D Unit cell size = 1 sq km & each deposit occupies 1 unit cell n(S) = 100 n(D) = 10 n(B1) = 16 n(B2) = 25 n(B1 & D) = 4 n(B2 & D) = 3 Calculate the weights of evidence (W+ and W-) and Contrast values for B1 and B2 = n( )/n(D) = n( )/ = [n(B) – n( )]/[n(S) –n(D)] = n( )/n(D) = [n(D) – n( )]/n(D) = n( )/ = [n(S) – n(B) – n(D) + n( )]/[n(S) –n(D)] B & D P{B|D} B & D P{B|D} B & D n(D) Where, B & D P{B|D} log e W+ = P{B|D} log e W- =

Indian Institute of Technology Bombay Log e (O{D|B}) = Log e (O{D}) + W+ B Log e (O{D|B}) = Log e (O{D}) + W- B Assuming conditional independence of the geological features B1 and B2, the posterior probability of D given B1 and B2 can be estimated using: Log e (O{D|B1, B2}) = Log e (O{D}) + W+ B1 + W+ B2 Log e (O{D|B1, B2}) = Log e (O{D}) + W- B1 + W+ B2 Log e (O{D|B1, B2}) = Log e (O{D}) + W+ B1 + W- B2 Log e (O{D|B1, B2}) = Log e (O{D}) + W- B1 + W- B2 Probability of D given the presence of B1 and B2 Probability of D given the absence of B1 and presence B2 Probability of D given the presence of B1 and absence B2 Probability of D given the absence of B1 and B2 Log e (O{D|B1, B2, … Bn}) = Log e (O{D}) + ∑W+/- Bi i=1 n Or in general, for n geological features, The sign of W is +ive or -ive, depending on whether the feature is absent or present The odds are converted back to posterior probability using the relation 0 = P/(1+P) Combining Weights of Evidence: Posterior Probability Feature B2 Feature B1 Deposit D

Indian Institute of Technology Bombay Log e (O{D|B1, B2}) = Log e (O{D}) + ∑W+/- Bi i=1 n Calculation of posterior probability (or odds) require: Calculation of pr prob (or odds) of occurrence of deposits in the study area Calculation of weights of evidence of all geological features, i.e, P{B|D} log e P{B|D} log e W+ = W- = & Combining Weights of Evidence: Posterior Probability

Indian Institute of Technology Bombay Log e (O{D|B1, B2}) = Log e (O{D}) + W+/- B1 + W+/- B2 Log e (O{D}) = Log e (0.11) = Calculate posterior probability given: 1. Presence of B1 and B2; 2. Presence of B1 and absence of B2; 3. Absence of B1 and presence of B2; 4. Absence of both B1 and B2 B1 B2 S Prior Prb = 0.10 Prior Odds = 0.11 Combining Weights of Evidence: Posterior Probability

Indian Institute of Technology Bombay Log e (O{D|B1, B2}) = Log e (O{D}) + W+/- B1 + W+/- B2 Log e (O{D|B1, B2}) = = O{D|B1, B2} = Antilog e ( ) = P = O/(1+O) = (0.4238)/(1.4238) = For the areas where both B1 and B2 are present Log e (O{D|B1, B2}) = = O{D|B1, B2} = Antilog e ( ) = P = O/(1+O) = (0.3058)/(1.3058) = For the areas where B1 is present but B2 is absent Log e (O{D|B1, B2}) = = O{D|B1, B2} = Antilog e ( ) = P = O/(1+O) = (0.0934)/(1.0934) = Log e (O{D|B1, B2}) = = O{D|B1, B2} = Antilog e ( ) = P = O/(1+O) = (0.0705)/(1.0705) = For the areas where both B1 and B2 are absent For the areas where B1 is absent but B2 is present Log e (O{D}) = Log e (0.11) = Posterior probability Prospectivity Map B1 B2 S Prior Prb = 0.10 Combining Weights of Evidence: Posterior Probability