Analysis of uncertain data: Evaluation of Given Hypotheses Selection of probes for information gathering Anatole Gershman, Eugene Fink, Bin Fu, and Jaime.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Genetic Statistics Lectures (5) Multiple testing correction and population structure correction.
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Review of Probability. Definitions (1) Quiz 1.Let’s say I have a random variable X for a coin, with event space {H, T}. If the probability P(X=H) is.
Probability: Review The state of the world is described using random variables Probabilities are defined over events –Sets of world states characterized.
Bayes Rule The product rule gives us two ways to factor a joint probability: Therefore, Why is this useful? –Can get diagnostic probability P(Cavity |
Introduction to Probabilities Farrokh Alemi, Ph.D. Saturday, February 21, 2004.
What is Statistical Modeling
Lecture 5: Learning models using EM
A/Prof Geraint Lewis A/Prof Peter Tuthill
PMAI 2003/04 Targil 1: Basic Probability (see further details in handouts)
Introduction to Bayesian Learning Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán)
Evaluating Hypotheses
1 Bayesian Reasoning Chapter 13 CMSC 471 Adapted from slides by Tim Finin and Marie desJardins.
PMAI 2001/02 Targil 1: Basic Probability and Information theory
Lecture 05 Rule-based Uncertain Reasoning
Introduction to Bayesian Learning Ata Kaban School of Computer Science University of Birmingham.
Analysis of uncertain data: Selection of probes for information gathering Eugene Fink May 27, 2009.
Computer vision: models, learning and inference
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Does Naïve Bayes always work?
Crash Course on Machine Learning
: Appendix A: Mathematical Foundations 1 Montri Karnjanadecha ac.th/~montri Principles of.
Machine Learning Queens College Lecture 3: Probability and Statistics.
Chapter 12 Review of Calculus and Probability
1 A Presentation of ‘Bayesian Models for Gene Expression With DNA Microarray Data’ by Ibrahim, Chen, and Gray Presentation By Lara DePadilla.
Model Inference and Averaging
Bayes for Beginners Presenters: Shuman ji & Nick Todd.
METHODSDUMMIES BAYES FOR BEGINNERS. Any given Monday at pm “I’m sure this makes sense, but you lost me about here…”
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Naive Bayes Classifier
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
Brett Favre was born on October 10, 1969.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
The Quarterbacks of the NFL ( The Most Popular Quarterbacks from the past and present )
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 3. Bayes Decision Theory: Part II. Prof. A.L. Yuille Stat 231. Fall 2004.
Computing & Information Sciences Kansas State University Wednesday, 22 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 22 of 42 Wednesday, 22 October.
Uncertainty Management in Rule-based Expert Systems
Random Variables A random variable is a variable whose value is determined by the outcome of a random experiment. Example: In a single die toss experiment,
PAINT RAPID : Representation and Analysis of Probabilistic Intelligence Data Carnegie Mellon University DYNAM i X Technologies PI: Jaime Carbonell Eugene.
Probability Course web page: vision.cis.udel.edu/cv March 19, 2003  Lecture 15.
Chapter 1 Scientific Method. Observing is when you notice and describe events or processes in a careful, orderly way. (My cell phone won't work...what's.
Chapter 6 Bayesian Learning
Artificial Intelligence CIS 342 The College of Saint Rose David Goldschmidt, Ph.D.
Information Theory for Mobile Ad-Hoc Networks (ITMANET): The FLoWS Project Competitive Scheduling in Wireless Networks with Correlated Channel State Ozan.
STA 2023 Module 5 Discrete Random Variables. Rev.F082 Learning Objectives Upon completing this module, you should be able to: 1.Determine the probability.
Bayes Theorem. Prior Probabilities On way to party, you ask “Has Karl already had too many beers?” Your prior probabilities are 20% yes, 80% no.
Multi-target Detection in Sensor Networks Xiaoling Wang ECE691, Fall 2003.
By Michael Cordova, Nicholas King, Shawn Shannon, and Nathan Teeter.
Computer vision: models, learning and inference Chapter 2 Introduction to probability.
The Uniform Prior and the Laplace Correction Supplemental Material not on exam.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Bayesian Learning Bayes Theorem MAP, ML hypotheses MAP learners
Analysis of Uncertain Data: Tools for Representation and Processing Bin Fu Eugene Fink Jaime G. Carbonell.
Chapter 8: Introduction to Hypothesis Testing. Hypothesis Testing A hypothesis test is a statistical method that uses sample data to evaluate a hypothesis.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
Updating Probabilities Ariel Caticha and Adom Giffin Department of Physics University at Albany - SUNY MaxEnt 2006.
Bayesian Learning Reading: Tom Mitchell, “Generative and discriminative classifiers: Naive Bayes and logistic regression”, Sections 1-2. (Linked from.
Review of Probability.
Does Naïve Bayes always work?
Computer vision: models, learning and inference
Data Mining Lecture 11.
Chapter 1 Scientific Method.
Statistical NLP: Lecture 4
Class #21 – Monday, November 10
28th September 2005 Dr Bogdan L. Vrusias
CS639: Data Management for Data Science
Analysis of uncertain data: Evaluation of Given Hypotheses Selection of probes for information gathering Anatole Gershman, Eugene Fink, Bin Fu, and Jaime.
Presentation transcript:

Analysis of uncertain data: Evaluation of Given Hypotheses Selection of probes for information gathering Anatole Gershman, Eugene Fink, Bin Fu, and Jaime G. Carbonell

Analysis of uncertain data: Evaluation of Given Hypotheses Selection of probes for information gathering Anatole Gershman, Eugene Fink, Bin Fu, and Jaime G. Carbonell

Example The analyst has to distinguish between two hypotheses: Retires Joins Vikings

Example Observations: According to many rumors, quarterback Brett Favre has closed on the purchase of a home in Eden Prairie, MN, where the Minnesota Vikings' team facility is located. Without the tearful public ceremony that accompanied his retirement announcement from the Green Bay Packers just 11 months ago, quarterback Brett Favre has told the New York Jets he is retiring. Minnesota coach Brad Childress, jilted at the altar Tuesday afternoon by Brett Farve telling him he wasn’t going to play for the Vikings in 2009.

Example Observation distributions: Without the tearful public ceremony that accompanied his retirement announcement from the Green Bay Packers just 11 months ago, quarterback Brett Favre has told the New York Jets he is retiring. P(says retire | Retires) = 0.9 P(says retire | Joins Vikings) = 0.6 Bayesian induction: P (Retire|says retire) = P (Retire) ∙ P(says retire|Retire) / (P (Retire) ∙ P(says retire|Retire) + P (Joins Vikings) ∙ P(Joins Vikings|Retire)) = 0.5

General problem We have to distinguish among n mutually exclusive hypotheses, denoted H 1, H 2,…, H n. 0.6 For every hypothesis, we know its prior; thus, we have an array of n of priors, P(H 1 ), P(H 2 ), P(H n ) 0.4

General problem We base the analysis on m observable features, denoted OBS 1, OBS 2, …, OBS m. Each observation is a variable that takes one of several discrete values. OBS 1 For every observation, OBS a, we know the number of its possible values, num[a]. Thus, we have num[1..m] with the number of values for each observation. num[1] = 2 I will RETIRE! I won’t RETIRE! For every hypothesis, we know the related probability distribution of each observation. P(o a,j | H i ) represents the probabilities of possible values of OBS a We know a specific value of each observation val [1..m].

General problem We have to evaluate the posterior probabilities of the n given hypotheses, denoted Post(H 1 ), Post(H 2 ), Post(H n ) 0.5

Extension #1 Prior: Something else H0 (“surprise”) 0.05

Extension #1 After discovering val, Posterior probability of H 0 : Post(H 0 ) = P(H 0 ) ∙ P(val | H 0 ) / P(val) = P(H 0 ) ∙ P(val | H 0 ) / ( P(H 0 ) ∙ P(val | H 0 ) + likelihood(val) ). Bad news: We do not know P(val | H 0 ). Good news: Post(H 0 ) monotonically depends on P(val | H 0 ) ; thus, if we obtain lower and upper bounds for P(val | H 0 ), we also get bounds for Post(H 0 ).

Plausibility principle Unlikely events normally do not happen; thus, if we have observed val, then its likelihood must not be too small. Plausibility threshold: We use a global constant plaus, which must be between 0.0 and 1.0. If we have observed val, we assume that P( val ) ≥ plaus / num. We use it to obtains bounds for P(val | H 0 ), : Lower: (plaus / num − likelihood(val)) / prior[0]. Upper: 1.0.

Plausibility principle We use it to obtains bounds for P(val | H 0 ) : Lower: (plaus / num − likelihood(val)) / P(H 0 ). Upper: 1.0. We substitute these bounds into the dependency of Post(H 0 ) on P(val | H 0 ),, thus obtaining the bounds for Post(H 0 ) : Lower: 1.0 − likelihood(val) ∙ num / plaus. Upper: P(H 0 ) / (P(H 0 ) + likelihood(val)). We have derived bounds for the probability that none of the given hypotheses is correct.

Extension #2 Multiple observations: Which one(s) to Use? Independence assumption: usually does not work. Bayesian analysis: Use their joint distribution? Difficult to get. We identify the highest-utility observation and do not use other observations to corroborate it.

Extension #2 Utility Function Which one is “better”?

Extension #2 Utility Function Shannon’s Entropy (negation)

Extension #2 Utility Function KL-divergence

Extension #2 Utility Function Self-defined function

Analysis of uncertain data: Evaluation of Given Hypotheses Selection of probes for information gathering Anatole Gershman, Eugene Fink, Bin Fu, and Jaime G. Carbonell

Example The analyst has to distinguish between two hypotheses: Retires Joins Vikings

Example I will RETIRE! Ask Probe: Execute external action and observe its response, to gather more information. 0.5

Example Probe: Probe Cost Observation Probability Gain (utility function)

Probe Selection single-obs-gain(probe j ) = visible[i, a, j] · (likelihood(1) · probe-gain(1) + … + likelihood(num[a]) · probe-gain(num[a])) + (1.0 − visible[i, a, j]) · cost[j] gain(prob j ) = max (single-obs-gain(prob j, obs1),…, single-obs-gain(prob j, obsm)) Probe Cost Observation Probability Utility Function

Experiment Task: Evaluating hypothesizes (H1, H2, H3, H4). No Probe, Accuracy of distinguishing between H1 and other hypotheses

Experiment Probe Selection to distinguish H1 and other hypotheses Task: Evaluating hypothesizes (H1, H2, H3, H4).

Experiment Probe Selection to distinguish four hypotheses Task: Evaluating hypothesizes (H1, H2, H3, H4).

Summary Use Bayesian inference to distinguish among mutually exclusive hypotheses.  H 0 hypothesis  Multiple observations Use Probe to gather more information for better analysis  Cost, Utility function, Observation Probability,...

Thank you