Dealing With Uncertainty P(X|E) Probability theory The foundation of Statistics Chapter 13.

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

Joint and marginal distribution functions For any two random variables X and Y defined on the same sample space, the joint c.d.f. is For an example, see.
Probability: Review The state of the world is described using random variables Probabilities are defined over events –Sets of world states characterized.
PROBABILITY. Uncertainty  Let action A t = leave for airport t minutes before flight from Logan Airport  Will A t get me there on time ? Problems :
Uncertainty Everyday reasoning and decision making is based on uncertain evidence and inferences. Classical logic only allows conclusions to be strictly.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Review of Basic Probability and Statistics
CPSC 422 Review Of Probability Theory.
Probability.
Probability Review 1 CS479/679 Pattern Recognition Dr. George Bebis.
Uncertainty Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 13.
CS 547: Sensing and Planning in Robotics Gaurav S. Sukhatme Computer Science Robotic Embedded Systems Laboratory University of Southern California
Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.
Bayesian Networks. Graphical Models Bayesian networks Conditional random fields etc.
Uncertainty Management for Intelligent Systems : for SEP502 June 2006 김 진형 KAIST
Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.
Ai in game programming it university of copenhagen Welcome to... the Crash Course Probability Theory Marco Loog.
Probability and Information Copyright, 1996 © Dale Carnegie & Associates, Inc. A brief review (Chapter 13)
Dealing With Uncertainty P(X|E) Probability theory The foundation of Statistics Chapter 13.
Bayes Classification.
Uncertainty Chapter 13.
Probability and Statistics Review Thursday Sep 11.
Hamid R. Rabiee Fall 2009 Stochastic Processes Review of Elementary Probability Lecture I.
Recitation 1 Probability Review
Bayesian Belief Networks. What does it mean for two variables to be independent? Consider a multidimensional distribution p(x). If for two features we.
Dr. Gary Blau, Sean HanMonday, Aug 13, 2007 Statistical Design of Experiments SECTION I Probability Theory Review.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.
1 Chapter 13 Uncertainty. 2 Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
Introduction to Bayesian statistics Yves Moreau. Overview The Cox-Jaynes axioms Bayes’ rule Probabilistic models Maximum likelihood Maximum a posteriori.
Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?
Probability and naïve Bayes Classifier Louis Oliphant cs540 section 2 Fall 2005.
CSE PR 1 Reasoning - Rule-based and Probabilistic Representing relations with predicate logic Limitations of predicate logic Representing relations.
Uncertainty Uncertain Knowledge Probability Review Bayes’ Theorem Summary.
Elementary manipulations of probabilities Set probability of multi-valued r.v. P({x=Odd}) = P(1)+P(3)+P(5) = 1/6+1/6+1/6 = ½ Multi-variant distribution:
Chapter 13 February 19, Acting Under Uncertainty Rational Decision – Depends on the relative importance of the goals and the likelihood of.
Uncertainty. Assumptions Inherent in Deductive Logic-based Systems All the assertions we wish to make and use are universally true. Observations of the.
Probability Course web page: vision.cis.udel.edu/cv March 19, 2003  Lecture 15.
Uncertainty in Expert Systems
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
Uncertainty Chapter 13. Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
Uncertainty Chapter 13. Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
Computer Science CPSC 322 Lecture 27 Conditioning Ch Slide 1.
Uncertainty Let action A t = leave for airport t minutes before flight Will A t get me there on time? Problems: 1.partial observability (road state, other.
Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.
9/14/1999 JHU CS /Jan Hajic 1 Introduction to Natural Language Processing Probability AI-Lab
CSE 473 Uncertainty. © UW CSE AI Faculty 2 Many Techniques Developed Fuzzy Logic Certainty Factors Non-monotonic logic Probability Only one has stood.
Uncertainty Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
CS 2750: Machine Learning Probability Review Prof. Adriana Kovashka University of Pittsburgh February 29, 2016.
Anifuddin Azis UNCERTAINTY. 2 Introduction The world is not a well-defined place. There is uncertainty in the facts we know: What’s the temperature? Imprecise.
Probabilistic Robotics Probability Theory Basics Error Propagation Slides from Autonomous Robots (Siegwart and Nourbaksh), Chapter 5 Probabilistic Robotics.
Matching ® ® ® Global Map Local Map … … … obstacle Where am I on the global map?                                   
Pattern Recognition Probability Review
Review of Probability.
Appendix A: Probability Theory
Where are we in CS 440? Now leaving: sequential, deterministic reasoning Entering: probabilistic reasoning and machine learning.
Uncertainty Chapter 13.
Basic Probability Theory
State Estimation Probability, Bayes Filtering
Uncertainty.
Uncertainty in Environments
Representing Uncertainty
CSE-490DF Robotics Capstone
LECTURE 07: BAYESIAN ESTIMATION
Hankz Hankui Zhuo Bayesian Networks Hankz Hankui Zhuo
Probability overview Event space – set of possible outcomes
Uncertainty Chapter 13.
Mathematical Foundations of BME Reza Shadmehr
Uncertainty Chapter 13.
Presentation transcript:

Dealing With Uncertainty P(X|E) Probability theory The foundation of Statistics Chapter 13

History Games of chance: 300 BC 1565: first formalizations 1654: Fermat & Pascal, conditional probability Reverend Bayes: 1750’s 1950: Kolmogorov: axiomatic approach Objectivists vs subjectivists – (frequentists vs Bayesians) Frequentist build one model Bayesians use all possible models, with priors

Concerns Future: what is the likelihood that a student will earn a phd? Current: what is the likelihood that a person has cancer? What is the most likely diagnosis? Past: what is the likelihood that Marilyn Monroe committed suicide? Combining evidence and non-evidence. Always: Representation & Inference

Basic Idea Attach degrees of belief to proposition. Theorem (de Finetti): Probability theory is the only way to do this. –if someone does it differently you can play a game with him and win his money. Unlike logic, probability theory is non- monotonic. Additional evidence can lower or raise belief in a proposition.

Random Variable Informal: A variable whose values belongs to a known set of values, the domain. Math: non-negative function on a domain (called the sample space) whose sum is 1. Boolean RV: John has a cavity. –cavity domain ={true,false} Discrete RV: Weather Condition –wc domain= {snowy, rainy, cloudy, sunny}. Continuous RV: John’s height –john’s height domain = { positive real number}

Cross-Product RV If X is RV with values x1,..xn and –Y is RV with values y1,..ym, then –Z = X x Y is a RV with n*m values … This will be very useful! This does not mean P(X,Y) = P(X)*P(Y).

Discrete Probability If a discrete RV X has values v1,…vn, then a prob distribution for X is non-negative real valued function p such that: sum p(vi) = 1. Prob(fair coin comes up heads 0,1,..10 in 10 tosses) In math, pretend p is known. Via statistics we try to estimate it. Assigning RV is a modelling/representation problem. Standard probability models are uniform and binomial. Allows data completion and analytic results. Otherwise, resort to empirical.

Continuous Probability RV X has values in R, then a prob distribution for X is a non-negative real- valued function p such that the integral of p over R is 1. (called prob density function) Standard distributions are uniform, normal or gaussian, poisson, beta. May resort to empirical if can’t compute analytically.

Joint Probability: full knowledge If X and Y are discrete RVs, then the prob distribution for X x Y is called the joint prob distribution. Let x be in domain of X, y in domain of Y. If P(X=x,Y=y) = P(X=x)*P(Y=y) for every x and y, then X and Y are independent. Standard Shorthand: P(X,Y)=P(X)*P(Y), which means exactly the statement above.

Marginalization Given the joint probability for X and Y, you can compute everything. Joint probability to individual probabilities. P(X =x) is sum P(X=x and Y=y) over all y – written as sum P(X=x,Y=y). Conditioning is similar: –P(X=x) = sum P(X=x|Y=y)*P(Y=y)

Conditional Probability P(X=x | Y=y) = P(X=x, Y=y)/P(Y=y). Joint yields conditional. Shorthand: P(X|Y) = P(X,Y)/P(Y). Product Rule: P(X,Y) = P(X |Y) * P(Y) Bayes Rules: –P(X|Y) = P(Y|X) *P(X)/P(Y). Remember the abbreviations.

Consequences P(X|Y,Z) = P(Y,Z |X)*P(X)/P(Y,Z). proof: Treat Y&Z as new product RV U P(X|U) =P(U|X)*P(X)/P(U) by bayes P(X1,X2,X3) =P(X3|X1,X2)*P(X1,X2) = P(X3|X1,X2)*P(X2|X1)*P(X1) or P(X1,X2,X3) =P(X1)*P(X2|X1)*P(X3|X1,X2). Note: These equations make no assumptions! Last equation is called the Chain or Product Rule Can pick the any ordering of variables.

Bayes Rule Example Meningitis causes stiff neck (.5). –P(s|m) = 0.5 Prior prob of meningitis = 1/50,000. –p(m)= 1/50,000. Prior prob of stick neck ( 1/20). –p(s) = 1/20. Does patient have meningitis? –p(m|s) = p(s|m)*p(m)/p(s) =

Bayes Rule: multiple symptoms Given symptoms s1,s2,..sn, what estimate probability of Disease D. P(D|s1,s2…sn) = P(D,s1,..sn)/P(s1,s2..sn). If each symptom is boolean, need tables of size 2^n. ex. breast cancer data has 73 features per patient. 2^73 is too big. Approximate!

Idiot or Naïve Bayes Goal: max arg P(D, s1..sn) over all Diseases = max arg P(s1,..sn|D)*P(D)/ P(s1,..sn) = max arg P(s1,..sn|D)*P(D) (why?) ~ max arg P(s1|D)*P(s2|D)…P(sn|D)*P(D). Assumes conditional independence. enough data to estimate Not necessary to get prob right: only order.

Bayes Rules and Markov Models Recall P(X1, X2, …Xn) = P(X1)*P(X2|X1)*…P(Xn| X1,X2,..Xn-1). If X1, X2, etc are values at time points 1, 2.. and if Xn only depends on k previous times, then this is a markov model of order k. MMO: Independent of time –P(X1,…Xn) = P(X1)*P(X2)..*P(Xn)

Markov Models MM1: depends only on previous time –P(X1,…Xn)= P(X1)*P(X2|X1)*…P(Xn|Xn-1). May also be used for approximating probabilities. Much simpler to estimate. MM2: depends on previous 2 times –P(X1,X2,..Xn)= P(X1,X2)*P(X3|X1,X2) etc

Common DNA application Goal: P(gataag) = ? MM0 = P(g)*P(a)*P(t)*P(a)*P(a)*P(g). MM1 = P(g)*P(a|g)*P(t|a)*P(a|a)*P(g|a). MM2 = P(ga)*P(t|ga)*P(a|ta)*P(g|aa). Note: each approximation requires less data and less computation time.