Knowledge Representation and Reasoning University "Politehnica" of Bucharest Department of Computer Science Fall 2010 Adina Magda Florea

Slides:



Advertisements
Similar presentations
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Jan, 29, 2014.
Advertisements

A Tutorial on Learning with Bayesian Networks
BAYESIAN NETWORKS Ivan Bratko Faculty of Computer and Information Sc. University of Ljubljana.
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Bayesian Network and Influence Diagram A Guide to Construction And Analysis.
Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.
1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.
Artificial Intelligence Universitatea Politehnica Bucuresti Adina Magda Florea
BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
PROBABILITY. Uncertainty  Let action A t = leave for airport t minutes before flight from Logan Airport  Will A t get me there on time ? Problems :
Dynamic Bayesian Networks (DBNs)
Introduction of Probabilistic Reasoning and Bayesian Networks
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
From: Probabilistic Methods for Bioinformatics - With an Introduction to Bayesian Networks By: Rich Neapolitan.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 580 Artificial Intelligence Ch.6 [P]: Reasoning Under Uncertainty Section.
Inference in Bayesian Nets
. Bayesian Networks Lecture 9 Edited from Nir Friedman’s slides by Dan Geiger from Nir Friedman’s slides.
Bayesian Networks. Graphical Models Bayesian networks Conditional random fields etc.
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 580 Artificial Intelligence Ch.6 [P]: Reasoning Under Uncertainty Sections.
Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?
5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.
CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley.
Bayesian Networks Alan Ritter.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Quiz 4: Mean: 7.0/8.0 (= 88%) Median: 7.5/8.0 (= 94%)
Made by: Maor Levy, Temple University  Probability expresses uncertainty.  Pervasive in all of Artificial Intelligence  Machine learning 
Bayesian Learning By Porchelvi Vijayakumar. Cognitive Science Current Problem: How do children learn and how do they get it right?
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.
Reasoning Under Uncertainty: Independence and Inference Jim Little Uncertainty 5 Nov 10, 2014 Textbook §6.3.1, 6.5, 6.5.1,
Visibility Graph. Voronoi Diagram Control is easy: stay equidistant away from closest obstacles.
Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
Bayesian Networks for Data Mining David Heckerman Microsoft Research (Data Mining and Knowledge Discovery 1, (1997))
Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9.
Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.
1 Reasoning Under Uncertainty Artificial Intelligence Chapter 9.
Reasoning Under Uncertainty: Conditioning, Bayes Rule & the Chain Rule Jim Little Uncertainty 2 Nov 3, 2014 Textbook §6.1.3.
Made by: Maor Levy, Temple University  Inference in Bayes Nets ◦ What is the probability of getting a strong letter? ◦ We want to compute the.
1 CMSC 671 Fall 2001 Class #21 – Tuesday, November 13.
Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
Computer Science CPSC 322 Lecture 27 Conditioning Ch Slide 1.
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 2, 2015.
Bayesian networks and their application in circuit reliability estimation Erin Taylor.
Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1,
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
Reasoning Under Uncertainty: Independence and Inference CPSC 322 – Uncertainty 5 Textbook §6.3.1 (and for HMMs) March 25, 2011.
Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.
CSE 473 Uncertainty. © UW CSE AI Faculty 2 Many Techniques Developed Fuzzy Logic Certainty Factors Non-monotonic logic Probability Only one has stood.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) Nov, 13, 2013.
Conditional Independence As with absolute independence, the equivalent forms of X and Y being conditionally independent given Z can also be used: P(X|Y,
Probabilistic Reasoning Inference and Relational Bayesian Networks.
Reasoning Under Uncertainty: Belief Networks
Review of Probability.
Inference in Bayesian Networks
Chapter 10: Using Uncertain Knowledge
Today.
Reasoning Under Uncertainty: Conditioning, Bayes Rule & Chain Rule
CS 416 Artificial Intelligence
Instructors: Fei Fang (This Lecture) and Dave Touretzky
CS 188: Artificial Intelligence Fall 2008
Bayesian Statistics and Belief Networks
CS 188: Artificial Intelligence Fall 2007
Class #19 – Tuesday, November 3
CS 188: Artificial Intelligence Fall 2008
Class #16 – Tuesday, October 26
Presentation transcript:

Knowledge Representation and Reasoning University "Politehnica" of Bucharest Department of Computer Science Fall 2010 Adina Magda Florea curs.cs.pub.ro Master of Science in Artificial Intelligence,

2 Lecture 11 Uncertain representation of knowledge Lecture outline  Uncertain knowledge  Belief networks  Bayesian prediction

3 1. Uncertain knowledge  Probability theory – 2 main interpretations  Statistical = measure of proportion of individuals (long range frequency of a set of events)  Prob of a bird flying = proportion of birds that fly out of the set af all birds  Personal, subjective or Bayesian = an agent's measure of belief in some proposition based on the agent's knowledge  Prob of a bird flying = the agent's measure of belief in the flying ability of an individual based on the knowledge that the individual is a bird  Can be viewed as a measure over all the worlds that are possible, given the agent's knowledge about a particular situation (in each possible world, the bird either flies or it does not)

4 Bayesian probability  Both views have the same calculus  We talk about the second view  We assume uncertainty is epistemological - pertaining to the agent's knowledge about the world, rather than ontological – how the world is Semantics of (prior) probability  Interpretations – on possible worlds  Specify not only the truth of formulas but also how likely the real world is as compared to these formulas  Modal logics – possible worlds + accessibility relation  Probabilities – possible worlds + a measure on p.w.

5 Semantics of probability  A possible world is an assignment of exactly one value to every random variable.  Let W be the set of all possible worlds. If w  W and f is a formula, f is true in w (w |= f) is defined inductively on the structure of f: w |= x=v iffw assigns value v to x w |= W  f iff w |=/ f (or w |= ¬f) w |= f  g iff w |= f and w |= g w |= f  g iff w |= f or w |= W g  Associated with each possible world is a measure. When there are only a finite no. of worlds:  0  p(w) for all w  W  w  W p(w) = 1

6 Semantics of probability  The probability of a formula f is the sum of all measures of the possible worlds in which f is true. P(f)=  w |= f p(w) Semantics of conditional probability  A formula e representing the conjunction of all agent's observations of the world is called evidence  The measure of belief in formula h based on formula e is called conditional probability of h given e, P(h|e)  Evidence e will rule out all possible worlds that are incompatible with evidence e

7 Semantics of probability  Evidence e introduces a new measure p e over possible worlds where all worlds in which e is false have measure 0 and the remaining worlds are normalized so that the sum of the measures of the worlds is 1  p e (w) = p(w)/P(e) if w |= f 0 if w |=/ f P(h|e) =  w |=h p e (w) = (  w |= h  e p(w) )/ P(e) = P(h  e)/P(e)  We assume P(e)>0. If P(e) = 0 then e is false in all possible worlds and thus can not be observed  Chain rule P(f 1  … f n )=P(f 1 ) x P(f 2 |f 1 ) x …P(f n |f 1  …  f n-1 )

8 Bayes theorem  Given the current belief in a proposition H based on evidence K, P(H|K), we observe E. P(E|H  K) * P(H|K)  P(H|E  K) = P(E|K)  If the background knowledge K is implicit P(E|H) * P(H)  P(H|E) = P(E)

9 Independence assumptions  Independence. The knowledge of the truth of one proposition does not affect the belief in another  A random variable X is independent of a random variable Y given a random variable Z if for all values of the random variables (i.e., a i, b j, c k ) P(X=a i |Y=b j  Z=c k ) = P(X=a i |Z=c k )  Knowledge of Y's value does not affect the belief in the value of X, given the value of Z.

10 2. Belief networks  A BN (Belief Network or Bayesian Network) is a graphical representation of conditional independence  It is represented a Directed Acyclic Graph (DAG)  The nodes represent random variables.  The edges represent direct dependence among the variables.  X  Y: X has a direct influence on Y (represents a statistical dependence)  X = Parent(Y) if X  Y  X = Ancestor(Y) if there is a direct path from X to Y (X ..  Y)  Z = Descendant(Y) if Z=Y or there is a direct path from Y to Z (Y ..  Z)

11 BN The independence assumption embedded in a BN is:  Each random variable is independent of its nondescendants given its parents Y 1,..Y n – parents of X P(X=a|Y 1 =v 1  …  Y n =v n  R)= P(X=a|Y 1 =v 1  …  Y n =v n ) if R does not involve descendants, including itself  The number of probabilities needed to be specified for each variable is exponential in the number of parents of a variable  BN contains a set of conditional probability tables P(X=a|Y 1 =v 1  …  Y n =v n )

12 BN  Therefore a BN defines a Joint Probability Distribution (JPD) over the variables in the network  A value of the JPD can be computed as: P(X 1 =x 1  … X n =x n ) =  i=1,n P(X i =x i | parents(X i )) where parents(x i ) represent the specific values of Parents(X i )  P(X 1 =x 1  … X n =x n ) = P(x 1,…, x n ) = P(x n | x n-1,…, x 1 ) * P(x n-1,…, x 1 ) = … =  i=1,n P(x i | x i-1,…, x 1 )  Order of variables in the BN P(X i | X i-1,…, X 1 ) = P(X i | Parents(X i )) provided that Parents(X i )  { X i-1,…, X 1 }

13 BN  P(X i | X i-1,…, X 1 ) = P(X i | Parents(X i )) provided that Parents(X i )  { X i-1,…, X 1 }  A BN is a correct representation of the domain, provided that each node is conditionally independent of its predecessors, given its parents

Tampering Fire Alarm Leaving Smoke P(T) P(F) F T P(A) T T 0.5 T F 0.99 F T 0.85 F F A P(L) T 0.88 F F P(S) T 0.9 F 0.01 Instead of computing the joint distribution of all the variables by the chain rule P(T,F,A,S,L,R) = P(T)*P(F|T)*P(S|F,T)*P(A|S,F,T)*P(L|A,S,F,T)*P(R|L,A,S,F,T) the BN defines a unique JPD in a factored form, i.e. P(T,F,A,S,L,R) = P(T) * P(F) * P(A|T,F) * P(S|F) * P(L|A) * P(R|L) Report L P(R) T 0.75 F 0.01

Inferences  The probability of a variable given nondescendants can be computed using the "reasoning by case" rule  P(L|S) = P(L|A,S)*P(A|S) + P(L|~A,S)*(1-P(A|S))= P(L|A)*P(A|S) + P(L|~A)*(1-P(A|S))  P(A|S) = P(A|F,T)*P(F,T|S) + P(A|F,~T)*P(F,~T|S) + P(A|~F,T)*P(~F,T|S) + P(A|~F,~T)*P(~F,~T|S)  The right hand side of each product can be computed using the multiplicative rule P(F,T|S) = P(F|T,S)*P(T|S) = P(F|T,S)*P(T)  For computing P(F|T,S) we can not use the independence assumption because S is a descendant of F; we can use Bayes rule instead P(F|T,S) = (P(S|F,T)*P(F|T)) / P(S|T) = (P(S|F)*P(F)) / P(S|T)

16 Inferences  The prior probabilities (with no evidence) of each variable are: P(Tampering) = 0.02 P(Fire) = 0.1 P(Report) = P(Smoke) =  Observing the Report gives P(Tampering|Report) = P(Fire|Report) = P(Smoke|Report) =  The probability of both Tampering and Fire are increased by the Report  Because Fire is increased, so is the probability of Smoke

17 Inferences  Suppose instead that Smoke was observed P(Tampering|Smoke) = 0.02 P(Fire|Smoke) = P(Report|Smoke) =  Note that the probability of tampering is not affected by observing Smoke, however the probability of Report and Fire are increased  Suppose that both Report and Smoke were observed P(Tampering|Report, Smoke) = P(Fire|Report, Smoke) =  Thus, observing both makes Fire more likely  However, in the context of Report, the presence of Smoke makes Tampering less likely.

18 Inferences  Suppose instead that there is a Report but no Smoke P(Tampering|Report,~Smoke) = P(Fire|Report,~Smoke) =  In the context of Report, Fire becomes much less likely and so the probability of Tampering increases to explain Report.

19 Determining posterior distributions  Problem = computing conditional probabilities given the evidence  Estimating posterior probabilities in a BN within an absolute error (of less than 0.5) is NP-hard  3 main approaches (1) Exploit the structure of the network  Clique tree propagation method – the network is transformed into a tree with nodes labeled with sets of variables. Reasoning is performed by passing messages between the nodes in the tree  Time complexity is linear in the number of nodes of the tree; the tree is in fact a polytree, so its size may be exponential in the size of the belief network

20 Determining posterior distributions (2) Search-based approaches  Enumerate all possible worlds and estimate posterior probabilities from the worlds in general (3) Stochastic simulation  Random cases are generated according to a probability distribution. By treating these cases as a set of samples, one can estimate the marginal distribution on any combination of variables

21 A structure approach method  Based on the notion that a BN specifies a factorization of the JPD  A factor is a representation of a function from a tuple of random variables into a number.  f(X 1,..,X n ), X 1,..,X n are the variables of the factor, f is a factor on X 1,..,X n ;  if f(X 1,..,X n ) is a factor and each v i is an element of the domain of X i  f(X 1 =v 1,..,X j =v j ) is a number that is the value of f when each X i has value v j

22 A structure approach method  The product of two factors f 1 and f 2 is a factor on the union of the variables (f 1 x f 2 )(X 1, …,X i,Y 1, …,Y j,Z 1, …,Z k ) = f 1 (X 1, …,X i,Y 1, …,Y j ) x f 2 (Y 1, …,Y j,Z 1, …,Z k )  Given a factor f(X 1, …,X i ), one can sum out a variable, say X1, and the result is a factor on X 2, …,X i (  X1 f)(X 2,…,X i ) = f(X 1 =v 1,…,X i )+…+ f(X 1 =v k,…,X i )  A conditional probability distribution can be seen as f(X=u,Y 1 =v 1 …Y j =v j ) = P(X=u|Y 1 =v 1 ….Y j =v j )

23 A structure approach method  BN inference problem = computing the posterior distribution of a variable given some evidence  can be reduced to the problem of computing the probabilities of conjunctions  Given the evidence Y 1 =v 1 … Y j =v j and the query variable Z: P(Z|v 1,…. v j ) = P(Z,v 1,…v j ) / P(v 1,..v j ) = P(Z,v 1,…v j ) /  z P(z,v 1,..v j )  => compute the factor P(Z,v 1,…v j ) and normalize

24 A structure approach method  The variables of the BN are X 1,…,X n.  To compute the factor P(Z,v 1,…v j ) we must sum out the other variables from the JPD.  Be Z 1,…Z k an enumeration of the other variables in the BN  Z 1,…Z k = {X 1,…,X n } - {Z} - {Y 1,…,Y j }  The factor can be computing by summing out on Z i.  The order of the Z i is an elimination order  P(Z,Y 1 =v 1,…Y j =v j ) =  Zk ….  Z1 P(X 1,…X n ) Y1=v1,…,Yj=vj

25 A structure approach method  P(Z,Y 1 =v 1,…Y j =v j ) =  Zk ….  Z1 P(X 1,…X n ) Y1=v1,…,Yj=vj  There is a possible world for each assignment of a value to each variable.  The JPD P(X 1,…X n ) gives the probability (measure) for each possible world  The approach selects the worlds with the observed values for the Y's and sum over possible worlds with the same value for Z => in fact this is the definition of conditional probability

26 A structure approach method  By the rule for conjunction of probabilities and the definition of a BN: P(X 1,…X n )=P(X 1 |Parents(X 1 )) * …*P(X n |Parents(X n ))  Now the BN inference problem is reduced to a problem of summing out a set of variables from a product of factors.  To compute the posterior distribution of a query variable given observations: Construct the JPD in terms of a product of factors Set the observed variables to their observed values Sum out each of the other variables (the Z 1 …Z k ) Multiply the remaining factors and normalize

27 A structure approach method  To sum out a variable Z from a product f 1 …f k of factors:  We must first partition the factors into those that do not contain Z, say f 1,..,f i, and those that contain Z, say f i+1 …f k  Then  Z f 1 x …x f k = f 1 x.. x f i x (  Z f i+1 x … x f k )  Then explicitly construct a representation (in terms of a multidimensional array, a tree, or a set of rules) of the rightmost factor  The factor size is exponential in the number of variables of the factor

28 3. Bayesian prediction 5 bags of candies Candies h1: 100% cherry h2: 75% cherry25% lime h3: 50% cherry50% lime h4: 25% cherry75% lime h5: 100% lime H (set of hypothesis) – type of bag with values h1.. h5 Collect evidence (random variables): d1, d2, … with possible values cherry or lime Goal: predict the flavour of the next candy

29 Bayesian prediction Be D the data with observed value d The probability of each hypothesis, based on Bayes' rule, is: P(h i |d) =  P(d|h i ) P(h i )(1) The prediction on an unknown hypothesis X is P(X|d) = Σ i P(X|h i ) P(h i |d)(2)  Key elements: prior probabilities P(h i ) and the probability of an evidence for each hypothesis P(d|h i ) P(d|h i ) = Π j P(d j |h i )(3) We assume the prior probability: h1 h2 h3 h4 h

h1 h2 h3 h4 h h1: 100% cherry h2: 75% cherry25% lime h3: 50% cherry50% lime h4: 25% cherry75% lime h5: 100% lime P(lime) = 0.1* * * * *1 = 0.5  = 1/0.5 = 2 P(h1|lime) =  P(lime|h1)P(h1) = 2*0.1*0 = 0 P(h2|lime) =  P(lime|h2)P(h2) = 2 * (0.25*0.2) = 0.1 P(h3|lime) =  P(lime|h3)P(h3) = 2 * (0.5*0.4) = 0.4 P(h4|lime) =  P(lime|h4)P(h4) = 2 * (0.75*0.2) = 0.3 P(h5|lime) =  P(lime|h5)P(h5) = 2 * (1*0.1) = P(h i |d) =  P(d|h i ) P(h i ) (1)

h1 h2 h3 h4 h h1: 100% cherry h2: 75% cherry25% lime h3: 50% cherry50% lime h4: 25% cherry75% lime h5: 100% lime P(lime,lime) = 0.1* *0.25* *0.5* *0.75* *1*1 =  = 1/0.325 = P(h1|lime,lime) =  P(lime,lime|h1)P(h1) = 3* 0.1*0*0 =0 P(h2|lime,lime) =  P(lime,lime|h2)P(h2) = 3 * (0.25*.25*0.2) = P(h3|lime,lime) =  P(lime,lime|h3)P(h3) = 3 * (0.5*0.5*0.4) = 0.3 P(h4|lime,lime) =  P(lime,lime|h4)P(h4) = 3 * (0.75*0.75*0.2) = P(h5|lime,lime) =  P(lime,lime|h5)P(h5) = 3 * (1*1*0.1) = P(hi|d) =  P(d|hi) P(hi) (1) P(d|h i ) = Π j P(d j |h i ) (3)

32  P(h i |d 1,…,d 10 ) from equation (1)

h1 h2 h3 h4 h h1: 100% cherry h2: 75% cherry25% lime h3: 50% cherry50% lime h4: 25% cherry75% lime h5: 100% lime P(d 2 =lime|d 1 )=P(d 2 |h1)*P(h1|d 1 ) + P(d 2 |h2)*P(h2|d 1 ) + P(d 2 |h3)*P(h3|d 1 ) + P(d 2 |h4)*P(h4|d 1 ) + P(d 2 |h5)*P(h5|d 1 ) = = 0* * * *0.3+1*0.2 = P(X|d) = Σ i P(X|h i ) P(h i |d) (2) Bayesian prediction

34 Remarks  The true hypothesis will finally dominate the prediction  Problems if the hypothesis space is big  Aproximation  Prediction based on the most probable hypothesis  MAP Learning – maximum aposteriori  P(X|d)=~P(X|h MAP )  In the xemaple h MAP =h5 after 3 evidences so 1.0  As more data is collected MAP and Bayes tend to be closer