Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bogdan Moldovan, Ingo Thon, Jesse Davis, and Luc de Raedt

Similar presentations


Presentation on theme: "Bogdan Moldovan, Ingo Thon, Jesse Davis, and Luc de Raedt"— Presentation transcript:

1 Bogdan Moldovan, Ingo Thon, Jesse Davis, and Luc de Raedt
MCMC Estimation of Conditional Probabilities in Probabilistic Programing Languages Bogdan Moldovan, Ingo Thon, Jesse Davis, and Luc de Raedt Department of Computer Science, Katholieke Universiteit Leuven, Belgium

2 Outline The paper goal Markov Chain Monte Carlo methods
Markov chain Monte Carlo (MCMC) Challenges Solution Testing Conclusion

3 What are we going to see? ProbLog Markov Chain Monte Carlo
Estimating conditional probabilities Sampling from AND/OR tree ProbLog

4 A simple weather model The matrix P represents the weather model in which a sunny day is 90% likely to be followed by another sunny day, and a rainy day is 50% likely to be followed by another rainy day. Predicting the weather: The weather on day 1 is known to be sunny. This is represented by a vector in which the "sunny" entry is 100%, and the "rainy" entry is 0%. the weather on day 2 can be predicted by: (1 0)*P , the weather on day 3 can be predicted by: (1 0)*P^2 or ( )*P .. Steady state of the weather: in this example, predictions for the weather on more distant days are increasingly inaccurate and tend towards a steady state vector. This vector represents the probabilities of sunny and rainy weather on all days, and is independent of the initial weather.

5 A simple weather model Predicting the weather
ההתפלגות של המשתנה המקרי ה- n+1-י בהינתן המשתנים שקדמו לו, שווה להתפלגותו בהינתן המשתנה ה- n-י בלבד Steady state of the weather

6 Markov Chain A process satisfies the Markov property if one can make predictions for the future of the process based solely on its present state just as well as one could knowing the process's full history. Conditional on the present state of the system, its future and past are independent. אינטואיטיבית, מודל מרקובי מתאר מערכת "חסרת זיכרון"

7 Monte Carlo methods After placing random points, the estimate for π is within 0.07% of the actual value. Their essential idea is using randomness to solve problems that might be deterministic in principle. Monte Carlo methods vary, but tend to follow a particular pattern: מוגדר מרחב של קלטים אפשריים לאלגוריתם הקלטים נבחרים על ידי המחשב מתוך מרחב הקלטים האפשריים על ידי שימוש בפונקציית הסתברות מסוימת על מרחב זה על קלטים אלו מחושב חישוב דטרמיניסטי מסוים הסטטיסטיקה של כל התוצאות נאספת ומוצגת

8 Markov chain Monte Carlo (MCMC) methods
A class of algorithms for sampling from a probability distribution based on constructing a Markov chain that has the desired distribution of its equilibrium distribution. לאחר כמה צעדים, משתמשים במצב השרשרת כמדגם/כדגימה של ההתפלגות הרצויה איכות המדגם/הדגימות משפרת כפונקציה של מספר הצעדים.

9 Our Goal Markov Chain Monte Carlo Estimating conditional probabilities
Sampling from AND/OR tree

10 Why Proposal distribution?
| ? = (

11 Why AND/OR trees? It is much easier to calculate just the possible worlds where the evidence holds. ובגלל שקשה לחשב את כל מה שאנחנו רוצים ואז לזרוק את כל העולמות שהאבחנה לא מתקיימת בהם נשתמש בעצים אלו

12 ProbLog program example
set of definite clauses facts pi :: ci pi = probability ci = fact

13 AND/OR Trees Root – evidence AND OR Empty Clause

14 AND/OR Trees the solution tree corresponding to the partial possible world : {earthquake = true, hears_alarm(mary) = true}.

15 Challenges designing a proposal distribution that, as often as possible, constructs states that agree with e, as only these are relevant for estimating P(q|e). two partial possible worlds can overlap, that can lead to the same full possible world, and then it may be overcounted. אתגרים

16 Partial worlds overlapping
e={alarm=true} {burglary=true} {earthquake=true} {burglary=true, earthquake=false} {burglary=false, earthquake=true} {burglary=true, earthquake=true} {burglary=true, earthquake=true}

17 OK, what to do? MCMC approach tailored to computing the conditional probability of a ProbLog query Given the previous solution tree, our proposal distribution builds a new one to propose as the candidate next state We adapt ideas from the Karp and Luby algorithm to identify overlapping worlds.

18 ProbLog program example
atomic choice = ground probabilistic fact p :: f . total choice = the resulting set of facts when we have included a fact for all random variables. partial choice = the resulting set of facts when we have not included a fact for all random variables. For a uniform distribution, X will be sampled from the discrete uniform distribution and f(x) will be included as a fact. x is the sampled value for X. Poisson distributions are treated similarly.

19 ProbLog program example
The probability of the total choice 0.05 × .99 × .7 × .4 {burglary, earthquake, hears alarm(john), hears alarm(mary)} T1

20 ProbLog program example
Ps(q) = P({w | q is true in the possible world w}) In our example- the probability of alarm = the probability that it is true in the 2^4 possible worlds.

21 MCMC Algorithm Overview
The estimate of P(q|e) is obtained by dividing the number of partial possible worlds where e is true and q is entailed by the number of partial possible worlds where e is true. MCMC uses Rejection sampling A set of observed (evidence) atoms e a query q ProbLog program T P(q|e) The estimate of P(q|e) is obtained by dividing the number of partial possible worlds where e is true and q is entailed by the number of partial possible worlds where e is true

22 Proposing a new state Intuition: small changes in the solution tree are more likely to lead to another solution than a big jump by probabilistically favoring reusing parts of the current proof for e. We follow the same branch at an OR node as in the previous state with probability P1. We make the same atomic choice as in the previous state with probability P2.

23 Proposing a new state P1 and P2 are user defined parameters; higher values encourage more reuse between consecutive solution trees. lower parameter values are better to encourage faster solution space exploration Our algorithm is similar to the standard MCMC algorithm in [1]. Until a stop criteria is met, each iteration proposes a candidate state, which is checked for overlap with previously seen states. If there is no overlap, or it can be resolved, we calculate the acceptance probability, and advance to the next state accordingly. higher values are better to favour reuse

24 AND/OR Trees A solution tree S in the AND/OR tree pTree(e) is a subtree such that: 1) e is the root of S. 2) the children of all AND nodes that are in S are also in S. 3) all OR nodes that are in S have exactly one child that is also in S. 4) all the leaves are . A solution tree is consistent with regard to random variables and atomic choices (e.g., it will not contain two atoms a and a’ and there cannot be two different values assigned to the same discrete distribution atom).

25 AND/OR trees AND Atomic Choice OR Empty Clause

26 Handling overlapping partial worlds
we use the idea from the Karp and Luby algorithm: each possible world is assigned to exactly one of its explanations.

27 Handling overlapping partial worlds
e={alarm=true} {burglary=true} {earthquake=true} {burglary=true, earthquake=false} {burglary=true, earthquake=true} {burglary=true, earthquake=true} {burglary=false, earthquake=true}

28 Computing the acceptance probability
The Markov chain advances by accepting a candidate state x∗ with probability: and otherwise remains in the same state (x(i+1) =x(i)) P(·) = probability of a state Q(·|·) = probability of transitioning from one state to another

29 Computing the acceptance probability
Psi is probability that fact ci takes on value vi Q(x∗|x(i)) – see algorithm here: Q(x(i)|x∗) – the same algorithm for Q(x∗|x(i)) with reverse parameters. היא המכפלה של ההסתברויות של כל הבחירות שנעשתה בעת הבנייה של המצב החדש מהמצב הישן כי הבחירה של כל חוליה לא תלויה באחרות

30 Testing: Comparison with other inference engines.
Domain= Hamming codes (a family of linear error- correcting codes containing data and parity bits) we predict the values of certain bits given other bits as evidence We run all sampling algorithms for 100, 000 samples.

31 Testing: Comparison with other inference engines.

32 Other tests results The new MCMC approach provide support for Poisson and uniform distributions. The new MCMC approach outperform existing ProbLog inference techniques on the considered tasks.

33 Conclusion An MCMC algorithm approach for estimating the conditional probability of a query given evidence in ProbLog. Our proposal distribution proposes candidate states by sampling solution trees from an AND/OR tree. Handling potential overlap between partial worlds is solved by employing ideas from the Karp and Luby algorithm. We provide support for Poisson and uniform distributions. We outperform existing ProbLog inference techniques on the considered tasks. גישה

34 Questions?

35 Basic concepts of logic programming
aθ => Each occurrence of Xi in a is replaced with ti substitution θ = {X1 = t1, ...,Xn = tn} Definite clause = h ← b1, ..., bn maps each Variable Xi to a term ti h is true whenever all bi are true Atom= pred(t1, ..., tn) Term = ti Variable constant F is a function applied on n Terms Pred = predicate pred/n of arity n F(t1,…,tn)


Download ppt "Bogdan Moldovan, Ingo Thon, Jesse Davis, and Luc de Raedt"

Similar presentations


Ads by Google