Download presentation
Presentation is loading. Please wait.
Published byPamela Cox Modified over 9 years ago
1
Probabilistic Inference in PRISM Taisuke Sato Tokyo Institute of Technology
2
Problem model-specific learning algorithms Model 1 EM VB MCMC Model 2 Model n... EM 1 EM 2 EM n Statistical machine learning is a labor-intensive process: {modeling learning evaluation}* of trial-and-error Pains of deriving and implementing model-specific learning algorithms and model-specific probabilistic inference
3
Develop a high-level modeling language that offers universal learning and inference methods applicable to every model The user concentrates on modeling and the rest (learning and inference) is taken care of by the system Our solution Model 1 EM VB MCMC Model 2 Model n... modeling language
4
Bayesian network Bayesian network HMM New model... EM/MAP VB MCMC PRISM system VT VBVT Learning methods Probabilistic models PCFG Logic-based high-level modeling language Its generic inference/learning methods subsume standard algorithms such as FB for HMMs and BP for Bayesian networks PRISM (http://sato-www.cs.titech.ac.jp/prism/)
5
Semantics program = Turing machine + probabilistic choice + Dirichlet prior denotation = a probability measure over possible worlds Propositionalized probability computation (PPC) programs written at predicate logic level probability computation at propositional logic level Dynamic programming for PPC proof search generates a directed graph (explanation graph) Probabilities are computed from bottom to top in the graph Discriminative use generatively define a model by a PRISM program and descriminatively use it for better prediction performance Basic ideas
6
btype(X):- gtype(Gf,Gm), pg_table(X,[Gf,Gm]). pg_table(X,GT):- ((X=a;X=b),(GT=[X,o];GT=[o,X];GT=[X,X]) ; X=o,GT=[o,o] ; X=ab,(GT=[a,b];GT=[b,a])). gtype(Gf,Gm):- msw(abo,Gf),msw(abo,Gm). ABO blood type program values(abo,[a,b,o],[0.5,0.2,0.3]). msw(abo,a) is true with prob. 0.5 probabilistic primitives simulate gene inheritance from father (left) and mother (right)
7
btype(a) gtype(a,a) v gtype(a,o) v gtype(o,a) gtype(a,a) msw(abo,a) & msw(abo,a) gtype(a,o) msw(abo,a) & msw(abo,o) gtype(o,a) msw(abo,o) & msw(abo,a) Propositionalized probability computation Explanation graph for btype(a) that explains how btype(a) is proved by probabilistic choice made by msw-atoms 0.55 0.25 0.15 0.50.250.5 0.30.150.5 0.150.3 Sum-product computation of probabilities in a bottom-up manner using probabilities assigned to msw atoms Expl. graph is acyclic and dynamic programming (DP) is possible PPC+DP subsumes forward-backward, belief propagation, inside- outside computation
8
A program defines a joint distributionP(x,y| ) where x hidden and y observed P(msw(abo,a),..btype(a),… | a, b, o ) where a + b + o =1 Learning from observed data y by maximizing P(y| ) MLE/MAP P(x*,y| ) where x* = argmax _x P(x,y| ) VT From a Bayesian point of view, a program defines marginal likelihood ∫ P(x,y| , ) d We wish to compute predictive distribution = ∫ P(x|y, , ) d marginal likelihood P(y| ) = x ∫ P(x,y| , ) d Both need approximation Variational Bayes (VB) VB, VB-VT MCMC Metropolis-Hastings Learning
9
Sample session 1 - Expl. graph and prob. computation | ?- prism(blood) loading::blood.psm.out | ?- show_sw Switch gene: unfixed_p: a (p: 0.500000000) b (p: 0.200000000) o (p: 0.300000000) | ?- probf(btype(a)) btype(a) gtype(a,a) v gtype(a,o) v gtype(o,a) gtype(a,a) msw(gene,a) & msw(gene,a) gtype(a,o) msw(gene,a) & msw(gene,o) gtype(o,a) msw(gene,o) & msw(gene,a) | ?- prob(btype(a),P) P = 0.55 built-in predicate
10
| ?- D=[btype(a),btype(a),btype(ab),btype(o)],learn(D) Exporting switch information to the EM routine... done #em-iters: 0(4) (Converged: -4.965121886) Statistics on learning: Graph size: 18 Number of switches: 1 Number of switch instances: 3 Number of iterations: 4 Final log likelihood: -4.965121886 | ?- prob(btype(a),P) P = 0.598211 | ?- viterbif(btype(a)) btype(a) <= gtype(a,a) gtype(a,a) <= msw(gene,a) & msw(gene,a) Sample session 2 - MLE and Viterbi inference
11
Sample session 3 - Bayes inference by MCMC | ?- D=[btype(a), btype(a), btype(ab), btype(o)], marg_mcmc_full(D,[burn_in(1000),end(10000),skip(5)],[VFE,ELM]), marg_exact(D,LogM) VFE = -5.54836 ELM = -5.48608 LogM = -5.48578 |?- D=[btype(a), btype(a), btype(ab),btype(o)], predict_mcmc_full(D,[btype(a)],[[_,E,_]]), print_graph(E,[lr('<=')]) btype(a) <= gtype(a,a) gtype(a,a) <= msw(gene,a) & msw(gene,a)
12
Summary PRISM = Probabilistic Prolog for statistical machine learning Forward sampling Exact probability computation Parameter learning MLE/MAP, VT Bayesian inference VB VBVT MCMC Viterbi inference model core (BIC,Cheesman-Stutz,VFE) smoothing Current version 2.1
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.