Probabilistic Inference in PRISM Taisuke Sato Tokyo Institute of Technology.

Probabilistic Inference in PRISM Taisuke Sato Tokyo Institute of Technology

Problem model-specific learning algorithms Model 1 EM VB MCMC Model 2 Model n... EM 1 EM 2 EM n Statistical machine learning is a labor-intensive process: {modeling  learning  evaluation}* of trial-and-error Pains of deriving and implementing model-specific learning algorithms and model-specific probabilistic inference

Develop a high-level modeling language that offers universal learning and inference methods applicable to every model The user concentrates on modeling and the rest (learning and inference) is taken care of by the system Our solution Model 1 EM VB MCMC Model 2 Model n... modeling language

Bayesian network Bayesian network HMM New model... EM/MAP VB MCMC PRISM system VT VBVT Learning methods Probabilistic models PCFG Logic-based high-level modeling language Its generic inference/learning methods subsume standard algorithms such as FB for HMMs and BP for Bayesian networks PRISM (http://sato-www.cs.titech.ac.jp/prism/)

Semantics program = Turing machine + probabilistic choice + Dirichlet prior denotation = a probability measure over possible worlds Propositionalized probability computation (PPC) programs written at predicate logic level probability computation at propositional logic level Dynamic programming for PPC proof search generates a directed graph (explanation graph) Probabilities are computed from bottom to top in the graph Discriminative use generatively define a model by a PRISM program and descriminatively use it for better prediction performance Basic ideas

btype(X):- gtype(Gf,Gm), pg_table(X,[Gf,Gm]). pg_table(X,GT):- ((X=a;X=b),(GT=[X,o];GT=[o,X];GT=[X,X]) ; X=o,GT=[o,o] ; X=ab,(GT=[a,b];GT=[b,a])). gtype(Gf,Gm):- msw(abo,Gf),msw(abo,Gm). ABO blood type program values(abo,[a,b,o],[0.5,0.2,0.3]). msw(abo,a) is true with prob. 0.5 probabilistic primitives simulate gene inheritance from father (left) and mother (right)

btype(a) gtype(a,a) v gtype(a,o) v gtype(o,a) gtype(a,a) msw(abo,a) & msw(abo,a) gtype(a,o) msw(abo,a) & msw(abo,o) gtype(o,a) msw(abo,o) & msw(abo,a) Propositionalized probability computation Explanation graph for btype(a) that explains how btype(a) is proved by probabilistic choice made by msw-atoms 0.55 0.25 0.15 0.50.250.5 0.30.150.5 0.150.3 Sum-product computation of probabilities in a bottom-up manner using probabilities assigned to msw atoms Expl. graph is acyclic and dynamic programming (DP) is possible PPC+DP subsumes forward-backward, belief propagation, inside- outside computation

A program defines a joint distributionP(x,y|  ) where x hidden and y observed P(msw(abo,a),..btype(a),… |  a,  b,  o ) where  a +  b +  o =1 Learning  from observed data y by maximizing P(y|  )  MLE/MAP P(x*,y|  ) where x* = argmax _x P(x,y|  )  VT From a Bayesian point of view, a program defines marginal likelihood ∫ P(x,y| ,  ) d  We wish to compute predictive distribution = ∫ P(x|y, ,  ) d  marginal likelihood P(y|  ) =  x ∫ P(x,y| ,  ) d  Both need approximation Variational Bayes (VB)  VB, VB-VT MCMC  Metropolis-Hastings Learning

Sample session 1 - Expl. graph and prob. computation | ?- prism(blood) loading::blood.psm.out | ?- show_sw Switch gene: unfixed_p: a (p: 0.500000000) b (p: 0.200000000) o (p: 0.300000000) | ?- probf(btype(a)) btype(a) gtype(a,a) v gtype(a,o) v gtype(o,a) gtype(a,a) msw(gene,a) & msw(gene,a) gtype(a,o) msw(gene,a) & msw(gene,o) gtype(o,a) msw(gene,o) & msw(gene,a) | ?- prob(btype(a),P) P = 0.55 built-in predicate

| ?- D=[btype(a),btype(a),btype(ab),btype(o)],learn(D) Exporting switch information to the EM routine... done #em-iters: 0(4) (Converged: -4.965121886) Statistics on learning: Graph size: 18 Number of switches: 1 Number of switch instances: 3 Number of iterations: 4 Final log likelihood: -4.965121886 | ?- prob(btype(a),P) P = 0.598211 | ?- viterbif(btype(a)) btype(a) <= gtype(a,a) gtype(a,a) <= msw(gene,a) & msw(gene,a) Sample session 2 - MLE and Viterbi inference

Sample session 3 - Bayes inference by MCMC | ?- D=[btype(a), btype(a), btype(ab), btype(o)], marg_mcmc_full(D,[burn_in(1000),end(10000),skip(5)],[VFE,ELM]), marg_exact(D,LogM) VFE = -5.54836 ELM = -5.48608 LogM = -5.48578 |?- D=[btype(a), btype(a), btype(ab),btype(o)], predict_mcmc_full(D,[btype(a)],[[_,E,_]]), print_graph(E,[lr('<=')]) btype(a) <= gtype(a,a) gtype(a,a) <= msw(gene,a) & msw(gene,a)

Summary PRISM = Probabilistic Prolog for statistical machine learning Forward sampling Exact probability computation Parameter learning MLE/MAP, VT Bayesian inference VB VBVT MCMC Viterbi inference model core (BIC,Cheesman-Stutz,VFE) smoothing Current version 2.1

Probabilistic Inference in PRISM Taisuke Sato Tokyo Institute of Technology.

Similar presentations

Presentation on theme: "Probabilistic Inference in PRISM Taisuke Sato Tokyo Institute of Technology."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Probabilistic Inference in PRISM Taisuke Sato Tokyo Institute of Technology.

Similar presentations

Presentation on theme: "Probabilistic Inference in PRISM Taisuke Sato Tokyo Institute of Technology."— Presentation transcript:

Similar presentations

About project

Feedback