Download presentation
Presentation is loading. Please wait.
Published byReginald Kennedy Modified over 9 years ago
1
Computing & Information Sciences Kansas State University University of Illinois at Urbana-ChampaignDSSI--MIAS Data Sciences Summer Institute Multimodal Information Access and Synthesis Learning and Reasoning with Graphical Models of Probability for the Identity Uncertainty Problem William H. Hsu Tuesday, 05 Jun 2007 Laboratory for Knowledge Discovery in Databases Kansas State University http://www.kddresearch.org/KSU/CIS/DSSI-MIAS-SRL-20070605.ppt
2
Computing & Information Sciences Kansas State University University of Illinois at Urbana-ChampaignDSSI--MIAS Probabilistic Relational Models (PRMs) –First-order representations –Semantics –Logic and probability –Representation: bridge between learning, reasoning (cf. Koller 2001) Markov Chain Monte Carlo (MCMC) Methods –Local versus global search –MCMC approach defined Identity Uncertainty (IDU) Problem –Definition –Example: citation matching Relevance to Named Entity Recognition and Resolution Part 3 of 8: PRMs, MCMC, IDU Overview
3
Computing & Information Sciences Kansas State University University of Illinois at Urbana-ChampaignDSSI--MIAS Bayesian Learning: Synopsis Components of Bayes’s Theorem: Prior and Conditional Probabilities P(h) Prior Probability of (Correctness of) Hypothesis h Uniform priors: no background knowledge Background knowledge can skew priors away from ~ Uniform(H) P(h | D) Probability of h Given Training Data D P(h D) Joint Probability of h and D P(D) Probability of D Expresses distribution D: P(D) ~ D To compute: marginalize joint probabilities P(D | h) Probability of D Given h Probability of observing D given that h is correct (“generative” model) P(D | h) = 1 if h consistent with D (i.e., x i. h(x i ) = c(x i )), otherwise 0
4
Computing & Information Sciences Kansas State University University of Illinois at Urbana-ChampaignDSSI--MIAS Review: MAP and ML Hypotheses Bayes’s Theorem MAP Hypothesis Maximum a posteriori hypothesis, h MAP Caveat: maximizing P(h | D) versus combining h values may not be best ML Hypothesis Maximum likelihood hypothesis, h ML Sufficient for computing MAP when priors P(h) are uniformly distributed Hard to estimate P(h | D) in this case Solution approach: encode knowledge about H in P(h) - explicit bias
5
Computing & Information Sciences Kansas State University University of Illinois at Urbana-ChampaignDSSI--MIAS ML Hypothesis Maximum likelihood hypothesis, h ML Uniform priors: posterior P(h | D) hard to estimate - why? Recall: belief revision given evidence (data) “No knowledge” means we need more evidence Consequence: more computational work to search H ML Estimation (MLE): Finding h ML for Unknown Concepts Recall: log likelihood (log prob value) used - proportional to likelihood In practice, estimate desc. statistics of P(D | h) to approximate h ML e.g., ML : ML estimator for unknown mean (P(D) ~ Normal) sample mean Maximum Likelihood Estimation (MLE): Review h P(h)P(h) P(h)P(h) Hypotheses P(h|D1)P(h|D1) P(h|D 1, D 2 ) Hypotheses
6
Computing & Information Sciences Kansas State University University of Illinois at Urbana-ChampaignDSSI--MIAS Markov Chain Monte Carlo Example [1]: Face Recognition Matsui et al. (2004)
7
Computing & Information Sciences Kansas State University University of Illinois at Urbana-ChampaignDSSI--MIAS What is BNT? BNT is an open-source collection of matlab functions for inference and learning of (directed) graphical models Started in Summer 1997 (DEC CRL), development continued while at UCB Over 100,000 hits and about 30,000 downloads since May 2000 About 43,000 lines of code (of which 8,000 are comments) From Murphy (2003)
8
Computing & Information Sciences Kansas State University University of Illinois at Urbana-ChampaignDSSI--MIAS Why yet another BN toolbox? In 1997, there were very few BN programs, and all failed to satisfy the following desiderata: Must support real-valued (vector) data Must support learning (params and struct) Must support time series Must support exact and approximate inference Must separate API from UI Must support MRFs as well as BNs Must be possible to add new models and algorithms Preferably free Preferably open-source Preferably easy to read/ modify Preferably fast BNT meets all these criteria except for the last From Murphy (2003)
9
Computing & Information Sciences Kansas State University University of Illinois at Urbana-ChampaignDSSI--MIAS Why Matlab? Pros Excellent interactive development environment Excellent numerical algorithms (e.g., SVD) Excellent data visualization Many other toolboxes, e.g., netlab Code is high-level and easy to read (e.g., Kalman filter in 5 lines of code) Matlab is the lingua franca of engineers and NIPS Cons Slow Commercial license is expensive Poor support for complex data structures Other languages considered in hindsight Lush, R, Ocaml, Numpy, Lisp, Java From Murphy (2003)
10
Computing & Information Sciences Kansas State University University of Illinois at Urbana-ChampaignDSSI--MIAS BNT’s class structure Models – bnet, mnet, DBN, factor graph, influence (decision) diagram CPDs – Gaussian, tabular, softmax, etc Potentials – discrete, Gaussian, mixed Inference engines Exact - junction tree, variable elimination Approximate - (loopy) belief propagation, sampling Learning engines Parameters – EM, (conjugate gradient) Structure - MCMC over graphs, K2 From Murphy (2003)
11
Computing & Information Sciences Kansas State University University of Illinois at Urbana-ChampaignDSSI--MIAS 1. Making the graph X Y Q X = 1; Q = 2; Y = 3; dag = zeros(3,3); dag(X, [Q Y]) = 1; dag(Q, Y) = 1; Graphs are (sparse) adjacency matrices GUI would be useful for creating complex graphs Repetitive graph structure (e.g., chains, grids) is best created using a script (as above) From Murphy (2003)
12
Computing & Information Sciences Kansas State University University of Illinois at Urbana-ChampaignDSSI--MIAS 2. Making the model node_sizes = [1 2 1]; dnodes = [2]; bnet = mk_bnet(dag, node_sizes, … ‘discrete’, dnodes); X Y Q X is always observed input, hence only one effective value Q is a hidden binary node Y is a hidden scalar node bnet is a struct, but should be an object mk_bnet has many optional arguments, passed as string/value pairs From Murphy (2003)
13
Computing & Information Sciences Kansas State University University of Illinois at Urbana-ChampaignDSSI--MIAS 3. Specifying the parameters X Y Q bnet.CPD{X} = root_CPD(bnet, X); bnet.CPD{Q} = softmax_CPD(bnet, Q); bnet.CPD{Y} = gaussian_CPD(bnet, Y); CPDs are objects which support various methods such as Convert_from_CPD_to_potential Maximize_params_given_expected_suff_stats Each CPD is created with random parameters Each CPD constructor has many optional arguments From Murphy (2003)
14
Computing & Information Sciences Kansas State University University of Illinois at Urbana-ChampaignDSSI--MIAS 4. Training the model load data –ascii; ncases = size(data, 1); cases = cell(3, ncases); observed = [X Y]; cases(observed, :) = num2cell(data’); Training data is stored in cell arrays (slow!), to allow for variable-sized nodes and missing values cases{i,t} = value of node i in case t engine = jtree_inf_engine(bnet, observed); Any inference engine could be used for this trivial model bnet2 = learn_params_em(engine, cases); We use EM since the Q nodes are hidden during training learn_params_em is a function, but should be an object X Y Q From Murphy (2003)
15
Computing & Information Sciences Kansas State University University of Illinois at Urbana-ChampaignDSSI--MIAS Before training From Murphy (2003)
16
Computing & Information Sciences Kansas State University University of Illinois at Urbana-ChampaignDSSI--MIAS After training From Murphy (2003)
17
Computing & Information Sciences Kansas State University University of Illinois at Urbana-ChampaignDSSI--MIAS 5. Inference/ prediction engine = jtree_inf_engine(bnet2); evidence = cell(1,3); evidence{X} = 0.68; % Q and Y are hidden engine = enter_evidence(engine, evidence); m = marginal_nodes(engine, Y); m.mu % E[Y|X] m.Sigma % Cov[Y|X] X Y Q From Murphy (2003)
18
Computing & Information Sciences Kansas State University University of Illinois at Urbana-ChampaignDSSI--MIAS Other kinds of models that BNT supports Classification/ regression: linear regression, logistic regression, cluster weighted regression, hierarchical mixtures of experts, naïve Bayes Dimensionality reduction: probabilistic PCA, factor analysis, probabilistic ICA Density estimation: mixtures of Gaussians State-space models: LDS, switching LDS, tree-structured AR models HMM variants: input-output HMM, factorial HMM, coupled HMM, DBNs Probabilistic expert systems: QMR, Alarm, etc. Limited-memory influence diagrams (LIMID) Undirected graphical models (MRFs) From Murphy (2003)
19
Computing & Information Sciences Kansas State University University of Illinois at Urbana-ChampaignDSSI--MIAS Summary of BNT Provides many different kinds of models/ CPDs – lego brick philosophy Provides many inference algorithms, with different speed/ accuracy/ generality tradeoffs (to be chosen by user) Provides several learning algorithms (parameters and structure) Source code is easy to read and extend From Murphy (2003)
20
Computing & Information Sciences Kansas State University University of Illinois at Urbana-ChampaignDSSI--MIAS Problems with BNT It is slow It has little support for undirected models Models are not bona fide objects Learning engines are not objects It does not support online inference/learning It does not support Bayesian estimation It has no GUI It has no file parser It is more complex than necessary From Murphy (2003)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.