Download presentation
Presentation is loading. Please wait.
1
CS498-EA Reasoning in AI Lecture #20
Instructor: Eyal Amir Fall Semester 2009 Who is in this class? Some slides in this set were adopted from Eran Segal
2
Summary of last time: Inference
We presented the variable elimination algorithm Specifically, VE for finding marginal P(Xi) over one variable, Xi from X1,…,Xn Order on variables such that One variable Xj eliminated at a time (a) Move unneeded terms (those not involving Xj) outside summation over Xj (b) Create a new potential function, fXj(.) over other variables appearing in the terms of the summation at (a) Works for both BNs and MFs (Markov Fields)
3
Exact Inference Treewidth methods: Variable elimination
Clique tree algorithm Treewidth
4
Today: Learning in Graphical Models
Parameter Estimation Maximum Likelihood Complete Observations Naïve Bayes Not so Naïve Bayes
5
Learning Introduction
So far, we assumed that the networks were given Where do the networks come from? Knowledge engineering with aid of experts Automated construction of networks Learn by examples or instances 5
6
maximize i P(bi,ci,ei; w0, w1)
Parameter estimation Maximum likelihood estimation: maximize i P(bi,ci,ei; w0, w1)
7
Learning Introduction
Input: dataset of instances D={d[1],...d[m]} Output: Bayesian network Measures of success How close is the learned network to the original distribution Use distance measures between distributions Often hard because we do not have the true underlying distribution Instead, evaluate performance by how well the network predicts new unseen examples (“test data”) Classification accuracy How close is the structure of the network to the true one? Use distance metric between structures Hard because we do not know the true structure Instead, ask whether independencies learned hold in test data 7
8
Prior Knowledge Prespecified structure Prespecified variables
Learn only CPDs Prespecified variables Learn network structure and CPDs Hidden variables Learn hidden variables, structure, and CPDs Complete/incomplete data Missing data Unobserved variables 8
9
Learning Bayesian Networks
X1 X2 Inducer Data Prior information Y P(Y|X1,X2) X1 X2 y0 y1 x10 x20 1 x21 0.2 0.8 x11 0.1 0.9 0.02 0.98 9
10
Known Structure, Complete Data
Goal: Parameter estimation Data does not contain missing values X1 X2 X1 X2 Inducer Initial network Y Y X1 X2 Y x10 x21 y0 x11 x20 y1 P(Y|X1,X2) X1 X2 y0 y1 x10 x20 1 x21 0.2 0.8 x11 0.1 0.9 0.02 0.98 Input Data 10
11
Unknown Structure, Complete Data
Goal: Structure learning & parameter estimation Data does not contain missing values X1 X2 X1 X2 Inducer Initial network Y Y X1 X2 Y x10 x21 y0 x11 x20 y1 P(Y|X1,X2) X1 X2 y0 y1 x10 x20 1 x21 0.2 0.8 x11 0.1 0.9 0.02 0.98 Input Data 11
12
Known Structure, Incomplete Data
Goal: Parameter estimation Data contains missing values X1 X2 X1 X2 Inducer Initial network Y Y X1 X2 Y ? x21 y0 x11 x10 x20 y1 P(Y|X1,X2) X1 X2 y0 y1 x10 x20 1 x21 0.2 0.8 x11 0.1 0.9 0.02 0.98 Input Data 12
13
Unknown Structure, Incomplete Data
Goal: Structure learning & parameter estimation Data contains missing values X1 X2 X1 X2 Inducer Initial network Y Y X1 X2 Y ? x21 y0 x11 x10 x20 y1 P(Y|X1,X2) X1 X2 y0 y1 x10 x20 1 x21 0.2 0.8 x11 0.1 0.9 0.02 0.98 Input Data 13
14
Parameter Estimation Input Goal: Learn CPD parameters
Network structure Choice of parametric family for each CPD P(Xi|Pa(Xi)) Goal: Learn CPD parameters Two main approaches Maximum likelihood estimation Bayesian approaches 14
15
Biased Coin Toss Example
Coin can land in two positions: Head or Tail Estimation task Given toss examples x[1],...x[m] estimate P(H)= and P(T)=1- Assumption: i.i.d samples Tosses are controlled by an (unknown) parameter Tosses are sampled from the same distribution Tosses are independent of each other 15
16
Biased Coin Toss Example
Goal: find [0,1] that predicts the data well “Predicts the data well” = likelihood of the data given Example: probability of sequence H,T,T,H,H 0.2 0.4 0.6 0.8 1 L(D:) 16
17
Maximum Likelihood Estimator
Parameter that maximizes L(D:) In our example, =0.6 maximizes the sequence H,T,T,H,H 0.2 0.4 0.6 0.8 1 L(D:) 17
18
Maximum Likelihood Estimator
General case Observations: MH heads and MT tails Find maximizing likelihood Equivalent to maximizing log-likelihood Differentiating the log-likelihood and solving for we get that the maximum likelihood parameter is: 18
19
Sufficient Statistics
For computing the parameter of the coin toss example, we only needed MH and MT since MH and MT are sufficient statistics 19
20
Sufficient Statistics
Definition: Function s(D) is a sufficient statistics from instances to a vector in k if for any two datasets D and D’ and any we have Datasets Statistics 20
21
Sufficient Statistics for Multinomial
A sufficient statistics for a dataset D over a variable Y with k values is the tuple of counts <M1,...Mk> such that Mi is the number of times that the Y=yi in D Sufficient statistic Define s(x[i]) as a tuple of dimension k s(x[i])=(0,...0,1,0,...,0) (1,...,i-1) (i+1,...,k) 21
22
Bias and Variance of Estimator
23
Next Time
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.