Maximum Likelihood Estimation Learning Probabilistic Graphical Models Parameter Estimation Maximum Likelihood Estimation
Biased Coin Example Tosses are independent of each other P is a Bernoulli distribution: P(X=1) = , P(X=0) = 1- sampled IID from P Tosses are independent of each other Tosses are sampled from the same distribution (identically distributed)
IID as a PGM X X[1] . . . X[M] Data m
Maximum Likelihood Estimation Goal: find [0,1] that predicts D well Prediction quality = likelihood of D given 0.2 0.4 0.6 0.8 1 L(D:)
Maximum Likelihood Estimator Observations: MH heads and MT tails Find maximizing likelihood Equivalent to maximizing log-likelihood Differentiating the log-likelihood and solving for :
Sufficient Statistics For computing in the coin toss example, we only needed MH and MT since MH and MT are sufficient statistics
Sufficient Statistics A function s(D) is a sufficient statistic from instances to a vector in k if for any two datasets D and D’ and any we have Datasets Statistics
Sufficient Statistic for Multinomial For a dataset D over variable X with k values, the sufficient statistics are counts <M1,...,Mk> where Mi is the # of times that X[m]=xi in D Sufficient statistic s(x) is a tuple of dimension k s(xi)=(0,...0,1,0,...,0) i
Sufficient Statistic for Gaussian Gaussian distribution: Rewrite as Sufficient statistics for Gaussian: s(x)=<1,x,x2>
Maximum Likelihood Estimation MLE Principle: Choose to maximize L(D:) Multinomial MLE: Gaussian MLE:
Summary Maximum likelihood estimation is a simple principle for parameter selection given D Likelihood function uniquely determined by sufficient statistics that summarize D MLE has closed form solution for many parametric distributions
END END END