Download presentation
Presentation is loading. Please wait.
Published byDuane Lynch Modified over 9 years ago
1
Margin Learning, Online Learning, and The Voted Perceptron SPLODD ~= AE* – 3, 2011 * Autumnal Equinox
2
Review Computer science is full of equivalences – SQL relational algebra – YFCL optimizing … on the training data – gcc –O4 foo.c gcc foo.c Also full of relationships between sets: – Finding smallest error-free decision tree >> 3-SAT – DataLog >> relational algebra – CFL >> Det FSMs = RegEx
3
Review Bayes Nets: describe a (family of) joint distribution(s) between random variables – They are an operational description (a program) for how data can be generated – They are a declarative description (a definition) for the joint distribution, and from this we can derive algorithms for doing stuff other than generation There is a close connection between Naïve Bayes and loglinear models
4
NB vs loglinear models Loglinear classif. NB classif. Multinomial? classif. * SymDir(100) * AbsDisc(0.01)* * * * * * * * * * * * * * * * * * * * * Max CL(y|x) + G(0,1.0) * NB-JL NB-CL NB-CL*
5
NB vs loglinear models Loglinear classif. NB classif. Multinomial? classif. * SymDir(100) * * * * * * * * * Max CL(y|x) + G(0,1.0) * Y WjWj “Optimal if”
6
Similarly for sequences… An HMM is a Bayes net – It implies a set of independence assumptions – ML parameter setting and Viterbi are optimal if these hold A CRF is a Markov field – It implies a set of independence assumptions – These, plus the goal of maximizing Pr(y|x), give us a learning algorithm You can construct features so that any HMM can be emulated by a CRF with those features
7
In sequence space… CRF/loglinear models HMMs Multinomial? models * SymDir(100) * AbsDisc(0.01)* * * * * * * * * * * * * * * * * * * * * Max CL(y|x) + G(0,1.0) * JL CL CL*
8
Review: CRFs/Markov Random Fields When will prof Cohen post the notes Semantics of a Markov random field Y1Y2Y3Y4Y5Y6Y7 What’s independent: Pr(Y i |other Y’s) = Pr(Y i |Y i-1,Y i+1 ) Probability distribution:
9
Review: CRFs/Markov Random Fields B I O B I O B I O B I O B I O B I O B I O When will prof Cohen post the notes …
10
Review: CRFs/Markov Random Fields When will prof Cohen post the notes Y1Y2Y3Y4Y5Y6Y7 What’s independent: Pr(Y i |other Y’s) = Pr(Y i |neighbors of Yi) Probability distribution: Yf
11
Pseudo-likelihood and dependency networks Any Markov field defines a (family of) probability distributions D – But not a simple program for generation/sampling – We can use MCMC in the general case If you have for each node i, P D (X i |Pa i ), that’s a dependency net – Still no simple program for generation/sampling (but can use Gibbs) – You can learn these from data using YFCL – Equivalently: learning this maximizes pseudo-likelihood, just as HMM learning maximizes (real) likelihood on a sequence. A weirdness: every MRF has an equivalent dependency net, but every dependency net (set of local conditionals) does not have an equivalent MRF
12
And now for …
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.