Survey on Semi-Supervised CRFs Yusuke Miyao Department of Computer Science The University of Tokyo
Contents 1.Conditional Random Fields (CRFs) 2.Semi-Supervised Log-Linear Model 3.Semi-Supervised CRFs 4.Dynamic Programming for Semi-Supervised CRFs
1. Conditional Random Fields (CRFs) Log-linear model: for sentence x = and label sequence y =, – λ k : parameter – f k : feature function – Z : partition function
Parameter Estimation (1/2) Estimate parameters λ k, given labeled training data D = { } Objective function: log-likelihood (+ regularizer)
Parameter Estimation (2/2) Gradient-based optimization is applied (CG, pseudo-Newton, etc.) model expectation
Dynamic Programming for CRFs Computation of model expectations requires summation over y → exponential Dynamic programming allows for efficient computation of model expectations His friend runs the company Noun Det Noun x y
Dynamic Programming for CRFs Assumption: (0-th order) His friend runs the company Noun Det Verb Adj t ytyt
Forward/Backward Probability His friend runs the company Noun Det Verb Adj Computed by Dynamic Programming
2. Semi-Supervised Log-Linear Model Grandvalet et al. (2004) Given labeled data D L = { } and unlabeled data D U = {z i } Objective function: log-likelihood + negative entropy regularizer
Negative Entropy Regularizer Maximizing → Minimizing class overwraps = Targets are separated
Gradient of Entropy (1/2)
Gradient of Entropy (2/2)
3. Semi-Supervised CRFs Jiao et al. (2006) Given labeled data D L ={ } and unlabeled data D U = {z i } Objective function: log-likelihood + negative entropy regularizer
Application to NER Gene and protein identification A (labeled): 5448 words, B (unlabeled): 5210 words, C: words, D: words Self-training did not get any improvements A & BA & CA & D γ PRFPRFPRF
Results
4. Dynamic Programming for Semi-Supervised CRFs Mann et al. (2007) We have to compute: where y -t = and y -t ・ y =
Example Enumerate all y while fixing t -th state to y t If we can compute efficiently, we can compute the gradient His friend runs the company Noun Det Verb Adj
Decomposition of Entropy In the following, we use
Subsequence Constrained Entropy Computed from forward-backward probability Subsequence constrained entropy His friend runs the company Noun Det Verb Adj
Forward/Backward Subsequence Constrained Entropy His friend runs the company Noun Det Verb Adj
Dynamic Computation of H α H α can be computed incrementally Computed from forward-backward probability =
References Y. Grandvalet and Y. Bengio Semi-supervised learning by entropy minimization. In NIPS F. Jiao, S. Wang, C.-H. Lee, R. Greiner, and D. Schuurmans Semi-supervised conditional random fields for improved sequence segmentation and labeling. In COLING/ACL G. S. Mann and A. McCallum Efficient Computation of Entropy Gradient for Semi-Supervised Conditional Random Fields. In NAACL-HLT X. Zhu Semi-supervised learning literature survey. Technical Report 1530, Computer Sciences, University of Wisconsin-Madison.