Download presentation
Presentation is loading. Please wait.
Published byMarybeth McGee Modified over 8 years ago
1
Survey on Semi-Supervised CRFs Yusuke Miyao Department of Computer Science The University of Tokyo
2
Contents 1.Conditional Random Fields (CRFs) 2.Semi-Supervised Log-Linear Model 3.Semi-Supervised CRFs 4.Dynamic Programming for Semi-Supervised CRFs
3
1. Conditional Random Fields (CRFs) Log-linear model: for sentence x = and label sequence y =, – λ k : parameter – f k : feature function – Z : partition function
4
Parameter Estimation (1/2) Estimate parameters λ k, given labeled training data D = { } Objective function: log-likelihood (+ regularizer)
5
Parameter Estimation (2/2) Gradient-based optimization is applied (CG, pseudo-Newton, etc.) model expectation
6
Dynamic Programming for CRFs Computation of model expectations requires summation over y → exponential Dynamic programming allows for efficient computation of model expectations His friend runs the company Noun Det Noun x y
7
Dynamic Programming for CRFs Assumption: (0-th order) His friend runs the company Noun Det Verb Adj t ytyt
8
Forward/Backward Probability His friend runs the company Noun Det Verb Adj Computed by Dynamic Programming
9
2. Semi-Supervised Log-Linear Model Grandvalet et al. (2004) Given labeled data D L = { } and unlabeled data D U = {z i } Objective function: log-likelihood + negative entropy regularizer
10
Negative Entropy Regularizer Maximizing → Minimizing class overwraps = Targets are separated
11
Gradient of Entropy (1/2)
12
Gradient of Entropy (2/2)
13
3. Semi-Supervised CRFs Jiao et al. (2006) Given labeled data D L ={ } and unlabeled data D U = {z i } Objective function: log-likelihood + negative entropy regularizer
14
Application to NER Gene and protein identification A (labeled): 5448 words, B (unlabeled): 5210 words, C: 10208 words, D: 25145 words Self-training did not get any improvements A & BA & CA & D γ PRFPRFPRF 00.800.360.500.770.290.430.740.300.43 0.10.820.400.540.790.320.460.740.310.44 0.50.820.400.540.790.330.460.740.310.44 10.820.400.540.770.340.470.730.330.45 50.840.450.590.780.380.510.720.360.48 100.780.460.580.660.380.480.660.380.47
15
Results
16
4. Dynamic Programming for Semi-Supervised CRFs Mann et al. (2007) We have to compute: where y -t = and y -t ・ y =
17
Example Enumerate all y while fixing t -th state to y t If we can compute efficiently, we can compute the gradient His friend runs the company Noun Det Verb Adj
18
Decomposition of Entropy In the following, we use
19
Subsequence Constrained Entropy Computed from forward-backward probability Subsequence constrained entropy His friend runs the company Noun Det Verb Adj
20
Forward/Backward Subsequence Constrained Entropy His friend runs the company Noun Det Verb Adj
21
Dynamic Computation of H α H α can be computed incrementally Computed from forward-backward probability =
22
References Y. Grandvalet and Y. Bengio. 2004. Semi-supervised learning by entropy minimization. In NIPS 2004. F. Jiao, S. Wang, C.-H. Lee, R. Greiner, and D. Schuurmans. 2006. Semi-supervised conditional random fields for improved sequence segmentation and labeling. In COLING/ACL 2006. G. S. Mann and A. McCallum. 2007. Efficient Computation of Entropy Gradient for Semi-Supervised Conditional Random Fields. In NAACL-HLT 2007. X. Zhu. 2005. Semi-supervised learning literature survey. Technical Report 1530, Computer Sciences, University of Wisconsin-Madison.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.