Presentation is loading. Please wait.

Presentation is loading. Please wait.

Survey on Semi-Supervised CRFs Yusuke Miyao Department of Computer Science The University of Tokyo.

Similar presentations


Presentation on theme: "Survey on Semi-Supervised CRFs Yusuke Miyao Department of Computer Science The University of Tokyo."— Presentation transcript:

1 Survey on Semi-Supervised CRFs Yusuke Miyao Department of Computer Science The University of Tokyo

2 Contents 1.Conditional Random Fields (CRFs) 2.Semi-Supervised Log-Linear Model 3.Semi-Supervised CRFs 4.Dynamic Programming for Semi-Supervised CRFs

3 1. Conditional Random Fields (CRFs) Log-linear model: for sentence x = and label sequence y =, – λ k : parameter – f k : feature function – Z : partition function

4 Parameter Estimation (1/2) Estimate parameters λ k, given labeled training data D = { } Objective function: log-likelihood (+ regularizer)

5 Parameter Estimation (2/2) Gradient-based optimization is applied (CG, pseudo-Newton, etc.) model expectation

6 Dynamic Programming for CRFs Computation of model expectations requires summation over y → exponential Dynamic programming allows for efficient computation of model expectations His friend runs the company Noun Det Noun x y

7 Dynamic Programming for CRFs Assumption: (0-th order) His friend runs the company Noun Det Verb Adj t ytyt

8 Forward/Backward Probability His friend runs the company Noun Det Verb Adj Computed by Dynamic Programming

9 2. Semi-Supervised Log-Linear Model Grandvalet et al. (2004) Given labeled data D L = { } and unlabeled data D U = {z i } Objective function: log-likelihood + negative entropy regularizer

10 Negative Entropy Regularizer Maximizing → Minimizing class overwraps = Targets are separated

11 Gradient of Entropy (1/2)

12 Gradient of Entropy (2/2)

13 3. Semi-Supervised CRFs Jiao et al. (2006) Given labeled data D L ={ } and unlabeled data D U = {z i } Objective function: log-likelihood + negative entropy regularizer

14 Application to NER Gene and protein identification A (labeled): 5448 words, B (unlabeled): 5210 words, C: 10208 words, D: 25145 words Self-training did not get any improvements A & BA & CA & D γ PRFPRFPRF 00.800.360.500.770.290.430.740.300.43 0.10.820.400.540.790.320.460.740.310.44 0.50.820.400.540.790.330.460.740.310.44 10.820.400.540.770.340.470.730.330.45 50.840.450.590.780.380.510.720.360.48 100.780.460.580.660.380.480.660.380.47

15 Results

16 4. Dynamic Programming for Semi-Supervised CRFs Mann et al. (2007) We have to compute: where y -t = and y -t ・ y =

17 Example Enumerate all y while fixing t -th state to y t If we can compute efficiently, we can compute the gradient His friend runs the company Noun Det Verb Adj

18 Decomposition of Entropy In the following, we use

19 Subsequence Constrained Entropy Computed from forward-backward probability Subsequence constrained entropy His friend runs the company Noun Det Verb Adj

20 Forward/Backward Subsequence Constrained Entropy His friend runs the company Noun Det Verb Adj

21 Dynamic Computation of H α H α can be computed incrementally Computed from forward-backward probability =

22 References Y. Grandvalet and Y. Bengio. 2004. Semi-supervised learning by entropy minimization. In NIPS 2004. F. Jiao, S. Wang, C.-H. Lee, R. Greiner, and D. Schuurmans. 2006. Semi-supervised conditional random fields for improved sequence segmentation and labeling. In COLING/ACL 2006. G. S. Mann and A. McCallum. 2007. Efficient Computation of Entropy Gradient for Semi-Supervised Conditional Random Fields. In NAACL-HLT 2007. X. Zhu. 2005. Semi-supervised learning literature survey. Technical Report 1530, Computer Sciences, University of Wisconsin-Madison.


Download ppt "Survey on Semi-Supervised CRFs Yusuke Miyao Department of Computer Science The University of Tokyo."

Similar presentations


Ads by Google