Sample Selection for Statistical Parsing

Sample Selection for Statistical Parsing
Rebecca Hwa University of Pittsburgh Presentation by: Andreas Lundberg

Overview Background on statistical parsing Selection Algorithms
Prepositional-Phrase Attachment Evaluation Functions Experiments and Results Parsing Conclusions

Statistical Parsing Parsing:
Large corpus of text is compiled. For example, the Wall Street Journal (WSJ) Treebank. Humans annotate the text as training examples. The parser is trained on the examples. Problem: Annotating text is a process that requires many man-hours of human effort. We want to minimize this bottleneck. Approach: Sample selection as a variant of active learning. Select for annotation the example with the highest Training Utility Value (TUV)

Selection Algorithms Committee-based
Multiple learners with different hypotheses Examples that lead to most disagreement have highest Training Utility Value (TUV) Single Learner One learner with one hypothesis TUV estimated by a combination of predictive criteria

Sample Selection Learning Algorithm
U is a set of unlabeled candidates. L is a set of labeled training examples. C is the current hypothesis. Initialize: C ← Train(L). Repeat N ← Select(n,U,C, f ). //N gets the n highest-ranked candidates of U. U ← U − N. L ← L ∪ Label(N). //Slow, since Label() is performed by humans. Until (C is good enough) or (U = nil ) or (cutoff).

Prepositional-Phrase Attachment
Learning PP-attachment is traditionally used to gain insight into the parsing problem. Is the prepositional phrase modifying (attached to) the verb or the preceding noun? I washed the shirt [with soap]. I bought the shirt [with pockets]. Collins-Brooks Model. (verb, object noun, preposition, prep. noun, attachment) “I wrote a book in three days”, attach-verb => (wrote, book, in, days, verb) Characteristic tuples: subsets including preposition.

Prepositional-Phrase Attachment (continued)
Count(t) = occurrence frequency of characteristic tuple t in the training set. CountNP(t) = occurrence frequency of t in training set where preposition attaches to nouns.

Collins-Brooks PP-attachment
subroutine Train(L) foreach ex ∈ L do extract (v, n, p, n2, a) from ex foreach tuple ∈ {(v, n, p, n2), (v, p, n2), (n, p, n2), (v, n, p), (v, p), (n, p), (p, n2), (p)} do Count(tuple) ← Count(tuple) + 1 if a = noun then CountNP(tuple) ← CountNP(tuple) + 1

Collins-Brooks (continued)
subroutine Test(U) foreach u ∈ U do extract (v, n, p, n2) from u if Count(v, n, p, n2) > 0 then prob ← CountNP(v,n,p,n2) Count(v,n,p,n2) elsif Count(v, p, n2) + Count(n, p, n2) + Count(v, n, p) > 0 then prob ← CountNP(v,p,n2)+CountNP(n,p,n2)+CountNP(v,n,p) Count(v,p,n2)+Count(n,p,n2)+Count(v,n,p) elsif Count(v, p) + Count(n, p) + Count(p, n2) > 0 then prob ← CountNP(v,p)+CountNP(n,p)+CountNP(p,n2) Count(v,p)+Count(n,p)+Count(p,n2) elsif Count(p) > 0 then prob ← CountNP(p) Count(p) else prob ← 1 if prob ≥ .5 then output noun else output verb

Evaluation Functions using Prior Knowledge of the Problem Space
fnovel(u,C) = SUM t∈Tuples(u) { 1 : Count(t) = 0 0 : otherwise Data that has never been encountered before is probably useful. fbackoff(u,C) = Number of tuples that would be useless if u were included in the training set.

Evaluation Functions using the Performance of the Hypothesis
ferr(u,C) = hypothesis’ estimate of its likelihood to misclassify u. func(u,C) = uncertainty across all classes. In the case of binary classification (verb or noun), func is the same as ferr.

Evaluation Functions using the Parameters of the Hypothesis
fconf(u,C) = ¼ SUMl∈1...4| conf_int( pl(u,C), nl(u,C) ) | pl(u,C) is the probability that model C will attach u to noun at back-off level l, nl(u,C) is the number of training examples on which this classification is based. conf_int(p,n) is the confidence interval where p is the expected value of p based on n trials for some defined level of confidence.

Hybrid Evaluation Function
farea(u,C) = ¼ SUMl∈1...4 area(p,n) area(p,n) computes the area under a Gaussian function with a mean of 0.5 and a standard deviation of 0.1 that is bounded by conf_int. Can be viewed as a product of fconf and func.

Results – 83% performance

Results - Continued Hybrid approach performs best.
Farea required 47% fewer examples to achieve the highest performance level of 83.8%. Knowledge about the problem space helps in selecting early examples, but not later. Effective evaluation functions use the model’s current hypothesis. Does this apply to parsing in general, or only prepositional-phrase attachment?

Evaluation Functions using Prior Knowledge of the Problem Space
Flex(w,G) = SUMwi,wj∈wnew(wi,wj)*coocc(wi,wj) length(w) Flen(w,G) = length(w) w is an unlabeled sentence candidate, G is the current parsing model.

Evaluation Functions using the Performance of the Hypothesis
ferr(w,G) = 1 – P(vmax | w, G) vmax = argmaxv∈VP(v | G) func(w,G) = TE(w, G) / lg(||V||) TE(w, G) = tree entropy, the expected number of bits needed to encode the distribution of possible parses for sentence w. ||V|| is the number of parses. (No, I don’t understand func completely. :p)

Evaluation Functions using the Parameters of the Hypothesis
Binary classification tasks such as PP-attachment are possible because the confidence interval is well defined. Parsing is made up of many multinomial classification decisions that must be combined into an overall confidence. The author leaves this open. The author doesn't propose any sort of hybrid function, either.

Results – 80% performance Expectation-Maximization

Results – 88% performance History-based learner

Conclusions Sample selection is useful in problems where raw data is cheap but annotating them is costly. This is the case for many supervised learning tasks in natural language processing. Sample selection has been applied to: Text categorization Base noun phrase chunking Part-of-speech tagging Spelling confusion set disambiguation Word sense disambiguation And more, including semantic and syntactic parsing!

References Ratnaparkhi, Adwait “Statistical Models for Unsupervised Prepositional Phrase Attachment”

Sample Selection for Statistical Parsing

Similar presentations

Presentation on theme: "Sample Selection for Statistical Parsing"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Sample Selection for Statistical Parsing

Similar presentations

Presentation on theme: "Sample Selection for Statistical Parsing"— Presentation transcript:

Similar presentations

About project

Feedback