Presentation is loading. Please wait.

Presentation is loading. Please wait.

Automatic Methods to Supplement Broad-Coverage Subcategorization Lexicons Michael Schiehlen, Kristina Spranger Institut für Maschinelle Sprachverarbeitung.

Similar presentations


Presentation on theme: "Automatic Methods to Supplement Broad-Coverage Subcategorization Lexicons Michael Schiehlen, Kristina Spranger Institut für Maschinelle Sprachverarbeitung."— Presentation transcript:

1 Automatic Methods to Supplement Broad-Coverage Subcategorization Lexicons Michael Schiehlen, Kristina Spranger Institut für Maschinelle Sprachverarbeitung Universität Stuttgart {mike,sprangka}@ims.uni-stuttgart.de

2 IMS Stuttgart LREC'04 26th May 2004 © Michael Schiehlen, Kristina Spranger 2 three approaches to acquisition of subcategorization frames method for evaluation, annotation guidelines system overview evaluation results rules for inferring frames from stem verbs disambiguation strategies for frame selection Overview of the Talk

3 IMS Stuttgart LREC'04 26th May 2004 © Michael Schiehlen, Kristina Spranger 3 Motivation for broad coverage, both computational linguists and lexicographers need precise and detailed subcat info of infrequent words we define infrequent words as words missing in a broad-coverage and detailed lexicon task: get results as precise and detailed as possible for infrequent words, i.e. supplement the lexicon

4 IMS Stuttgart LREC'04 26th May 2004 © Michael Schiehlen, Kristina Spranger 4 Approaches to Subcat Acquisition precision-focussed approach (Eckle-Kohler, 1999), produced a lexicon of 16,630 German verbs (EKL) recall-oriented approach (Manning, 1993), (Briscoe and Carroll, 1997), (Schulte im Walde, 2002) supplementation approach (our approach) supplements EKL

5 IMS Stuttgart LREC'04 26th May 2004 © Michael Schiehlen, Kristina Spranger 5 Acquisition of Subcat Frames

6 IMS Stuttgart LREC'04 26th May 2004 © Michael Schiehlen, Kristina Spranger 6 System Overview 36.2 million tokens of newspaper text cascaded finite-state parser patternset evaluator patternset extractor EKL subcategorization patterns ambiguous subcategorization patterns for 3278 verbs (1845 hapax legomena)

7 IMS Stuttgart LREC'04 26th May 2004 © Michael Schiehlen, Kristina Spranger 7 Patternset Extractor: A Corpus Example Er wedelte Schuhe mit dem Rasierpinsel ab. He dusted off shoes with the shaving brush. ab#wedeln |nom,gen|nom,akk|nom,akk,PP/mit:D| |nom|nom|nom| er (he) |gen|akk|akk| Schuh (shoes) |adj|adj|PP/mit:Dat| Rasierpinsel (shaving brush)

8 IMS Stuttgart LREC'04 26th May 2004 © Michael Schiehlen, Kristina Spranger 8 New Proposal for Evaluation task: find correct subcat frame for each token – all proposed subcat frames can be traced back to specific corpus examples we did not use large published dictionaries as test data: – subcat info not explicit – gaps (~12.7% of our verbs are new) manual annotation (1333 examples)

9 IMS Stuttgart LREC'04 26th May 2004 © Michael Schiehlen, Kristina Spranger 9 Annotation Guidelines semantically motivated: – frames with up to 4 arguments – in case of doubt we opted for complement status rather than adjunct status – same frame for alternations inter-annotator agreement: κ-value 80.9%

10 IMS Stuttgart LREC'04 26th May 2004 © Michael Schiehlen, Kristina Spranger 10 Disambiguation Strategies for Patternsets longest match: prefer longer over shorter frames, prefer reflexives and correlatives global frame frequency: in whole corpus assumption: same distribution for all verbs local frame frequency: in extracted patternsets assumption: special distribution for rare verbs

11 IMS Stuttgart LREC'04 26th May 2004 © Michael Schiehlen, Kristina Spranger 11 Inferring Frames for Prefix Verbs from Stem Verbs subcat behaviour of prefix verbs (v p ) and their stem verbs is correlated (cf. Aldinger, PW-5, this afternoon) extracted mapping rules for v p from EKL max P( f p | f s, prefix(v p ) ) – v : set of frames for v from parser – [v] : set of frames for v from EKL f p v p f s [stem(v p )]

12 IMS Stuttgart LREC'04 26th May 2004 © Michael Schiehlen, Kristina Spranger 12 Conditions on Prefix Rules three language-independent constraints on how prefix verbs inherit subcat frames A(f p,f s ) all arguments of f s also in f p B(f p,f s ) | { v : prefix(v)=prefix(v p ) & f p [v] & f s [stem(v)] } | 2 C(f p ) | { f s ' [stem(v p )] : A(f p,f s ') & B(f p,f s ') } |=1

13 IMS Stuttgart LREC'04 26th May 2004 © Michael Schiehlen, Kristina Spranger 13 Evaluation Results

14 IMS Stuttgart LREC'04 26th May 2004 © Michael Schiehlen, Kristina Spranger 14 Evaluation Results for Hapax Legomena

15 IMS Stuttgart LREC'04 26th May 2004 © Michael Schiehlen, Kristina Spranger 15 Impact of Conditions on Prefix Rules

16 IMS Stuttgart LREC'04 26th May 2004 © Michael Schiehlen, Kristina Spranger 16 Conclusions the easy things are done, now let's tackle the difficult problems in subcat acquisition automatic methods yield reasonable results even in this scenario: – we used a parser (+11.45% F-Score) – and subcat mapping rules for prefix verbs (+3.61% F-Score)


Download ppt "Automatic Methods to Supplement Broad-Coverage Subcategorization Lexicons Michael Schiehlen, Kristina Spranger Institut für Maschinelle Sprachverarbeitung."

Similar presentations


Ads by Google