Automatic Methods to Supplement Broad-Coverage Subcategorization Lexicons Michael Schiehlen, Kristina Spranger Institut für Maschinelle Sprachverarbeitung.

Slides:



Advertisements
Similar presentations
Finding The Unknown Number In A Number Sentence! NCSCOS 3 rd grade 5.04 By: Stephanie Irizarry Click arrow to go to next question.
Advertisements

1 Knowledge Representation Introduction KR and Logic.
Experiments in German Noun Chunking Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart COLING.
Slide 1 Insert your own content. Slide 2 Insert your own content.
Cultural Heritage in REGional NETworks REGNET Project Meeting Content Group
Cultural Heritage in REGional NETworks REGNET Project Meeting Content Group Part 1: Usability Testing.
Design and Implementation Issues for Explorative Location-based Applications: The NexusRallye Daniela Nicklas, Nicola Hönle, Michael Moltenbrey, Bernhard.
Multilinguality & Semantic Search Eelco Mossel (University of Hamburg) Review Meeting, January 2008, Zürich.
EA Demonstration Study : Dissemination Forum – 8 June EAEA Framework Proposal Paolo Monaco EA Unit.
0 - 0.
2 pt 3 pt 4 pt 5 pt 1 pt 2 pt 3 pt 4 pt 5 pt 1 pt 2 pt 3 pt 4 pt 5 pt 1 pt 2 pt 3 pt 4 pt 5 pt 1 pt 2 pt 3 pt 4 pt 5 pt 1 pt Time Money AdditionSubtraction.
ALGEBRAIC EXPRESSIONS
MULTIPLYING MONOMIALS TIMES POLYNOMIALS (DISTRIBUTIVE PROPERTY)
MULTIPLICATION EQUATIONS 1. SOLVE FOR X 3. WHAT EVER YOU DO TO ONE SIDE YOU HAVE TO DO TO THE OTHER 2. DIVIDE BY THE NUMBER IN FRONT OF THE VARIABLE.
SUBTRACTING INTEGERS 1. CHANGE THE SUBTRACTION SIGN TO ADDITION
FACTORING Think Distributive property backwards Work down, Show all steps ax + ay = a(x + y)
Addition Facts
CS4026 Formal Models of Computation Running Haskell Programs – power.
Window type passage retrieval Supported by German Morphological Analyzer University of Stuttgart Kieko SAITOEsther Koenig-Baumer Institute of Natural Language.
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Christian Fortmann & Martin Forst InSTIL/ICALL2004 Symposium, Venice 1 A German LFG for CALL Christian Fortmann, Martin Forst Institut für Maschinelle.
Richmond House, Liverpool (1) 26 th January 2004.
LABELING TURKISH NEWS STORIES WITH CRF Prof. Dr. Eşref Adalı ISTANBUL TECHNICAL UNIVERSITY COMPUTER ENGINEERING 1.
Arnd Christian König Venkatesh Ganti Rares Vernica Microsoft Research Entity Categorization Over Large Document Collections.
5.9 + = 10 a)3.6 b)4.1 c)5.3 Question 1: Good Answer!! Well Done!! = 10 Question 1:
Linking Verb? Action Verb or. Question 1 Define the term: action verb.
Lets play bingo!!. Calculate: MEAN Calculate: MEDIAN
Past Tense Probe. Past Tense Probe Past Tense Probe – Practice 1.
This, that, these, those Number your paper from 1-10.
Properties of Exponents
Addition 1’s to 20.
Test B, 100 Subtraction Facts
CH 8 Right Triangles. Geometric Mean of 2 #’s If you are given two numbers a and b you can find the geometric mean. a # = # b 3 x = x 27 Ex ) 3 and 27.
CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.
Week 1.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Feature Structures and Unification.
Chunk/Shallow Parsing Miriam Butt October PP-Attachment Recall the PP-Attachment Problem (demonstrated with XLE ): The ambiguity increases exponentially.
January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.
1/13 Parsing III Probabilistic Parsing and Conclusions.
Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.
1/17 Acquiring Selectional Preferences from Untagged Text for Prepositional Phrase Attachment Disambiguation Hiram Calvo and Alexander Gelbukh Presented.
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
1 Statistical NLP: Lecture 10 Lexical Acquisition.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
Spring /22/071 Beyond PCFGs Chris Brew Ohio State University.
The Impact of Grammar Enhancement on Semantic Resources Induction Luca Dini Giampaolo Mazzini
Czech-English Word Alignment Ondřej Bojar Magdalena Prokopová
Improving Subcategorization Acquisition using Word Sense Disambiguation Anna Korhonen and Judith Preiss University of Cambridge, Computer Laboratory 15.
SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
Comparing syntactic semantic patterns and passages in Interactive Cross Language Information Access (iCLEF at the University of Alicante) Borja Navarro,
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
Comparing Corpus Co-Occurrence, Dictionary and Wikipedia Entries as Resources for Semantic Relatedness Information Michael RothSabine Schulte im Walde.
The interface between model-theoretic and corpus-based semantics
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Supertagging CMSC Natural Language Processing January 31, 2006.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
1 STO A Lexical Database of Danish for Language Technology Applications Anna Braasch Center for Sprogteknologi Copenhagen SPINN Seminar, October 27, 2001.
Building Sub-Corpora Suitable for Extraction of Lexico-Syntactic Information Ondřej Bojar, Institute of Formal and Applied Linguistics, ÚFAL.
FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.
Acceso a la información mediante exploración de sintagmas Anselmo Peñas, Julio Gonzalo y Felisa Verdejo Dpto. Lenguajes y Sistemas Informáticos UNED III.
AUTONOMOUS REQUIREMENTS SPECIFICATION PROCESSING USING NATURAL LANGUAGE PROCESSING - Vivek Punjabi.
Utilizing vector models for automatic text lemmatization Ladislav Gallay Supervisor: Ing. Marián Šimko, PhD. Slovak University of Technology Faculty of.
Chapter 8 Lexical Acquisition February 19, 2007 Additional Notes to Manning’s slides.
Natural Language Processing Vasile Rus
Leonardo Zilio Supervisors: Prof. Dr. Maria José Bocorny Finatto
Automatically Labeled Data Generation for Large Scale Event Extraction
Learning to Sportscast: A Test of Grounded Language Acquisition
Parsing Unrestricted Text
Statistical NLP: Lecture 10
Presentation transcript:

Automatic Methods to Supplement Broad-Coverage Subcategorization Lexicons Michael Schiehlen, Kristina Spranger Institut für Maschinelle Sprachverarbeitung Universität Stuttgart

IMS Stuttgart LREC'04 26th May 2004 © Michael Schiehlen, Kristina Spranger 2 three approaches to acquisition of subcategorization frames method for evaluation, annotation guidelines system overview evaluation results rules for inferring frames from stem verbs disambiguation strategies for frame selection Overview of the Talk

IMS Stuttgart LREC'04 26th May 2004 © Michael Schiehlen, Kristina Spranger 3 Motivation for broad coverage, both computational linguists and lexicographers need precise and detailed subcat info of infrequent words we define infrequent words as words missing in a broad-coverage and detailed lexicon task: get results as precise and detailed as possible for infrequent words, i.e. supplement the lexicon

IMS Stuttgart LREC'04 26th May 2004 © Michael Schiehlen, Kristina Spranger 4 Approaches to Subcat Acquisition precision-focussed approach (Eckle-Kohler, 1999), produced a lexicon of 16,630 German verbs (EKL) recall-oriented approach (Manning, 1993), (Briscoe and Carroll, 1997), (Schulte im Walde, 2002) supplementation approach (our approach) supplements EKL

IMS Stuttgart LREC'04 26th May 2004 © Michael Schiehlen, Kristina Spranger 5 Acquisition of Subcat Frames

IMS Stuttgart LREC'04 26th May 2004 © Michael Schiehlen, Kristina Spranger 6 System Overview 36.2 million tokens of newspaper text cascaded finite-state parser patternset evaluator patternset extractor EKL subcategorization patterns ambiguous subcategorization patterns for 3278 verbs (1845 hapax legomena)

IMS Stuttgart LREC'04 26th May 2004 © Michael Schiehlen, Kristina Spranger 7 Patternset Extractor: A Corpus Example Er wedelte Schuhe mit dem Rasierpinsel ab. He dusted off shoes with the shaving brush. ab#wedeln |nom,gen|nom,akk|nom,akk,PP/mit:D| |nom|nom|nom| er (he) |gen|akk|akk| Schuh (shoes) |adj|adj|PP/mit:Dat| Rasierpinsel (shaving brush)

IMS Stuttgart LREC'04 26th May 2004 © Michael Schiehlen, Kristina Spranger 8 New Proposal for Evaluation task: find correct subcat frame for each token – all proposed subcat frames can be traced back to specific corpus examples we did not use large published dictionaries as test data: – subcat info not explicit – gaps (~12.7% of our verbs are new) manual annotation (1333 examples)

IMS Stuttgart LREC'04 26th May 2004 © Michael Schiehlen, Kristina Spranger 9 Annotation Guidelines semantically motivated: – frames with up to 4 arguments – in case of doubt we opted for complement status rather than adjunct status – same frame for alternations inter-annotator agreement: κ-value 80.9%

IMS Stuttgart LREC'04 26th May 2004 © Michael Schiehlen, Kristina Spranger 10 Disambiguation Strategies for Patternsets longest match: prefer longer over shorter frames, prefer reflexives and correlatives global frame frequency: in whole corpus assumption: same distribution for all verbs local frame frequency: in extracted patternsets assumption: special distribution for rare verbs

IMS Stuttgart LREC'04 26th May 2004 © Michael Schiehlen, Kristina Spranger 11 Inferring Frames for Prefix Verbs from Stem Verbs subcat behaviour of prefix verbs (v p ) and their stem verbs is correlated (cf. Aldinger, PW-5, this afternoon) extracted mapping rules for v p from EKL max P( f p | f s, prefix(v p ) ) – v : set of frames for v from parser – [v] : set of frames for v from EKL f p v p f s [stem(v p )]

IMS Stuttgart LREC'04 26th May 2004 © Michael Schiehlen, Kristina Spranger 12 Conditions on Prefix Rules three language-independent constraints on how prefix verbs inherit subcat frames A(f p,f s ) all arguments of f s also in f p B(f p,f s ) | { v : prefix(v)=prefix(v p ) & f p [v] & f s [stem(v)] } | 2 C(f p ) | { f s ' [stem(v p )] : A(f p,f s ') & B(f p,f s ') } |=1

IMS Stuttgart LREC'04 26th May 2004 © Michael Schiehlen, Kristina Spranger 13 Evaluation Results

IMS Stuttgart LREC'04 26th May 2004 © Michael Schiehlen, Kristina Spranger 14 Evaluation Results for Hapax Legomena

IMS Stuttgart LREC'04 26th May 2004 © Michael Schiehlen, Kristina Spranger 15 Impact of Conditions on Prefix Rules

IMS Stuttgart LREC'04 26th May 2004 © Michael Schiehlen, Kristina Spranger 16 Conclusions the easy things are done, now let's tackle the difficult problems in subcat acquisition automatic methods yield reasonable results even in this scenario: – we used a parser (+11.45% F-Score) – and subcat mapping rules for prefix verbs (+3.61% F-Score)