A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart

Slides:



Advertisements
Similar presentations
Experiments in German Noun Chunking Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart COLING.
Advertisements

Automatic Methods to Supplement Broad-Coverage Subcategorization Lexicons Michael Schiehlen, Kristina Spranger Institut für Maschinelle Sprachverarbeitung.
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Sentence Classification and Clause Detection for Croatian Kristina Vučković, Željko Agić, Marko Tadić Department of Information Sciences, Department of.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Feature Structures and Unification.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self.
Chapter 4 Syntax.
Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.
Chunk/Shallow Parsing Miriam Butt October PP-Attachment Recall the PP-Attachment Problem (demonstrated with XLE ): The ambiguity increases exponentially.
Dr. Abdullah S. Al-Dobaian1 Ch. 2: Phrase Structure Syntactic Structure (basic concepts) Syntactic Structure (basic concepts)  A tree diagram marks constituents.
Universität des Saarlandes Seminar: Recent Advances in Parsing Technology Winter Semester Jesús Calvillo.
Introduction and Jurafsky Model Resource: A Probabilistic Model of Lexical and Syntactic Access and Disambiguation, Jurafsky 1996.
Recognizing Implicit Discourse Relations in the Penn Discourse Treebank Ziheng Lin, Min-Yen Kan, and Hwee Tou Ng Department of Computer Science National.
Shallow Parsing CS 4705 Julia Hirschberg 1. Shallow or Partial Parsing Sometimes we don’t need a complete parse tree –Information extraction –Question.
NLP and Speech Course Review. Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser thethe – determiner Det NP → Det.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
Stemming, tagging and chunking Text analysis short of parsing.
Amirkabir University of Technology Computer Engineering Faculty AILAB Efficient Parsing Ahmad Abdollahzadeh Barfouroush Aban 1381 Natural Language Processing.
1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 20, 2004.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Shallow Parsing.
Features and Unification
1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004.
Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
A Memory-Based Approach to Semantic Role Labeling Beata Kouchnir Tübingen University 05/07/04.
Announcements Main CSE file server went down last night –Hand in your homework using ‘submit_cse467’ as soon as you can – no penalty if handed in today.
Syntax Nuha AlWadaani.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
Albert Gatt Corpora and Statistical Methods Lecture 9.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
IV. SYNTAX. 1.1 What is syntax? Syntax is the study of how sentences are structured, or in other words, it tries to state what words can be combined with.
Ling 570 Day 17: Named Entity Recognition Chunking.
Natural Language Processing Lecture 6 : Revision.
10/12/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini.
10. Parsing with Context-free Grammars -Speech and Language Processing- 발표자 : 정영임 발표일 :
Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop Nizar Habash and Owen Rambow Center for Computational Learning.
12/06/1999 JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Statistical Parsing Dr. Jan Hajič CS Dept., Johns Hopkins Univ.
Recognizing Names in Biomedical Texts: a Machine Learning Approach GuoDong Zhou 1,*, Jie Zhang 1,2, Jian Su 1, Dan Shen 1,2 and ChewLim Tan 2 1 Institute.
PARSING David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
Rule Learning - Overview Goal: Syntactic Transfer Rules 1) Flat Seed Generation: produce rules from word- aligned sentence pairs, abstracted only to POS.
10/30/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 7 Giuseppe Carenini.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
CPE 480 Natural Language Processing Lecture 4: Syntax Adapted from Owen Rambow’s slides for CSc Fall 2006.
Tokenization & POS-Tagging
CSA2050 Introduction to Computational Linguistics Parsing I.
CPSC 503 Computational Linguistics
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Supertagging CMSC Natural Language Processing January 31, 2006.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
POS Tagger and Chunker for Tamil
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.
Building Sub-Corpora Suitable for Extraction of Lexico-Syntactic Information Ondřej Bojar, Institute of Formal and Applied Linguistics, ÚFAL.
◦ Process of describing the structure of phrases and sentences Chapter 8 - Phrases and sentences: grammar1.
Chunk Parsing. Also called chunking, light parsing, or partial parsing. Method: Assign some additional structure to input over tagging Used when full.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.
Roadmap Probabilistic CFGs –Handling ambiguity – more likely analyses –Adding probabilities Grammar Parsing: probabilistic CYK Learning probabilities:
Natural Language Processing Vasile Rus
Statistical NLP: Lecture 3
CS 388: Natural Language Processing: Syntactic Parsing
Chunk Parsing CS1573: AI Application Development, Spring 2003
David Kauchak CS159 – Spring 2019
A Link Grammar for an Agglutinative Language
Presentation transcript:

A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart EACL 2003, Budapest April 17 th, 2003

IMS Stuttgart EACL 2003 April 17 th, 2003 © Michael Schiehlen 2 Dependency-Based Evaluation " every word either depends on another word (the head) or is independent " parsing seen as classification task (Lin:95) " measured in (labelled) precision and recall: assign to every word – a pair – or a marker TOP (for independent words) " unlabelled precision and recall: neglect grammatical role: – only assign and TOP

IMS Stuttgart EACL 2003 April 17 th, 2003 © Michael Schiehlen 3 Dependency Structure (Details) " PPs: – headed by internal arguments (NP), not by Prep " coordination: – multi-headed constituent: every conjunct is a head – conjunction only linked to final conjunct " verb complex (auxiliary verbs + full verb): – abstraction over verb complexes – all attachments into verb complex are correct (Lin:95)

IMS Stuttgart EACL 2003 April 17 th, 2003 © Michael Schiehlen 4 Test Environment " tokenized version of NEGRA tree bank " ca. 340,000 tokens in 19,547 sentences " investigated effect of POS tagging quality I : ideal tags from tree bank L: lexicon tags from tagger trained on tree bank T: tagger tags as determiner by tagger trained on independent corpus

IMS Stuttgart EACL 2003 April 17 th, 2003 © Michael Schiehlen 5 Baseline: Tagging Approach " determine dependency tuples directly " used Tree Tagger (Schmid:94) on tag trigrams " three approaches to encode head – exact position of head: pos head – distance of head from dependent: pos head -pos dep – nth-tag method (Lin:95): e.g. <<<N (third noun left) " category of head, " direction in which to find head from token, " number of words with same category between token and head

IMS Stuttgart EACL 2003 April 17 th, 2003 © Michael Schiehlen 6 Tagging Approach (contd.) " hybrid method: • choose between nth-tag and distance result on the basis of POS tag • build decision list greedily so as to optimize F-value in training set (using 10-fold cross-validation) " all results achieved by 10-fold cross-validation " if no head is found, token counts as not assigned (=> precision usually higher than recall)

IMS Stuttgart EACL 2003 April 17 th, 2003 © Michael Schiehlen 7 Results for Tagging Approach

IMS Stuttgart EACL 2003 April 17 th, 2003 © Michael Schiehlen 8 Overview of Finite-State Parser

IMS Stuttgart EACL 2003 April 17 th, 2003 © Michael Schiehlen 9 Recognition Phase " consists of cascaded deterministic transducers (like Abney:97) " noun chunker also recognizes nested noun phrases (`full noun chunks') " inflectional information checked on-line " clause chunker recognizes complete clauses, not simplex clauses (Abney:97)

IMS Stuttgart EACL 2003 April 17 th, 2003 © Michael Schiehlen 10 Example Output of Noun Chunker

IMS Stuttgart EACL 2003 April 17 th, 2003 © Michael Schiehlen 11 Example Output of Clause Chunker

IMS Stuttgart EACL 2003 April 17 th, 2003 © Michael Schiehlen 12 Rule Interpretation " inserts – syntactic structure (AdjP, coordinated VP or Prep) – grammatical roles (13 different roles) " recognition grammar generated from interpretation grammar by removing semicolon symbols, e.g. det ;SPR ( ;[ADJP ( adv ;ADJ )* adja ;HD ;]ADJP )* nn ;HD FINAL:NP " nondeterministic transducer (like Abney:97)

IMS Stuttgart EACL 2003 April 17 th, 2003 © Michael Schiehlen 13 Example Output of Rule Interpreter

IMS Stuttgart EACL 2003 April 17 th, 2003 © Michael Schiehlen 14 Subcat Frame Recognition " deterministic transducer to find lexically given subcategorization frames " fine-grained distinction of complements (61 additional roles), partially disambiguates between adjuncts and complements " if no corresponding frame is found, unspecified role (CMP, ACMP) remains – only correct in half-labelled precision and recall " several frames can be encoded at once

IMS Stuttgart EACL 2003 April 17 th, 2003 © Michael Schiehlen 15 Example Output of Frame Recognizer

IMS Stuttgart EACL 2003 April 17 th, 2003 © Michael Schiehlen 16 Conversion into Dependency Tuples " explicit representation of ambiguities (subcat roles and attachment) with context variables " measuring performance of parsers with underspecified output (Riezler et al.:02) lower bound: random disambiguation upper bound: ideal disambiguation " also heuristic disambiguation: choose – highest attachment and – most frequent subcat frame

IMS Stuttgart EACL 2003 April 17 th, 2003 © Michael Schiehlen 17 Example Output: Dependency Tuples Udo/0kennt/1[1a]:NPnom,[1b]:NPakk kennt/1TOP eine/2Frau/5SPR sehr/3nette/4ADJ nette/4Frau/5ADJ Frau/5kennt/1[1a]:NPakk,[1b]:NPnom aus/6Rio/7MRK Rio/7kennt/1ADJ [1A0] Frau/5ADJ [1A1]./8TOP

IMS Stuttgart EACL 2003 April 17 th, 2003 © Michael Schiehlen 18 Results for Finite-State Parser

IMS Stuttgart EACL 2003 April 17 th, 2003 © Michael Schiehlen 19 Conclusion " two approaches to partial parsing: tagger, finite- state parser " hybrid model of nth-tag tagging and finite-state achieves % on I-tags (gain of 4.8% in lower and 1% in upper bound) " some constructions not yet handled in parser – attachment of extraposed relative clauses and noun- complement clauses – distribution of constituents in the middle field under VP coordination