Construct State Modification in the Arabic Treebank

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

Feature Structures and Parsing Unification Grammars Algorithms for NLP 18 November 2014.
Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.
Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.
Chunk Parsing CS1573: AI Application Development, Spring 2003 (modified from Steven Bird’s notes)
Shallow Parsing CS 4705 Julia Hirschberg 1. Shallow or Partial Parsing Sometimes we don’t need a complete parse tree –Information extraction –Question.
Drexel – 4/22/13 1/39 Treebank Analysis Using Derivation Trees Seth Kulick
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
6/29/051 New Frontiers in Corpus Annotation Workshop, 6/29/05 Ann Bies – Linguistic Data Consortium* Seth Kulick – Institute for Research in Cognitive.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Shallow Parsing.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
1 CS546: Machine Learning and Natural Language Preparation to the Term Project: - Dependency Parsing - Dependency Representation for Semantic Role Labeling.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
GALE Banks 11/9/06 1 Parsing Arabic: Key Aspects of Treebank Annotation Seth Kulick Ryan Gabbard Mitch Marcus.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
Illinois-Coref: The UI System in the CoNLL-2012 Shared Task Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Mark Sammons, and Dan Roth Supported by ARL,
10/12/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini.
10. Parsing with Context-free Grammars -Speech and Language Processing- 발표자 : 정영임 발표일 :
Training dependency parsers by jointly optimizing multiple objectives Keith HallRyan McDonaldJason Katz- BrownMichael Ringgaard.
Modifier (grammar) Definition: A word, phrase, or clause that functions as an adjective oradverb to provide additional information about another word or.
Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop Nizar Habash and Owen Rambow Center for Computational Learning.
Address Levels Business Use Alignment. Introduction Objective is to provide layers of address granularity tailored to business use Address use levels.
1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs.
Semi-supervised Training of Statistical Parsers CMSC Natural Language Processing January 26, 2006.
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Noun-Phrase Analysis in Unrestricted Text for Information Retrieval David A. Evans, Chengxiang Zhai Laboratory for Computational Linguistics, CMU 34 th.
Ideas for 100K Word Data Set for Human and Machine Learning Lori Levin Alon Lavie Jaime Carbonell Language Technologies Institute Carnegie Mellon University.
Parallel XML Parsing Using Meta-DFAs Yinfei Pan 1, Ying Zhang 1, Kenneth Chiu 1, Wei Lu 2 1 State University of New York (SUNY) Binghamton 2 Indiana University.
CSA2050 Introduction to Computational Linguistics Parsing I.
Supertagging CMSC Natural Language Processing January 31, 2006.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
NLP. Introduction to NLP Time flies like an arrow –Many parses –Some (clearly) more likely than others –Need for a probabilistic ranking method.
Towards Semi-Automated Annotation for Prepositional Phrase Attachment Sara Rosenthal William J. Lipovsky Kathleen McKeown Kapil Thadani Jacob Andreas Columbia.
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
CIS Treebanks, Trees, Querying, QC, etc. Seth Kulick Linguistic Data Consortium University of Pennsylvania
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 17 th.
LING 581: Advanced Computational Linguistics Lecture Notes March 2nd.
Chapter 12: Probabilistic Parsing and Treebanks Heshaam Faili University of Tehran.
General Information on Context-free and Probabilistic Context-free Grammars İbrahim Hoça CENG784, Fall 2013.
Natural Language Processing Vasile Rus
Language Identification and Part-of-Speech Tagging
Treebanks, Trees, Querying, QC, etc.
PRESENTED BY: PEAR A BHUIYAN
Statistical NLP: Lecture 3
David Mareček and Zdeněk Žabokrtský
Authorship Attribution Using Probabilistic Context-Free Grammars
张昊.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27
LING/C SC/PSYC 438/538 Lecture 21 Sandiway Fong.
Yoav Goldberg and Michael Elhadad
Probabilistic and Lexicalized Parsing
LING/C SC 581: Advanced Computational Linguistics
LING 581: Advanced Computational Linguistics
LING 581: Advanced Computational Linguistics
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27
Toward Better Understanding
Constraining Chart Parsing with Partial Tree Bracketing
System Combination LING 572 Fei Xia 01/31/06.
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
LING/C SC/PSYC 438/538 Lecture 24 Sandiway Fong.
Chunk Parsing CS1573: AI Application Development, Spring 2003
CSCI 5832 Natural Language Processing
CS224N Section 3: Corpora, etc.
Hierarchical, Perceptron-like Learning for OBIE
Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.
Extracting Why Text Segment from Web Based on Grammar-gram
Presentation transcript:

Construct State Modification in the Arabic Treebank Ryan Gabbard and Seth Kulick University of Pennsylvania ACL 6/18/08

Construct State (iDAfa إضافة) in Arabic Outline Construct State (iDAfa إضافة) in Arabic What it is The problem of attachment within an iDAfa A Machine Learning Approach Definition, Features, Results Conclusion and Future Work ACL 6/18/08

Construct State (iDAfa) 2+ words grouped tightly together Like English compound or possessive NOUN with NP complement (recursive) (NP $awAriE streets (NP madiyn+ap city (NP luwnog byt$ Long Beach))) شوارع هدينة لونغ بيتش ACL 6/18/08

Construct State (iDAfa) (NP $awAriE streets (NP (NP madiyn+ap city (NP luwnog byt$ Long Beach)) (PP fiy in (NP wilAy+ap state (NP kAliyfuwrniyA))))) شوارع هدينة لونغ بيتش في ولاية كاليفورنيا (Multiple) Modification at any level Modifiers stacked up at end No clear pattern of attachment level ACL 6/18/08

Restriction on PP attachment in PTB Multiple PP modifiers at same level Allowed Not Allowed (NP (NP …) (NP (NP (NP …) (PP …) (PP …)) (PP …) (PP …)) Parser can learn that PPs attach to “base” (non-recursive) NPs (Collins, 99) Not true for ATB, because of the iDAfa. ACL 6/18/08

Modification of non-base NPs (NP $awAriE streets (NP (NP madiyn+ap city (NP luwnog byt$ Long Beach)) (PP fiy in (NP wilAy+ap state (NP kAliyfuwrniyA))))) (NP (NP streets) (PP of (NP (NP the city) (PP of (NP Long Beach)) (PP in (NP (NP the state) (PP of California))))) ACL 6/18/08

Problem Summary and Approach PP, ADJP attachment harder in ATB Cannot rely on base NP constraint PP attachment to a non-base NP nearly non-existent in PTB 16th most frequent dependency in ATB PP attachment worse for ATB (Kulick,Gabbard,Marcus, 2006) Treat attachment within iDAfa as problem independent of parser ACL 6/18/08

The Task as a Machine Learning Problem Definition Instances are attachments Extract idafas and modifiers from corpus Labels are level to attach at Constraint: No attachments crossing levels Technique MaxEnt model to label attachments Dynamic programming to enforce constraint ACL 6/18/08

Machine Learning Features Baseline: Only level of attachment Non-Baseline Features AttSym – POS tag or nonterminal label of modifier Lex – (noun being modifed, head word of modifier) TotDepth – (baseline ^ total depth of idafa ^ AttSym) Simple GenAgr - (AttSym ^ gender suffixes of the words corresponding to lex) Full GenAgr – Simple GenAgr also with number suffixes ACL 6/18/08

Machine Learning Results Features Accuracy Base 39.7 Base+AttSym 76.1 Base+Lex 58.4 Base+Lex+AttSym 79.9 Base+Lex+AttSym+TotDepth 78.7 Base+Lex+AttSym+GenAgr 79.3 ACL 6/18/08

For ML problem in this talk Future Work For ML problem in this talk More feature investigation Improved analysis of subclasses of iDAfas. In context of real system Analysis of iDAfa and attachment accuracy in current parsing Get attachment problem out of parser Use current work as module after parsing ACL 6/18/08