Introduction to treebanks Session 1: 7/08/2011 1.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

Layering Semantics (Putting meaning into trees) Treebank Workshop Martha Palmer April 26, 2007.
June 6, 20073rd PIRE Meeting1 Tectogrammatical Representation of English in Prague Czech-English Dependency Treebank Lucie Mladová Silvie Cinková, Kristýna.
Hindi Syntax Annotating Dependency, Lexical Predicate-Argument Structure, and Phrase Structure Martha Palmer (University of Colorado, USA) Rajesh Bhatt.
Overview of the Hindi-Urdu Treebank Fei Xia University of Washington 7/23/2011.
Language Data Resources Treebanks. A treebank is a … database of syntactic trees corpus annotated with morphological and syntactic information segmented,
Semantic Role Labeling Abdul-Lateef Yussiff
Recognizing Implicit Discourse Relations in the Penn Discourse Treebank Ziheng Lin, Min-Yen Kan, and Hwee Tou Ng Department of Computer Science National.
Towards Parsing Unrestricted Text into PropBank Predicate- Argument Structures ACL4 Project NCLT Seminar Presentation, 7th June 2006 Conor Cafferkey.
Annotating language data Tomaž Erjavec Institut für Informationsverarbeitung Geisteswissenschaftliche Fakultät Karl-Franzens-Universität Graz Tomaž Erjavec.
In Search of a More Probable Parse: Experiments with DOP* and the Penn Chinese Treebank Aaron Meyers Linguistics 490 Winter 2009.
LING 581: Advanced Computational Linguistics Lecture Notes January 19th.
< Translator Team > 25+ Languages, …and growing!.
Towards an NLP `module’ The role of an utterance-level interface.
LING 581: Advanced Computational Linguistics Lecture Notes March 9th.
The Hindi-Urdu Treebank Lecture 7: 7/29/ Multi-representational, Multi-layered treebank Traditional approach: – Syntactic treebank: PS or DS, but.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
Albert Gatt LIN3022 Natural Language Processing Lecture 8.
1 Annotation Guidelines for the Penn Discourse Treebank Part B Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, Bonnie Webber.
DS-to-PS conversion Fei Xia University of Washington July 29,
LING 364: Introduction to Formal Semantics Lecture 4 January 24th.
Recovering empty categories. Penn Treebank The Penn Treebank Project annotates naturally occurring text for linguistic structure. It produces skeletal.
Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19.
1 Deryle Lonsdale, Jeremiah McGhee, Nathan Glenn, and Tory Anderson.
Workshop on Treebanks, Rochester NY, April 26, 2007 The Penn Treebank: Lessons Learned and Current Methodology Ann Bies Linguistic Data Consortium, University.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
LING/C SC/PSYC 438/538 Lecture 27 Sandiway Fong. Administrivia 2 nd Reminder – 538 Presentations – Send me your choices if you haven’t already.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
A Survey of NLP Toolkits Jing Jiang Mar 8, /08/20072 Outline WordNet Statistics-based phrases POS taggers Parsers Chunkers (syntax-based phrases)
LING 581: Advanced Computational Linguistics Lecture Notes February 12th.
10/12/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini.
The Prague (Czech-)English Dependency Treebank Jan Hajič Charles University in Prague Computer Science School Institute of Formal and Applied Linguistics.
AQUAINT Workshop – June 2003 Improved Semantic Role Parsing Kadri Hacioglu, Sameer Pradhan, Valerie Krugler, Steven Bethard, Ashley Thornton, Wayne Ward,
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments.
Conversion of Penn Treebank Data to Text. Penn TreeBank Project “A Bank of Linguistic Trees” (as of 11/1992) University of Pennsylvania, LINC Laboratory.
Tokenization & POS-Tagging
CPSC 503 Computational Linguistics
LING 001 Introduction to Linguistics Spring 2010 Syntactic parsing Part-Of-Speech tagging Apr. 5 Computational linguistics.
Supertagging CMSC Natural Language Processing January 31, 2006.
Automatic Grammar Induction and Parsing Free Text - Eric Brill Thur. POSTECH Dept. of Computer Science 심 준 혁.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 2.
Handling Unlike Coordinated Phrases in TAG by Mixing Syntactic Category and Grammatical Function Carlos A. Prolo Faculdade de Informática – PUCRS CELSUL,
NLP. Parsing ( (S (NP-SBJ (NP (NNP Pierre) (NNP Vinken) ) (,,) (ADJP (NP (CD 61) (NNS years) ) (JJ old) ) (,,) ) (VP (MD will) (VP (VB join) (NP (DT.
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 3 rd.
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 17 th.
Lec. 10.  In this section we explain which constituents of a sentence are minimally required, and why. We first provide an informal discussion and then.
LING 581: Advanced Computational Linguistics Lecture Notes March 2nd.
Natural Language Processing Vasile Rus
English Proposition Bank: Status Report
PRESENTED BY: PEAR A BHUIYAN
Parsing in Multiple Languages
Semantic Parsing for Question Answering
Oracle Supplier Management Solution Product Availability
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27
LING/C SC/PSYC 438/538 Lecture 20 Sandiway Fong.
LING/C SC 581: Advanced Computational Linguistics
LING 581: Advanced Computational Linguistics
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27
Constraining Chart Parsing with Partial Tree Bracketing

LING/C SC/PSYC 438/538 Lecture 23 Sandiway Fong.
COUNTRIES NATIONALITIES LANGUAGES.
CS224N Section 3: Corpora, etc.
CS224N Section 3: Project,Corpora
Progress report on Semantic Role Labeling

Owen Rambow 6 Minutes.
Presentation transcript:

Introduction to treebanks Session 1: 7/08/2011 1

Outline Types of treebanks – (Syntactic) Treebank – PropBank – Discourse Treebank The English Penn Treebank Why do we need treebanks? Hw1 2

(Syntactic) Treebank Sentences annotated with syntactic structure (dependency structure or phrase structure) 1960s: Brown Corpus Early 1990s: The English Penn Treebank Late 1990s: Prague Dependency Treebank 1990s – now: Arabic, Chinese, Dutch, Finnish, French, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Italian, Japanese, Korean, Latin, Norwegian, Polish, Spanish, Turkish, etc. 3

An example John loves Mary. (S (NP (NNP John)) (VP (VBP loves) (NP (NNP Mary))) (..)) S NP VP./. John/NNP loves/VBPNP Mary/NNP loves/VBP John/NNPMary/NNP./. 4

PropBank Sentences annotated with predicate argument structure Ex: John loves Mary – “loves” is the predicate – “John” is Arg0 (“Agent”) – “Mary” is Arg1 (“Theme”) 2000s: The English PropBank, followed by the PropBanks for Chinese, Arabic, Hindi/Urdu, etc. 5

Discourse Treebank : The English Discourse Treebank The city’s Campaign Finance Board has refused to pay Mr. Dinkins $95,142 in matching funds because his campaign records are incomplete. Motorola is fighting back against junk mail. So much of the stuff poured into its Austin, Texas, offices that its mail rooms there simply stopped delivering it. Implicit = so Now, thousands of mailers, catalogs and sales pitches go straight into the trash. 6

Multi-representational, multi-layered treebank 2010-: Multi-representational, multi-layer Treebank for Hindi/Urdu The treebank includes both PS, DS, and PB. S NP VP./. John/NNP loves/VBPNP Mary/NNP loves/VBP John/NNPMary/NNP./. “loves” is predicate. “John” is Arg0. “Mary” is Arg1. 7

Outline Types of treebanks The English Penn Treebank Why do we need treebanks? Hw1 8

The English Penn Treebank (PTB) Developed at UPenn in early 1990s Most commonly used treebank in the CL field Data: – WSJ: 1-million words from 1987 to 1989 – Others: Brown Corpus, ATIS, etc. Release: – 1992: version 1 – 1995: version 2 – 1999: version 3 9

An example 10

The PTB Tagset Syntactic labels: e.g., NP, VP Function tags: e.g., -SBJ, -LOC Empty categories (ECs): e.g., *T* (for A-bar movement) Sub-categories for ECs: e.g., 0 (zero complementizers), NP* (PRO, A-movement) 11

Passive 12

Clausal Complementation 13

Raising 14

Wh-Relative Clauses 15

Contact Relatives 16

Indirect Questions 17

Punctuation 18

FinancialSpeak 19

Lists 1 20

Lists 2 21

Outline Types of treebanks The English Penn Treebank Why do we need treebanks? Hw1 22

Why do we need treebanks? Computational Linguistics: (Session 6-7) – To build and evaluate NLP tools (e.g., word segmenters, part-of-speech taggers, parsers, semantic role labelers) – This leads to significant progress of the CL field Theoretic linguistics: (Session 2 and 5-6) – Annotation guidelines are like a grammar book, with more detail and coverage – As a discovery tool – One can test linguistic theories and collect statistics by searching treebanks. 23

CL example: Parsing S => NP VP. NP => NNP VP => VBP NP NNP => John NNP => Mary VBP => loves. =>. S NP VP./. John/NNP loves/VBPNP Mary/NNP Input: John loves Mary. Output: 24

Ambiguity PP attachment: John bought the book in the store S S => NP VP NP => PN VP => V NP VP => VP PP NP => NP PP PP => P NP NP VP John/NNP bought/VBPNP the book PP in the store VP NP John/NNP bought/VBP NP the book PP in the store VP S 25

Labeled f-score S NP VP John/NNP bought/VBP NP the book PP in the store VP NP John/NNP bought/VBP NP the book PP in the store VP S 1 2 3,45,6, ,4 5,6,7 (1, 7, S) (1, 1, NP) (2, 7, VP) (3, 7, NP) (3, 4, NP) (5, 7, PP) (6, 7, NP) (1, 7, S) (1, 1, NP) (2, 7, VP) (2, 4, VP) (3, 4, NP) (5, 7, PP) (6, 7, NP) Prec=6/7, recall=6/7, f-score=6/7 sys output:gold standard: 26

Parsing evaluation Use the English Penn Treebank – Section 2-18 for training – Section 23 for final testing – Section 0-1, 22, and 24 for development Evaluation: – precision, recall, f-score – Best f-score: around 91% 27

Outline Types of treebanks The English Penn Treebank Why do we need treebanks? Hw1 28

Hw1: required part Required reading: Chapters 1 and 2 of the PTB guidelines Assignment: – pick a specific phenomenon handled by the PTB, – discuss the PTB treatment of this phenomenon, and – explain whether you concur with the treatment or not. If you do not, outline how you would have represented it differently. 29