April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks and Parsing Jan Hajič Institute of Formal and Applied Linguistics School of.

Slides:



Advertisements
Similar presentations
The SALSA experience: semantic role annotation Katrin Erk University of Texas at Austin.
Advertisements

A Joint Model For Semantic Role Labeling Aria Haghighi, Kristina Toutanova, Christopher D. Manning Computer Science Department Stanford University.
April 26th, 2007 Workshop on Treebanking, HLT/NAACL, Rochester 1 Layering of Annotations in the Penn Discourse TreeBank (PDTB) Rashmi Prasad Institute.
Annotating language data Tomaž Erjavec Institut für Informationsverarbeitung Geisteswissenschaftliche Fakultät Karl-Franzens-Universität Graz Tomaž Erjavec.
NLP and Speech Course Review. Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser thethe – determiner Det NP → Det.
The Hindi-Urdu Treebank Lecture 7: 7/29/ Multi-representational, Multi-layered treebank Traditional approach: – Syntactic treebank: PS or DS, but.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
DS-to-PS conversion Fei Xia University of Washington July 29,
April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks: Layering the Annotation Jan Hajič Institute of Formal and Applied Linguistics.
Are Linguists Dinosaurs? 1.Statistical language processors seem to be doing away with the need for linguists. –Why do we need linguists when a machine.
Extracting LTAGs from Treebanks Fei Xia 04/26/07.
April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks: Language-specific Issues Czech Jan Hajič Institute of Formal and Applied Linguistics.
Växjö University Joakim Nivre Växjö University. 2 Who? Växjö University (800) School of Mathematics and Systems Engineering (120) Computer Science division.
Conversion from DS to PS. Information in PS and DS PS (e.g., PTB) DS (some target DS) POS tagyes Function tag (e.g., -SBJ) yes Empty category and co-indexation.
Treebanks as Training Data for Parsers Joakim Nivre Växjö University and Uppsala University
Workshop on Treebanks, Rochester NY, April 26, 2007 The Penn Treebank: Lessons Learned and Current Methodology Ann Bies Linguistic Data Consortium, University.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
Introduction & Overview CS4533 from Cooper & Torczon.
Thoughts on Treebanks Christopher Manning Stanford University.
Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 7.
Czech-to-English Translation: MT Marathon 2009 Session Preview Jonathan Clark Greg Hanneman Language Technologies Institute Carnegie Mellon University.
EMPOWER 2 Empirical Methods for Multilingual Processing, ‘Onoring Words, Enabling Rapid Ramp-up Martha Palmer, Aravind Joshi, Mitch Marcus, Mark Liberman,
Invitation to Computer Science 5th Edition
1/21 Introduction to TectoMT Zdeněk Žabokrtský, Martin Popel Institute of Formal and Applied Linguistics Charles University in Prague CLARA Course on Treebank.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Computational Linguistics Yoad Winter *General overview *Examples: Transducers; Stanford Parser; Google Translate; Word-Sense Disambiguation * Finite State.
1 Data-Driven Dependency Parsing. 2 Background: Natural Language Parsing Syntactic analysis String to (tree) structure He likes fish S NP VP NP VNPrn.
INTRODUCTION TO COMPUTING CHAPTER NO. 06. Compilers and Language Translation Introduction The Compilation Process Phase 1 – Lexical Analysis Phase 2 –
THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)
Tree-based Machine Translation using syntax and semantics
The Prague (Czech-)English Dependency Treebank Jan Hajič Charles University in Prague Computer Science School Institute of Formal and Applied Linguistics.
12/06/1999 JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Statistical Parsing Dr. Jan Hajič CS Dept., Johns Hopkins Univ.
AQUAINT Workshop – June 2003 Improved Semantic Role Parsing Kadri Hacioglu, Sameer Pradhan, Valerie Krugler, Steven Bethard, Ashley Thornton, Wayne Ward,
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
30 March – 8 April 2005 Dipartimento di Informatica, Universita di Pisa ML for NLP With Special Focus on Tagging and Parsing Kiril Ribarov.
Natural Language Programming David Vadas The University of Sydney Supervisor: James Curran.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Iceland 5/30-6/1/07 1 Parsing with Morphological Information for Treebank Construction Seth Kulick University of Pennsylvania.
nd PIRE project workshop1 Tectogrammatical Representation of English Silvie Cinková Lucie Mladová, Anja Nedoluzhko, Jiří Semecký, Jana Šindlerová,
March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Intro 1 The Prague Dependency Treebank (PDT) Introduction Jan Hajič Institute.
Supertagging CMSC Natural Language Processing January 31, 2006.
Annotation Procedure in Building the Prague Czech-English Dependency Treebank Marie Mikulová and Jan Štěpánek Institute of Formal and Applied Linguistics.
Syntactic Annotation of Slovene Corpora (SDT, JOS) Nina Ledinek ISJ ZRC SAZU
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
Natural Language Processing Lecture 14—10/13/2015 Jim Martin.
Building Sub-Corpora Suitable for Extraction of Lexico-Syntactic Information Ondřej Bojar, Institute of Formal and Applied Linguistics, ÚFAL.
Arabic Syntactic Trees Zdeněk Žabokrtský Otakar Smrž Center for Computational Linguistics Faculty of Mathematics and Physics Charles University in Prague.
Parsing & Language Acquisition: Parsing Child Language Data CSMC Natural Language Processing February 7, 2006.
LING/C SC/PSYC 438/538 Lecture 18 Sandiway Fong. Adminstrivia Homework 7 out today – due Saturday by midnight.
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
Multilinugual PennTools that capture parses and predicate-argument structures, for use in Applications Martha Palmer, Aravind Joshi, Mitch Marcus, Mark.
 2003 CSLI Publications Ling 566 Oct 17, 2011 How the Grammar Works.
1 CPA: Where do we go from here? Research Institute for Information and Language Processing, University of Wolverhampton; UPF Barcelona; University of.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
ICS312 Introduction to Compilers Set 23. What is a Compiler? A compiler is software (a program) that translates a high-level programming language to machine.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.
Prologue Sung-Dong Kim, Dept. of Computer Engineering, Hansung University.
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 17 th.
1/16 TectoMT Zdeněk Žabokrtský ÚFAL MFF UK Software framework for developing MT systems (and other NLP applications)
Formal Languages and Automata FORMAL LANGUAGES FINITE STATE AUTOMATA.
PRESENTED BY: PEAR A BHUIYAN
David Mareček and Zdeněk Žabokrtský
LING/C SC/PSYC 438/538 Lecture 20 Sandiway Fong.
Statistical NLP Spring 2011
LING/C SC 581: Advanced Computational Linguistics
Constraining Chart Parsing with Partial Tree Bracketing
LING/C SC/PSYC 438/538 Lecture 23 Sandiway Fong.
Natural Language Processing
Presentation transcript:

April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks and Parsing Jan Hajič Institute of Formal and Applied Linguistics School of Computer Science Faculty of Mathematics and Physics Charles University, Prague Czech Republic

April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester2 Questions Covered Q1 What do you care … building a parser? Q2 What works, what doesn’t? Q3 What info is useful, what not? Q4 How does grammar writing interact with treebank building (TB)? Q5 Methodological lessons learned from TB? Q6 (Dis)advantages of pre-parsing for TB? Q7 Phrase-structure vs. dependency?

April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester3 Q1 What do we really care about… building a parser What will its output used for: Deep (semantic structure) parsing Translation Question answering … etc. Conversion of annotation into “features” Locality good (with today’s parsers) Accuracy Size, speed, … (the practical things)

April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester4 Q3 What info is useful Hard to say MST (McDonald), Collins, Charniak surface syntax parsers (Czech): No function tags used Reduced tagset (1100 -> 43)  Hand-made reduction worked best! (POS, case if possible) Lemmatization, word forms used Empty categories, co-indexation not used (not present) Adjunct/argument distinction not used Subcat frames not used (not present)

April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester5 Q5 Lessons learned ! For parsing only ! Separated surface and deep annotation is good Even then, Czech parsing lags behind English 1 million word treebank is far from enough… … for languages with rich inflection, that is Need for tagset reduction “Local” information helps Often can be extracted automatically from the annotated treebank Lexicalized PS/dependency not much difference (so far)

April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester6 Q6 (Dis)advantages of pre-parsing (surface) Speed Up to 50% faster (100% increase in throughput) …therefore cheaper Consistency better Labeling Color codes for uncertainty of label assignment Disadvantage: “strange” errors Can be checked for automatically with cross-checking

April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester7 Q7 Phrase structure vs. dependency If … (phrase structure) has heads marked AND (dependency) has tags suitable for phrase labels and no non-projectivity Then… essentially the same thing Else... ?? determining heads; branching & labels; projectivization Done on Czech: Collins parser 98, ACL ’99 Dependency -> lexicalized PS (parsing) -> Dep.