April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks: Layering the Annotation Jan Hajič Institute of Formal and Applied Linguistics.

Slides:



Advertisements
Similar presentations
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Advertisements

CPSC 422, Lecture 16Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 16 Feb, 11, 2015.
Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa 1), Sebastian Riedel 2), Tsutomu Hirao 3), Masayuki Asahara.
Annotation of Grammatemes in the Prague Dependency Treebank 2.0 Magda Razímová Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University.
Layering Semantics (Putting meaning into trees) Treebank Workshop Martha Palmer April 26, 2007.
June 6, 20073rd PIRE Meeting1 Tectogrammatical Representation of English in Prague Czech-English Dependency Treebank Lucie Mladová Silvie Cinková, Kristýna.
The SALSA experience: semantic role annotation Katrin Erk University of Texas at Austin.
Used in place of a noun pronoun.
Software Applications for Processing Romanian Texts. Demonstration and Comparison Sanda Cherata Babeş-Bolyai University Faculty of Letters.
April 26th, 2007 Workshop on Treebanking, HLT/NAACL, Rochester 1 Layering of Annotations in the Penn Discourse TreeBank (PDTB) Rashmi Prasad Institute.
LING NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006.
Stemming, tagging and chunking Text analysis short of parsing.
DS-to-PS conversion Fei Xia University of Washington July 29,
Markov Model Based Classification of Semantic Roles A Final Project in Probabilistic Methods in AI Course Submitted By: Shlomit Tshuva, Libi Mann and Noam.
April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks: Language-specific Issues Czech Jan Hajič Institute of Formal and Applied Linguistics.
April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks and Parsing Jan Hajič Institute of Formal and Applied Linguistics School of.
The LC-STAR project (IST ) Objectives: Track I (duration 2 years) Specification and creation of large word lists and lexica suited for flexible.
Tips and Tricks … with INTEX/NOOJ Tamás Váradi Institute for Linguistics Research Hungarian Academy of Sciences Max Silberztein University.
Building the Valency Lexicon of Arabic Verbs Viktor Bielický Otakar Smrž LREC 2008, Marrakech, Morocco.
EMPOWER 2 Empirical Methods for Multilingual Processing, ‘Onoring Words, Enabling Rapid Ramp-up Martha Palmer, Aravind Joshi, Mitch Marcus, Mark Liberman,
PDT 2.0 Prague Dependency Treebank 2.0 Zdeněk Žabokrtský Dept. of Formal and Applied Linguistics Charles University, Prague.
UAM CorpusTool: An Overview Debopam Das Discourse Research Group Department of Linguistics Simon Fraser University Feb 5, 2014.
PDT Grammatemes and Coreference in the PDT 2.0 Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University in Prague.
Interpreting Dictionary Definitions Dan Tecuci May 2002.
March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax 1 The PDT Morphology and Surface Syntax.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)
Morphological Meanings in the Prague Dependency Treebank Magda Razímová Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University,
Tree-based Machine Translation using syntax and semantics
The Prague (Czech-)English Dependency Treebank Jan Hajič Charles University in Prague Computer Science School Institute of Formal and Applied Linguistics.
Czech-English Word Alignment Ondřej Bojar Magdalena Prokopová
12/06/1999 JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Statistical Parsing Dr. Jan Hajič CS Dept., Johns Hopkins Univ.
Quality Control for Wordnet Development in BalkaNet Pavel Smrž Faculty of Informatics, Masaryk University in Brno, Czech.
Systematic Parameterized Description of Pro-forms in the Prague Dependency Treebank 2.0 Magda Ševčíková Zdeněk Žabokrtský Institute of Formal and Applied.
From E-Content to E-Learning in Computational Linguistics Localisation of Teaching materials for less processed languages Kiril Simov *, Petya Osenova.
Jan Hajič Otakar Smrž Petr Zemánek Jan Šnaidauf Emanuel Beška Faculty of Mathematics and Physics Faculty of Philosophy and Arts Charles University in Prague.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
Spanish FrameNet Project Autonomous University of Barcelona Marc Ortega.
Capturing patterns of linguistic interaction in a parsed corpus A methodological case study Sean Wallis Survey of English Usage University College London.
Resemblances between Meaning-Text Theory and Functional Generative Description Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University,
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.
1 / 5 Zdeněk Žabokrtský: Automatic Functor Assignment in the PDT Automatic Functor Assignment (AFA) in the Prague Dependency Treebank PDT : –a long term.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Proper Nouns in Czech Corpora Magda Ševčíková Institute of Formal and Applied Linguistics Faculty of Mathematics and Physics.
PDT Grammatemes in the PDT 2.0 Zdeněk Žabokrtský Dept. of Formal and Applied Linguistics Charles University, Prague
MedKAT Medical Knowledge Analysis Tool December 2009.
nd PIRE project workshop1 Tectogrammatical Representation of English Silvie Cinková Lucie Mladová, Anja Nedoluzhko, Jiří Semecký, Jana Šindlerová,
March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Intro 1 The Prague Dependency Treebank (PDT) Introduction Jan Hajič Institute.
Supertagging CMSC Natural Language Processing January 31, 2006.
Annotation Procedure in Building the Prague Czech-English Dependency Treebank Marie Mikulová and Jan Štěpánek Institute of Formal and Applied Linguistics.
Syntactic Annotation of Slovene Corpora (SDT, JOS) Nina Ledinek ISJ ZRC SAZU
1 STO A Lexical Database of Danish for Language Technology Applications Anna Braasch Center for Sprogteknologi Copenhagen SPINN Seminar, October 27, 2001.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Leonid Iomdin Institute for Information Transmission Problems, Russian Academy of Sciences
Building Sub-Corpora Suitable for Extraction of Lexico-Syntactic Information Ondřej Bojar, Institute of Formal and Applied Linguistics, ÚFAL.
Arabic Syntactic Trees Zdeněk Žabokrtský Otakar Smrž Center for Computational Linguistics Faculty of Mathematics and Physics Charles University in Prague.
March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics 1 PDT: Tectogrammatical Representation Jan Hajič Institute.
Semantic annotation of a dialog corpus Silvie Cinková Institute of Formal and Applied Linguistics Charles University in Prague, Czech Republic COMPANIONS.
Coreference: Current and outlook Silvie Cinková (CU) Companions Semantic Representation and Dialog Interfacing Workshop Edinburgh, March 5, 2008.
Prague Czech-English Dependency Treebank 2.0 ufal.mff.cuni.cz/pcedt2.0 Silvie Cinková, Marie Mikulová, Jan Štěpánek & professors, annotators and programmers.
Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.
Netgraph – a Tool for Searching in the Prague Dependency Treebank 2.0 Defence of the Doctoral Thesis, Prague, September 3 rd, 2008 Author: Mgr. Jiří Mírovský.
CSC 594 Topics in AI – Natural Language Processing
Lecture – VIII Monojit Choudhury RS, CSE, IIT Kharagpur
LING/C SC/PSYC 438/538 Lecture 20 Sandiway Fong.
A Statistical Model for Parsing Czech
Prague Dependency Treebank 2. 0 Zdeněk Žabokrtský Dept
LING/C SC 581: Advanced Computational Linguistics
LING/C SC/PSYC 438/538 Lecture 23 Sandiway Fong.
PREPOSITIONAL PHRASES
Presentation transcript:

April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks: Layering the Annotation Jan Hajič Institute of Formal and Applied Linguistics School of Computer Science Faculty of Mathematics and Physics Charles University, Prague Czech Republic

April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester2 Layering the PDT (5) stand-off layers: Deep structure (t) Syntax & semnatics Dependecy & non-dep. links Surface structure (a) Dependency, function Morphology (m) Lemma, tag (detailed) Word (token) (w) Audio/auto transcript (z) z-layer “PML” Scheme (XML based)

April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester3 The Links Within t-layer Co-reference links Pronoun to antecedent, (future: full coref chains) Complement to 2 nd governor, etc. Lexicon links Verbs, nouns, adjectives, adverbs to dictionary entry  Word sense disambiguated, valency/frame-based t-layer to a-layer Which a-node the t-node “comes from” No restrictions (crossing, many-to-many, …)

April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester4 The Questions I Influence of choices made in the underlying annotation influenced “upper” layer choices? Minimal or none thanks to stand-off annotation style, and many-to- many references/links allowed (XML IDs) Added annotation (over surface syntax): Node order (information structure), deep dependencies, 30+ node labels (time, modalities, semantic POS, number, pronoun classes, …), co- reference, valency dictionary (~ “frame files”) links (word sense annotation), “empty” nodes (args), …

April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester5 The Questions II Hard to circumvent syntactic choices? Not really… (again, thanks to XML stand-off) Only 1 label at surface syntactic level (function) Dependency(-only) no problem (no need to refer to phrases – all represented by subtrees) …but there will be a problem with the t-layer When referring from some “higher” (“logic”) layer:  (Probably) need to refer to labels (attributes) Solution:  Add IDs to attributes (should be easy, in fact – XML ID…)

April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester6 The Questions III Desirable characteristics … for adding layers Stand-off annotation Proper IDs for in-, between-layer reference In advance, if possible, but usually can be added later Quality Control !! Easier with layers - cross-layer constraints Invisible to annotators -> catch random errors Links (between-layer type) can be pre-annotated PS vs. dep.: impact on additional annotation Not observed