March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics 1 PDT: Tectogrammatical Representation Jan Hajič Institute.

Slides:



Advertisements
Similar presentations
Lexical Functional Grammar : Grammar Formalisms Spring Term 2004.
Advertisements

Semantics (Representing Meaning)
Annotation of Grammatemes in the Prague Dependency Treebank 2.0 Magda Razímová Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University.
Leonid Iomdin Institute for Information Transmission Problems, Russian Academy of Sciences
June 6, 20073rd PIRE Meeting1 Tectogrammatical Representation of English in Prague Czech-English Dependency Treebank Lucie Mladová Silvie Cinková, Kristýna.
Issues in Building and Exploiting Latin Language Resources Marco Passarotti Università Cattolica del Sacro Cuore, Milan (Italy)
Statistical NLP: Lecture 3
Introduction to phrases & clauses
MORPHOLOGY - morphemes are the building blocks that make up words.
1 LIN 1310B Introduction to Linguistics Prof: Nikolay Slavkov TA: Qinghua Tang CLASS 18, March 13, 2007.
LING NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006.
1 Words and the Lexicon September 10th 2009 Lecture #3.
April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks: Layering the Annotation Jan Hajič Institute of Formal and Applied Linguistics.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 2.
April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks: Language-specific Issues Czech Jan Hajič Institute of Formal and Applied Linguistics.
April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks and Parsing Jan Hajič Institute of Formal and Applied Linguistics School of.
Chapter 2 A rapid overview.
Building the Valency Lexicon of Arabic Verbs Viktor Bielický Otakar Smrž LREC 2008, Marrakech, Morocco.
323 Morphology The Structure of Words 1.1 What is Morphology? Morphology is the internal structure of words. V: walk, walk+s, walk+ed, walk+ing N: dog,
PDT 2.0 Prague Dependency Treebank 2.0 Zdeněk Žabokrtský Dept. of Formal and Applied Linguistics Charles University, Prague.
1 LIN 1310B Introduction to Linguistics Prof: Nikolay Slavkov TA: Qinghua Tang CLASS 14, Feb 27, 2007.
Chapter 4 Basics of English Grammar Business Communication Copyright 2010 South-Western Cengage Learning.
Leonid Iomdin Institute for Information Transmission Problems, Russian Academy of Sciences
PDT Grammatemes and Coreference in the PDT 2.0 Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University in Prague.
Prague Dependency Treebank(s) Workshop at LSA2011, Part II Jan Hajič, Zdeňka Urešová Institute of Formal and Applied Linguistics School of Computer Science.
Machine Translation using Tectogrammatics Zdeněk Žabokrtský IFAL, Charles University in Prague.
March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax 1 The PDT Morphology and Surface Syntax.
IV. SYNTAX. 1.1 What is syntax? Syntax is the study of how sentences are structured, or in other words, it tries to state what words can be combined with.
1 Introduction to Natural Language Processing ( ) Linguistic Essentials: Syntax AI-lab
Morphological Meanings in the Prague Dependency Treebank Magda Razímová Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University,
Tree-based Machine Translation using syntax and semantics
April 17, 2007MT Marathon: Tree-based Translation1 Tree-based Translation with Tectogrammatical Representation Jan Hajič Institute of Formal and Applied.
The Prague (Czech-)English Dependency Treebank Jan Hajič Charles University in Prague Computer Science School Institute of Formal and Applied Linguistics.
Treebanks and MWEs (Part 1) Jan Hajič, Pavel Straňák, Jiří Mírovský Institute of Formal and Applied Linguistics & LINDAT/CLARIN School of Computer Science.
Jan Hajič Otakar Smrž Petr Zemánek Jan Šnaidauf Emanuel Beška Faculty of Mathematics and Physics Faculty of Philosophy and Arts Charles University in Prague.
1 LIN 1310B Introduction to Linguistics Prof: Nikolay Slavkov TA: Qinghua Tang CLASS 12, Feb 13, 2007.
Ideas for 100K Word Data Set for Human and Machine Learning Lori Levin Alon Lavie Jaime Carbonell Language Technologies Institute Carnegie Mellon University.
Resemblances between Meaning-Text Theory and Functional Generative Description Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University,
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
CPE 480 Natural Language Processing Lecture 4: Syntax Adapted from Owen Rambow’s slides for CSc Fall 2006.
1 / 5 Zdeněk Žabokrtský: Automatic Functor Assignment in the PDT Automatic Functor Assignment (AFA) in the Prague Dependency Treebank PDT : –a long term.
Jan 2004CSA3050: NLG21 CSA3050: Natural Language Generation 2 Surface Realisation Systemic Grammar Functional Unification Grammar see J&M Chapter 20.3.
Proper Nouns in Czech Corpora Magda Ševčíková Institute of Formal and Applied Linguistics Faculty of Mathematics and Physics.
PDT Grammatemes in the PDT 2.0 Zdeněk Žabokrtský Dept. of Formal and Applied Linguistics Charles University, Prague
Prague Dependency Treebank 1.0 Functional Generative Description.
nd PIRE project workshop1 Tectogrammatical Representation of English Silvie Cinková Lucie Mladová, Anja Nedoluzhko, Jiří Semecký, Jana Šindlerová,
March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Intro 1 The Prague Dependency Treebank (PDT) Introduction Jan Hajič Institute.
Annotation Procedure in Building the Prague Czech-English Dependency Treebank Marie Mikulová and Jan Štěpánek Institute of Formal and Applied Linguistics.
Syntactic Annotation of Slovene Corpora (SDT, JOS) Nina Ledinek ISJ ZRC SAZU
Machine Translation using Tectogrammatics Zdeněk Žabokrtský IFAL, Charles University in Prague.
Leonid Iomdin Institute for Information Transmission Problems, Russian Academy of Sciences
SYNTAX.
Building Sub-Corpora Suitable for Extraction of Lexico-Syntactic Information Ondřej Bojar, Institute of Formal and Applied Linguistics, ÚFAL.
Arabic Syntactic Trees Zdeněk Žabokrtský Otakar Smrž Center for Computational Linguistics Faculty of Mathematics and Physics Charles University in Prague.
Non-sentential utterances (NSU) in dialog Silvie Cinková (CU) Companions Semantic Representation and Dialog Interfacing Workshop Edinburgh, March 5, 2008.
TSD, Brno, Institute of Formal and Applied Linguistics, 1 Czech Verbs of Communication and the Extraction of.
Semantic annotation of a dialog corpus Silvie Cinková Institute of Formal and Applied Linguistics Charles University in Prague, Czech Republic COMPANIONS.
Coreference: Current and outlook Silvie Cinková (CU) Companions Semantic Representation and Dialog Interfacing Workshop Edinburgh, March 5, 2008.
Prague Czech-English Dependency Treebank 2.0 ufal.mff.cuni.cz/pcedt2.0 Silvie Cinková, Marie Mikulová, Jan Štěpánek & professors, annotators and programmers.
Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.
Netgraph – a Tool for Searching in the Prague Dependency Treebank 2.0 Defence of the Doctoral Thesis, Prague, September 3 rd, 2008 Author: Mgr. Jiří Mírovský.
Lecture – VIII Monojit Choudhury RS, CSE, IIT Kharagpur
Lecture 3: Functional Phrases
Statistical NLP: Lecture 3
A Statistical Model for Parsing Czech
Chapter 4 Basics of English Grammar
Prague Dependency Treebank 2. 0 Zdeněk Žabokrtský Dept
Universal Dependencies
Introduction to Linguistics
Chapter 4 Basics of English Grammar
Presentation transcript:

March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics 1 PDT: Tectogrammatical Representation Jan Hajič Institute of Formal and Applied Linguistics School of Computer Science Faculty of Mathematics and Physics Charles University, Prague Czech Republic

March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics 2 Tectogrammatical Annotation (t-layer) Underlying (deep) syntax 4 sublayers (integrated): dependency structure, (detailed) functors valency annotation topic/focus and deep word order coreference (mostly grammatical only)‏ all the rest (“grammatemes”): detailed functors, underlying gender, number,... Total 39 attributes (vs. 5 at m-layer, 2 at a-layer)‏

March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics 3 Analytical vs. Tectogrammatical representation Underlying verb + tense Deep function Elided Actor in Prepositions out Another ellipsis... (TR: sublayer 1 only shown)‏

March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics 4 Layer 3: Tectogrammatical Underlying (deep) syntax 4 sublayers: dependency structure, (detailed) functors topic/focus and deep word order coreference (mostly grammatical only)‏ all the rest (grammatemes): detailed functors underlying gender, number,...

March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics 5 Example - TR Graphical visualization Simplified… He worked as an engineer and he liked the work.

March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics 6 Dependency Structure Similar to the surface (Analytical) layer......but: certain nodes deleted auxiliaries, non-autosemantic words, punctuation some nodes added based on word (mostly verb, noun) valency some ellipsis resolution detailed dependency relation labels (functors)

March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics 7 Tectogrammatical Functors “Actants”: ACT, PAT, EFF, ADDR, ORIG modify: verbs, nouns, adjectives cannot repeat in a clause, usually obligatory Free modifications (~ 50), semantically defined can repeat; optional, sometimes obligatory Ex.: LOC, DIR1,...; TWHEN, TTILL,...; RSTR; BEN, ATT, ACMP, INTT, MANN; MAT, APP; ID, DPHR,... Special Coordination, Rhematizers, Foreign phrases,... syntactic semantic

March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics 8 Tectogrammatical Example Analytical verb form:  he would be allowed to be enrolled Additional attributes (grammatemes): conditional + “allow” Collapsed

March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics 9 Tectogrammatical Example Predicate with copula (state)‏  you were fired

March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics 10 Tectogrammatical Example Passive construction (action)‏  (The) book has been translated [by Mr. X] Disappeared Added

March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics 11 Tectogrammatical Example Object  he gave Mary a book Obj goes into ACT, PAT, ADDR, EFF or ORIG based on governor’s valency frame

March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics 12 Relative clause (embedded)‏  the woman, who had a French accent, was very pretty Tectogrammatical Example

March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics 13 Tectogrammatical Example Incomplete phrases  Peter works well, but Paul badly Added

March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics 14 Layer 3: Tectogrammatical Underlying (deep) syntax 4 sublayers: dependency structure, (detailed) functors topic/focus and deep word order coreference (mostly grammatical only)‏ all the rest (grammatemes): detailed functors underlying gender, number,...

March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics 15 Deep Word Order, Topic/Focus Example: Baker bakes rolls. vs. Baker IC bakes rolls. Analytical dep. tree:

March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics 16 Deep Word Order Topic/Focus Deep word order: from “old” information to the “new” one (left-to- right) at every level (head included)‏ projectivity by definition (almost...)‏ i.e., partial level-based order -> total d.w.o. Topic/focus/contrastive topic attribute of every node (t, f, c)‏ restricted by d.w.o. and other constraints

March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics 17 Layer 3: Tectogrammatical Underlying (deep) syntax 4 sublayers: dependency structure, (detailed) functors topic/focus and deep word order coreference (mostly grammatical only)‏ all the rest (grammatemes): detailed functors underlying gender, number,...

March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics 18 Coreference (intro only: see Silvie’s part)‏ Grammatical (easy)‏ relative clauses which, who  Peter and Paul, who... control infinitival constructions  John promised to go... reflexive pronouns {him,her,thme}self(-ves)‏  Mary saw herself in... John go he home promise PRED ACT PAT ACT DIR3

March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics 19 Coreference Textual Ex.: Peter moved to Iowa after he finished his PhD.

March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics 20 Layer 3: Tectogrammatical Underlying (deep) syntax 4 sublayers: dependency structure, (detailed) functors topic/focus and deep word order coreference (mostly grammatical only)‏ all the rest (grammatemes): detailed functors underlying gender, number,...

March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics 21 “Grammatemes”‏ Detailed functors (subfunctors)‏ only for some functors: TWHEN: before/after LOC: next-to, behind, in-front-of,... also: ACMP, BEN, CPR, DIR1, DIR2, DIR3, EXT Lexical (underlying)‏ number (SG/PL), tense, modality, degree of comparison,... strictly only where necessary (agreement!)‏

March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics 22 Tectogrammatical attributes I node typing complex, coap, qcomplex, root, atom,... functor, subfunctor TWHEN: TWHEN.basic, TWHEN.before is_member, is_generated, is_parenthesis, is_dsp_root, is_state, quot_type,... grammatemes (16): aspect, degcmp, deontmod, sempos, tense, indeftype, politeness, person,...

March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics 23 Tectogrammatical attributes II topic/focus: tfa, deepord valency: t_lemma, val_frame.rf bookkeeping: id coref_gram.rf, coref_text.rf, compl.rf reference to TR node, type of coreference sentmod Linking to analytical layer a.lex.rf (“main” anal. node), a.aux.rf (others)

March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics 24 Fully Annotated Sentence He spends his days sketching passers-by, or trying to.

March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics 25 Definition of Valency Ability (“desire”) of words (verbs, nouns, adjectives) to combine themselves with other units of meaning Properties of valency: Specific for every word meaning (in general) leave: sb left sth for sb vs. sb left from somewhere same as in PropBank leave.02 vs. leave.01 Typically strongly correlates with surface form morphological case (~ ending), preposition+case,... Semantic constraints are very dangerous

March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics 26 Structure of Valency word (lemma) word sense group 1 valency frame:  slot 1 slot 2 slot 3 surface expression word sense group 2... PDT VALLEX (Cz), EngVallex (En) vyměnit (to replace) vyměnit 1 ACT PATEFF Nom. Acc.za+Acc. vyměnit 2...

March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics 27 PDT-VALLEX Entry dosáhnout: “to reach”, “to get [sb to do sth]” browser/user-formatted example:

March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics 28 Corpus Valency Lexicon Corpus: ENTRY: uzavřít vf 1 : ACT(.1) CPHR({smlouva}.4) ex: u. dohodu (close a contract) vf 2 : ACT(.1) PAT(.4) ex.: u. pokoj (close a room, house) Lexicon: Sentence 2035: Sentence 15345:Sentence 51042:

March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics 29 Valency & Form: Constraints Tree structure: (Sets of) Constraints: n1: lemma=uvažovat mode=active n2: case=Nom afun=Sb n3: lemma=o afun=AuxP n4: case=Loc afun=Obj n1 n2n3 n4

March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics 30 (General) Valency Lexicon Entries EntrySense # Frame # Valency Optionality Form alternatives 111ACT PAT {c i } {c i } 22ACT PAT LOC {c i } {c i } 3ACT PAT DIR3 {c i } 34ACT PAT {c i } {c i } 211ACT {c i } 22ACT INTT {c i } {c i } 311ACT PAT {c i } {c i } 22ACT PAT {c i } {c i }

March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics 31 Example: Valency & Form 1:2 relative clause to_say: ACT EFF lemma=say mode=active afun=AuxC lemma=that afun=Obj POS=verb afun=Sb case=Nom linear representation: EFF(that[.v])

March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics 32 Valency and Text Generation Using valency for......getting the correct (lemma, tag) of verb arguments Example: starat_se PRED Martin ACT tygr PAT Martin starat V o tygr VALLEX entry: starat (se) ACT(.1) PAT(o.[.4]) se Martin se stará o tygry. “Martin takes care of tigers.” “to take care of” “tiger”