PDT 2.0 1 Grammatemes in the PDT 2.0 Zdeněk Žabokrtský Dept. of Formal and Applied Linguistics Charles University, Prague

Slides:



Advertisements
Similar presentations
Markéta Lopatková Institute of Formal and Applied Linguistics, MFF UK PDT – Tectogrammatical Layer Introduction and T-lemma.
Advertisements

Annotation of Grammatemes in the Prague Dependency Treebank 2.0 Magda Razímová Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University.
Greenberg 1963 Some Universals of Grammar with Particular Reference to the Order of Meaningful Elements.
En->Cz MT system based on tectogrammatics Zdeněk Žabokrtský IFAL, Charles University in Prague.
En->Cz MT system based on TR Zdeněk Žabokrtský IFAL, Charles University in Prague.
June 6, 20073rd PIRE Meeting1 Tectogrammatical Representation of English in Prague Czech-English Dependency Treebank Lucie Mladová Silvie Cinková, Kristýna.
Statistical NLP: Lecture 3
Chapter 4 Basics of English Grammar
10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt NounsVerbsPronounsPrepositionsAdverbs.
April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks: Layering the Annotation Jan Hajič Institute of Formal and Applied Linguistics.
Elicitation Corpus April 12, Agenda Tagging with feature vectors or feature structures Combinatorics Extensions.
1 CS 502: Computing Methods for Digital Libraries Lecture 12 Information Retrieval II.
NLP and Speech 2004 English Grammar
PARTS OF SPEECH 1 The principles of the traditional classification of the English vocabulary 2 Notional and functional parts of speech. 3 The field structure.
Grammatical frameworks Inflectional morphology. Grammar In the Middle Ages, grammatica […] chiefly meant the knowledge or study of Latin, and were hence.
Its Grammatical Categories
Chapter 2 A rapid overview.
By: Amanda Anthony Sarah Stepanchick & Ashley Morgan
Parts of Speech (Lexical Categories). Parts of Speech Nouns, Verbs, Adjectives, Prepositions, Adverbs (etc.) The building blocks of sentences The [ N.
PDT 2.0 Prague Dependency Treebank 2.0 Zdeněk Žabokrtský Dept. of Formal and Applied Linguistics Charles University, Prague.
Grammar Skills Workshop
Chapter 4 Basics of English Grammar Business Communication Copyright 2010 South-Western Cengage Learning.
Chapter 4 Syntax Part II.
PDT Grammatemes and Coreference in the PDT 2.0 Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University in Prague.
2. Phrases / Agreement. Phrases A phrase is a group of words that cannot stand alone as a sentence. Unlike the clause, a phrase does not have a subject-verb.
Instructor: Jully Yin Meeting Room: Room 209. Ms. Jully Yin has been instructing at National Taipei University since Education: Ms. Jully Yin has.
Machine Translation using Tectogrammatics Zdeněk Žabokrtský IFAL, Charles University in Prague.
March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax 1 The PDT Morphology and Surface Syntax.
Dr. Monira Al-Mohizea MORPHOLOGY & SYNTAX WEEK 11.
Instructor: Jully Yin Meeting Room: Room 209. Ms. Jully Yin has been instructing at National Taipei University since Education: Ms. Jully Yin has.
Morphological Meanings in the Prague Dependency Treebank Magda Razímová Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University,
The Prague (Czech-)English Dependency Treebank Jan Hajič Charles University in Prague Computer Science School Institute of Formal and Applied Linguistics.
Czech-English Word Alignment Ondřej Bojar Magdalena Prokopová
English Review for Final These are the chapters to review. In Textbook: Chapter 1 Nouns Chapter 2 Pronouns Chapter 3 Adjectives Chapter 4 Verbs Chapter.
Chapter 5 Syntax English Linguistics: An Introduction.
English Review for Final These are the chapters to review. In Textbook: Chapter 1 Nouns Chapter 2 Pronouns Chapter 3 Adjectives Chapter 4 Verbs Chapter.
Systematic Parameterized Description of Pro-forms in the Prague Dependency Treebank 2.0 Magda Ševčíková Zdeněk Žabokrtský Institute of Formal and Applied.
Metalanguage Revision English language year
Parts of Speech (Lexical Categories). Parts of Speech n Nouns, Verbs, Adjectives, Prepositions, Adverbs (etc.) n The building blocks of sentences n The.
Resemblances between Meaning-Text Theory and Functional Generative Description Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University,
1 / 5 Zdeněk Žabokrtský: Automatic Functor Assignment in the PDT Automatic Functor Assignment (AFA) in the Prague Dependency Treebank PDT : –a long term.
Proper Nouns in Czech Corpora Magda Ševčíková Institute of Formal and Applied Linguistics Faculty of Mathematics and Physics.
nd PIRE project workshop1 Tectogrammatical Representation of English Silvie Cinková Lucie Mladová, Anja Nedoluzhko, Jiří Semecký, Jana Šindlerová,
March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Intro 1 The Prague Dependency Treebank (PDT) Introduction Jan Hajič Institute.
Annotation Procedure in Building the Prague Czech-English Dependency Treebank Marie Mikulová and Jan Štěpánek Institute of Formal and Applied Linguistics.
What are Determiners? Unit 14 – Presentation 1 “a broad category of the English grammar that contains many subcategories in it, e.g. demonstrative & indefinite.
Machine Translation using Tectogrammatics Zdeněk Žabokrtský IFAL, Charles University in Prague.
◦ Process of describing the structure of phrases and sentences Chapter 8 - Phrases and sentences: grammar1.
Arabic Syntactic Trees Zdeněk Žabokrtský Otakar Smrž Center for Computational Linguistics Faculty of Mathematics and Physics Charles University in Prague.
March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics 1 PDT: Tectogrammatical Representation Jan Hajič Institute.
Inflection. Inflection refers to word formation that does not change category and does not create new lexemes, but rather changes the form of lexemes.
PROCEDURES FOR THE STRUCTURE QUESTIONS (Paper TOEFL Test and Computer TOEFL Test) First, study the sentence. Your purpose is to determine what is needed.
A Review for ENGL Parts of Speech In English, there are only eight parts of speech. That means that every sentence you read—and write—is composed.
Prague Czech-English Dependency Treebank 2.0 ufal.mff.cuni.cz/pcedt2.0 Silvie Cinková, Marie Mikulová, Jan Štěpánek & professors, annotators and programmers.
Writing 2 ENG 221 Norah AlFayez. Lecture Contents Revision of Writing 1. Introduction to basic grammar. Parts of speech. Parts of sentences. Subordinate.
1 Variation in English Grammar Linda Thomas U210A Chapter 6.
INFORMATION FOR PARENTS AUTUMN 2014 SPELLING, PUNCTUATION AND GRAMMAR.
Descriptive Grammar – 2S, 2016 Mrs. Belén Berríos
Ms. Rasha Ali Inflection.
The theory of word classes in modern grammar studies
The verb The verb-predicate agreed with the subject of the sentence in two grammatical categories: number and person. Specifically verbal categories.
Parts of Speech Review.
Beginning Syntax Linda Thomas
Project editing Ist grade Project.
Verbs, tense, aspect, and mood
Getting started with Sanskrit grammar
A Statistical Model for Parsing Czech
Chapter 4 Basics of English Grammar
Prague Dependency Treebank 2. 0 Zdeněk Žabokrtský Dept
Chapter 4 Basics of English Grammar
Presentation transcript:

PDT Grammatemes in the PDT 2.0 Zdeněk Žabokrtský Dept. of Formal and Applied Linguistics Charles University, Prague

PDT What is a "grammateme"? the same t-lemmas, the same tree topology, the same functors, but the original sentences are obviously not synonymous and must be distinguished at the t-layer (must obtain different t-trees) ! the difference is in grammatemes ~ t-node attribute-value pairs representing morphological meanings (semantically indispensable morphological categories) e.g. number for nouns, tense for verbs, degree for adjectives, deontic/verb/sentence modality...

PDT What is not a grammateme? grammatemes are not just straightforward counterparts of surface morphological categories (as stored in m-layer tags) ! some morphological categories are only imposed by grammar and thus are not semantically relevant gender, number or case of an adjective in a noun group come from agreement with the noun (e.g. in Czech or German), not from semantics similarly, person is not a grammateme of verbs, as it is only induced by subject-verb agreement on the surface, grammatemes can be expressed both inflectionally and analytically -> info about grammatemes can be distributed over more than one m-layer token comparative of adjectives in English (more interesting) future tense of imperfectives in Czech (budu chodit.../I will go...)

PDT Complete list of grammateme attributes used in PDT gram/tense - tense of verbs 10. gram/aspect - aspect of verbs 11. gram/verbmod - basic verb modality (indicative, imperative, conditional) 12. gram/deontmod - deontic modality expressed by modal verbs 13. gram/dispmod - dispositional modality (specific for Czech) 14. gram/resultative - resultativeness of verbs 15. gram/iterativeness - iterativeness of verbs 16. sentmod - sentence modality (enunciative, exclamative, desiderative, imperative, interrogative) 1. gram/number - number of semantic nouns 2. gram/gender - gender of semantic nouns 3. gram/person - person of pronominal semantic nouns 4. gram/politeness -basic vs. polite/esteemed form, relevant for pronominal semantic nouns 5. gram/indeftype (type of indefiniteness of pro-forms) 6. gram/numertype (type of numeric expression) 7. gram/negation - negation of semantic nouns, adjectives, and adverbs (not of verbs) 8. gram/degcmp - degree of comparison of semantic adjectives and adverbs

PDT Grammateme number values: sg - singular pl - plural nr - not recognized m-layer/t-layer asymmetry: pluralia tantum: jedny dveře/dvoje dveře (one door, two doors) - only the plural form exists at the m-layer, but sg/pl should be disambiguated at the t-layer polite form: "Viděl jste to, Petře?" (Did you see it, Petr?) - complex verb form containing an auxiliary verb in plural at the m-layer, but at the t-layer the grammateme number (filled in the reconstructed #PersPron node) is equal to singular

PDT Grammateme tense relative tense of verbs (with respect to the tense of the governing clause) values: sim - simultaneous ant - anterior post - posterior nil - absent (with infinitives) nr - not recognized m-layer means for expressing tense=post in Czech: inflection with perfectives (uvařím - I will cook) auxiliary verb být with imperfectives (budu zpívat - I will sing) prefix po-/pů- with a limited set of verbs (pojedu - I will go)

PDT Grammateme indeftype (I) pro-form - a word used to replace or substitute other words, phrases, clauses... pronouns (pro-nouns), pro-adjectives, pro-numerals, pro-adverbs there are many semantically significant analogies present in the pro-forms systems, but usually not explicitly distinguished in the POS tag sets example of such parallelism: nobody/never/nowhere... vs. everybody/always/everywhere... grammateme indeftype (type of indefiniteness) dedicated for all indefinite pro-forms to capture the parallelisms, each group of pro-forms is represented with t_lemma identical with the relative form: někde->kde (nowhere->where), kdokoli->kdo (whoever->who), nikdy->kdy (never->when)

PDT Grammateme indeftype (II)

PDT Grammateme indeftype (III) indefinite, negative, interrogative, and relative pronouns and other pro-forms are unproductive classes with (at least to a certain extent) transparent derivational relations also in other languages preliminary sketch of several English and German pronouns classified by indeftype

PDT Typing of t-nodes unlike t_lemmas and functors, grammateme attributes are not relevant for all t-nodes obviously, no tense for dog, no degree of comparison for (he) waits, etc. crucial question: how to formally declare presence/absence of a certain grammateme in a certain t-node ?  the need for node typing our solution: two-level hierarchy of node types 1 st level: 8 coarse-grained types of nodes 2 nd level: 19 more specific subtypes, corresponding to detailed semantic parts of speech

PDT Two-level hierarchy of t-node types 1 st level: attribute nodetype 2 nd level: attribute sempos tectogrammatical node complexatomqcomplexlistcoapdphrfphrroot semantic adjectives semantic adverbs semantic verbs

PDT nodetype values: root | complex | qcomplex | list | atom | coap | dphr | fphr fully automatic annotation - use of the tree structure  root t-attributes t-lemma  qcomplex | list functor  atom | coap | dphr | fphr otherwise  complex Levnější benzín na Východě, dražší na Západě Cheaper gasoline in the East, more expensive one in the West First level of the hierarchy: attribute nodetype

PDT sempos relevant only for nodetype=complex t-nodes 19 values of the attribute sempos : n.... | adj.... | adv.... | v.... fully automatic annotation – use of m-tag t-lemma other t-attributes sempos value delimits the set of relevant grammatemes Second level of the hierarchy: attribute sempos

PDT M-layer POS tags vs. sempos “prototypical“ relations between semantic and “traditional“ parts of speech distribution of pronouns and numerals into semantic parts of speech classification following the derivational information nounsadjectivespronounsnumeralsadverbsverbsprep.conj.part.interj. semantic nounssemantic adjectivessemantic adverbssemantic verbs Examples of asymmetry: m-layer possessive adjectives (e.g. matčin/mother's) converted to semantic nouns (matka/mother) m-layer deadjectival adverbs (pěkně/nicely) converted to semantic adjectives (pěkný/nice)

PDT Pro-forms: m-layer tags vs. t-layer sempos

PDT Grammatemes: Annotation process implementation: 2000 Perl LOCs in the ntred environment 2000 lines of linguistic rules in a special notation extensive usage of m-layer and a-layer manual annotation -> mostly automatic annotation possible only 5 man-months of human annotation

PDT More reading about grammatemes Chapter 2.4 in the t-layer manual (included in the PDT 2.0 documentation) Razímová, M., Žabokrtský, Z.: Morphological Meanings in the Prague Dependency Treebank 2.0. In: Proceedings of TSD Razímová, M., Žabokrtský, Z.: Annotation of Grammatemes in the Prague Dependency Treebank 2.0. Proceedings of Annotation Science Workshop, LREC Ševčíková Razímová, M., Žabokrtský, Z.: Systematic Parametrized Description of Pro-forms in the Prague Dependency Treebank 2.0. In: Proceedings of TLT. 2006