Morphological Meanings in the Prague Dependency Treebank Magda Razímová Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University,

Slides:



Advertisements
Similar presentations
Markéta Lopatková Institute of Formal and Applied Linguistics, MFF UK PDT – Tectogrammatical Layer Introduction and T-lemma.
Advertisements

Identifying Parts of Speech & their Functions Nouns, Pronouns, Verbs, Prepositions, Adjectives, & Adverbs; Subjects & Objects.
Annotation of Grammatemes in the Prague Dependency Treebank 2.0 Magda Razímová Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University.
Functional Generative Description (FGD) Markéta Lopatková Institute of Formal and Applied Linguistics, MFF UK
En->Cz MT system based on tectogrammatics Zdeněk Žabokrtský IFAL, Charles University in Prague.
En->Cz MT system based on TR Zdeněk Žabokrtský IFAL, Charles University in Prague.
June 6, 20073rd PIRE Meeting1 Tectogrammatical Representation of English in Prague Czech-English Dependency Treebank Lucie Mladová Silvie Cinková, Kristýna.
Statistical NLP: Lecture 3
Prague Arabic Dependency Treebank Center for Computational Linguistics Institute of Formal and Applied Linguistics Charles University in Prague MorphoTrees.
Introduction to Linguistics n About how many words does the average 17 year old know?
Stemming, tagging and chunking Text analysis short of parsing.
April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks: Layering the Annotation Jan Hajič Institute of Formal and Applied Linguistics.
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
Features and Unification
April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks: Language-specific Issues Czech Jan Hajič Institute of Formal and Applied Linguistics.
Parts of Speech (Lexical Categories). Parts of Speech Nouns, Verbs, Adjectives, Prepositions, Adverbs (etc.) The building blocks of sentences The [ N.
1/36 TectoMT Zdeněk Žabokrtský Institute of Formal and Applied Linguistics MFF UK Software framework for developing MT systems (and other NLP applications)
Building the Valency Lexicon of Arabic Verbs Viktor Bielický Otakar Smrž LREC 2008, Marrakech, Morocco.
1/36 TectoMT Zdeněk Žabokrtský ÚFAL MFF UK Software framework for developing MT systems (and other NLP applications)
PDT 2.0 Prague Dependency Treebank 2.0 Zdeněk Žabokrtský Dept. of Formal and Applied Linguistics Charles University, Prague.
PDT Grammatemes and Coreference in the PDT 2.0 Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University in Prague.
1/21 Introduction to TectoMT Zdeněk Žabokrtský, Martin Popel Institute of Formal and Applied Linguistics Charles University in Prague CLARA Course on Treebank.
Machine Translation using Tectogrammatics Zdeněk Žabokrtský IFAL, Charles University in Prague.
A Remedial English Grammar. CHAPTERS ARTICLES AGREEMENT OF VERB AND SUBJECT CONCORD OF NOUNS, PRONOUNS AND POSSESSIVE ADJECTIVES CONFUSION OF ADJECTIVES.
March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax 1 The PDT Morphology and Surface Syntax.
Phonemes A phoneme is the smallest phonetic unit in a language that is capable of conveying a distinction in meaning. These units are identified within.
Tree-based Machine Translation using syntax and semantics
The Prague (Czech-)English Dependency Treebank Jan Hajič Charles University in Prague Computer Science School Institute of Formal and Applied Linguistics.
Czech-English Word Alignment Ondřej Bojar Magdalena Prokopová
Treebanks and MWEs (Part 1) Jan Hajič, Pavel Straňák, Jiří Mírovský Institute of Formal and Applied Linguistics & LINDAT/CLARIN School of Computer Science.
Systematic Parameterized Description of Pro-forms in the Prague Dependency Treebank 2.0 Magda Ševčíková Zdeněk Žabokrtský Institute of Formal and Applied.
Jan Hajič Otakar Smrž Petr Zemánek Jan Šnaidauf Emanuel Beška Faculty of Mathematics and Physics Faculty of Philosophy and Arts Charles University in Prague.
Metalanguage Revision English language year
Parts of Speech (Lexical Categories). Parts of Speech n Nouns, Verbs, Adjectives, Prepositions, Adverbs (etc.) n The building blocks of sentences n The.
Resemblances between Meaning-Text Theory and Functional Generative Description Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University,
A Remedial English Grammar. CHAPTERS ARTICLES AGREEMENT OF VERB AND SUBJECT CONCORD OF NOUNS, PRONOUNS AND POSSESSIVE ADJECTIVES CONFUSION OF ADJECTIVES.
1 / 5 Zdeněk Žabokrtský: Automatic Functor Assignment in the PDT Automatic Functor Assignment (AFA) in the Prague Dependency Treebank PDT : –a long term.
Proper Nouns in Czech Corpora Magda Ševčíková Institute of Formal and Applied Linguistics Faculty of Mathematics and Physics.
PDT Grammatemes in the PDT 2.0 Zdeněk Žabokrtský Dept. of Formal and Applied Linguistics Charles University, Prague
Morphological typology
nd PIRE project workshop1 Tectogrammatical Representation of English Silvie Cinková Lucie Mladová, Anja Nedoluzhko, Jiří Semecký, Jana Šindlerová,
March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Intro 1 The Prague Dependency Treebank (PDT) Introduction Jan Hajič Institute.
Natural Language Processing Chapter 2 : Morphology.
GoBack definitions Level 1 Parts of Speech GoBack is a memorization game; the teacher asks students definitions, and when someone misses one, you go back.
Annotation Procedure in Building the Prague Czech-English Dependency Treebank Marie Mikulová and Jan Štěpánek Institute of Formal and Applied Linguistics.
Los Paises Hispanohablantes. What countries speak Spanish?
Syntactic Annotation of Slovene Corpora (SDT, JOS) Nina Ledinek ISJ ZRC SAZU
What are Determiners? Unit 14 – Presentation 1 “a broad category of the English grammar that contains many subcategories in it, e.g. demonstrative & indefinite.
Machine Translation using Tectogrammatics Zdeněk Žabokrtský IFAL, Charles University in Prague.
Leonid Iomdin Institute for Information Transmission Problems, Russian Academy of Sciences
Building Sub-Corpora Suitable for Extraction of Lexico-Syntactic Information Ondřej Bojar, Institute of Formal and Applied Linguistics, ÚFAL.
Arabic Syntactic Trees Zdeněk Žabokrtský Otakar Smrž Center for Computational Linguistics Faculty of Mathematics and Physics Charles University in Prague.
Text segmentation Amany AlKhayat. Before any real processing is done, text needs to be segmented at least into linguistic units such as words, punctuation,
NSF PARTNERSHIP FOR RESEARCH AND EDUCATION : M EANING R EPRESENTATION FOR S TATISTICAL L ANGUAGE P ROCESSING 1 TectoMT TectoMT = highly modular software.
March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics 1 PDT: Tectogrammatical Representation Jan Hajič Institute.
MORPHOLOGY. PART 1: INTRODUCTION Parts of speech 1. What is a part of speech?part of speech 1. Traditional grammar classifies words based on eight parts.
Semantic annotation of a dialog corpus Silvie Cinková Institute of Formal and Applied Linguistics Charles University in Prague, Czech Republic COMPANIONS.
Auxiliaries in simple past How to work with “did” and “was-were”
Unit 1: Present Tense   Simple Present Tense   Present Continuous Tense   Subject & Object Pronouns (I, you, it, he, she, they) vs. (me, you, him,
Prague Czech-English Dependency Treebank 2.0 ufal.mff.cuni.cz/pcedt2.0 Silvie Cinková, Marie Mikulová, Jan Štěpánek & professors, annotators and programmers.
1/16 TectoMT Zdeněk Žabokrtský ÚFAL MFF UK Software framework for developing MT systems (and other NLP applications)
When our vacation ended Piper and Levy climbed up in the tree, and they would not answer their mother. 1. Which answer contains the prepositional phrase.
1 The grammatical categories of words and their inflections Kuiper and Allan Chapter 2.1.
Netgraph – a Tool for Searching in the Prague Dependency Treebank 2.0 Defence of the Doctoral Thesis, Prague, September 3 rd, 2008 Author: Mgr. Jiří Mírovský.
Grammar Grammar analysis.
Prague Arabic Dependency Treebank
Comparing the past and the present
Nouns Nouns not noun noun noun not not
Prague Dependency Treebank 2. 0 Zdeněk Žabokrtský Dept
The development of PDT 3.0 Introduction to the discussion
Presentation transcript:

Morphological Meanings in the Prague Dependency Treebank Magda Razímová Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University, Prague

Prague Dependency Treebank based on Functional Generative Description (Sgall 1967) layered annotation scenario in PDT 2.0 –w-layer (word layer) original text, segmented on word boundaries –m-layer (morphological layer) morphological lemma and tag associated with each token –a-layer (analytical layer) surface-syntactic dependency tree each token is represented by a node –t-layer (tectogrammatical layer) deep-syntactic dependency tree only autosemantic words are represented as tree nodes

Interlinking the layers in PDT 2.0

Why do we need morphological meanings on t-layer? Peter met her youngest brother. Peter will meet her young brothers. Peter ACT meet PRED tense=ant brother PAT number=sg #PersPron APP young RSTR degree=sup Peter ACT meet PRED tense=post brother PAT number=pl #PersPron APP young RSTR degree=pos –in FGD, morphological meanings are represented by grammatemes –grammateme = node attribute (resp. attribute-value pair)

Interesting issues (i) reduction of morphological information –e.g. categories imposed only by agreement are not stored on t-layer (no person with verbs, no number with adjectives) relocation of morphological information –e.g. in case of subject deletion, categories such as gender/person are formally expressed by the verb form, but logically associated with the (unexpressed) subject –Ex: Spala. [lit. sleep.past.fem.sg.3 ] sleep PRED tense=ant #PersPron ACT num=pl gen=fem pers=3

Interesting issues (ii) tectogrammatical node complexatomqcomplexlistcoapdphrfphrroot semantic adjectives semantic adverbs semantic verbs presence/absence of a given attribute?  the need for hierarchy of node types two-level hierarchy of t-layer nodes used in PDT 2.0:

Interesting issues (iii) differentiating between “traditional” and semantic parts of speech –e.g. učitelův (teacher’s) – possessive adjective on m-layer, but semantic noun on t-layer various m-layer means for expressing the same t-layer meaning –future tense in Czech simple verb form for perfectives (přinesu) complex verb form for imperfectives (budu chodit) prefixed form for some verbs (půjdu) from inflection to derivation –e.g. regular systems in pronominal expressions: somebody, nobody, everybody, anybody, somewhere, nowhere, everywhere...

Implementation system of 14 grammatemes –number, gender, person, degcmp, verbmod, aspect, tense, numertype, indeftype, negation, politeness, deontmod, dispmod, resultative, iterativeness, sentmod (semi-)automatic procedure implemented in Perl using the information from the two lower levels all t-layer data in PDT 2.0 (50,000 Czech sentences) enriched with node classification and grammateme values