Morphological Meanings in the Prague Dependency Treebank Magda Razímová Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University, Prague
Prague Dependency Treebank based on Functional Generative Description (Sgall 1967) layered annotation scenario in PDT 2.0 –w-layer (word layer) original text, segmented on word boundaries –m-layer (morphological layer) morphological lemma and tag associated with each token –a-layer (analytical layer) surface-syntactic dependency tree each token is represented by a node –t-layer (tectogrammatical layer) deep-syntactic dependency tree only autosemantic words are represented as tree nodes
Interlinking the layers in PDT 2.0
Why do we need morphological meanings on t-layer? Peter met her youngest brother. Peter will meet her young brothers. Peter ACT meet PRED tense=ant brother PAT number=sg #PersPron APP young RSTR degree=sup Peter ACT meet PRED tense=post brother PAT number=pl #PersPron APP young RSTR degree=pos –in FGD, morphological meanings are represented by grammatemes –grammateme = node attribute (resp. attribute-value pair)
Interesting issues (i) reduction of morphological information –e.g. categories imposed only by agreement are not stored on t-layer (no person with verbs, no number with adjectives) relocation of morphological information –e.g. in case of subject deletion, categories such as gender/person are formally expressed by the verb form, but logically associated with the (unexpressed) subject –Ex: Spala. [lit. sleep.past.fem.sg.3 ] sleep PRED tense=ant #PersPron ACT num=pl gen=fem pers=3
Interesting issues (ii) tectogrammatical node complexatomqcomplexlistcoapdphrfphrroot semantic adjectives semantic adverbs semantic verbs presence/absence of a given attribute? the need for hierarchy of node types two-level hierarchy of t-layer nodes used in PDT 2.0:
Interesting issues (iii) differentiating between “traditional” and semantic parts of speech –e.g. učitelův (teacher’s) – possessive adjective on m-layer, but semantic noun on t-layer various m-layer means for expressing the same t-layer meaning –future tense in Czech simple verb form for perfectives (přinesu) complex verb form for imperfectives (budu chodit) prefixed form for some verbs (půjdu) from inflection to derivation –e.g. regular systems in pronominal expressions: somebody, nobody, everybody, anybody, somewhere, nowhere, everywhere...
Implementation system of 14 grammatemes –number, gender, person, degcmp, verbmod, aspect, tense, numertype, indeftype, negation, politeness, deontmod, dispmod, resultative, iterativeness, sentmod (semi-)automatic procedure implemented in Perl using the information from the two lower levels all t-layer data in PDT 2.0 (50,000 Czech sentences) enriched with node classification and grammateme values