Jan Hajič Otakar Smrž Petr Zemánek Jan Šnaidauf Emanuel Beška Faculty of Mathematics and Physics Faculty of Philosophy and Arts Charles University in Prague Development in Data and Tools Prague Arabic Dependency Treebank
September 23, 2004Prague Arabic Dependency Treebank: Development in Data and Tools 2 Project Release – PADT 1.0 December 2004, Linguistic Data Consortium Morpho, Syntax AFP13 000N/AFrance PressePenn ATB 1 UMH38 500N/AUmmah PressPenn ATB 2 XIN13 500N/AXinhua NewsA Gigaword ALH Al-Hayat NewsA Gigaword ANN An-Nahar NewsA Gigaword XIA Xinhua NewsA Gigaword
September 23, 2004Prague Arabic Dependency Treebank: Development in Data and Tools 3 Open-Source Tools TrEd Tree Editor Multi-purpose annotation environment Suite of programming utilities Netgraph Search Engine Server/Client system architecture Easy-to-learn query language Encode::Arabic Perl Module Extension for processing of Arabic script ArabTeX, Buckwalter, Unicode, …
September 23, 2004Prague Arabic Dependency Treebank: Development in Data and Tools 4 PADT Functional Views Functional Generative Description Theory of linguistic meaning and its expression Prague Dependency Treebank for Czech Independence of representation levels Tectogrammatical – linguistic meaning Analytical – surface dependency syntax Morphological – categories and lexical units Abstraction of the relations across levels Strict distinction between form and function Different units of description on each level
September 23, 2004Prague Arabic Dependency Treebank: Development in Data and Tools 5 Functional Morphology Provides syntax levels with their abstract language, not just giving letters in tokens Revives multiple senses of categories Completeness of generation Strict modeling of grammatical control MorphoTrees – ‘human tagging’ Successful prototype feature-based tagger
September 23, 2004Prague Arabic Dependency Treebank: Development in Data and Tools 6 Syntactic Levels of Description Analytical level Pragmatically motivated, close to surface syntax Every single token resulting from morphological level forms one node Tree-like dependency structure for every sentence Tectogrammatical level Linguistic (literal) meaning, deep relations, TFA Initial structures transformed from AL Nodes for autosemantic words only Decisive role of valency frames
September 23, 2004Prague Arabic Dependency Treebank: Development in Data and Tools 7 Logic of Analytical Trees Concepts of dependency and valency Reduction: sentence must retain grammatical correctness if leaves (terminal nodes) are chopped off Trees: clause components clauses sentences paragraphs etc. Subtrees of clauses exchangeable for non-clauses Nodes: words, tokenized parts of words, punctuation marks – marked by functions Edges: syntactic relations – governing node dependent node/subtree
September 23, 2004Prague Arabic Dependency Treebank: Development in Data and Tools 8 Some Syntax Issues of Arabic Non-verbal predication of several types Subordinate non-verbal clauses / modification Verb-like behavior of many nominal forms Mostly VSO in verbal sentences, but… vice-versa in non-verbal clauses different, depending on context boundness Compound verbs, fixed composite prepositions Grammatical co-reference, accusative of inner object, complex referencing, etc.
September 23, 2004Prague Arabic Dependency Treebank: Development in Data and Tools 9 Problem I: Predication Head node of tree: PREDICATE Why? Steady role in sentence, cannot be omitted Verbal predicate: I-go to school Non-verbal predicate Nominal: The-house a-big (=the house is big) Existential: There a-city (=there is a city) Prepositional Possessive: For him a-house (=he has a house) Adverbial: The-mosque in the-city (=…is…) Conjunctional: The-problem that (=…is that)
September 23, 2004Prague Arabic Dependency Treebank: Development in Data and Tools 10 la- [PredP] for -hu [Obj] him baytun [Sb] a-house [nom.] Predication Types in Trees dAma [Pred] lasted iqtirAHu [Sb] proposal ‑ hu [Atr] his al-EamalIyata [Obj] the-operation [acc.] EalA [AuxP] on zumalA’i [Obj] colleagues ‑ hi [Atr] his sAEatayni [Adv] two-hours [acc.] al-baytu [Sb] the-house [nom.] kabIrun [Pnom] a-big [nom.] vam~ata [PredE] there-is fI [PredP] in al-madInati [Adv] the-city [gen.] al-jAmiEu [Sb] the-mosque [nom.] madInatun [Sb] a-city [nom.] Nominal Prepositional (possessive) Existential Prepositional (adverbial, locative) Verbal Verb-like behavior (object of noun?)
September 23, 2004Prague Arabic Dependency Treebank: Development in Data and Tools 11 Problem II: Clauses & Co-reference Recursiveness: subordinate clause is con- tained as subtree in place of simple element Head-node of clause gets the same function Problem: non-verbal structures – clauses or not? Compound verbs (mA zAla etc.) treated equally Grammatical co-reference: Personal pro- noun formally required by another element Pronoun must be marked to be treated as such Target of reference is unambiguously identifiable Often in subordinate clauses, mostly attributive Ex.: He-wrote a-book number its-pages hundred
September 23, 2004Prague Arabic Dependency Treebank: Development in Data and Tools 12 naHwu [Sb] grammar [nom.] jumalan [Sb] sentences [acc.] fI [Atr_PredP] in Clauses & Co-reference in Trees kataba [Pred] he-wrote SafHatin [Atr] pages [gen.] kitAban [Obj] a-book mi’atu [Sb] hundred [nom.] zAlat [Pred] she-stopped tuHis~u [Atv] she-feels anna [AuxC] that ‑ hA [Atr_Ref] their -hA [Obj] her wADiHun [Atr_Pnom] clear [nom.] tuEjibu [Obj_Pred] they-impress al-rajulu [Sb] the-man [nom.] Attributive clause, prepositional predicate (adverbial) Objective clause, verbal predicate Compound verb, formed as main verb and its complement zaybabu [Sb] Zaynab mA [AuxM] not -hi [Adv_Ref] it Referencing pronoun, as attribute in clause Attributive clause, nominal predicate Referencing pronoun, as adverbial in clause
September 23, 2004Prague Arabic Dependency Treebank: Development in Data and Tools 13 Future Prospects Implementation of Functional Morphology Tectogrammatical annotation Lexicons of valency frames Re-training the feature-based tagger on MorphoTrees Machine-learning on the treebank data for various purposes
September 23, 2004Prague Arabic Dependency Treebank: Development in Data and Tools 14 Thank you Questions welcome!