Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jan Hajič Otakar Smrž Petr Zemánek Jan Šnaidauf Emanuel Beška Faculty of Mathematics and Physics Faculty of Philosophy and Arts Charles University in Prague.

Similar presentations


Presentation on theme: "Jan Hajič Otakar Smrž Petr Zemánek Jan Šnaidauf Emanuel Beška Faculty of Mathematics and Physics Faculty of Philosophy and Arts Charles University in Prague."— Presentation transcript:

1 Jan Hajič Otakar Smrž Petr Zemánek Jan Šnaidauf Emanuel Beška Faculty of Mathematics and Physics Faculty of Philosophy and Arts Charles University in Prague Development in Data and Tools Prague Arabic Dependency Treebank

2 September 23, 2004Prague Arabic Dependency Treebank: Development in Data and Tools 2 Project Release – PADT 1.0  December 2004, Linguistic Data Consortium  148 000 Morpho, 113 500 Syntax AFP13 000N/AFrance PressePenn ATB 1 UMH38 500N/AUmmah PressPenn ATB 2 XIN13 500N/AXinhua NewsA Gigaword ALH10 00073 500Al-Hayat NewsA Gigaword ANN12 50025 500An-Nahar NewsA Gigaword XIA26 50049 500Xinhua NewsA Gigaword

3 September 23, 2004Prague Arabic Dependency Treebank: Development in Data and Tools 3 Open-Source Tools  TrEd Tree Editor Multi-purpose annotation environment Suite of programming utilities  Netgraph Search Engine Server/Client system architecture Easy-to-learn query language  Encode::Arabic Perl Module Extension for processing of Arabic script ArabTeX, Buckwalter, Unicode, …

4 September 23, 2004Prague Arabic Dependency Treebank: Development in Data and Tools 4 PADT Functional Views  Functional Generative Description Theory of linguistic meaning and its expression Prague Dependency Treebank for Czech  Independence of representation levels Tectogrammatical – linguistic meaning Analytical – surface dependency syntax Morphological – categories and lexical units  Abstraction of the relations across levels Strict distinction between form and function Different units of description on each level

5 September 23, 2004Prague Arabic Dependency Treebank: Development in Data and Tools 5 Functional Morphology  Provides syntax levels with their abstract language, not just giving letters in tokens  Revives multiple senses of categories  Completeness of generation  Strict modeling of grammatical control  MorphoTrees – ‘human tagging’  Successful prototype feature-based tagger

6 September 23, 2004Prague Arabic Dependency Treebank: Development in Data and Tools 6 Syntactic Levels of Description  Analytical level Pragmatically motivated, close to surface syntax Every single token resulting from morphological level forms one node Tree-like dependency structure for every sentence  Tectogrammatical level Linguistic (literal) meaning, deep relations, TFA Initial structures transformed from AL Nodes for autosemantic words only Decisive role of valency frames

7 September 23, 2004Prague Arabic Dependency Treebank: Development in Data and Tools 7 Logic of Analytical Trees  Concepts of dependency and valency  Reduction: sentence must retain grammatical correctness if leaves (terminal nodes) are chopped off  Trees: clause components  clauses  sentences  paragraphs etc. Subtrees of clauses exchangeable for non-clauses  Nodes: words, tokenized parts of words, punctuation marks – marked by functions  Edges: syntactic relations – governing node  dependent node/subtree

8 September 23, 2004Prague Arabic Dependency Treebank: Development in Data and Tools 8 Some Syntax Issues of Arabic  Non-verbal predication of several types  Subordinate non-verbal clauses / modification  Verb-like behavior of many nominal forms  Mostly VSO in verbal sentences, but… vice-versa in non-verbal clauses different, depending on context boundness  Compound verbs, fixed composite prepositions  Grammatical co-reference, accusative of inner object, complex referencing, etc.

9 September 23, 2004Prague Arabic Dependency Treebank: Development in Data and Tools 9 Problem I: Predication  Head node of tree: PREDICATE Why? Steady role in sentence, cannot be omitted  Verbal predicate: I-go to school  Non-verbal predicate Nominal: The-house a-big (=the house is big) Existential: There a-city (=there is a city) Prepositional  Possessive: For him a-house (=he has a house)  Adverbial: The-mosque in the-city (=…is…) Conjunctional: The-problem that (=…is that)

10 September 23, 2004Prague Arabic Dependency Treebank: Development in Data and Tools 10 la- [PredP] for -hu [Obj] him baytun [Sb] a-house [nom.] Predication Types in Trees dAma [Pred] lasted iqtirAHu [Sb] proposal ‑ hu [Atr] his al-EamalIyata [Obj] the-operation [acc.] EalA [AuxP] on zumalA’i [Obj] colleagues ‑ hi [Atr] his sAEatayni [Adv] two-hours [acc.] al-baytu [Sb] the-house [nom.] kabIrun [Pnom] a-big [nom.] vam~ata [PredE] there-is fI [PredP] in al-madInati [Adv] the-city [gen.] al-jAmiEu [Sb] the-mosque [nom.] madInatun [Sb] a-city [nom.] Nominal Prepositional (possessive) Existential Prepositional (adverbial, locative) Verbal Verb-like behavior (object of noun?)

11 September 23, 2004Prague Arabic Dependency Treebank: Development in Data and Tools 11 Problem II: Clauses & Co-reference  Recursiveness: subordinate clause is con- tained as subtree in place of simple element Head-node of clause gets the same function Problem: non-verbal structures – clauses or not? Compound verbs (mA zAla etc.) treated equally  Grammatical co-reference: Personal pro- noun formally required by another element Pronoun must be marked to be treated as such Target of reference is unambiguously identifiable Often in subordinate clauses, mostly attributive Ex.: He-wrote a-book number its-pages hundred

12 September 23, 2004Prague Arabic Dependency Treebank: Development in Data and Tools 12 naHwu [Sb] grammar [nom.] jumalan [Sb] sentences [acc.] fI [Atr_PredP] in Clauses & Co-reference in Trees kataba [Pred] he-wrote SafHatin [Atr] pages [gen.] kitAban [Obj] a-book mi’atu [Sb] hundred [nom.] zAlat [Pred] she-stopped tuHis~u [Atv] she-feels anna [AuxC] that ‑ hA [Atr_Ref] their -hA [Obj] her wADiHun [Atr_Pnom] clear [nom.] tuEjibu [Obj_Pred] they-impress al-rajulu [Sb] the-man [nom.] Attributive clause, prepositional predicate (adverbial) Objective clause, verbal predicate Compound verb, formed as main verb and its complement zaybabu [Sb] Zaynab mA [AuxM] not -hi [Adv_Ref] it Referencing pronoun, as attribute in clause Attributive clause, nominal predicate Referencing pronoun, as adverbial in clause

13 September 23, 2004Prague Arabic Dependency Treebank: Development in Data and Tools 13 Future Prospects  Implementation of Functional Morphology  Tectogrammatical annotation  Lexicons of valency frames  Re-training the feature-based tagger on MorphoTrees  Machine-learning on the treebank data for various purposes

14 September 23, 2004Prague Arabic Dependency Treebank: Development in Data and Tools 14 Thank you Questions welcome! http://ckl.mff.cuni.cz/padt/


Download ppt "Jan Hajič Otakar Smrž Petr Zemánek Jan Šnaidauf Emanuel Beška Faculty of Mathematics and Physics Faculty of Philosophy and Arts Charles University in Prague."

Similar presentations


Ads by Google