Download presentation
Presentation is loading. Please wait.
Published byJaime Buff Modified over 9 years ago
1
Prague Dependency Treebank 1.0 CD-ROM PRESENTATION Dec 18, 2000
2
Prague Dependency Treebank 1.0 CD-ROM PRESENTATION Dec 18, 2000 Functional Generative Description
3
Prague Dependency Treebank 1.0 Functional Generative Description ztheoretical framework based on the findings of European structural linguistics, esp. of the classical Prague School zmethodological requirements of a formal description zlevels: ytectogrammatical (underlying) representations (TRs) with dependency based syntax ymorphemics yphonemics and phonetics TRs (see Sgall, Hajičová and Panevová 1986, formally specified by Petkevič, also in a declarative way)
4
Prague Dependency Treebank 1.0 Dependency tree My younger brother arrived there yesterday. Linearized form, one-to-one relation: ((I) Appurt (younger) Rstr brother) Act arrive.Pret.Indic ( Dir there) ( Temp yesterday)
5
Prague Dependency Treebank 1.0 Dependency Tree zlabels - lexical meanings (abstract symbols) with indices yfunctors xsubscripts at parentheses oriented towards head y grammatemes - values of morphological categories x Tense, Modality, Number, Definiteness, etc. zprojectivity zvalency yarguments (inner participants) and adjuncts (circumstantials or 'free modifications') yobligatory and optional with a given head, ydeletable or not
6
Prague Dependency Treebank 1.0 Dependency Tree zparticipants (arguments) of verbs yActor/Bearer (underlying subject) yObjective (Patient, underlying direct object) yAddressee (underlying indirect object) yEffect ('second' object: to choose so. as sth.) yOrigin (to make sth. out of sth.) zadjuncts yLocative, several Directional and Temporal modifications yCondition, Means, Manner, etc.
7
Prague Dependency Treebank 1.0 Dependency Tree zinner participants yMaterial (Partitive) two baskets of sth. yIdentity the river Danube; the notion of operator z free modifications yPossession (Appurtenance) my table; Jim's brother yRestrictive rich man yDescriptive the Swedes, who are a Scandinavian nation Complementations dependent mainly on nouns
8
Prague Dependency Treebank 1.0 Dependency Tree zsyntactic grammatemes yLoc, Dir - in, on, under, between... y Regard - with, without zoperational (testable) criteria yfor distinguishing xarguments from adjuncts, xfrom each other y deletability (dialogue test)
9
Prague Dependency Treebank 1.0 Simplified valency frames yread V Act Addr Obj ychange V Act Obj Orig Eff ygive V Act Addr Obj ybrother N Appurt yman N yglass N Material yfull A Material obligatory complementations in blue
10
Prague Dependency Treebank 1.0 Topic-focus articulation z contextual boundness ymain verb CB/NB (T/F) ydependents to the left/right z communicative dynamism yleft-right (mother, sisters, transitive) ypartial ordering z underlying word order yleft-right ylinear ordering left-to-right order of nodes together with the index T or (prototypically) F indicates the TFA of the sentence (of the TR) young there T
11
Prague Dependency Treebank 1.0 Topic-focus articulation z TFA - one of the basic aspects of underlying structures young there T yesterday F
12
Prague Dependency Treebank 1.0 Complex sentence z a subordinated (dependent) clause (i.e. its main verb) depends on a word contained in its governing clause My brother, whom you know, arrived there yesterday.
13
Prague Dependency Treebank 1.0 Complex sentence z function words (synsemantic) are viewed as function morphemes, syntactically fixed to certain lexical (autosemantic) words - prepositions and articles to nouns, conjunctions and auxiliaries to verbs Martin came there late, since he had to accompany his sick mother.
14
Prague Dependency Treebank 1.0 Complex sentence Martin arrived late to the session, since he had to accompany his sick mother. schematically (morphemes): Martin arrive.ed late to the session since he have.ed to accompany he.s sick mother. dot - close connection of morphemes ('semes')
15
Prague Dependency Treebank 1.0 zdeleted items restored yorder of items - difference between 'underlying' and surface (morphemic) word order ytransductive components - Panevová, Oliva, Borota zcoordination (multidimensional) yJim and Mary, who have two children, went to Boston. ythe linearized notation is adequate: y((Jim Mary) Conj ((who) Act have ( Pat (two) Rstr children))) Act went ( Dir Boston) zstructures close to Boolean, i.e. no complex 'innate properties' specific for natural language are needed.
16
Prague Dependency Treebank 1.0 Prague Dependency Treebank - corpus annotation zan intermediate level - 'analytical' representations ydependency trees, not always projective ynodes for all word tokens, even for punctuation marks ztectogrammmatical tree: coordinating conjunction as the head
17
Prague Dependency Treebank 1.0 CD-ROM PRESENTATION Dec 18, 2000
18
Prague Dependency Treebank 1.0 CD-ROM PRESENTATION Dec 18, 2000 Morphological Layer
19
Prague Dependency Treebank 1.0 ACKNOWLEDGEMENTS
20
Prague Dependency Treebank 1.0 ANNOTATED CORPORA PDT version 1.0, 2000 (1996 - 2000) Penn Treebank, release 3, 1999 (1989 - 1999)
21
Prague Dependency Treebank 1.0 TAG SETs Czech - ambiguous inflective language nový, nového, novému, novém, novým, nová, nové, novou, nových, novým, novými, … novější, novejšího, novějšímu, novějším, …., nejnovější, nejnovějšího, nejnovějšímu, nejnovějším….. nejnovějších, nejnovějším, … English - language with poor inflection work, works, worked, working
22
Prague Dependency Treebank 1.0
23
TEXT SOURCES zLidové noviny zMladá Fronta Dnes zVesmír zČeskomoravský Profit...taken from Czech National Corpus z´88, ´89 WSJ articles zAir Travel Information System transcripts zBrown Corpus zSwitchboard transcripts
24
Prague Dependency Treebank 1.0 ANNOTATION STRATEGY - Penn Treebank TEXT Ken Church‘s stochastic tagger, Eric Brill‘s transformation tagger corrections by annotator ( GNU Emacs Lisp based package )
25
Prague Dependency Treebank 1.0 ANNOTATION STRATEGY - PDT Automatic Morphological Analyzer (AMA) two independent annotators; Linux, Win tools differences resolved by third annotator comparison with the current AMA; manual resolution; Win tools
26
Prague Dependency Treebank 1.0 INTERNAL FORMAT zSGML coding, csts dtd word/tag(|tag)*
27
Prague Dependency Treebank 1.0 Pokus pokus NNIS1-----A---- o o RR--4---------- zázrak zázrak NNIS4-----A----.. Z:------------- The/DT envelope/NN arrives/VBZ in/IN the/DT mail/NN./. SAMPLES
28
Prague Dependency Treebank 1.0 zSGML coding z word/tag z word/lemma/tag CONVERSION pdt2wsj.pl pdt2wsjFLT.pl
29
Prague Dependency Treebank 1.0 DATA SIZE
30
Prague Dependency Treebank 1.0 DATA SETs of MORPHOLOGICALLY ANNOTATED DATA
31
Prague Dependency Treebank 1.0 TOOLS zAutomatic Morphological Analyser/Generator of Czech yHMAnalyze.pl, HMGenerate.pl yDictionary: CZE_a yRemote Acces z Czech Taggers yHMM yExponential
32
Prague Dependency Treebank 1.0 CD-ROM PRESENTATION Dec 18, 2000
33
Prague Dependency Treebank 1.0 CD-ROM PRESENTATION Dec 18, 2000 Analytical Layer in PDT
34
Prague Dependency Treebank 1.0 Introduction zInput: morphologically tagged sentences zGraph Editor: “user-friendly” software zOutput: ATS structure y„surface“ syntax tree structure ynodes labelled by the analytical functions
35
Prague Dependency Treebank 1.0 Two stages (chronologically) z(A) manual „analytic“ annotation (ATS) ytraining data for (B)(a) z(B) y(a) semiautomatic procedure (Collin‘s parser) y(b) manual correcting of (B)(a)
36
Prague Dependency Treebank 1.0 Constraints and limitations zany string has a node of its own yword-form, punctuation mark, etc. yAuxV, AuxP, AuxC, AuxX, AuxG… zreflecting the coordination and apposition relations yso called third dimension of the graph in the plain tree (X_Co, X_Ap, X_Pa, where X is one of analytic functions, such as Sb, Obj, Adv, etc.)
37
Prague Dependency Treebank 1.0 Constraints and limitations zno missing nodes (on the surface) can be added yanalytic funtion Ex_D is used zrelations between semi-automatic and manual procedure y80% edges are established correctly automatically
38
Prague Dependency Treebank 1.0 Project organization zteam consisting of 5-6 annotators zhandbook for ATS structure annotation z1999: 100000 sentences on ATS ztectogrammatical annotation follows
39
Prague Dependency Treebank 1.0 Adv AuxT První restituční zákon českého parlamentu se do sněmovních lavic může vrátit jako bumerang.
40
Prague Dependency Treebank 1.0 CD-ROM PRESENTATION Dec 18, 2000
41
Prague Dependency Treebank 1.0 CD-ROM PRESENTATION Dec 18, 2000 From the Analytical towards the Tectogrammatical layer
42
Prague Dependency Treebank 1.0 Introduction zATS annotation ynodes: xword forms xpunctuation xgraphical symbols zTGTS annotation xautosemantic words xdeletions yedges: xsurface relations xdeep layer functions
43
Prague Dependency Treebank 1.0 Input Czech sentence Morphological tagging and lexical disambiguation Tokenization Syntactic parsing and analytic function assignment Tree structure pruning Attribute assignments TGTS ATS PDT1.0 Annotation process
44
Prague Dependency Treebank 1.0 Transition procedure zdeterministic procedure operating on trees zmacro language for Graph Editor (C++ like) zautomatic changes & tools for annotators zRequirements ynew attributes for tectogrammatical layer yATS is recoverable from TGTS yautomatized to a maximally high degree
45
Prague Dependency Treebank 1.0 New attributes ztrlemma - lemma of the original node or lemma composed of joined nodes zmorphological grammatemes ygender, number, degree of comparison, tense, yaspect, iterativeness, verbal modality, deontic modality, sentence modality zpositionof the node zposition of the node yfunctor, topic-focus articulation, syntactic grammateme, ytype of relation (dependency, coordination, apposition), yphraseme, deletion, quoted word, direct speech, ycoreference, antecedent
46
Prague Dependency Treebank 1.0 Tree Structure Pruning yU toho, kdo začíná opravdu od nuly, není daňový výnos pro stát podstatný. yFor those, who start actually at zero, the tax outcome for the state is not substantial.
47
Prague Dependency Treebank 1.0 Tree Structure Pruning yU toho, kdo začíná opravdu od nuly, není daňový výnos pro stát podstatný. yFor those, who start actually at zero, the tax outcome for the state is not substantial. REG
48
Prague Dependency Treebank 1.0 Verbal Nodes … enterpreneurs should have (their) taxes … … podnikatelé by měli mít daně … PRED verbmod=CDN deontmod=HRT
49
Prague Dependency Treebank 1.0 Attribute Assignments prepositions stored as fw attribute zquoted words yclause in quotes -> DSP yone pair of quotes in the sentence -> DSPP ystring in quotes -> QUOT zgender, number, tense, degcmp, aspect zdefault values
50
Prague Dependency Treebank 1.0 Macros for Annotators zkeyboard shortcuts (in Graph editor) ystructure changes xhide/recover nodes xmerge nodes yadd new nodes yfunctor assignments
51
Prague Dependency Treebank 1.0 Manual annotation zstructure checking zfunctors zdeletions of obligatory modifications zfeedback for formulating the handbook for annotators
52
Prague Dependency Treebank 1.0 CD-ROM PRESENTATION Dec 18, 2000
53
Prague Dependency Treebank 1.0 CD-ROM PRESENTATION Dec 18, 2000 Tectogrammatical Layer
54
Prague Dependency Treebank 1.0
55
CT T T T T F F T T
56
z Jirka se včera opil do němoty a Honza dneska. z George himself yesterday drank to silence and Honza today.
57
Prague Dependency Treebank 1.0 Attributes of Coreferrential relations z only in MC attributevalues coref the lemma of the antecedent corsnt NIL - in the same sentence PREV1... PREVi - position of the sentence which includes the antecedent grammatical coreference antec the functor of the antecedent
58
Prague Dependency Treebank 1.0 Example Honza slíbil přijít včas. Honza promised to come in time. coref:Honza corsnt:NIL cornum:1 antec:ACT
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.