Presentation is loading. Please wait.

Presentation is loading. Please wait.

Issues of Valency in Prague Dependency Treebank: Creating Valency Lexicon of Verbs Markéta Lopatková Center for Computational Linguistics MFF UK, Prague.

Similar presentations


Presentation on theme: "Issues of Valency in Prague Dependency Treebank: Creating Valency Lexicon of Verbs Markéta Lopatková Center for Computational Linguistics MFF UK, Prague."— Presentation transcript:

1 Issues of Valency in Prague Dependency Treebank: Creating Valency Lexicon of Verbs Markéta Lopatková Center for Computational Linguistics MFF UK, Prague CIL XVII, Prague, July 26, 2003 1

2 Motivation  ‘traditional’ linguistics  source of data for linguistic research  verification of theoretical criteria set up  natural language processing  lemmatization  morphological tagging  syntactic analysis  word sense disambiguation  ‘semantic analysis’  machine translation  building other resources  language acquisition CIL XVII, Prague, July 26, 2003 2

3 Syntactic vs. semantic approach I.  ‘Levin Verb Classes’ (Levin, 1993)  hypothesis: syntactic features of verbs are semantically determined  method: syntactic behavior  semantic classes  ‘alternation’ ~ a change in the realization of the argument structure of a verb  ‘conative alternation’ Edith cuts the bread  Edith cuts at the bread  classes = verbs which undergo certain types of alternations CIL XVII, Prague, July 26, 2003 3

4 Syntactic vs. semantic approach II.  PropBank (Palmer et al., 2001)  ‘layer of semantic annotation’ in PennTreebank  argument structure for verbs arguments: Arg0,... Arg5 modificators: ArgM (LOC, TMP, EXT, PRP, ADV)  He was drawing diagrams and sketches for his patron. Arg0: he Rel: drawing Arg1: diagrams and sketches Arg2-for: his patron  He keeps st in the fridge. Arg0: he Rel: keeps Arg1: st Arg2-in: the fridge (also Hajičová, Kučerová, 2002) CIL XVII, Prague, July 26, 2003 4

5 Syntactic vs. semantic approach III.  FrameNet (Fillmore, 2002)  it groups lexical items with parallel semantic characterization  the structure and particular components correspond to ‘semantic roles’ of the common semantic frame  verbs, nouns, adjectives, prepositions  ‘Communication’: ‘Speaker’‘Message’ ‘Addressee’ ‘Topic’ ‘Medium’ Tom communicates with Kim about the festival. Tom communicates with Kim by letter. Tom communicates the message to me.  ‘Reciprocality’: ‘Protagonists’ ‘Prot-1’ ‘Prot-2’ Tom fought with Kim. Tom and Kim fought. CIL XVII, Prague, July 26, 2003 5

6 Syntactic vs. semantic approach IV.  LCS Database (Lexical Conceptual Structure) (Dorr, 2001)  semantic representation  semantic structure + semantic content  verb cut down lexical item: (act_on loc (* thing 1) (* thing 2) ((* [on] 23) loc (*head*) (thing 24)) (cut+ingly 26) (down+/m))  sentence United States cut down (the) quota. (act_on loc (us+) (quota+) ((* [on] 23) loc (*head*) (thing 24)) (cut+ingly 26) (down+/m))  logic arguments (ag, exp, th, src, goal, info, perc, loc,poss, time, prop)  logic modifiers (mod-poss, ben, instr, purp, mod-loc, manner, mod-prop)  cut down: _ag_th,mod-loc(on) CIL XVII, Prague, July 26, 2003 6

7 Prague Dependency TreeBank  based on  Functional Generative Description (FGD) (Sgall et al., 1986)  dependency-oriented  stratificational  level of underlying representation (‘tectogrammatical level’) (described in Hajičová et al., 2000)  valency theory (esp. Panevová, 1994) CIL XVII, Prague, July 26, 2003 7

8 Valency in FGD I.  complementations:  inner participants vs. free modifications  obligatory vs. optional  valency frame: Matka.ACT předělala dětem.ADDR loutku.PAT z Kašpárka.ORIG na čerta.EFF. [Mother re-made a puppet for children from a Punch to an imp.] (Panevová) V Praze.LOC se sejdeme na Hlavním nádraží.LOC u pokladen.LOC. (Panevová) [In Prague we will meet at Main Station near a booking-office.] CIL XVII, Prague, July 26, 2003 8 obligatoryoptional inner participants free modifications

9 Valency in FGD II.  a ‘middle position:  syntactic criteria are used for the identification of Actor and Patient (Actor is the first inner participant, the second is always a Patient)  other inner participants (Addressee, Origin and Effect) as well as free modifications are determined in accordance with semantic considerations  concept of ‘shifting’ (Panevová, 1974-75) Origin Actor Patient Addressee Effect Kniha.ACT vyšla. (Panevová) [The book appears.] Chlapec.ACT vyrostl v muže.PAT. (Panevová) [A boy grew up to a man.] CIL XVII, Prague, July 26, 2003 9

10 Valency in FGD III.  valency of autosemantic words  verbs (Panevová, from the seventies)  5 inner participants - Actor, Patient, Addressee, Origin, Effect  app. 45 free modifications  ‘shifting of cognitive roles’ for inner participants  nouns (esp. Panevová, 2000, Řezníčková, manuscript)  verbal complementations  spec. nominal complementations - Identity, Partitive, Appurtenance, Restrictive and Descriptive Attribute  adjectives (Panevová, 1998)  verbal complementations  spec. adjectival complementations CIL XVII, Prague, July 26, 2003 10

11 Valency structure on TR level of PDT  the core of annotation on the tectogrammatical level  problem of consistency  valency lexicon  verbs two branches:  lists of verbs with their complementations being created and used by annotators (PDT-VALLEX)  complex valency lexicon (VALLEX)  nouns  the theoretical aspects and methodology are refined now (Řezníčková, manuscript)  lists of nouns with their complementations  adjectives  lists of adjectives with their complementations CIL XVII, Prague, July 26, 2003 11

12 Valency lexicon of verbs – PDT-VALLEX  lists being created and used by annotators  valency frames of verbs in their particular meanings, as they appear during annotation, the lexeme as a whole is not analyzed:  the information specifying elements of frames: ‘functor’ - i.e. name of complementation type - obligatory / optional possible morphemic form(s)  example(s)  it serves for consistency of annotation  approx. 4 700 verbs with 7 150 valency frames (i.e. 1,5 frames per verb) dát [to give]... ACT(1;obl) ADDR(3;obl) PAT(4;obl) dát někomu knihu [to give sb a book] CIL XVII, Prague, July 26, 2003 12

13 Valency lexicon of verbs – VALLEX  complex information on the whole verb lexeme in all its meanings (Lopatková, Žabokrtský, 2002)  the information on particular valency frames, corresponding to its meanings (described with gloss(es) and example(s))  the information specifying elements of frames: ‘functor’ - i.e. name of complementation type - obligatory / optional possible morphemic form(s) mluvit [to speak]... ACT(1;obl) ADDR(s+7;obl) PAT(o+6;opt) mluvila s ním o dětech [she spoke with him about their children]  additional syntactic information CIL XVII, Prague, July 26, 2003 13

14 Valency lexicon of verbs – VALLEX II.  additional syntactic information for particular valency frames:  reflexivity (in progress)  reciprocity  control  aspect and aspectual counterparts  possible diatheses, passivization (future plans)  primary / secondary / idiomatic usage  syntactic/semantic class (in progress)  pointers to Czech EuroWordNet (in progress)  frequency of a particular frame in samples of ČNK (60 occurrences of each verb lexeme) CIL XVII, Prague, July 26, 2003 14

15 Valency lexicon of verbs – VALLEX III.  current state: 1 400 verbs with 3 860 frames (i.e. 2,7 frames per verb)  verbs chosen according to their frequency in Czech National Corpus and PDT  about 85% on ‘running text’ in PDT  open questions  enriched valency frame  syntactic-semantic classes  alternative frames  frozen collocations CIL XVII, Prague, July 26, 2003 15

16 Valency lexicon of verbs  Why two branches?  PDT-VALLEX ~ ‘extensive’  necessary for annotation  ‘recall’ improves relatively quickly  VALLEX ~ ‘intensive’  the whole lexeme is analyzed en bloc  adequate and consistent description  ‘precision’ improves  the two branches are supposed to be merged  PDT-VALLEX ~ valuable source for VALLEX CIL XVII, Prague, July 26, 2003 16

17

18 Enriched valency frames I.  inner participants  each inner participant can occur only once (with single occurrence of a verb)  combination of inner participants must be listed for a particular verb  morphemic form is predicted by the governing verb  concept of ‘shifting’ is applied  free modifications  each free modification can be repeated  syntactically, they can modify any verb (only semantic restrictions are often present)  they have typical semantics  they do not undergo the ‘shifting’ CIL XVII, Prague, July 26, 2003 18

19 Enriched valency frames II.  quasi-valency complementations (also Panevová, 2003)  each quasi-valency complementation can occur only once (with any occurrence of a verb)  each quasi-valency complementation is characteristic for a limited list of verbs  morphemic form is predicted by the governing verb  they have typical semantics  they do not undergo the shifting  Obstacle uhodit hlavou o větev.OBST [to bump one's head against a bough] zavadit o stůl.OBST [to brush against a table]  Difference prodloužit o hodinu.DIFF [to prolong by one hour]  Mediator vzít někoho za ruku.MDT [to take sb by his/her hand] CIL XVII, Prague, July 26, 2003 19

20 Enriched valency frames III.  typical modifications  optional free modifications ‘commonly’ used with a verb  usually modify group of verbs with similar meaning  morphemic form  prototypical for some modifications e.g. Dative case or prep. group pro [for]+Acc for Benefactor  determined by the typical semantics of the modifying members e.g. prep. groups na [on]+Loc and v [in]+Loc typically specify Location  ‘verbs of motion’ – typically modified by Direction modification (provided that Direction is not obligatory) jít do kina / přes les / jít z domova [to go to cinema / through the wood / from home]  ‘verbs of exchange’ – typically modified by modification of Recompense dát / dostat / získat / kupovat / brát něco.PAT za něco.RCMP [to give / get / obtain / buy / take something for something] CIL XVII, Prague, July 26, 2003 20

21 Exploitation of the valency lexicon  reaching the consistency of assigning the valency structure (PDT-VALLEX)  automatic syntactic analysis (‘shallow parsing’)  ‘tectogrammatical parser’  automatic system for creating an underlying representation of Czech sentences  source data for building the valency lexicon of nouns CIL XVII, Prague, July 26, 2003 21

22 Resources  theoretical articles on valency (Panevová)  The Manual for Tectogrammatical Tagging of the Prague Dependency Treebank (Hajičová et al., 2000)  lists of particular valency frames created by annotators  electronic valency dictionary of surface realizations of verbal modifiers (FI MU Brno, Pala, Ševeček, 1997)  printed dictionaries Slovesa pro praxi (SPP, 1997), valency specification of 767 most frequent verbs Slovník spisovného jazyka českého (SSJČ, 1964) Slovník spisovné češtiny pro školu a veřejnost (SSČ, 1978) Slovník českých synonym (SČS, 1994) Slovník české frazeologie a idiomatiky (SČFI, 1983)  Czech National Corpus (ČNK)  EuroWordNet, Czech WordNet CIL XVII, Prague, July 26, 2003 22

23 References I.  Dorr, B.J. (2001) LCS Verb Database, Online Software Database of Conceptual Structures and Documentations, UCMP.  Fillmore, Ch. (2002) FrameNet and the Linking between Semantic and Syntactic Relations. In: COLING 2002, Proceedings, pp. xxviii-xxxvi.  Hajičová, E. et al. (2000) A Manual for Tectogrammatical Tagging of the Prague Dependency Treebank. UFAL/CKL Technical Report TR-2000-09.  Hajičová, E., Kučerová, I. (2002) Argument/Valency Structure in PropBank, LCS Database and Prague Dependency Treebank. In: LREC 2002, Proceedings, pp. 846-851.  Levin, B. (1993) English Verb Classes and Alternations: A Preliminary Investigation. Chicago: University of Chicago.  Lopatková, M. et al. (2002) Tektogramaticky anotovaný valenční slovník českých sloves. UFAL/CKL Technical Report TR-2002-15.  Lopatková, M., Žabokrtský, Z. (2002) Valency Dictionary of Czech Verbs. In: LREC 2002, Proceedings, pp. 949-956.  Lopatková, M. (2003) Valency in the Prague Dependency Treebank: Building the Valency Lexicon. PBML 79. (in press) CIL XVII, Prague, July 26, 2003 23

24 References II.  Pala, K., Ševeček, P. (1997) Valence českých sloves. In: Sborník prací FFUB, Brno.  Palmer, M. et al. (2001) Automatic Predicate Argument Analysis of the Penn TreeBank. In: HLT 2001, Proceedings, San Francisco: Morgan Kaufamm.  Panevová, J. (1974-75) On Verbal Frames in Functional Generative Description. Part I, PBML 22, pp. 3-40, Part II, PBML 23, pp. 17-52.  Panevová, J. (1994) Valency Frames and the Meaning of the Sentence. In: Luelsdorff (ed.) The Prague School of Structural and Functional Linguistics, John Benjamins, pp. 223-243.  Panevová, J. (1998) Ještě k teorii valence. Slovo a slovesnost 59, pp. 1-14.  Panevová, J. (2000) Poznámky k valenci podstatných jmen. Čeština - univerzália a specifika 2, Masarykova Univerzita, Brno, pp. 173-180.  Panevová, J. (2003) Some Issues of Syntax and Semantics of Verbal Modifications. In: Proceedings of MTT 2003, Paris. (in press)  Sgall, P. et al. (1986) The Meaning of the Sentence in Its Semantic and Pragmatic Aspects. Dordrecht: Reidel, Prague: Academia. CIL XVII, Prague, July 26, 2003 24


Download ppt "Issues of Valency in Prague Dependency Treebank: Creating Valency Lexicon of Verbs Markéta Lopatková Center for Computational Linguistics MFF UK, Prague."

Similar presentations


Ads by Google