Download presentation
Presentation is loading. Please wait.
Published byPaul Hensley Modified over 9 years ago
1
The SALSA experience: semantic role annotation Katrin Erk University of Texas at Austin
2
Semantic role annotation in SALSA SALSA: The Saarbrücken Lexical Semantics Annotation and Analysis project Manual annotation of the German TIGER corpus with lexical semantic information Basis: The Berkeley FrameNet database Verbs annotated with their Frame (~ sense), plus semantic roles TIGER corpus: 1.5 million words / 80 K sentences of German newspaper text (Frankfurter Rundschau) Stuttgart/Potsdam/Saarbrücken Phrase types and grammatical functions
3
Annotation Scheme (They didn‘t want to pay the move back because the employee had quit.) Semantics: Independent frames Trees of depth one One edge points to target, others to frame elements Sem. roles point to syn. constituents TIGER Syntax: Node labels: constituents Edge labels: gramm. functions Crossing edges POS
4
Experiences with the semantic role annotation in Salsa Frame (~ sense) assignment more difficult than role assignment Multiple tags possible, at frame level and at role level Limited compositionality phenomena, each with separate annotation format in Salsa: Light verbs, metaphor, idioms Distinction often difficult: metaphor vs idiom, bleaching If I did this again, one format, multiple tags possible Annotation beyond the sentence boundary Message role in Communication frames Annotation below the word boundary: German noun compounds Mietrechtsdiskussion: discussion of tenant law
5
Encoding sem. role annotation: TIGER XML as a great basis TIGER XML: each constituent is an XML element with a globally unique ID Syn. edges explicitly encoded: elements links two nodes, referring to their IDs Models discontinuous constituents Salsa/Tiger XML: Sem. annotation by adding a modular block to the XML structure of a sentence Semantics points to syn. constituents using their IDs Annotation beyond sentence boundary possible: globally unique syn. IDs
6
Extracting a lexicon: need for a deeper, richer syntax Extracting syntax/semantics mapping: needs to identify gramm. functions filled by sem. roles Problems: Constituent structure rather than dependencies: subjects hard to retrieve TIGER does not mark voice Shallow format for PPs: determining heads is hard Coordination is a pain
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.