The SALSA experience: semantic role annotation Katrin Erk University of Texas at Austin.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.
CODE/ CODE SWITCHING.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Chapter 4 Syntax.
Language Data Resources Treebanks. A treebank is a … database of syntactic trees corpus annotated with morphological and syntactic information segmented,
Statistical NLP: Lecture 3
Capturing linguistic interaction in a grammar A method for empirically evaluating the grammar of a parsed corpus Sean Wallis Survey of English Usage University.
Annotating language data Tomaž Erjavec Institut für Informationsverarbeitung Geisteswissenschaftliche Fakultät Karl-Franzens-Universität Graz Tomaž Erjavec.
Semantic Frames: FrameNet. What is FrameNet? FrameNet is an ongoing project at the International Computer Science Institute located in Berkeley California.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
Foundations of Language Science and Technology - Corpus Linguistics - Silvia Hansen-Schirra.
DS-to-PS conversion Fei Xia University of Washington July 29,
April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks: Layering the Annotation Jan Hajič Institute of Formal and Applied Linguistics.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Language, Mind, and Brain by Ewa Dabrowska Chapter 2: Language processing: speed and flexibility.
Shallow semantic parsing: Making most of limited training data Katrin Erk Sebastian Pado Saarland University.
April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks and Parsing Jan Hajič Institute of Formal and Applied Linguistics School of.
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
Corpus-based Induction of an LFG Syntax-Semantics Interface for Frame Semantic Processing Anette Frank, Jiří Semecký
Predicate Nominative. Predicate Nominative: noun(s) or pronoun(s) in the predicate that identifies the subject. **Always follows linking verb. **Look.
Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text Soo-Min Kim and Eduard Hovy USC Information Sciences Institute 4676.
ELN – Natural Language Processing Giuseppe Attardi
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
PropBank, VerbNet & SemLink Edward Loper. PropBank 1M words of WSJ annotated with predicate- argument structures for verbs. –The location & type of each.
NLP superficial and lexic level1 Superficial & Lexical level 1 Superficial level What is a word Lexical level Lexicons How to acquire lexical information.
LIRICS mid-term review 1 LIRICS WP3: Morpho-syntactic and syntactic annotations Thierry Declerck DFKI-LT - Saarbrücken 23rd May 2006.
The Impact of Grammar Enhancement on Semantic Resources Induction Luca Dini Giampaolo Mazzini
SALSA The Saarbrücken Lexical Semantics Annotation & Acquisition Project Aljoscha Burchardt, Katrin Erk, Anette Frank, Andrea Kowalski, Sebastian Pado,
Syntactically annotated corpora of Estonian Heli Uibo Institute of Computer Science University of Tartu
A Web Application for Customized Corpus Delivery Nancy Ide, Keith Suderman, Brian Simms Department of Computer Science Vassar College USA.
Sekimo Solutions mentioned by the TEI  CONCUR: an optional feature of SGML (not XML) that allows multiple.
Based on “Semi-Supervised Semantic Role Labeling via Structural Alignment” by Furstenau and Lapata, 2011 Advisors: Prof. Michael Elhadad and Mr. Avi Hayoun.
1 Introduction to Natural Language Processing ( ) Linguistic Essentials: Syntax AI-lab
© Paul Buitelaar, February 2002 Corpus Annotation Day at DI Multi-Layer Annotation for Cross- Lingual Information Retrieval in the Medical Domain Paul.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
The Current State of FrameNet CLFNG June 26, 2006 Fillmore.
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
11 Chapter 19 Lexical Semantics. 2 Lexical Ambiguity Most words in natural languages have multiple possible meanings. –“pen” (noun) The dog is in the.
Spanish FrameNet Project Autonomous University of Barcelona Marc Ortega.
Ideas for 100K Word Data Set for Human and Machine Learning Lori Levin Alon Lavie Jaime Carbonell Language Technologies Institute Carnegie Mellon University.
Linguistic Essentials
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
Natural Language Processing for Information Retrieval -KVMV Kiran ( )‏ -Neeraj Bisht ( )‏ -L.Srikanth ( )‏
GermaNet-WS II A WordNet “Detour” to FrameNet Aljoscha Burchardt Katrin Erk Anette Frank* Saarland University, DFKI* Saarbrücken
CSA2050 Introduction to Computational Linguistics Parsing I.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
MedKAT Medical Knowledge Analysis Tool December 2009.
nd PIRE project workshop1 Tectogrammatical Representation of English Silvie Cinková Lucie Mladová, Anja Nedoluzhko, Jiří Semecký, Jana Šindlerová,
Multi-lingual Semantic Annotation: Theory and Applications June 26 and 27, 2006 Saarbrücken.
Unit 8 Syntax. Syntax Syntax deals with rules for combining words into sentences, as well as with relationship between elements in one sentence Basic.
Supertagging CMSC Natural Language Processing January 31, 2006.
Syntactic Annotation of Slovene Corpora (SDT, JOS) Nina Ledinek ISJ ZRC SAZU
1 STO A Lexical Database of Danish for Language Technology Applications Anna Braasch Center for Sprogteknologi Copenhagen SPINN Seminar, October 27, 2001.
CS 4705 Lecture 17 Semantic Analysis: Robust Semantics.
Levels of Linguistic Analysis
SALSA-WS 09/05 Approximating Textual Entailment with LFG and FrameNet Frames Aljoscha Burchardt, Anette Frank Computational Linguistics Department Saarland.
Simple Sentences. What is a Sentence?  A sentence is the basic unit of written communication.  A sentence must have 3 elements:  A subject  A main.
July 2002, DI Colloquium Semantic Annotation for Semantic Indexing Paul Buitelaar, Martin VolkMuchMore DFKI Language Technology Saarbrücken, Germany Eurospider.
September 26, : Grammars and Lexicons Lori Levin.
AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Identifying Expressions of Opinion in Context Eric Breck and Yejin Choi and Claire Cardie IJCAI 2007.
10/31/00 1 Introduction to Cognitive Science Linguistics Component Topic: Formal Grammars: Generating and Parsing Lecturer: Dr Bodomo.
Natural Language Processing Vasile Rus
Embedded Clauses in TAG
An Introduction to the Government and Binding Theory
Statistical NLP: Lecture 3
Levels of Linguistic Analysis
Linguistic Essentials
Presentation transcript:

The SALSA experience: semantic role annotation Katrin Erk University of Texas at Austin

Semantic role annotation in SALSA  SALSA: The Saarbrücken Lexical Semantics Annotation and Analysis project  Manual annotation of the German TIGER corpus with lexical semantic information Basis: The Berkeley FrameNet database Verbs annotated with their Frame (~ sense), plus semantic roles  TIGER corpus:  1.5 million words / 80 K sentences of German newspaper text (Frankfurter Rundschau)  Stuttgart/Potsdam/Saarbrücken  Phrase types and grammatical functions

Annotation Scheme (They didn‘t want to pay the move back because the employee had quit.) Semantics:  Independent frames  Trees of depth one  One edge points to target, others to frame elements  Sem. roles point to syn. constituents TIGER Syntax:  Node labels: constituents  Edge labels: gramm. functions  Crossing edges  POS

Experiences with the semantic role annotation in Salsa  Frame (~ sense) assignment more difficult than role assignment  Multiple tags possible, at frame level and at role level  Limited compositionality phenomena, each with separate annotation format in Salsa: Light verbs, metaphor, idioms Distinction often difficult: metaphor vs idiom, bleaching If I did this again, one format, multiple tags possible  Annotation beyond the sentence boundary Message role in Communication frames  Annotation below the word boundary: German noun compounds Mietrechtsdiskussion: discussion of tenant law

Encoding sem. role annotation: TIGER XML as a great basis  TIGER XML: each constituent is an XML element with a globally unique ID Syn. edges explicitly encoded: elements links two nodes, referring to their IDs Models discontinuous constituents  Salsa/Tiger XML: Sem. annotation by adding a modular block to the XML structure of a sentence Semantics points to syn. constituents using their IDs Annotation beyond sentence boundary possible: globally unique syn. IDs

Extracting a lexicon: need for a deeper, richer syntax  Extracting syntax/semantics mapping: needs to identify gramm. functions filled by sem. roles  Problems: Constituent structure rather than dependencies: subjects hard to retrieve TIGER does not mark voice Shallow format for PPs: determining heads is hard Coordination is a pain