The Impact of Grammar Enhancement on Semantic Resources Induction Luca Dini Giampaolo Mazzini

Slides:



Advertisements
Similar presentations
CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.
Advertisements

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
MEANT: semi-automatic metric for evaluating for MT evaluation via semantic frames an asembling of ACL11,IJCAI11,SSST11 Chi-kiu Lo & Dekai Wu Presented.
The SALSA experience: semantic role annotation Katrin Erk University of Texas at Austin.
Statistical Methods and Linguistics - Steven Abney Thur. POSTECH Computer Science NLP Lab Shim Jun-Hyuk.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
CS4705 Natural Language Processing.  Regular Expressions  Finite State Automata ◦ Determinism v. non-determinism ◦ (Weighted) Finite State Transducers.
Albert Gatt LIN3022 Natural Language Processing Lecture 8.
Are Linguists Dinosaurs? 1.Statistical language processors seem to be doing away with the need for linguists. –Why do we need linguists when a machine.
Chapter3: Language Translation issues
J. Turmo, 2006 Adaptive Information Extraction Summary Information Extraction Systems Multilinguality Introduction Language guessers Machine Translators.
Inducing Information Extraction Systems for New Languages via Cross-Language Projection Ellen Riloff University of Utah Charles Schafer, David Yarowksy.
1/13 Parsing III Probabilistic Parsing and Conclusions.
Creating a Bilingual Ontology: A Corpus-Based Approach for Aligning WordNet and HowNet Marine Carpuat Grace Ngai Pascale Fung Kenneth W.Church.
1/17 Probabilistic Parsing … and some other approaches.
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
CS 330 Programming Languages 09 / 16 / 2008 Instructor: Michael Eckmann.
1 Lending a Hand: Sign Language Machine Translation Sara Morrissey NCLT Seminar Series 21 st June 2006.
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
A Framework for Named Entity Recognition in the Open Domain Richard Evans Research Group in Computational Linguistics University of Wolverhampton UK
LEARNING WORD TRANSLATIONS Does syntactic context fare better than positional context? NCLT/CNGL Internal Workshop Ankit Kumar Srivastava 24 July 2008.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text Soo-Min Kim and Eduard Hovy USC Information Sciences Institute 4676.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
ICS611 Introduction to Compilers Set 1. What is a Compiler? A compiler is software (a program) that translates a high-level programming language to machine.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
1 Statistical NLP: Lecture 10 Lexical Acquisition.
Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.
Develop a fast semantic decoder for dialogue systems Capability to parse 10 – 100 ASR hypotheses in real time Robust to speech recognition noise Semantic.
Assessing the Impact of Frame Semantics on Textual Entailment Authors: Aljoscha Burchardt, Marco Pennacchiotti, Stefan Thater, Manfred Pinkal Saarland.
Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
An Extended GHKM Algorithm for Inducing λ-SCFG Peng Li Tsinghua University.
The Current State of FrameNet CLFNG June 26, 2006 Fillmore.
Part-Of-Speech Tagging using Neural Networks Ankur Parikh LTRC IIIT Hyderabad
Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop Nizar Habash and Owen Rambow Center for Computational Learning.
Joey Paquet, Lecture 12 Review. Joey Paquet, Course Review Compiler architecture –Lexical analysis, syntactic analysis, semantic.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
A Semantic Approach to IE Pattern Induction Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield, UK.
Using Surface Syntactic Parser & Deviation from Randomness Jean-Pierre Chevallet IPAL I2R Gilles Sérasset CLIPS IMAG.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
Chapter 23: Probabilistic Language Models April 13, 2004.
GermaNet-WS II A WordNet “Detour” to FrameNet Aljoscha Burchardt Katrin Erk Anette Frank* Saarland University, DFKI* Saarbrücken
The Functions and Purposes of Translators Syntax (& Semantic) Analysis.
2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania.
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
8 December 1997Industry Day Applications of SuperTagging Raman Chandrasekar.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,
Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.
The University of Illinois System in the CoNLL-2013 Shared Task Alla RozovskayaKai-Wei ChangMark SammonsDan Roth Cognitive Computation Group University.
Statistical Machine Translation Part II: Word Alignments and EM
Approaches to Machine Translation
CSc 453 Compilers & Systems Software 00. Background
Learning to Sportscast: A Test of Grounded Language Acquisition
Approaches to Machine Translation
Chapter 10: Compilers and Language Translation
Compiler design Review COMP 442/6421 – Compiler Design
Presentation transcript:

The Impact of Grammar Enhancement on Semantic Resources Induction Luca Dini Giampaolo Mazzini

Objectives > Bridging from dependency parsing to kowledge representation; >Need of an intermediate level >Semantic Role Labelling –Easily configurable; –Rule based; –Moderately learning based (MLN) > Production of a reasonably large repository of lexical units with assigned frames and mappings to syntax. > Objective of this presentation: To measure the inpact of grammar enhancement on the derivation of semantic resources.

Plan of This Talk > Architecture and Methodology; > First Evaluation; > The Effect of Grammar Improvement;

Architecture and Methodology;

Architecture Source Annotation Dep. Extraction FE alignement annotated Example Parsing parsed Annotation Parsing parsed Example Machine Translation Target Example Target LU Identification

Example …foreign policy dispute…disputa di politica straniera >

Ingredients > Bilingual MT System (Systran) > Comparable parsers for Italian and English (XIP, Xerox Incremental Parser) >Lexicon look up module ( it en) >Word sense disambiguation and clustering >Semantic vectors for source and target

Challenges > Ambiguity of translation: >Write.v ->{scrivere, fare lo scrittore, scolpire, vergare,documentare, comporre, scrivere una lettera, cantare, trascrivere}. > Lack of translation. > Identification of the semantic head of the Frame Element. > Grammatical transformations. > Grammar Errors.

Results (1)

Results (2)

Results (3)

Evaluation

Evaluation(1): SRL (1) >Manual annotation of TUT corpus (Lesmo et Al. 2002): >1000 sentences >Corpus annotated only with frame bearing induced LU; >Selection of correct frame (if any) >FE annotation of all dependants >Export in CoNLL format

Evaluation (1): SRL (2) > Second step: “parse” the corpus for SRL: >No real parser; >Very simple algorithm for assignement; >Random choice in case of ambiguity; > Results: According to Toutanova et al. (2008) F-Measure metrics: –precision of 0.53, a recall of 0.33 and a consequent precision of > Poor comparison with state of the art SRL.

Evaluation (2) > “Standard” corpus annotation: >20 sentences X 20 lexical units (no ambiguity). > Creation of a DB of triples. > Comparison with induced resources based on standard precision and recall metrics. >A hit counts as positive if Part-of-speech, Grammatic Function and Frame element all matches >A “boost” was assigned on the basis of the importance of valence population (based both number and variety of realization). >Global precision and recall is the arithmetic mean of all weights: –Precision: 0,65 –Recall: 0,41

The Effects of Grammar Improvement;

Errors > No translation for a lexical unit (7,815); > Absence of examples in the source FrameNet (4,922); > No translated example contains the candidate translation(s) of the lexical unit (1,736). > No head could be identified for English frame element realization (parse error or difficult structure, e.g. coordination) (6,191) > The translation of the semantic head of the frame element or of the frame bearing head could not be matched in the Italian example. (99,808) > The semantic heads of both the lexical unit and the frame element are found in the Italian example but the parser could not find any dependency among them. (94,004)

The Enhancement Phase > Improvements concerned only one side of the parsing mechanism, i.e. the Italian Dependency Grammar; > Development: >Using the XIP IDE (Mokhtar et al., 2001). >The development period lasted about 6 month (Testa & al.,2009)). >It was based on iterative verification on different corpus (TUT/ISST). > Improvement in LAS 40% -> 70%

Consequences > The architecture was kept exactly the same and the source code “frozen” during the six month period. > Results Old PNew POld RNew R Eval 10,530,590,330,34 Eval 20,650,710,410,51

Comments > Both evaluation types shows an increase in precision of about 6%; > Strangely recall stay almost constant in ev1, while it increases considerably in ev2 > Explanation (?): >Unmapped phenomena; >“Random” effect due to small evaluation set.

Issues & Conclusions > Was it worth 6 month labour ? >Probably not, if grammar enhancement is finalized just to the acquisition of the resources. >Probably yes, if it is independently motivated. > In general evaluation of the impact of lower modules on high level application is something crucial for strategic choices and a rather “neglected” aspect. > We need to understand the correct trade-off. >Convergency: IFRAME project (

Thank You!