Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.

Slides:



Advertisements
Similar presentations
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Advertisements

The Chinese Room: Understanding and Correcting Machine Translation This work has been supported by NSF Grants IIS Solution: The Chinese Room Conclusions.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
CL Research ACL Pattern Dictionary of English Prepositions (PDEP) Ken Litkowski CL Research 9208 Gue Road Damascus,
Automatic Identification of Cognates, False Friends, and Partial Cognates University of Ottawa, Canada University of Ottawa, Canada.
Improving Machine Translation Quality via Hybrid Systems and Refined Evaluation Methods Andreas Eisele DFKI GmbH and Saarland University Helsinki, November.
January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.
CS347 Review Slides (IR Part II) June 6, 2001 ©Prabhakar Raghavan.
Are Linguists Dinosaurs? 1.Statistical language processors seem to be doing away with the need for linguists. –Why do we need linguists when a machine.
Inducing Information Extraction Systems for New Languages via Cross-Language Projection Ellen Riloff University of Utah Charles Schafer, David Yarowksy.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Creating a Bilingual Ontology: A Corpus-Based Approach for Aligning WordNet and HowNet Marine Carpuat Grace Ngai Pascale Fung Kenneth W.Church.
Shallow semantic parsing: Making most of limited training data Katrin Erk Sebastian Pado Saarland University.
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
1 The Web as a Parallel Corpus  Parallel corpora are useful  Training data for statistical MT  Lexical correspondences for cross-lingual IR  Early.
An Information Theoretic Approach to Bilingual Word Clustering Manaal Faruqui & Chris Dyer Language Technologies Institute SCS, CMU.
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Machine translation Context-based approach Lucia Otoyo.
Kuang Ru; Jinan Xu; Yujie Zhang; Peihao Wu Beijing Jiaotong University
Learning Information Extraction Patterns Using WordNet Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield,
Combining Lexical Semantic Resources with Question & Answer Archives for Translation-Based Answer Finding Delphine Bernhard and Iryna Gurevvch Ubiquitous.
Assessing the Impact of Frame Semantics on Textual Entailment Authors: Aljoscha Burchardt, Marco Pennacchiotti, Stefan Thater, Manfred Pinkal Saarland.
The Impact of Grammar Enhancement on Semantic Resources Induction Luca Dini Giampaolo Mazzini
1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.
CLEF 2004 – Interactive Xling Bookmarking, thesaurus, and cooperation in bilingual Q & A Jussi Karlgren – Preben Hansen –
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
Finding High-frequent Synonyms of a Domain- specific Verb in English Sub-language of MEDLINE Abstracts Using WordNet Chun Xiao and Dietmar Rösner Institut.
Information Retrieval and Web Search Cross Language Information Retrieval Instructor: Rada Mihalcea Class web page:
The Current State of FrameNet CLFNG June 26, 2006 Fillmore.
MIRACLE Multilingual Information RetrievAl for the CLEF campaign DAEDALUS – Data, Decisions and Language, S.A. Universidad Carlos III de.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Language Technology I © 2005 Hans Uszkoreit Language Technology I 2005/06 Hans Uszkoreit Universität des Saarlandes and German Research Center for Artificial.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Modelling Human Thematic Fit Judgments IGK Colloquium 3/2/2005 Ulrike Padó.
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
GUIDE : PROF. PUSHPAK BHATTACHARYYA Bilingual Terminology Mining BY: MUNISH MINIA (07D05016) PRIYANK SHARMA (07D05017)
A Semantic Approach to IE Pattern Induction Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield, UK.
Using Surface Syntactic Parser & Deviation from Randomness Jean-Pierre Chevallet IPAL I2R Gilles Sérasset CLIPS IMAG.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
SVETLA KOEVA SVETLOZARA LESEVA BORISLAV RIZOV. The project Automatic information extraction based on semantic relations (RILA – a bilateral co-operation.
CLEF2003 Forum/ August 2003 / Trondheim / page 1 Report on CLEF-2003 ML4 experiments Extracting multilingual resources from corpora N. Cancedda, H. Dejean,
Iterative Translation Disambiguation for Cross Language Information Retrieval Christof Monz and Bonnie J. Dorr Institute for Advanced Computer Studies.
Collocations and Terminology Vasileios Hatzivassiloglou University of Texas at Dallas.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Supertagging CMSC Natural Language Processing January 31, 2006.
Accurate Cross-lingual Projection between Count-based Word Vectors by Exploiting Translatable Context Pairs SHONOSUKE ISHIWATARI NOBUHIRO KAJI NAOKI YOSHINAGA.
1 STO A Lexical Database of Danish for Language Technology Applications Anna Braasch Center for Sprogteknologi Copenhagen SPINN Seminar, October 27, 2001.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
Combining Text and Image Queries at ImageCLEF2005: A Corpus-Based Relevance-Feedback Approach Yih-Cheng Chang Department of Computer Science and Information.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of.
Analysis of Experiments on Hybridization of different approaches in mono and cross-language information retrieval DAEDALUS – Data, Decisions and Language,
Overview of Statistical NLP IR Group Meeting March 7, 2006.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
Arnar Thor Jensson Koji Iwano Sadaoki Furui Tokyo Institute of Technology Development of a Speech Recognition System For Icelandic Using Machine Translated.
Approaches to Machine Translation
LACONEC A Large-scale Multilingual Semantics-based Dictionary
Approaches to Machine Translation
Dennis Zhao,1 Dragomir Radev PhD1 LILY Lab
Extracting Why Text Segment from Web Based on Grammar-gram
Presentation transcript:

Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics Saarland University

Comments/Thoughts  Useful approach, as it can potentially speed up and support annotation and thus making new FrameNets.  Uses only few resources, therefore extendable to other language pairs (in principle).  First experiments ‘only’.

Multilingual FrameNets  Having FrameNet for as many languages as possible would be nice.  There are numerous monolingual and cross- lingual applications.  BUT: Building ‘a FrameNet’ is knowledge and labour intensive work, and thus expensive, funding may be a problem.

Bootstrapping Multilingual FNs  (Re-) Use as much knowledge from existing FrameNets as possible.  Ease the task of annotators by making useful suggestions.  Use automatic methods for knowledge acquisition. LSA Swamp of Language

More than one strand of hair may be needed…  By the way: Change_hair_configuration is not yet in FN.

FR.FrameNet  In FR.FrameNet, several methods have been explored that could reduce time and costs of building new FrameNets.  Tasks explored:  Lexical Unit (Frame Evoking Element) transfer  Identify Frame Elements  Disambiguating LU-Frame Assignment

Lexical Unit Transfer  Can be seen as the task of finding and disambiguating translation pairs (links to Machine Translation, lexicography).  Extract disambiguated translations from existing ‘cluster-based’ dictionary.  Some manual annotation required, but relatively fast and simple way of acquiring a solid core lexicon.

Manual Filtering  Is frame information currently used for disambiguation?  How is the manual annotation done? Sounds like rules of thumb. Guidelines?  How is it evaluated?

Resources needed  Lexical unit transfer  English FrameNet  Large coverage bi-lingual dictionary (source►target language, optimally sense-disambiguated)   Corpus in target language  (Some) manual annotation (Read: OK,  may be problem for ‘small’ languages,  may be problem for small projects)

Lexical Unit Transfer: Other Possibilities  Using ‘human readable’ resources  Use existing dictionaries  Problem: Disambiguation  Using machine readable resources  Use Euro WordNet or similar  Problem again: Disambiguation  Use parallel corpora  Padó&Lapata, AAAI-05

Identify Frame Elements  Core idea: The same semantic restrictions/preferences should apply to Frame Elements in source and target language.  How can these semantic preferences be learned?  First step: Learn cross-lingual semantic similarity  Second step: Identify Frame Elements in one language and transfer.

Bilingual Infomap/Latent Semantic Analysis (LSA)  Originally used for crosslingual information retrieval.  Use bilingual, parallel ‘core’ corpus.  Parallel documents/paragraphs/… are put together and count as one text.  Build vector space.  Monolingual and cross-lingual similarities will ‘fall out’.

Identify and transfer Frame Elements  Use Berkeley FrameNet corpus as training corpus (English): Frame Elements (content words+POS) from annotated examples are used as starting point.  Use semantic space (generated by LSA) to find good (hopefully semantically related) translation candidates for words making up Frame Element.  To identify French Frame Element: Find ‘closest’ vector.  Several good examples, some less good ones.

Add Clustering  Inspection of data shows: Frame Elements may have semantically different fillers.  Thus, clustering of LSA vectors seems promising.  Identifying French Frame Elements: Instead of finding closest vector, check whether word vector belong to one of the clusters.  Problems: Identify optimal number of clusters, sparse data, …

Resources Needed  Frame Element identification/transfer  English FrameNet  Parallel corpus source/target language   Additional corpora in both languages  Corpus in target language  (Tagger in source/target language  )  (Not so little) manual annotation  (Read: OK,  may be problem for ‘small’ languages,  may be problem for small projects)

Use information from WordNet?  For French:  Use (Euro) WordNet alternatively/in addition:  Use Euro WordNet links (translations)  Use WordNet to expand ‘queries’  Use similarity measures such as Jiang&Conrath 97.  For other languages that do not have WordNet: ???

Syntax  Certain Frame Elements are semantically totally heterogeneous, but syntactically (relatively) easy to identify  For example: Statement.Message (engl.: say that X, fr.: dire que X)  Problem: Semantic transfer can be learned using LSA, syntactic transfer (that≈que) cannot.  Could (partially) parsed parallel corpora be used to learn syntactic transfer? Can ‘syntactic’ and ‘semantic’ Frame Element identification be combined? Alternatively: Can ‘syntactic’ Frame Elements be recognised and left to annotators altogether?

Frame Element Preferences  Knowing more about Frame Elements (explicitly) would be very helpful.  Automatic Frame/Frame Element assignment.  Manual annotation/guidelines.  Transfer to other languages.  Encoding preferences as links within FrameNet  Encoding preferences as links with external resources (WordNet? SUMO/MILO?), cf. work by Aljoscha Burchardt  Cf. yesterday’s talk by Michael Ellsworth

Conclusions  (Some) more research required.  Optimising the annotation process probably very important, e.g.:  Use several cycles (start with ‘more certain’ cases, re- train with the additional data, …)  Integrate different strategies, e.g. ‘syntax’ and ‘semantics’.  Which decisions can be made automatically? Can suggestions be made? How good are they? Recall vs. precision optimisations