Translating Collocations for Bilingual Lexicons

Slides:



Advertisements
Similar presentations
Statistical modelling of MT output corpora for Information Extraction.
Advertisements

epiC: an Extensible and Scalable System for Processing Big Data
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Measures of Coincidence Vasileios Hatzivassiloglou University of Texas at Dallas.
A Maximum Coherence Model for Dictionary-based Cross-language Information Retrieval Yi Liu, Rong Jin, Joyce Y. Chai Dept. of Computer Science and Engineering.
CALTS, UNIV. OF HYDERABAD. SAP, LANGUAGE TECHNOLOGY CALTS has been in NLP for over a decade. It has participated in the following major projects: 1. NLP-TTP,
Flow Network Models for Sub-Sentential Alignment Ying Zhang (Joy) Advisor: Ralf Brown Dec 18 th, 2001.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Creating a Bilingual Ontology: A Corpus-Based Approach for Aligning WordNet and HowNet Marine Carpuat Grace Ngai Pascale Fung Kenneth W.Church.
Bilingual Lexical Acquisition From Comparable Corpora Andrea Mulloni.
Evaluating an MT French / English System Widad Mustafa El Hadi Ismaïl Timimi Université de Lille III Marianne Dabbadie LexiQuest - Paris.
1 Lending a Hand: Sign Language Machine Translation Sara Morrissey NCLT Seminar Series 21 st June 2006.
LEARNING WORD TRANSLATIONS Does syntactic context fare better than positional context? NCLT/CNGL Internal Workshop Ankit Kumar Srivastava 24 July 2008.
MINERVA Survey of Multilingualism Israel Dr. Allison Kupietzky, Coordinator WP 3, Minerva Israel Berlin, April 7 th, 2005.
A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora Benjamin Arai Computer Science and Engineering Department.
Yuliya Morozova Institute for Informatics Problems of the Russian Academy of Sciences, Moscow.
Natural Language Processing Expectation Maximization.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Bilingual term extraction revisited: Comparing statistical and linguistic methods for a new pair of languages Špela Vintar Faculty of Arts Dept. of Translation.
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.
MINERVA Survey of Multilingualism Israel Dr. Allison Kupietzky, Coordinator WP 3, Minerva Israel Berlin, April 7 th, 2005.
Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.
Using a Lemmatizer to Support the Development and Validation of the Greek WordNet Harry Kornilakis 1, Maria Grigoriadou 1, Eleni Galiotou 1,2, Evangelos.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
GUIDE : PROF. PUSHPAK BHATTACHARYYA Bilingual Terminology Mining BY: MUNISH MINIA (07D05016) PRIYANK SHARMA (07D05017)
인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
CLEF2003 Forum/ August 2003 / Trondheim / page 1 Report on CLEF-2003 ML4 experiments Extracting multilingual resources from corpora N. Cancedda, H. Dejean,
For Wednesday No reading Homework –Chapter 23, exercise 15 –Process: 1.Create 5 sentences 2.Select a language 3.Translate each sentence into that language.
Modern MT Systems and the Myth of Human Translation: Real World Status Quo ● Intro ● MT & HT Definitions ● Comparison MT vs. HT ● Evaluation Methods ●
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department.
Alignment of Bilingual Named Entities in Parallel Corpora Using Statistical Model Chun-Jen Lee Jason S. Chang Thomas C. Chuang AMTA 2004.
Cache-based Document-level Statistical Machine Translation Prepared for I 2 R Reading Group Gongzhengxian 10 OCT 2011.
FEISGILTT Dublin 2014 Yves Savourel ENLASO Corporation QuEst Integration in Okapi This presentation was made possible by This project is sponsored by the.
Mutual bilingual terminology extraction Le An Ha*, Gabriela Fernandez**, Ruslan Mitkov*, Gloria Corpas*** * University of Wolverhampton ** Universidad.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Discriminative Modeling extraction Sets for Machine Translation Author John DeNero and Dan KleinUC Berkeley Presenter Justin Chiu.
A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of.
Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,
Using Parallel Corpora for Contrastive Studies Michael Barlow.
What’s a Collocation? ● "recurrent combinations of words that co-occur more often than expected by chance and that correspond to arbitrary word usages."
Automatic Writing Evaluation
EXTRACTING COMPLEX PREDICATES IN HINDI ACROSS PARALLEL CORPORA
Measuring Monolinguality
Approaches to Machine Translation
Statistical NLP: Lecture 7
A tool for automated extraction of multi-word expressions
Urdu-to-English Stat-XFER system for NIST MT Eval 2008
Overview Hypothesis: Develop a market-neutral long/short strategy
Statistical NLP: Lecture 13
Neural Machine Translation By Learning to Jointly Align and Translate
--Mengxue Zhang, Qingyang Li
Lecture 12: Data Wrangling
Yuri Pettinicchi Jeny Tony Philip
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
Expectation-Maximization Algorithm
Introduction Task: extracting relational facts from text
Approaches to Machine Translation
Machine Translation and MT tools: Giza++ and Moses
SNOMED-CT representation Radiologic report Admission Letter
Improved Word Alignments Using the Web as a Corpus
Improving IBM Word-Alignment Model 1(Robert C. MOORE)
A Suite to Compile and Analyze an LSP Corpus
Machine Translation and MT tools: Giza++ and Moses
Properties and Algebraic Expressions
Presentation transcript:

Translating Collocations for Bilingual Lexicons Collocations (idiomatic multi-word expressions) difficult to translate semantically opaque cannot be translated word-by-word a major obstacle to second language acquisition Example: demonstrate support  prouver son adhésion (prove adherence)

The Champollion approach Input: Large parallel corpora Output: List of collocations in each language, and equivalence mappings between these collocations The method is statistical and language-independent

Algorithm Align sentences across corpora Extract collocations from co-occurrence Identify all words that frequently appear across a source collocation Iteratively consider and score combinations of those words Select best set of words for the translation Determine word order and fill in prepositions

Sample translations additional costs  coûts supplémentaires affirmative action  action positive free trade  libre-échange freer trade  libéralisation … échanges take … steps  prendre … mesures stock market  bourse

Evaluation results Corpus of 3.5 million words, collocations selected from the same corpus: 78% Corpus of 8.5 million words, collocations selected from the same corpus: 74% Corpus of 3.5 million words, collocations selected from a different corpus: 65%

Conclusion Champollion provides for collocation translation Robust Language-independent Requires no tools But: Requires parallel corpora