1 Josef van Genabith & Andy Way TransBooster (2003-2006) LaDEva: Labelled Dependency-Based MT Evaluation (2006- 2008) GramLab (2001-2008) Previous MT Work.

Slides:



Advertisements
Similar presentations
The Chinese Room: Understanding and Correcting Machine Translation This work has been supported by NSF Grants IIS Solution: The Chinese Room Conclusions.
Advertisements

CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.
Wrapper Syntax for Example-Based Machine Translation Karolina Owczarzak, Bart Mellebeek, Declan Groves, Josef Van Genabith, Andy Way National Centre for.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
MT For Low-Density Languages Ryan Georgi Ling 575 – MT Seminar Winter 2007.
Grammar Development Platform Miriam Butt October 2002.
Dependency-Based Automatic Evaluation for Machine Translation Karolina Owczarzak, Josef van Genabith, Andy Way National Centre for Language Technology.
Contextual Bitext-derived Paraphrases in Automatic MT Evaluation HLT-NAACL, 09 June 2006 Karolina Owczarzak, Declan Groves, Josef Van Genabith, Andy Way.
Overview of the Hindi-Urdu Treebank Fei Xia University of Washington 7/23/2011.
NAACL 2007 Treebank-Based Acquisition of Multilingual LFG Resources1 Treebank-Based Acquisition of Multilingual LFG Resources for Parsing, Generation and.
For Friday No reading Homework –Chapter 23, exercises 1, 13, 14, 19 –Not as bad as it sounds –Do them IN ORDER – do not read ahead here.
Towards Parsing Unrestricted Text into PropBank Predicate- Argument Structures ACL4 Project NCLT Seminar Presentation, 7th June 2006 Conor Cafferkey.
Annotating language data Tomaž Erjavec Institut für Informationsverarbeitung Geisteswissenschaftliche Fakultät Karl-Franzens-Universität Graz Tomaž Erjavec.
Paris 2008 Treebank-Based LFG Resources 1 Treebank-Based Wide Coverage Probabilistic LFG Resources Josef van Genabith, Aoife Cahill, Grzegorz Chrupala,
1 Unsupervised Semantic Parsing Hoifung Poon and Pedro Domingos EMNLP 2009 Best Paper Award Speaker: Hao Xiong.
Generation Miriam Butt January The Two Sides of Generation 1) Natural Language Generation (NLG) Systems which take information from some database.
Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, September 2004.
Towards an NLP `module’ The role of an utterance-level interface.
Machine Translation Anna Sågvall Hein Mösg F
C SC 620 Advanced Topics in Natural Language Processing Lecture 20 4/8.
1 MT in the NCLT Andy Way NCLT, School of Computing, Dublin City University, Dublin 9, Ireland
Are Linguists Dinosaurs? 1.Statistical language processors seem to be doing away with the need for linguists. –Why do we need linguists when a machine.
Recent Trends in MT Evaluation: Linguistic Information and Machine Learning Jason Adams Instructors: Alon Lavie Stephan Vogel.
1 Kakia Chatsiou Department of Language and Linguistics University of Essex XLE Tutorial & Demo LG517. Introduction to LFG Introduction.
The Use of Corpora for Automatic Evaluation of Grammar Inference Systems Andrew Roberts & Eric Atwell Corpus Linguistics ’03 – 29 th March Computer Vision.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks and Parsing Jan Hajič Institute of Formal and Applied Linguistics School of.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
MT Summit VIII, Language Technologies Institute School of Computer Science Carnegie Mellon University Pre-processing of Bilingual Corpora for Mandarin-English.
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
A Hierarchical Phrase-Based Model for Statistical Machine Translation Author: David Chiang Presented by Achim Ruopp Formulas/illustrations/numbers extracted.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
Does Syntactic Knowledge help English- Hindi SMT ? Avinesh. PVS. K. Taraka Rama, Karthik Gali.
Natural Language Processing Ellen Back, LIS489, Spring 2015.
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Evaluation of the Statistical Machine Translation Service for Croatian-English Marija Brkić Department of Informatics, University of Rijeka
Grammatical Machine Translation Stefan Riezler & John Maxwell.
Area Report Machine Translation Hervé Blanchon CLIPS-IMAG A Roadmap for Computational Linguistics COLING 2002 Post-Conference Workshop.
Kyoshiro SUGIYAMA, AHC-Lab., NAIST An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering Kyoshiro Sugiyama, Masahiro.
Natural Language Processing Guangyan Song. What is NLP  Natural Language processing (NLP) is a field of computer science and linguistics concerned with.
PETRA – the Personal Embedded Translation and Reading Assistant Werner Winiwarter University of Vienna InSTIL/ICALL Symposium 2004 June 17-19, 2004.
Grammar Engineering: What is it good for? Miriam Butt (University of Konstanz) and Martin Forst (NetBase Solutions) Colombo 2014.
The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.
Approximating a Deep-Syntactic Metric for MT Evaluation and Tuning Matouš Macháček, Ondřej Bojar; {machacek, Charles University.
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.
Effective Use of Linguistic and Contextual Information for Statistical Machine Translation Libin Shen and Jinxi Xu and Bing Zhang and Spyros Matsoukas.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
Grammars Grammars can get quite complex, but are essential. Syntax: the form of the text that is valid Semantics: the meaning of the form – Sometimes semantics.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Syntactic Annotation of Slovene Corpora (SDT, JOS) Nina Ledinek ISJ ZRC SAZU
Exploiting Reducibility in Unsupervised Dependency Parsing David Mareček and Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University.
Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics Projecte de Tesi Elisabet Comelles Directores Irene Castellon i Victoria Arranz.
2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
SALSA-WS 09/05 Approximating Textual Entailment with LFG and FrameNet Frames Aljoscha Burchardt, Anette Frank Computational Linguistics Department Saarland.
Approaching a New Language in Machine Translation Anna Sågvall Hein, Per Weijnitz.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Is Neural Machine Translation the New State of the Art?
RECENT TRENDS IN SMT By M.Balamurugan, Phd Research Scholar,
Approaches to Machine Translation
PRESENTED BY: PEAR A BHUIYAN
Approaches to Machine Translation
Computational Linguistics: New Vistas
Statistical Machine Translation Papers from COLING 2004
Presentation transcript:

1 Josef van Genabith & Andy Way TransBooster ( ) LaDEva: Labelled Dependency-Based MT Evaluation ( ) GramLab ( ) Previous MT Work & GramLab

2 TransBooster TransBooster ( ) Enterprise Ireland funded Basic Research Project PI: Josef van Genabith Col: Andy Way Students: Bart Mellebeek, Anna Khasin, Karolina Owczarzak

3 TransBooster TransBooster Basic Idea: MT systems are better on short (= simple) sentences than on longer ones. Capitalise on this! Divide up long sentences (automatically) into shorter components Feed those components to MT system Translate (get better results for shorter components) Put (better) translations together in target (= get better translation) A bit like Controlled Language, but automatic and without the restrictions (to particular syntax etc.)!

4 TransBooster TransBooster Example

5 TransBooster Wrapper technology Tricks MT system to produce better results …

6 TransBooster TransBooster needs Good parsers Head and argument/adjunct finding rules TransBooster with Rule-Based MT (Systran, Logomedia) Example-Based MT (DCU system) Statistical MT (standard Aachen PBSMT) Multi-engine MT Improves results! => full details Bart Mellebeek’s PhD & publications

7 TransBooster Bart Mellebeeks PhD dissertation 2007

8 LaDEva LaDEva: Labelled Dependency Based Evaluation for MT ( ) Microsoft Ireland funded Basic Research Project PIs: Josef van Genabith/Andy Way Students: Karolina Owczarzak

9 LaDEva Basic Idea: Automatic evaluation methods extremely important for MT String-based MT evaluation (BLEU etc.) unfairly penalises perfectly valid - lexical variation/paraphrases - syntactic variation/paraphrases Compare: John resigned yesterday. Yesterday, John quit. Use labelled dependencies (instead of surface strings) for automatic evaluation

10 LaDEva LaDEva example (syntactic variation): Use WordNet and PBSMT alignments for lexical variation …

11 LaDEva LaDEva needs Very (!) robust dependency parsers that can parse MT output (as opposed to grammatical language) DCU GramLab treebank-based LFG parsers Microsoft Parsers WordNet, PBSMT alignments Evaluate LaDEva using BLEU NIST GTM Meteor in terms of correlation with human judgments

12 LaDEva

13 LaDEva Karolina Owczarzak’s PhD thesis 2008

14 GramLab GramLab (2001 – 2008) - Automatic Annotation of Penn-II Treenbank with LFG F-Structures ( ) Enterprise Ireland funded Basic Research Project Team: PI: Josef van Genabith, Col: Andy Way, Aoife Cahill, Mairead McCarthy, Mick Burke, Ruth O’Donovan - GramLab: Chinese, Japanese, Arabic, Spanish, French, German, English( ) Science Foundation Ireland funded Principal Investigatorship Team: PI: Josef van Genabith, Grzegorz Chrupala, Natalie Schluter, Ines Rehbein, Yuqing Guo, Masanori Oya, Amine Akrout, Dr. Aoife Cahill, Dr. Yaffa Al- Raheb, Dr. Deirdre Hogan, Dr. Sisay Adafre, Dr. Lamia Tounsi, Dr. Mohammed Attia

15 GramLab GramLab (2001 – 2008) Basic Idea: Handcrafting deep wide coverage grammars is time-consuming, expensive and difficult to scale to unrestricted text. Acquire grammars automatically from treebanks => shallow grammars New: acquire deep grammars automatically from treebanks

16 GramLab Shallow Grammar: defines language as set of strings and associates syntactic structure to string Deep Grammar: shallow grammar + maps strings to information (meaning, dependencies, predicate argument structure – “who did what to whom”) + non-local dependency resolution

17 GramLab

18 GramLab Probabilistic Parsing & Probabilistic Generation Used in MT Evaluation (Karo), Question Answering System (Sisay) Outperforms best hand-crafted resources (XLE, RASP) for English Lots of publications, including 2 Computational Linguistics Journal Papers, 6 ACL, COLING, EMNLP Papers ( ) Aoife Cahill, Michael Burke, Ruth O'Donovan, Stefan Riezler, Josef van Genabith and Andy Way, Wide-Coverage Deep Statistical Parsing using Automatic Dependency Structure Annotation in Computational Linguistics, 2008 Ruth O'Donovan, Michael Burke, Aoife Cahill, Josef van Genabith and Andy Way (2005) Large- Scale Induction and Evaluation of Lexical Resources from the Penn-II and Penn-III Treebanks, Computational Linguistics, 2005 Transfer-based probabilistic data-driven MT … (Yvette Graham) LORG industry strength parsers and generators for IE/IR & QA (Jennifer & Deirdre)