1 Josef van Genabith & Andy Way TransBooster ( ) LaDEva: Labelled Dependency-Based MT Evaluation ( ) GramLab ( ) Previous MT Work & GramLab
2 TransBooster TransBooster ( ) Enterprise Ireland funded Basic Research Project PI: Josef van Genabith Col: Andy Way Students: Bart Mellebeek, Anna Khasin, Karolina Owczarzak
3 TransBooster TransBooster Basic Idea: MT systems are better on short (= simple) sentences than on longer ones. Capitalise on this! Divide up long sentences (automatically) into shorter components Feed those components to MT system Translate (get better results for shorter components) Put (better) translations together in target (= get better translation) A bit like Controlled Language, but automatic and without the restrictions (to particular syntax etc.)!
4 TransBooster TransBooster Example
5 TransBooster Wrapper technology Tricks MT system to produce better results …
6 TransBooster TransBooster needs Good parsers Head and argument/adjunct finding rules TransBooster with Rule-Based MT (Systran, Logomedia) Example-Based MT (DCU system) Statistical MT (standard Aachen PBSMT) Multi-engine MT Improves results! => full details Bart Mellebeek’s PhD & publications
7 TransBooster Bart Mellebeeks PhD dissertation 2007
8 LaDEva LaDEva: Labelled Dependency Based Evaluation for MT ( ) Microsoft Ireland funded Basic Research Project PIs: Josef van Genabith/Andy Way Students: Karolina Owczarzak
9 LaDEva Basic Idea: Automatic evaluation methods extremely important for MT String-based MT evaluation (BLEU etc.) unfairly penalises perfectly valid - lexical variation/paraphrases - syntactic variation/paraphrases Compare: John resigned yesterday. Yesterday, John quit. Use labelled dependencies (instead of surface strings) for automatic evaluation
10 LaDEva LaDEva example (syntactic variation): Use WordNet and PBSMT alignments for lexical variation …
11 LaDEva LaDEva needs Very (!) robust dependency parsers that can parse MT output (as opposed to grammatical language) DCU GramLab treebank-based LFG parsers Microsoft Parsers WordNet, PBSMT alignments Evaluate LaDEva using BLEU NIST GTM Meteor in terms of correlation with human judgments
12 LaDEva
13 LaDEva Karolina Owczarzak’s PhD thesis 2008
14 GramLab GramLab (2001 – 2008) - Automatic Annotation of Penn-II Treenbank with LFG F-Structures ( ) Enterprise Ireland funded Basic Research Project Team: PI: Josef van Genabith, Col: Andy Way, Aoife Cahill, Mairead McCarthy, Mick Burke, Ruth O’Donovan - GramLab: Chinese, Japanese, Arabic, Spanish, French, German, English( ) Science Foundation Ireland funded Principal Investigatorship Team: PI: Josef van Genabith, Grzegorz Chrupala, Natalie Schluter, Ines Rehbein, Yuqing Guo, Masanori Oya, Amine Akrout, Dr. Aoife Cahill, Dr. Yaffa Al- Raheb, Dr. Deirdre Hogan, Dr. Sisay Adafre, Dr. Lamia Tounsi, Dr. Mohammed Attia
15 GramLab GramLab (2001 – 2008) Basic Idea: Handcrafting deep wide coverage grammars is time-consuming, expensive and difficult to scale to unrestricted text. Acquire grammars automatically from treebanks => shallow grammars New: acquire deep grammars automatically from treebanks
16 GramLab Shallow Grammar: defines language as set of strings and associates syntactic structure to string Deep Grammar: shallow grammar + maps strings to information (meaning, dependencies, predicate argument structure – “who did what to whom”) + non-local dependency resolution
17 GramLab
18 GramLab Probabilistic Parsing & Probabilistic Generation Used in MT Evaluation (Karo), Question Answering System (Sisay) Outperforms best hand-crafted resources (XLE, RASP) for English Lots of publications, including 2 Computational Linguistics Journal Papers, 6 ACL, COLING, EMNLP Papers ( ) Aoife Cahill, Michael Burke, Ruth O'Donovan, Stefan Riezler, Josef van Genabith and Andy Way, Wide-Coverage Deep Statistical Parsing using Automatic Dependency Structure Annotation in Computational Linguistics, 2008 Ruth O'Donovan, Michael Burke, Aoife Cahill, Josef van Genabith and Andy Way (2005) Large- Scale Induction and Evaluation of Lexical Resources from the Penn-II and Penn-III Treebanks, Computational Linguistics, 2005 Transfer-based probabilistic data-driven MT … (Yvette Graham) LORG industry strength parsers and generators for IE/IR & QA (Jennifer & Deirdre)