Presentation is loading. Please wait.

Presentation is loading. Please wait.

Current Trends in MT Andy Way NCLT, School of Computing, Dublin City University, Dublin 9, Ireland

Similar presentations


Presentation on theme: "Current Trends in MT Andy Way NCLT, School of Computing, Dublin City University, Dublin 9, Ireland"— Presentation transcript:

1 Current Trends in MT Andy Way NCLT, School of Computing, Dublin City University, Dublin 9, Ireland away@computing.dcu.ie www.nclt.dcu.ie/mt/

2 NCLT, Dublin, April 20072 Overview of Talk Current Trends From EACL-06 to ACL-07 Topics Country of Origin Ongoing and Future Work at DCU Other Important Research Future General Directions Increased convergence within MT Increased convergence between MT and rest of NLP Concluding Remarks

3 NCLT, Dublin, April 20073 Current Trends EACL-06 MT Track featured 24 papers in a number of areas: SMT8 Evaluation5 Word Alignment 4 Applications2 Lexicon/WSD1 RBMT1 EBMT1 Corpus Building 1 Hybrid MT1

4 NCLT, Dublin, April 20074 Current Trends: Country of Origin Of the 24 MT papers: 18 (75%) were from Europe 6 from UK 6 from Spain 3 from Germany 1 each from Romania, Italy & Ireland 6 (25%) were from N. America (5 from USA) 0 were from Asia

5 NCLT, Dublin, April 20075 Current Trends: Success Rates (by Country) Of the 24 MT papers, 7 (29%) were accepted (general EACL acceptance rate 19.7%: 52/264) 2 from USA (out 0f 5) 2 from Germany (out of 3) 1 from UK (out of 6) 1 from Romania (out of 1) 1 from Canada (out of 1)

6 NCLT, Dublin, April 20076 Current Trends: Success Rates (by Topic) Of the 7 accepted MT papers 2 were on SMT (out of 8) 2 were on word alignment (out of 4) 2 were on evaluation (out of 5) 1 was on hybrid MT (out of 1)

7 NCLT, Dublin, April 20077 Current Trends ACL-07 MT Track features 67 papers in a number of areas: SMT29 Word Alignment 10 Evaluation9 Lexicon/WSD6 Tree  String4 RBMT3 EBMT2 Corpus Building 2 Hybrid MT1 Applications1

8 NCLT, Dublin, April 20078 Current Trends ACL-07 SMT Track features 29 papers in a number of areas: General Issues11 Reordering5 Parsing/ Structure5 Phrases3 LM2 Decoding2 Sent. Alignment 1

9 NCLT, Dublin, April 20079 Current Trends: Summary of Themes Of the 67 MT papers: 54 (80%) involve corpus-based MT 9 (13%) involve evaluation 3 (4%) involve RBMT

10 NCLT, Dublin, April 200710 Current Trends: Country of Origin Of the 67 MT papers: 32 (48%) are from Asia 19 (28%) are from N. America (18 from USA) 16 (24%) are from Europe

11 NCLT, Dublin, April 200711 Current Trends: Country of Origin Of the 32 papers from Asia: China20 Taiwan3 Japan3 India2 Hong Kong1 Korea1 Thailand1 Singapore1

12 NCLT, Dublin, April 200712 Current Trends: Country of Origin Of the 16 papers from Europe: Spain 3 Ireland 3 UK 2 Germany 2 France 1 Italy 1 Denmark 1 Turkey 1 Czech. Rep. 1 Hungary 1

13 NCLT, Dublin, April 200713 Change 06—07 (by Topic)

14 NCLT, Dublin, April 200714 Change 06—07 (by Country)

15 NCLT, Dublin, April 200715 Current Trends: Success Rates (by Country) Of the 67 MT papers, 17 were accepted accepted (25.4%; overall acceptance rate 22.4%) from the following countries: USA: 8 (out of 18) China: 3 (out of 20) Ireland: 2 (out of 3) UK: 2 (out of 2) Canada: 1 (out of 1) Singapore: 1 (out of 1)

16 NCLT, Dublin, April 200716 Current Trends: Success Rates (by Topic) Of the 17 successful MT papers: 3 were on language modelling/decoding 2 were on evaluation 2 were on word alignment 2 were on reordering 1 was on word-sense disambiguation 1 was on tree  string models 1 was on SMT via pivot languages 1 was on multi-parallel corpora 1 was on hybrid MT 1 was on transductive learning

17 NCLT, Dublin, April 200717 Consequences of these Trends The ‘system’ is at breaking point Do we need a pre-selection phase? As in many other areas, a ‘new world order’ is emerging There is very little internal QA as yet Standard of English and basic structure is lacking But … they’re doing OK already, and they’ll improve! Relatively few ‘world centres’ in MT at present Despite massive increase in MT use, big decrease in teaching of MT – paradox!

18 NCLT, Dublin, April 200718 Ongoing Work in DCU Integrating Syntax into SMT –Supertag translation and target language models –Adding source language information –Tree-to-Tree Translation (DOT, LFG-DOT: also tree  string models), inc. porting monolingual parsing techniques to the bilingual case Applications –Automatic Translation of DVD subtitles –Sign-Language MT –Large-Scale Open Evaluation (inc. parallel computation) New Language Pairs, Corpora etc.

19 NCLT, Dublin, April 200719 System Development SystemLang. Pairs#Sent. Pairs Gaijin ‘97EN  DE1836 wEBMT ‘03FR  EN219,ooo Penn-II NPs & VPs TMI-04FR  EN203,000 ACL-05FR  EN322,000 MaTreX OpenLabES  EN958,000 MaTreX NIST-06Chinese  EN Arabic  EN 3,000,000

20 NCLT, Dublin, April 200720 Ongoing Work in DCU (cont’d) Dependency- (and Semantically) Marked-Up Corpora New models of Word Alignment New integrated models of subtree/substring alignment New dependency-based Evaluation metrics New Decoders –EBMT –Memory-Based Open-Source Components

21 NCLT, Dublin, April 200721 Ongoing Work in DCU (cont’d) Collaborative work: Tilburg (Memory-based Decoding) Donostia (Basque MT) Aachen (Sign-Language MT) Amsterdam (Integrating Syntax & SMT) St. Andrew’s (DOT) Edinburgh (SMT) CMU (Hybrid SMT—EBMT)

22 NCLT, Dublin, April 200722 Future Work in DCU Spoken Language Translation

23 NCLT, Dublin, April 200723 Future Work in DCU MT via SMS Automatic Interpreting Enhanced hybrid models Scalability Tuning MT to text type & genre MT using Pivot languages (‘triangulation’) Better quality phrases (cf. CONLL monolingual chunking shared task) …

24 NCLT, Dublin, April 200724 Future General Directions Corpus Building (integrating syntax, semantics … discourse …) –cf. data size vs. data quality … –Filtering/pruning training data (‘safe’ alignments) Word Alignment Language Modelling Decoding Evaluation Methods Large-scale Open Evaluations Further Convergence between models

25 NCLT, Dublin, April 200725 Dekai Wu’s 3D MT Space

26 NCLT, Dublin, April 200726 Convergence between MT and Rest of NLP For some time now not many MT researchers doing syntax and vice-versa. With move (back) to trees instead of strings: –Reconnect with wealth of tree automata literature –Get lots of implemented algorithms for free!

27 NCLT, Dublin, April 200727 Concluding Remarks So … there’s plenty for us still to do! Two worries: MT R&D seems to be at an all-time high, yet we’re not teaching MT any more. Most (S)MT people come from different backgrounds, but huge danger that some people are merely reinventing the wheel …

28 NCLT, Dublin, April 200728 Thanks! The end beginning !


Download ppt "Current Trends in MT Andy Way NCLT, School of Computing, Dublin City University, Dublin 9, Ireland"

Similar presentations


Ads by Google