Download presentation
Presentation is loading. Please wait.
Published byChastity Cummings Modified over 9 years ago
1
Tapta4IPC: helping translation of IPC definitions Bruno Pouliquen (Bruno.Pouliquen@wipo.int)Bruno.Pouliquen@wipo.int 25 feb 2013, IPC workshop Translation assistant for patent titles and abstracts in PATENTSCOPE - potential use in translating IPC definitions collaboration
2
Statistical Machine Translation: bottom-up approach no rules, no grammar, no dictionary, no terminology, only the parallel texts (bitexts) We use an open-source system: Moses Tapta: Translation of Patent Titles and Abstract Originally built to translate patent applications Adapted to various applications Introduction data system
3
Our system prepares the data for Moses, apply some post-processing (filter, pruning, binarization, optimization…) and offers a Web interface to translate Tapta framework clean re-clean train-model post-filter prunebinarizeoptimize Publish source language Bitexts Gather/convert data target language
4
Introduction: Tapta In WIPO, as part of Patentscope (English,French,German,Chinese,Japanese) eg. http://patentscope.wipo.int/translate/simpleTranslate.jsf?id=JP75694586&langpair=jaen http://patentscope.wipo.int/translate/simpleTranslate.jsf?id=JP75694586&langpair=jaen Automatic translation of a patent application only available in Japanese… In United Nations (English from/into Arabic,French,Spanish,Russian & Chinese)
5
Technical workflow Moses’ training phrase table reordering model Moses decoder Translation server EnEs Strengthening of forum for human dignity : legal aid Fortalecimiento del foro para la dignidad humana – asistencia jurídica must respect all aspects of human dignity debe respetar todos los aspectos de la dignidad humana should fully respect human dignity se deben respetar plenamente la dignidad humana Translation client language model Filter align. Tokenization Score alignment Filter wrong language Sentence-split Sentence-align Filter align. Filter wrong language Bitexts aligned at sentence level source language Bitexts target language
6
IPC context Gather data: – Get existing definitions – Add IPC schema (xml on WIPO website) – Add “few” texts from patents “learn” translation model Translate new texts
7
Get existing data, build parallel texts Wheel guards WO/2013/014517 (EN) TYRE FOR VEHICLE WHEELS (FR) PNEUMATIQUE POUR ROUES DE VÉHICULE IPC schema… Patent texts… Couvre-roues Wheelsroues Wheel guardsCouvre-roues Tyre for vehicle wheelsPneumatique pour roues de véhicule Existing definitions… Bitext: training material…
8
How well it works? Automatic evaluation: BLEU score Principle : similarity of n-grams between evaluated and reference sentences On IPC definition English-French: bleu=48% (without patent data: 44%) Good quality needs human post-editing
9
Tapta4IPC prototype (1) Live demo using: http://patentscope.wipo.int/translateUN/translateIPC.jsf
10
http://fulty3.wipo.int:8080/Wtapta/translateIPC.jsf Tapta4IPC prototype (2)
11
Conclusion / future work This is a prototype, but the quality looks already acceptable Human evaluation? Better integrate the tool In PCA6TRANSDEF ? Other languages?
12
Tapta4IPC in various languages Tapta4IPC should work reasonably well on the following languages (we have built some language specific tools and we have patent corpora): German Japanese Korean Spanish Dutch Portuguese Chinese Russian More challenging: Czech, Slovak, Polish (many word forms, training corpus?) Estonian (even more word forms, would in theory require more training corpus) Other languages: Arabic, Italian, Danish, Swedish etc.
13
Thank you for your attention شكرا لكم على اهتمامكم Merci pour votre attention! 感谢您的关注 Grazie per la vostra attenzione! ¡ Gracias por su atención ! Vielen Dank für Ihre Aufmerksamkeit! Obrigado pela vossa atenção! Dziękuję bardzo za Państwa uwagę! Děkujeme za Vaši pozornost! Ďakujem ti veľmi pekne za tvoju pozornosť Tänan tähelepanu eest! Благодарим за Вашето внимание! Tak for Jeres opmærksomhed! Thank you for your attention!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.