Download presentation
Presentation is loading. Please wait.
Published byPhilip Farmer Modified over 9 years ago
1
machine translation the Wiki way Bittlingmayer Adam Mathias 27 February 2007 University of Washington LING 575 – Machine Translation Машинен превод Strojový překlad Maskinoversættelse Maschinelle Übersetzung Maŝintradukado Traducción automática Itzulpengintza automatiko ترجمه ماشینی Konekäännin Traduction automatique תרגום מכונה Strojno prevođenje Gépi fordítás 機械翻訳 기계 번역 Terjemahan mesin Computervertaling Maskinoversettelse Tłumaczenie maszynowe Tradução automática Traducere automată Машинный перевод Maskinöversättning การแปลภาษาอัตโนมัติ 机器翻译
2
machine translation the Wiki way introduction to Wikipedia technical details and editing low-density languages parallelness of corpora named entities other entities disambiguation categorization problems papers
3
introduction to Wikipedia en.wikipedia.org
4
introduction to Wikipedia en.wikipedia.org Wikipedia (IPA: / ˌ wi ː ki ːˈ pi ː di.ə/ or / ˌ w ɪː ki ːˈ pi ː di.ə/) is a multilingual, Web- based, free content encyclopedia project. Wikipedia is written collaboratively by volunteers; its articles can be edited by anyone with access to the Web site.IPAWebfree contentencyclopedia volunteers
5
introduction to Wikipedia en.wikipedia.org the Wiki family lots of languages - unevenly distributed lots of topics – unevenly distributed growing fast respectability
6
technical details and editing technical details structure layout content rules tags and templates redirect and disambiguation markup
7
technical details and editing editing anyone locking and blocking disputes version control
8
technical details and editing Fei_Xia example
9
low-density languages predictably lacking X-English / English-X usually good using related languages
10
parallelness of corpora degrees determinants of parallelness mapping
11
named entities article titles abbreviations and acronyms place names company names personal names
12
other entities events dates titles technical terms
13
disambiguation
14
categorization
15
problems incompleteness inconsistency foreign words moving target
16
papers monolingual semantics errors and reliability WordNet using Wikipedia’s structure multilingual named entities parallel sentence generation
17
papers parallel sentence generation 1. compare with Babelfished version create aligned sentences with Babelfish pair off with best scoring sentence from the Wiki article 2. bootstrap from article titles create aligned sentences by replacing linked words with equivalent translate the rest by throwing shrinking N-grams into Wiki search pair off with best scoring sentence from the Wiki article
18
conclusions seed or bootstrap with traditional methods fill holes with Wikipedia hybrid systems lots of research to be done
19
questions general Chinese company names cn/hk/tw issues abbreviations/acronyms many languages with one writing system using links to find word divisions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.