Machine translation the Wiki way Bittlingmayer Adam Mathias 27 February 2007 University of Washington LING 575 – Machine Translation Машинен превод Strojový.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

On-line Compilation of Comparable Corpora and Their Evaluation Radu ION, Dan TUFIŞ, Tiberiu BOROŞ, Alexandru CEAUŞU and Dan ŞTEFĂNESCU Research Institute.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Day 6 Wikipedia Intro Let’s go over the pro’s and con’s of using Wikipedia as a resource.
Mining Wiki Resources for Multilingual Named Entity Recognition Alexander E. Richman & Patrick Schone Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
Wikipedia Eng 352 Prof. Lipuma Librarian, Davida Scharf My office is in the Van Houten Library, ground floor.
Creating a Bilingual Ontology: A Corpus-Based Approach for Aligning WordNet and HowNet Marine Carpuat Grace Ngai Pascale Fung Kenneth W.Church.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.
1 Today  Tools (Yves)  Efficient Web Browsing on Hand Held Devices (Shrenik)  Web Page Summarization using Click- through Data (Kathy)  On the Summarization.
2007 Bouvet ASA1 A Topic Maps Wiki Lars Marius Garshol TMRA
Natural Language Processing Lab Northeastern University, China Feiliang Ren EBMT Based on Finite Automata State Transfer Generation Feiliang Ren.
Machine translation Context-based approach Lucia Otoyo.
Multilingual Word Sense Disambiguation using Wikipedia Bharath Dandala (University of North Texas) Rada Mihalcea (University of North Texas) Razvan Bunescu.
Multilingual Synchronization focusing on Wikipedia
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.
CLEF Ǻrhus Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Oier Lopez de Lacalle, Arantxa Otegi, German Rigau UVA & Irion: Piek Vossen.
Exploiting Wikipedia as External Knowledge for Document Clustering Sakyasingha Dasgupta, Pradeep Ghosh Data Mining and Exploration-Presentation School.
EOrganic Workspace Training Vegetable Group Intensive January
SIWM WEB WRITING TASK Stephanie. INTRODUCTION Title of the Article Source of the Article Content of Article Reasons for Choosing the Article Target Audience.
Wikis Chanaka Wickramasinghe Library Assistant /NSLRC Web based information dissemination:
Sarasota Policy Wiki Why Wiki? To provide a new platform for community input on public policies and issues. To encourage engagement.
Combining Lexical Semantic Resources with Question & Answer Archives for Translation-Based Answer Finding Delphine Bernhard and Iryna Gurevvch Ubiquitous.
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
Essay Pointers. Essay Grading Rubric  Composition (25%)  Subject Knowledge (25%)  Contribution (25%)  Reference and Citation (25%)  Composition (25%)
Related terms search based on WordNet / Wiktionary and its application in ontology matching RCDL'2009 St. Petersburg Institute for Informatics and Automation.
Enhanced Infrastructure for Creation & Collection of Translation Resources Zhiyi Song, Stephanie Strassel (speaker), Gary Krug, Kazuaki Maeda.
 Journals  Magazines  Newspapers  Reference sources  Websites  Databases.
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
By Josué A. Ruiz Rodriguez Wyatt Lugo Caballero.  What do you understand about Web tool?
Evaluating Websites November Don’t view the Internet as: a one stop information and research center the only place to look for information a place.
Literacy in Information: Evaluating Internet Resources Jennifer Fendrick & Nicole Christensen In order to properly evaluate a website, the.
Title of Presentation Name Date 2011 Presentation Template.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
Tajik Wikipedia Free Encyclopedia Ibrahim Rustamov Note: To view pages on the Internet properly with all Tajik letters, please.
Blogs, Wikis and Podcasting  By Zach, Andrew and Sam.
Understanding User’s Query Intent with Wikipedia G 여 승 후.
For Wednesday No reading Homework –Chapter 23, exercise 15 –Process: 1.Create 5 sentences 2.Select a language 3.Translate each sentence into that language.
LOGO 1 Corroborate and Learn Facts from the Web Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Shubin Zhao, Jonathan Betz (KDD '07 )
Word Translation Disambiguation Using Bilingial Bootsrapping Paper written by Hang Li and Cong Li, Microsoft Research Asia Presented by Sarah Hunter.
An Iterative Approach to Extract Dictionaries from Wikipedia for Under-resourced Languages G. Rohit Bharadwaj Niket Tandon Vasudeva Varma Search and Information.
CLEF Kerkyra Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Arantxa Otegi UNIPD: Giorgio Di Nunzio UH: Thomas Mandl.
W HAT IS A W IKIPEDIA ? An online encyclopedia that has millions of articles relating to just about any subject. Some popular topics are: academics current.
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.
Multilingual Synchronization focusing on Wikipedia
Making a Bibliography Using the Correct Format. Citing Sources Who What When Where Author (last name first) Title (book, website, webpage) Date published,
Mining Wiki Resoures for Multilingual Named Entity Recognition Xiej un
UI's for inputting and presenting the metadata of hypermedia documents Kai Kuikkaniemi HUT T
LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.
Lesson 6, Unit 3 Using the Internet for Research Based on the Plan Ahead educational materials made available by Gap Inc. at and.
Using Wikipedia for Hierarchical Finer Categorization of Named Entities Aasish Pappu Language Technologies Institute Carnegie Mellon University PACLIC.
Electronic Resources Where Do I Start? Choose unique keywords Use the best tool for the job Consider the source Use multiple sources.
Using Wikis in Education An introduction to the use of wikis as a collaborative content development tool for learning.
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
All About Wikis. What is a Wiki? A wiki is a tool for collaboration, information sharing and knowledge/content management.
CSC 4181 Compiler Construction
A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of.
What is a Wiki? A wiki is an online database that can be edited by anyone with access to it. “ Wiki ” is Hawaiian meaning ‘ fast ’ or ‘ quick ’
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Fundamental Writer’s Rules This is a scholarly paper. It is not an anecdotal account of your practice. The paper must rest on facts established by formal.
Exploiting Wikipedia as External Knowledge for Document Clustering
Wikipedia, the free encyclopedia
Prof. Adam Meyers: Proteus Project
Statistical n-gram David ling.
2018 Making Bibliographies
Evaluating sources.
Bedouin Article On a sheet of paper, write a title “Bedouin Article.” Follow the directions for the specific paragraph on that sheet of paper. Number each.
A long written work by an expert, giving a broad overview of a topic, aimed at students. Textbook.
USE WIKIPEDIA TO SEARCH INFORMATION
Presentation transcript:

machine translation the Wiki way Bittlingmayer Adam Mathias 27 February 2007 University of Washington LING 575 – Machine Translation Машинен превод Strojový překlad Maskinoversættelse Maschinelle Übersetzung Maŝintradukado Traducción automática Itzulpengintza automatiko ترجمه ماشینی Konekäännin Traduction automatique תרגום מכונה Strojno prevođenje Gépi fordítás 機械翻訳 기계 번역 Terjemahan mesin Computervertaling Maskinoversettelse Tłumaczenie maszynowe Tradução automática Traducere automată Машинный перевод Maskinöversättning การแปลภาษาอัตโนมัติ 机器翻译

machine translation the Wiki way introduction to Wikipedia technical details and editing low-density languages parallelness of corpora named entities other entities disambiguation categorization problems papers

introduction to Wikipedia en.wikipedia.org

introduction to Wikipedia en.wikipedia.org Wikipedia (IPA: / ˌ wi ː ki ːˈ pi ː di.ə/ or / ˌ w ɪː ki ːˈ pi ː di.ə/) is a multilingual, Web- based, free content encyclopedia project. Wikipedia is written collaboratively by volunteers; its articles can be edited by anyone with access to the Web site.IPAWebfree contentencyclopedia volunteers

introduction to Wikipedia en.wikipedia.org the Wiki family lots of languages - unevenly distributed lots of topics – unevenly distributed growing fast respectability

technical details and editing technical details structure layout content rules tags and templates redirect and disambiguation markup

technical details and editing editing anyone locking and blocking disputes version control

technical details and editing Fei_Xia example

low-density languages predictably lacking X-English / English-X usually good using related languages

parallelness of corpora degrees determinants of parallelness mapping

named entities article titles abbreviations and acronyms place names company names personal names

other entities events dates titles technical terms

disambiguation

categorization

problems incompleteness inconsistency foreign words moving target

papers monolingual semantics errors and reliability WordNet using Wikipedia’s structure multilingual named entities parallel sentence generation

papers parallel sentence generation 1. compare with Babelfished version create aligned sentences with Babelfish pair off with best scoring sentence from the Wiki article 2. bootstrap from article titles create aligned sentences by replacing linked words with equivalent translate the rest by throwing shrinking N-grams into Wiki search pair off with best scoring sentence from the Wiki article

conclusions seed or bootstrap with traditional methods fill holes with Wikipedia hybrid systems lots of research to be done

questions general Chinese company names cn/hk/tw issues abbreviations/acronyms many languages with one writing system using links to find word divisions