Named Entities in Domain Unlimited Speech Translation Alex Waibel, Stephan Vogel, Tanja Schultz Carnegie Mellon University Interactive Systems Labs.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Problems for Statistical MT Preprocessing Language modeling Translation modeling Decoding Parameter optimization Evaluation.
Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.
Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU.
Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, September 2004.
Center for Computational Learning Systems Independent research center within the Engineering School NLP people at CCLS: Mona Diab, Nizar Habash, Martin.
Tanja Schultz, Alan Black, Bob Frederking Carnegie Mellon University West Palm Beach, March 28, 2003 Towards Dolphin Recognition.
Jumping Off Points Ideas of possible tasks Examples of possible tasks Categories of possible tasks.
“Applying Morphology Generation Models to Machine Translation” By Kristina Toutanova, Hisami Suzuki, Achim Ruopp (Microsoft Research). UW Machine Translation.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
1 The Web as a Parallel Corpus  Parallel corpora are useful  Training data for statistical MT  Lexical correspondences for cross-lingual IR  Early.
An innovative platform to allow translation and indexing of internet sites Localization World
Search is not only about the Web An Overview on Printed Documents Search and Patent Search Walid Magdy Centre for Next Generation Localisation School of.
Prototype & Design Computer Inputs. How to Prototype & Design Computer Inputs Step 1: Review Input Requirements Step 2: Select the GUI Controls Step 3:
Encouraging Consistent Translation Choices Ferhan Ture, Douglas W. Oard, Philip Resnik University of Maryland NAACL-HLT’12 June 5,
® Automatic Scoring of Children's Read-Aloud Text Passages and Word Lists Klaus Zechner, John Sabatini and Lei Chen Educational Testing Service.
Word-subword based keyword spotting with implications in OOV detection Jan “Honza” Černocký, Igor Szöke, Mirko Hannemann, Stefan Kombrink Brno University.
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
The Evolution of Shared-Task Evaluation Douglas W. Oard College of Information Studies and UMIACS University of Maryland, College Park, USA December 4,
Transliteration Transliteration CS 626 course seminar by Purva Joshi Mugdha Bapat Aditya Joshi Manasi Bapat
Direct Translation Approaches: Statistical Machine Translation
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
Profile The METIS Approach Future Work Evaluation METIS II Architecture METIS II, the continuation of the successful assessment project METIS I, is an.
CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists Luo Si & Jamie Callan Language Technology Institute School of Computer.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
2012: Monolingual and Crosslingual SMS-based FAQ Retrieval Johannes Leveling CNGL, School of Computing, Dublin City University, Ireland.
17.0 Distributed Speech Recognition and Wireless Environment References: 1. “Quantization of Cepstral Parameters for Speech Recognition over the World.
Recent Major MT Developments at CMU Briefing for Joe Olive February 5, 2008 Alon Lavie and Stephan Vogel Language Technologies Institute Carnegie Mellon.
02/19/13English-Indian Language MT (Phase-II)1 English – Indian Language Machine Translation Anuvadaksh Phase – II - The SMT Team, CDAC Mumbai.
A Phonetic Search Approach to the 2006 NIST Spoken Term Detection Evaluation Roy Wallace, Robbie Vogt and Sridha Sridharan Speech and Audio Research Laboratory,
Improving out of vocabulary name resolution The Hanks David Palmer and Mari Ostendorf Computer Speech and Language 19 (2005) Presented by Aasish Pappu,
14/12/2009ICON Dipankar Das and Sivaji Bandyopadhyay Department of Computer Science & Engineering Jadavpur University, Kolkata , India ICON.
Alignment of Bilingual Named Entities in Parallel Corpora Using Statistical Model Chun-Jen Lee Jason S. Chang Thomas C. Chuang AMTA 2004.
NRC Report Conclusion Tu Zhaopeng NIST06  The Portage System  For Chinese large-track entry, used simple, but carefully- tuned, phrase-based.
UNSUPERVISED CV LANGUAGE MODEL ADAPTATION BASED ON DIRECT LIKELIHOOD MAXIMIZATION SENTENCE SELECTION Takahiro Shinozaki, Yasuo Horiuchi, Shingo Kuroiwa.
Xinhao Wang, Jiazhong Nie, Dingsheng Luo, and Xihong Wu Speech and Hearing Research Center, Department of Machine Intelligence, Peking University September.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
From Text to Image: Generating Visual Query for Image Retrieval Wen-Cheng Lin, Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information.
Results of the 2000 Topic Detection and Tracking Evaluation in Mandarin and English Jonathan Fiscus and George Doddington.
Haitham Elmarakeby.  Speech recognition
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Rapid Development in new languages Limited training data (6hrs) provided by NECTEC from 34 speakers, + 8 spks for development and test Romanization of.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
A New Approach for English- Chinese Named Entity Alignment Donghui Feng Yayuan Lv Ming Zhou USC MSR Asia EMNLP-04.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.
Language Technologies Capability Demonstration Alon Lavie, Lori Levin, Alex Waibel Language Technologies Institute Carnegie Mellon University CATANAL Planning.
Natural Language Processing Group Computer Sc. & Engg. Department JADAVPUR UNIVERSITY KOLKATA – , INDIA. Professor Sivaji Bandyopadhyay
Multilingual Search Shibamouli Lahiri
Large Vocabulary Data Driven MT: New Developments in the CMU SMT System Stephan Vogel, Alex Waibel Work done in collaboration with: Ying Zhang, Alicia.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
DARPA TIDES MT Group Meeting Marina del Rey Jan 25, 2002 Alon Lavie, Stephan Vogel, Alex Waibel (CMU) Ulrich Germann, Kevin Knight, Daniel Marcu (ISI)
LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.
Tight Coupling between ASR and MT in Speech-to-Speech Translation Arthur Chan Prepared for Advanced Machine Translation Seminar.
Search in Google's N-grams
RECENT TRENDS IN SMT By M.Balamurugan, Phd Research Scholar,
Statistical Machine Translation
Speaker : chia hua Authors : Long Qin, Ming Sun, Alexander Rudnicky
MARKETING PROCESS.
Tracking parameter optimization
Statistical Machine Translation Papers from COLING 2004
Statistical n-gram David ling.
LINGUA INGLESE 2A – a.a. 2018/2019 Computer-Aided Translation Technology LESSON 3 prof. ssa Laura Liucci –
PURE Learning Plan Richard Lee, James Chen,.
Word embeddings (continued)
AMTEXT: Extraction-based MT for Arabic
Translating Collocations for Bilingual Lexicons
Presentation transcript:

Named Entities in Domain Unlimited Speech Translation Alex Waibel, Stephan Vogel, Tanja Schultz Carnegie Mellon University Interactive Systems Labs

Nov 17, 2003ITIC MT Integration Meeting2 Objective Extraction and Translation of Arabic Named Entities from Speech Problem: –How do we do Domain-Unlimited Speech Translation? –What to do with Named Entities in Speech? Named Entities are Typically OoV’s  Recognizer will Replace it with a WRONG Word  Named Entity is Unlikely to be Handled Right Translation of Named Entities  Named Entities Frequently not in Lexicon

Nov 17, 2003ITIC MT Integration Meeting3 Approach – Speech Translation Piggy-Back on STR-DUST (NSF-ITR Project): –Speech Translation on Domain Unlimited Speech Tasks Approach: –Recognition: Statistical Speech Recognition –Consolidation: Statistical Reduction and Extraction –Translation: Statistical MT Opportunity: –Cascade of Statistical Source-Channel Models –Integration and Optimization –Combine and Compute Joint Models –Working with Errors: Lattices to Communicate between Modules

Nov 17, 2003ITIC MT Integration Meeting4 Approach – Named Entities Two Pass Decoding Strategy –OoV’s in Speech: Recover Named Entity in Dictionary –Identify Relevant Names from Very Large Name Lists –Search for Relevant New Names on Internet –Insert Named Entities in Dictionary, Iterate New Word Model –Model Unseen Words by New-Word-Model –Assign Named Entity Tag to New-Word –Bi-Lingual Named Entity Tagging Recover Named Entity –Identify Relevant Names from Translation Output –IR of Relevant Texts in Target Language –Use Transliteration Model to Update Lexicon

Nov 17, 2003ITIC MT Integration Meeting5 Input/Output Input: –Speech in source language (Arabic) –Text in source language (Arabic) Output: –English translation of transcript –English translation of extracted entities القاعدة بزعامة أسامة بن لادن الهجومين اللذين استهدفا كنيسين يهوديين في إسطنبول واللذين أسفرا عن مقتل 23 شخصا وإصابة 300 آخرين. وهدد البيان بتوجيه مزيد من الضربات للولايات المتحدة وحلفائها في جميع أنحاء العالم. NE Search and Translation Name: Abu Hafz Orgnz: al-Qaida Location: Baghdad Reco

Nov 17, 2003ITIC MT Integration Meeting6 Evaluation Correct Named Entity Detection –Word Correct from Arabic Speech –NE-Tag Correct from Arabic Transcript Correct Translation –Of Output Text (NIST, Bleu) –Of Output Named Entity

Nov 17, 2003ITIC MT Integration Meeting7 First Results NE Translation (Chinese) Test data: 887 sentences Small track NIST score Large track NIST score baseline Offline NE Online NE Online NE translation gives improvements for both tracks 2.Online NE translation works better on uncommon NE translation, and gives more improvement