CAS-IA System Description Jinhua Du CNGL July 23, 2008.

Slides:



Advertisements
Similar presentations
The University of Washington Machine Translation System for the IWSLT 2007 Competition Katrin Kirchhoff & Mei Yang Department of Electrical Engineering.
Advertisements

Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
TAP-ET: TRANSLATION ADEQUACY AND PREFERENCE EVALUATION TOOL Mark Przybocki, Kay Peterson, Sébastien Bronsart May LREC 2008 Marrakech, Morocco.
Automatic Identification of Cognates, False Friends, and Partial Cognates University of Ottawa, Canada University of Ottawa, Canada.
Linear Model Incorporating Feature Ranking for Chinese Documents Readability Gang Sun, Zhiwei Jiang, Qing Gu and Daoxu Chen State Key Laboratory for Novel.
Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.
Novel Reordering Approaches in Phrase-Based Statistical Machine Translation S. Kanthak, D. Vilar, E. Matusov, R. Zens & H. Ney ACL Workshop on Building.
Resources Distributors Linguistic Data Consortium NEMLAR (Network for Euro-Mediterranean LAnguage Resources)NEMLAR (Network for Euro-Mediterranean LAnguage.
“Applying Morphology Generation Models to Machine Translation” By Kristina Toutanova, Hisami Suzuki, Achim Ruopp (Microsoft Research). UW Machine Translation.
Minimum Error Rate Training in Statistical Machine Translation By: Franz Och, 2003 Presented By: Anna Tinnemore, 2006.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
MT Summit VIII, Language Technologies Institute School of Computer Science Carnegie Mellon University Pre-processing of Bilingual Corpora for Mandarin-English.
1 The Web as a Parallel Corpus  Parallel corpora are useful  Training data for statistical MT  Lexical correspondences for cross-lingual IR  Early.
Tapta4IPC: helping translation of IPC definitions Bruno Pouliquen 25 feb 2013, IPC workshop Translation.
A New Approach for Cross- Language Plagiarism Analysis Rafael Corezola Pereira, Viviane P. Moreira, and Renata Galante Universidade Federal do Rio Grande.
Machine translation Context-based approach Lucia Otoyo.
The CMU-UKA Statistical Machine Translation Systems for IWSLT 2007 Ian Lane, Andreas Zollmann, Thuy Linh Nguyen, Nguyen Bach, Ashish Venugopal, Stephan.
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Technical Report of NEUNLPLab System for CWMT08 Xiao Tong, Chen Rushan, Li Tianning, Ren Feiliang, Zhang Zhuyu, Zhu Jingbo, Wang Huizhen
Achieving Domain Specificity in SMT without Over Siloing William Lewis, Chris Wendt, David Bullock Microsoft Research Machine Translation.
Large Language Models in Machine Translation Conference on Empirical Methods in Natural Language Processing 2007 報告者:郝柏翰 2013/06/04 Thorsten Brants, Ashok.
Combining Lexical Semantic Resources with Question & Answer Archives for Translation-Based Answer Finding Delphine Bernhard and Iryna Gurevvch Ubiquitous.
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
Localizing Prestashop eCommerce Site with Wordfast
METEOR-Ranking & M-BLEU: Flexible Matching & Parameter Tuning for MT Evaluation Alon Lavie and Abhaya Agarwal Language Technologies Institute Carnegie.
Kyoshiro SUGIYAMA, AHC-Lab., NAIST An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering Kyoshiro Sugiyama, Masahiro.
Sustainability of the work and PANL10n network: Vision beyond 2010 Regional Conference on Localized ICT Development & Dissemination Across Asia PAN Localization.
SLTU 2014 – 4th Workshop on Spoken Language Technologies for Under-resourced Languages St. Petersburg, Russia KIT – University of the State.
INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing.
2010 Failures in Czech-English Phrase-Based MT 2010 Failures in Czech-English Phrase-Based MT Full text, acknowledgement and the list of references in.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Chinese Word Segmentation and Statistical Machine Translation Presenter : Wu, Jia-Hao Authors : RUIQIANG.
Enhanced Infrastructure for Creation & Collection of Translation Resources Zhiyi Song, Stephanie Strassel (speaker), Gary Krug, Kazuaki Maeda.
1 The LIG Arabic / English Speech Translation System at IWSLT07 Laurent BESACIER, Amar MAHDHAOUI, Viet-Bac LE LIG*/GETALP (Grenoble, France)
Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and.
The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.
Coşkun Mermer, Hamza Kaya, Mehmet Uğur Doğan National Research Institute of Electronics and Cryptology (UEKAE) The Scientific and Technological Research.
NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.
02/19/13English-Indian Language MT (Phase-II)1 English – Indian Language Machine Translation Anuvadaksh Phase – II - The SMT Team, CDAC Mumbai.
Advanced MT Seminar Spring 2008 Instructors: Alon Lavie and Stephan Vogel.
Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive.
Chinese Word Segmentation Adaptation for Statistical Machine Translation Hailong Cao, Masao Utiyama and Eiichiro Sumita Language Translation Group NICT&ATR.
NRC Report Conclusion Tu Zhaopeng NIST06  The Portage System  For Chinese large-track entry, used simple, but carefully- tuned, phrase-based.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
From Text to Image: Generating Visual Query for Image Retrieval Wen-Cheng Lin, Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information.
Addressing the Rare Word Problem in Neural Machine Translation
LREC 2004, 26 May 2004, Lisbon 1 Multimodal Multilingual Resources in the Subtitling Process S.Piperidis, I.Demiros, P.Prokopidis, P.Vanroose, A. Hoethker,
Imposing Constraints from the Source Tree on ITG Constraints for SMT Hirofumi Yamamoto, Hideo Okuma, Eiichiro Sumita National Institute of Information.
MACHINE TRANSLATION PAPER 1 Daniel Montalvo, Chrysanthia Cheung-Lau, Jonny Wang CS159 Spring 2011.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
微软亚洲研究院汉英翻译系统 CWMT2008 评测技术报告 张冬冬 李志灏 李沐 周明 微软亚洲研究院.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
Large Vocabulary Data Driven MT: New Developments in the CMU SMT System Stephan Vogel, Alex Waibel Work done in collaboration with: Ying Zhang, Alicia.
A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of.
A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong, et al. ACL 2009.
Intelligent Database Systems Lab Presenter : YU-TING LU Authors : Hsin-Chang Yang, Han-Wei Hsiao, Chung-Hong Lee IPM Multilingual document mining.
Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,
An Improved Hierarchical Word Sequence Language Model Using Word Association NARA Institute of Science and Technology Xiaoyi WuYuji Matsumoto.
DARPA TIDES MT Group Meeting Marina del Rey Jan 25, 2002 Alon Lavie, Stephan Vogel, Alex Waibel (CMU) Ulrich Germann, Kevin Knight, Daniel Marcu (ISI)
Chenchen Ding, Masao Utiyama, Eiichiro Sumita
Speaker : chia hua Authors : Long Qin, Ming Sun, Alexander Rudnicky
Ankit Srivastava CNGL, DCU Sergio Penkale CNGL, DCU
Urdu-to-English Stat-XFER system for NIST MT Eval 2008
Enabling System Elements
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
Memory-augmented Chinese-Uyghur Neural Machine Translation
Domain Mixing for Chinese-English Neural Machine Translation
The XMU SMT System for IWSLT 2007
1-P-30 Speech-to-Speech Translation using Dual Learning and Prosody Conversion Zhaojie Luo, Yoichi Takashima, Tetsuya Takiguchi, and Yasuo Ariki (Kobe.
Presentation transcript:

CAS-IA System Description Jinhua Du CNGL July 23, 2008

Outline Hardware in IA Pre-process & Data MT System Configuration for Evaluation Achievements Conclusions

Hardware Machines Parallel Computing –Condor –Grid Computing Module developed by ASR group TypeOperating SystemNumberCPURAM Desktop PCWindows 20039Pentium 4, 3.0G2.0G ServerLinux (Ubuntu)1Xeon 2.0G×416.0G

Pre-process & Data Pre-processing –encoding conversion & filter –punctuation and number conversion (full-shaped -> half-shaped, etc.) –case conversion (only the initial alphabet of the initial word), abbreviation processing –Chinese word segment (ICT or IA tool), English tokenization Data for NIST –Parallel: 3.4 M (if adds UN corpus, up to 10M) –Monolingual: 3.4M + 9.6M(gigaword1&2) + 1.4M(giga3) = 14.4M Data for IWSLT –Parallel: BTEC(20K or 40K); LDC –Monolingual: BTEC; Gigaword –Data Filter: only need the high correlation data, very important for spoken evaluation (More better data, more better performance)

System Configuration Modules –Pre-processing –Alignment Post-preprocessing & Models Generation –Decoding & MER Training –System Combination & Post-Processing

Achievements (zh-en) The 3 rd MT Symposia in China ( rank 3) –Limited (830K pairs) –Unlimited (3M pairs)

Achievements (zh-en) NIST MT Eval SystemBLEU-4IBM BLEU Primary ( combination ) HPB STTB PB

Achievements (zh-en) IWSLT2008 –More systems to be combined 2 PB systems developed by CASIA Moses SAMT (CMU) Hierarchical PB BTG-based system (Xiong) –Better performance (bleu+meteor)/ 2 bleumeteor (bleu+meteor)/ 2 bleumeteor tch.CRR nlpr.CRR

Conclusions More better data, better performance System combination is very helpful to improve the performance Evaluation is different from theoretical research: empirical methods and tricks are usually more effective For better rank, should be prepare in advance and build a temporarily team for evaluation Evaluation is a horrible thing for student: more time, more energy and no paper (joke but true) Develop systems for application purpose

Thanks