INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing.

Slides:



Advertisements
Similar presentations
Rule Refinement for Spoken Language Translation by Retrieving the Missing Translation of Content Words Linfeng Song, Jun Xie, Xing Wang, Yajuan Lü and.
Advertisements

Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Automatic Identification of Cognates, False Friends, and Partial Cognates University of Ottawa, Canada University of Ottawa, Canada.
Vamshi Ambati | Stephan Vogel | Jaime Carbonell Language Technologies Institute Carnegie Mellon University A ctive Learning and C rowd-Sourcing for Machine.
Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.
Orange: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation Chin-Yew Lin & Franz Josef Och (presented by Bilmes) or Orange: a.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
1 Language Model Adaptation in Machine Translation from Speech Ivan Bulyko, Spyros Matsoukas, Richard Schwartz, Long Nguyen, and John Makhoul.
1 Ensembles of Nearest Neighbor Forecasts Dragomir Yankov, Eamonn Keogh Dept. of Computer Science & Eng. University of California Riverside Dennis DeCoste.
Symmetric Probabilistic Alignment Jae Dong Kim Committee: Jaime G. Carbonell Ralf D. Brown Peter J. Jansen.
Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.
Quality-aware Collaborative Question Answering: Methods and Evaluation Maggy Anastasia Suryanto, Ee-Peng Lim Singapore Management University Aixin Sun.
Encouraging Consistent Translation Choices Ferhan Ture, Douglas W. Oard, Philip Resnik University of Maryland NAACL-HLT’12 June 5,
Predicting Missing Provenance Using Semantic Associations in Reservoir Engineering Jing Zhao University of Southern California Sep 19 th,
Multilayer Perceptron Classifier Combination for Identification of Materials on Noisy Soil Science Multispectral Images Fabricio A.
Alignment by Bilingual Generation and Monolingual Derivation Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Learning Phonetic Similarity for Matching Named Entity Translation and Mining New Translations Wai Lam, Ruizhang Huang, Pik-Shan Cheung ACM SIGIR 2004.
Language Knowledge Engineering Lab. Kyoto University NTCIR-10 PatentMT, Japan, Jun , 2013 Description of KYOTO EBMT System in PatentMT at NTCIR-10.
Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.
Named Entity Recognition based on Bilingual Co-training Li Yegang School of Computer, BIT.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Chinese Word Segmentation and Statistical Machine Translation Presenter : Wu, Jia-Hao Authors : RUIQIANG.
Lecture 20: Cluster Validation
SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.
Intent Subtopic Mining for Web Search Diversification Aymeric Damien, Min Zhang, Yiqun Liu, Shaoping Ma State Key Laboratory of Intelligent Technology.
The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.
NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.
Automatic Post-editing (pilot) Task Rajen Chatterjee, Matteo Negri and Marco Turchi Fondazione Bruno Kessler [ chatterjee | negri | turchi
Why Not Grab a Free Lunch? Mining Large Corpora for Parallel Sentences to Improve Translation Modeling Ferhan Ture and Jimmy Lin University of Maryland,
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Boosting Training Scheme for Acoustic Modeling Rong Zhang and Alexander I. Rudnicky Language Technologies Institute, School of Computer Science Carnegie.
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
N-best Reranking by Multitask Learning Kevin Duh Katsuhito Sudoh Hajime Tsukada Hideki Isozaki Masaaki Nagata NTT Communication Science Laboratories 2-4.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Cluster-specific Named Entity Transliteration Fei Huang HLT/EMNLP 2005.
INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese.
Iterative Translation Disambiguation for Cross Language Information Retrieval Christof Monz and Bonnie J. Dorr Institute for Advanced Computer Studies.
Chinese Word Segmentation Adaptation for Statistical Machine Translation Hailong Cao, Masao Utiyama and Eiichiro Sumita Language Translation Group NICT&ATR.
NRC Report Conclusion Tu Zhaopeng NIST06  The Portage System  For Chinese large-track entry, used simple, but carefully- tuned, phrase-based.
LREC 2008 Marrakech 29 May Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine.
Xinhao Wang, Jiazhong Nie, Dingsheng Luo, and Xihong Wu Speech and Hearing Research Center, Department of Machine Intelligence, Peking University September.
Yajuan Lü, Jin Huang and Qun Liu EMNLP, 2007 Presented by Mei Yang, May 12nd, 2008 Improving SMT Preformance by Training Data Selection and Optimization.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
From Text to Image: Generating Visual Query for Image Retrieval Wen-Cheng Lin, Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information.
Addressing the Rare Word Problem in Neural Machine Translation
Exploiting Ontologies for Automatic Image Annotation Munirathnam Srikanth, Joshua Varner, Mitchell Bowden, Dan Moldovan Language Computer Corporation SIGIR.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Imposing Constraints from the Source Tree on ITG Constraints for SMT Hirofumi Yamamoto, Hideo Okuma, Eiichiro Sumita National Institute of Information.
2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
A New Approach for English- Chinese Named Entity Alignment Donghui Feng Yayuan Lv Ming Zhou USC MSR Asia EMNLP-04.
Sunflower : a BTG based shift reduce decoder Introducer 宋林峰.
Finding τ → μ−μ−μ+ Decays at LHCb with Data Mining Algorithms
Combining Text and Image Queries at ImageCLEF2005: A Corpus-Based Relevance-Feedback Approach Yih-Cheng Chang Department of Computer Science and Information.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Discriminative Modeling extraction Sets for Machine Translation Author John DeNero and Dan KleinUC Berkeley Presenter Justin Chiu.
Large Vocabulary Data Driven MT: New Developments in the CMU SMT System Stephan Vogel, Alex Waibel Work done in collaboration with: Ying Zhang, Alicia.
Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,
1 A Statistical Matching Method in Wavelet Domain for Handwritten Character Recognition Presented by Te-Wei Chiang July, 2005.
Cross-language Projection of Dependency Trees Based on Constrained Partial Parsing for Tree-to-Tree Machine Translation Yu Shen, Chenhui Chu, Fabien Cromieres.
Automatically Labeled Data Generation for Large Scale Event Extraction
Monoligual Semantic Text Alignment and its Applications in Machine Translation Alon Lavie March 29, 2012.
Suggestions for Class Projects
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
Statistical Machine Translation Papers from COLING 2004
Presentation transcript:

INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing Technology Chinese Academy of Sciences

INSTITUTE OF COMPUTING TECHNOLOGY An Example 2

INSTITUTE OF COMPUTING TECHNOLOGY An Example 3 Initial MT system

INSTITUTE OF COMPUTING TECHNOLOGY An Example 4 Development set A:90% B:10% Initial MT systemTuned MT system that fits domain A The translation styles of A and B are quite different

INSTITUTE OF COMPUTING TECHNOLOGY An Example 5 Development set A:90% B:10% Initial MT systemTuned MT system that fits domain A Test set A:10% B:90%

INSTITUTE OF COMPUTING TECHNOLOGY An Example 6 Development set A:90% B:10% Initial MT systemTuned MT system that fits domain A Test set A:10% B:90% The translation style fits A, but we mainly want to translate B

INSTITUTE OF COMPUTING TECHNOLOGY Traditional Methods 7 Monolingual data with domain annotation

INSTITUTE OF COMPUTING TECHNOLOGY Traditional Methods 8 Monolingual data with domain annotation Domain recognizer

INSTITUTE OF COMPUTING TECHNOLOGY Traditional Methods 9 Bilingual training data

INSTITUTE OF COMPUTING TECHNOLOGY Traditional Methods 10 Bilingual training data Domain recognizer training data : domain A training data : domain B

INSTITUTE OF COMPUTING TECHNOLOGY Traditional Methods 11 Bilingual training data Domain recognizer training data : domain A training data : domain B MT system domain A MT system domain B

INSTITUTE OF COMPUTING TECHNOLOGY Traditional Methods 12 Test set

INSTITUTE OF COMPUTING TECHNOLOGY Traditional Methods 13 Domain recognizer Test set Test set domain A Test set domain B

INSTITUTE OF COMPUTING TECHNOLOGY Traditional Methods 14 The translation result MT system domain A MT system domain B Test set domain A Test set domain B The translation result domain A The translation result domain B

INSTITUTE OF COMPUTING TECHNOLOGY The merits Simple and effective Fits Human’s intuition 15

INSTITUTE OF COMPUTING TECHNOLOGY The drawbacks Classification Error (CE) Especially for unsupervised methods Supervised methods can make CE low, yet requiring annotation data limits its usage 16

INSTITUTE OF COMPUTING TECHNOLOGY Our motivation Jump out of the alley of doing adaptation directly Statistics methods (such as Bagging) can help. 17

INSTITUTE OF COMPUTING TECHNOLOGY The general framework of Bagging Preliminary 18

INSTITUTE OF COMPUTING TECHNOLOGY General framework of Bagging 19 Training set D

INSTITUTE OF COMPUTING TECHNOLOGY General framework of Bagging 20 C1 Training set D Training set D1Training set D2Training set D3 …… C2C3 ……

INSTITUTE OF COMPUTING TECHNOLOGY General framework of Bagging 21 C1C2C3 …… Test sample

INSTITUTE OF COMPUTING TECHNOLOGY General framework of Bagging 22 C1C2C3 …… Test sample Result of C1Result of C2Result of C3 …… Voting result

INSTITUTE OF COMPUTING TECHNOLOGY Our method 23

INSTITUTE OF COMPUTING TECHNOLOGY Training 24 A,A,A,B,B Suppose there is a development set For simplicity, there are only 5 sentences, 3 belong A, 2 belong B

INSTITUTE OF COMPUTING TECHNOLOGY Training 25 A,A,A,B,B A,B,B,B,B A,A,B,B,B A,A,A,B,B A,A,A,A,B …… We bootstrap N new development sets

INSTITUTE OF COMPUTING TECHNOLOGY Training 26 A,A,A,B,B A,B,B,B,B A,A,B,B,B A,A,A,B,B A,A,A,A,B MT system-1 …… MT system-2 MT system-3 MT system-4 MT system-5 …… For each set, a subsystem is tuned

INSTITUTE OF COMPUTING TECHNOLOGY Decoding 27 For simplicity, Suppose only 2 subsystem has been tuned Subsystem-1 W: Subsystem-1 W:

INSTITUTE OF COMPUTING TECHNOLOGY Decoding 28 Subsystem-1 W: Subsystem-1 W: A B Now a sentence “A B” needs a translation

INSTITUTE OF COMPUTING TECHNOLOGY Decoding 29 Subsystem-1 W: Subsystem-1 W: A B a b; a c; a b; a d; After translation, each system generate its N- best candidate

INSTITUTE OF COMPUTING TECHNOLOGY Decoding 30 a b; a c; a d; Fuse these N-best lists and eliminate deductions Subsystem-1 W: Subsystem-1 W: A B a b; a c; a b; a d;

INSTITUTE OF COMPUTING TECHNOLOGY Decoding 31 a b; a c; a d; Subsystem-1 W: Subsystem-1 W: A B a b; a c; a b; a d; Candidates are identical only if their target strings and feature values are entirely equal

INSTITUTE OF COMPUTING TECHNOLOGY Decoding 32 Calculate the voting score a b; a c; a d; Subsystem-1 W: Subsystem-1 W: a b; ; a b; ; a c; ; -0.1 a d; ; S represent the number of subsystems

INSTITUTE OF COMPUTING TECHNOLOGY Decoding 33 The one with the highest score wins a b; a c; a d; Subsystem-1 W: Subsystem-1 W: a b; ; a b; ; a c; ; -0.1 a d; ; -0.18

INSTITUTE OF COMPUTING TECHNOLOGY Decoding 34 The one with the highest score wins a b; a c; a d; Subsystem-1 W: Subsystem-1 W: a b; ; a b; ; a c; ; -0.1 a d; ; Since subsystems are different copies of the same model and share unique training data, calibration is unnecessary

INSTITUTE OF COMPUTING TECHNOLOGY Experiments 35

INSTITUTE OF COMPUTING TECHNOLOGY Basic Setups Data: NTCIR9 Chinese-English patent corpus 1k sentence pairs as development set Another 1k pairs as test set The remains are used for training System: hierarchical phrase based model Alignment: GIZA++ grow-diag-final 36

INSTITUTE OF COMPUTING TECHNOLOGY Effectiveness : Show and Prove Tune 30 subsystems using Bagging Tune 30 subsystems with random initial weight Evaluate the fusion results of the first N (N=5,10, 15, 20, 30) subsystems of both and compare 37

INSTITUTE OF COMPUTING TECHNOLOGY Results: 1-best 38 Number of subsystem +0.82

INSTITUTE OF COMPUTING TECHNOLOGY Results: 1-best 39 Number of subsystem +0.70

INSTITUTE OF COMPUTING TECHNOLOGY Results: Oracle 40 Number of subsystem +6.22

INSTITUTE OF COMPUTING TECHNOLOGY Results: Oracle 41 Number of subsystem +3.71

INSTITUTE OF COMPUTING TECHNOLOGY Compare with traditional methods Evaluate a supervised method For tackling data sparsity only operate on development set and test set Evaluate a unsupervised method Similar to Yamada (2007) To avoid data sparsity, only LM specific 42

INSTITUTE OF COMPUTING TECHNOLOGY Results 43

INSTITUTE OF COMPUTING TECHNOLOGY Conclusions Propose a bagging-based method to address multi-domain translation problem. Experiments shows that: Bagging is effective for domain adaptation problem Our method surpass baseline explicitly, and is even better than some traditional methods. 44

INSTITUTE OF COMPUTING TECHNOLOGY 45 Thank you for listening And any questions?