INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing.

INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing Technology Chinese Academy of Sciences

INSTITUTE OF COMPUTING TECHNOLOGY An Example 2

INSTITUTE OF COMPUTING TECHNOLOGY An Example 3 Initial MT system

INSTITUTE OF COMPUTING TECHNOLOGY An Example 4 Development set A:90% B:10% Initial MT systemTuned MT system that fits domain A The translation styles of A and B are quite different

INSTITUTE OF COMPUTING TECHNOLOGY An Example 5 Development set A:90% B:10% Initial MT systemTuned MT system that fits domain A Test set A:10% B:90%

INSTITUTE OF COMPUTING TECHNOLOGY An Example 6 Development set A:90% B:10% Initial MT systemTuned MT system that fits domain A Test set A:10% B:90% The translation style fits A, but we mainly want to translate B

INSTITUTE OF COMPUTING TECHNOLOGY Traditional Methods 7 Monolingual data with domain annotation

INSTITUTE OF COMPUTING TECHNOLOGY Traditional Methods 8 Monolingual data with domain annotation Domain recognizer

INSTITUTE OF COMPUTING TECHNOLOGY Traditional Methods 9 Bilingual training data

INSTITUTE OF COMPUTING TECHNOLOGY Traditional Methods 10 Bilingual training data Domain recognizer training data : domain A training data : domain B

INSTITUTE OF COMPUTING TECHNOLOGY Traditional Methods 11 Bilingual training data Domain recognizer training data : domain A training data : domain B MT system domain A MT system domain B

INSTITUTE OF COMPUTING TECHNOLOGY Traditional Methods 12 Test set

INSTITUTE OF COMPUTING TECHNOLOGY Traditional Methods 13 Domain recognizer Test set Test set domain A Test set domain B

INSTITUTE OF COMPUTING TECHNOLOGY Traditional Methods 14 The translation result MT system domain A MT system domain B Test set domain A Test set domain B The translation result domain A The translation result domain B

INSTITUTE OF COMPUTING TECHNOLOGY The merits Simple and effective Fits Human’s intuition 15

INSTITUTE OF COMPUTING TECHNOLOGY The drawbacks Classification Error (CE) Especially for unsupervised methods Supervised methods can make CE low, yet requiring annotation data limits its usage 16

INSTITUTE OF COMPUTING TECHNOLOGY Our motivation Jump out of the alley of doing adaptation directly Statistics methods (such as Bagging) can help. 17

INSTITUTE OF COMPUTING TECHNOLOGY The general framework of Bagging Preliminary 18

INSTITUTE OF COMPUTING TECHNOLOGY General framework of Bagging 19 Training set D

INSTITUTE OF COMPUTING TECHNOLOGY General framework of Bagging 20 C1 Training set D Training set D1Training set D2Training set D3 …… C2C3 ……

INSTITUTE OF COMPUTING TECHNOLOGY General framework of Bagging 21 C1C2C3 …… Test sample

INSTITUTE OF COMPUTING TECHNOLOGY General framework of Bagging 22 C1C2C3 …… Test sample Result of C1Result of C2Result of C3 …… Voting result

INSTITUTE OF COMPUTING TECHNOLOGY Our method 23

INSTITUTE OF COMPUTING TECHNOLOGY Training 24 A,A,A,B,B Suppose there is a development set For simplicity, there are only 5 sentences, 3 belong A, 2 belong B

INSTITUTE OF COMPUTING TECHNOLOGY Training 25 A,A,A,B,B A,B,B,B,B A,A,B,B,B A,A,A,B,B A,A,A,A,B …… We bootstrap N new development sets

INSTITUTE OF COMPUTING TECHNOLOGY Training 26 A,A,A,B,B A,B,B,B,B A,A,B,B,B A,A,A,B,B A,A,A,A,B MT system-1 …… MT system-2 MT system-3 MT system-4 MT system-5 …… For each set, a subsystem is tuned

INSTITUTE OF COMPUTING TECHNOLOGY Decoding 27 For simplicity, Suppose only 2 subsystem has been tuned Subsystem-1 W: Subsystem-1 W:

INSTITUTE OF COMPUTING TECHNOLOGY Decoding 28 Subsystem-1 W: Subsystem-1 W: A B Now a sentence “A B” needs a translation

INSTITUTE OF COMPUTING TECHNOLOGY Decoding 29 Subsystem-1 W: Subsystem-1 W: A B a b; a c; a b; a d; After translation, each system generate its N- best candidate

INSTITUTE OF COMPUTING TECHNOLOGY Decoding 30 a b; a c; a d; Fuse these N-best lists and eliminate deductions Subsystem-1 W: Subsystem-1 W: A B a b; a c; a b; a d;

INSTITUTE OF COMPUTING TECHNOLOGY Decoding 31 a b; a c; a d; Subsystem-1 W: Subsystem-1 W: A B a b; a c; a b; a d; Candidates are identical only if their target strings and feature values are entirely equal

INSTITUTE OF COMPUTING TECHNOLOGY Decoding 32 Calculate the voting score a b; a c; a d; Subsystem-1 W: Subsystem-1 W: a b; ; -0.16 a b; ; +0.04 a c; ; -0.1 a d; ; -0.18 S represent the number of subsystems

INSTITUTE OF COMPUTING TECHNOLOGY Decoding 33 The one with the highest score wins a b; a c; a d; Subsystem-1 W: Subsystem-1 W: a b; ; -0.16 a b; ; +0.04 a c; ; -0.1 a d; ; -0.18

INSTITUTE OF COMPUTING TECHNOLOGY Decoding 34 The one with the highest score wins a b; a c; a d; Subsystem-1 W: Subsystem-1 W: a b; ; -0.16 a b; ; +0.04 a c; ; -0.1 a d; ; -0.18 Since subsystems are different copies of the same model and share unique training data, calibration is unnecessary

INSTITUTE OF COMPUTING TECHNOLOGY Experiments 35

INSTITUTE OF COMPUTING TECHNOLOGY Basic Setups Data: NTCIR9 Chinese-English patent corpus 1k sentence pairs as development set Another 1k pairs as test set The remains are used for training System: hierarchical phrase based model Alignment: GIZA++ grow-diag-final 36

INSTITUTE OF COMPUTING TECHNOLOGY Effectiveness : Show and Prove Tune 30 subsystems using Bagging Tune 30 subsystems with random initial weight Evaluate the fusion results of the first N (N=5,10, 15, 20, 30) subsystems of both and compare 37

INSTITUTE OF COMPUTING TECHNOLOGY Results: 1-best 38 Number of subsystem +0.82

INSTITUTE OF COMPUTING TECHNOLOGY Results: 1-best 39 Number of subsystem +0.70

INSTITUTE OF COMPUTING TECHNOLOGY Results: Oracle 40 Number of subsystem +6.22

INSTITUTE OF COMPUTING TECHNOLOGY Results: Oracle 41 Number of subsystem +3.71

INSTITUTE OF COMPUTING TECHNOLOGY Compare with traditional methods Evaluate a supervised method For tackling data sparsity only operate on development set and test set Evaluate a unsupervised method Similar to Yamada (2007) To avoid data sparsity, only LM specific 42

INSTITUTE OF COMPUTING TECHNOLOGY Results 43

INSTITUTE OF COMPUTING TECHNOLOGY Conclusions Propose a bagging-based method to address multi-domain translation problem. Experiments shows that: Bagging is effective for domain adaptation problem Our method surpass baseline explicitly, and is even better than some traditional methods. 44

INSTITUTE OF COMPUTING TECHNOLOGY 45 Thank you for listening And any questions?

INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing.

Similar presentations

Presentation on theme: "INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing.

Similar presentations

Presentation on theme: "INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing."— Presentation transcript:

Similar presentations

About project

Feedback