Download presentation
Presentation is loading. Please wait.
Published bySharleen Kelley Modified over 9 years ago
1
INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing Technology Chinese Academy of Sciences
2
INSTITUTE OF COMPUTING TECHNOLOGY An Example 2
3
INSTITUTE OF COMPUTING TECHNOLOGY An Example 3 Initial MT system
4
INSTITUTE OF COMPUTING TECHNOLOGY An Example 4 Development set A:90% B:10% Initial MT systemTuned MT system that fits domain A The translation styles of A and B are quite different
5
INSTITUTE OF COMPUTING TECHNOLOGY An Example 5 Development set A:90% B:10% Initial MT systemTuned MT system that fits domain A Test set A:10% B:90%
6
INSTITUTE OF COMPUTING TECHNOLOGY An Example 6 Development set A:90% B:10% Initial MT systemTuned MT system that fits domain A Test set A:10% B:90% The translation style fits A, but we mainly want to translate B
7
INSTITUTE OF COMPUTING TECHNOLOGY Traditional Methods 7 Monolingual data with domain annotation
8
INSTITUTE OF COMPUTING TECHNOLOGY Traditional Methods 8 Monolingual data with domain annotation Domain recognizer
9
INSTITUTE OF COMPUTING TECHNOLOGY Traditional Methods 9 Bilingual training data
10
INSTITUTE OF COMPUTING TECHNOLOGY Traditional Methods 10 Bilingual training data Domain recognizer training data : domain A training data : domain B
11
INSTITUTE OF COMPUTING TECHNOLOGY Traditional Methods 11 Bilingual training data Domain recognizer training data : domain A training data : domain B MT system domain A MT system domain B
12
INSTITUTE OF COMPUTING TECHNOLOGY Traditional Methods 12 Test set
13
INSTITUTE OF COMPUTING TECHNOLOGY Traditional Methods 13 Domain recognizer Test set Test set domain A Test set domain B
14
INSTITUTE OF COMPUTING TECHNOLOGY Traditional Methods 14 The translation result MT system domain A MT system domain B Test set domain A Test set domain B The translation result domain A The translation result domain B
15
INSTITUTE OF COMPUTING TECHNOLOGY The merits Simple and effective Fits Human’s intuition 15
16
INSTITUTE OF COMPUTING TECHNOLOGY The drawbacks Classification Error (CE) Especially for unsupervised methods Supervised methods can make CE low, yet requiring annotation data limits its usage 16
17
INSTITUTE OF COMPUTING TECHNOLOGY Our motivation Jump out of the alley of doing adaptation directly Statistics methods (such as Bagging) can help. 17
18
INSTITUTE OF COMPUTING TECHNOLOGY The general framework of Bagging Preliminary 18
19
INSTITUTE OF COMPUTING TECHNOLOGY General framework of Bagging 19 Training set D
20
INSTITUTE OF COMPUTING TECHNOLOGY General framework of Bagging 20 C1 Training set D Training set D1Training set D2Training set D3 …… C2C3 ……
21
INSTITUTE OF COMPUTING TECHNOLOGY General framework of Bagging 21 C1C2C3 …… Test sample
22
INSTITUTE OF COMPUTING TECHNOLOGY General framework of Bagging 22 C1C2C3 …… Test sample Result of C1Result of C2Result of C3 …… Voting result
23
INSTITUTE OF COMPUTING TECHNOLOGY Our method 23
24
INSTITUTE OF COMPUTING TECHNOLOGY Training 24 A,A,A,B,B Suppose there is a development set For simplicity, there are only 5 sentences, 3 belong A, 2 belong B
25
INSTITUTE OF COMPUTING TECHNOLOGY Training 25 A,A,A,B,B A,B,B,B,B A,A,B,B,B A,A,A,B,B A,A,A,A,B …… We bootstrap N new development sets
26
INSTITUTE OF COMPUTING TECHNOLOGY Training 26 A,A,A,B,B A,B,B,B,B A,A,B,B,B A,A,A,B,B A,A,A,A,B MT system-1 …… MT system-2 MT system-3 MT system-4 MT system-5 …… For each set, a subsystem is tuned
27
INSTITUTE OF COMPUTING TECHNOLOGY Decoding 27 For simplicity, Suppose only 2 subsystem has been tuned Subsystem-1 W: Subsystem-1 W:
28
INSTITUTE OF COMPUTING TECHNOLOGY Decoding 28 Subsystem-1 W: Subsystem-1 W: A B Now a sentence “A B” needs a translation
29
INSTITUTE OF COMPUTING TECHNOLOGY Decoding 29 Subsystem-1 W: Subsystem-1 W: A B a b; a c; a b; a d; After translation, each system generate its N- best candidate
30
INSTITUTE OF COMPUTING TECHNOLOGY Decoding 30 a b; a c; a d; Fuse these N-best lists and eliminate deductions Subsystem-1 W: Subsystem-1 W: A B a b; a c; a b; a d;
31
INSTITUTE OF COMPUTING TECHNOLOGY Decoding 31 a b; a c; a d; Subsystem-1 W: Subsystem-1 W: A B a b; a c; a b; a d; Candidates are identical only if their target strings and feature values are entirely equal
32
INSTITUTE OF COMPUTING TECHNOLOGY Decoding 32 Calculate the voting score a b; a c; a d; Subsystem-1 W: Subsystem-1 W: a b; ; -0.16 a b; ; +0.04 a c; ; -0.1 a d; ; -0.18 S represent the number of subsystems
33
INSTITUTE OF COMPUTING TECHNOLOGY Decoding 33 The one with the highest score wins a b; a c; a d; Subsystem-1 W: Subsystem-1 W: a b; ; -0.16 a b; ; +0.04 a c; ; -0.1 a d; ; -0.18
34
INSTITUTE OF COMPUTING TECHNOLOGY Decoding 34 The one with the highest score wins a b; a c; a d; Subsystem-1 W: Subsystem-1 W: a b; ; -0.16 a b; ; +0.04 a c; ; -0.1 a d; ; -0.18 Since subsystems are different copies of the same model and share unique training data, calibration is unnecessary
35
INSTITUTE OF COMPUTING TECHNOLOGY Experiments 35
36
INSTITUTE OF COMPUTING TECHNOLOGY Basic Setups Data: NTCIR9 Chinese-English patent corpus 1k sentence pairs as development set Another 1k pairs as test set The remains are used for training System: hierarchical phrase based model Alignment: GIZA++ grow-diag-final 36
37
INSTITUTE OF COMPUTING TECHNOLOGY Effectiveness : Show and Prove Tune 30 subsystems using Bagging Tune 30 subsystems with random initial weight Evaluate the fusion results of the first N (N=5,10, 15, 20, 30) subsystems of both and compare 37
38
INSTITUTE OF COMPUTING TECHNOLOGY Results: 1-best 38 Number of subsystem +0.82
39
INSTITUTE OF COMPUTING TECHNOLOGY Results: 1-best 39 Number of subsystem +0.70
40
INSTITUTE OF COMPUTING TECHNOLOGY Results: Oracle 40 Number of subsystem +6.22
41
INSTITUTE OF COMPUTING TECHNOLOGY Results: Oracle 41 Number of subsystem +3.71
42
INSTITUTE OF COMPUTING TECHNOLOGY Compare with traditional methods Evaluate a supervised method For tackling data sparsity only operate on development set and test set Evaluate a unsupervised method Similar to Yamada (2007) To avoid data sparsity, only LM specific 42
43
INSTITUTE OF COMPUTING TECHNOLOGY Results 43
44
INSTITUTE OF COMPUTING TECHNOLOGY Conclusions Propose a bagging-based method to address multi-domain translation problem. Experiments shows that: Bagging is effective for domain adaptation problem Our method surpass baseline explicitly, and is even better than some traditional methods. 44
45
INSTITUTE OF COMPUTING TECHNOLOGY 45 Thank you for listening And any questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.