Download presentation
Presentation is loading. Please wait.
Published byKelley Mills Modified over 9 years ago
1
Instance Weighting for Domain Adaptation in NLP Jing Jiang & ChengXiang Zhai University of Illinois at Urbana-Champaign June 25, 2007
2
2 Domain Adaptation Many NLP tasks are cast into classification problems Lack of training data in new domains Domain adaptation: –POS: WSJ biomedical text –NER: news blog, speech –Spam filtering: public email corpus personal inboxes Domain overfitting NER TaskTrain → TestF1 to find PER, LOC, ORG from news text NYT → NYT0.855 Reuters → NYT0.641 to find gene/protein from biomedical literature mouse → mouse0.541 fly → mouse0.281
3
3 Existing Work on Domain Adaptation Existing work –Prior on model parameters [Chelba & Acero 04] –Mixture of general and domain-specific distributions [Daumé III & Marcu 06] –Analysis of representation [Ben-David et al. 07] Our work –A fresh instance weighting perspective –A framework that incorporates both labeled and unlabeled instances
4
4 Outline Analysis of domain adaptation Instance weighting framework Experiments Conclusions
5
5 The Need for Domain Adaptation source domain target domain
6
6 The Need for Domain Adaptation source domain target domain
7
7 Where Does the Difference Come from? p(x, y) p(x)p(y | x) p s (y | x) ≠ p t (y | x) p s (x) ≠ p t (x) labeling difference instance difference labeling adaptation instance adaptation ?
8
8 An Instance Weighting Solution (Labeling Adaptation) source domain target domain p t (y | x) ≠ p s (y | x) remove/demote instances
9
9 An Instance Weighting Solution (Labeling Adaptation) source domain target domain p t (y | x) ≠ p s (y | x) remove/demote instances
10
10 An Instance Weighting Solution (Labeling Adaptation) source domain target domain p t (y | x) ≠ p s (y | x) remove/demote instances
11
11 An Instance Weighting Solution (Instance Adaptation: p t (x) < p s (x)) source domain target domain p t (x) < p s (x) remove/demote instances
12
12 An Instance Weighting Solution (Instance Adaptation: p t (x) < p s (x)) source domain target domain p t (x) < p s (x) remove/demote instances
13
13 An Instance Weighting Solution (Instance Adaptation: p t (x) < p s (x)) source domain target domain p t (x) < p s (x) remove/demote instances
14
14 An Instance Weighting Solution (Instance Adaptation: p t (x) > p s (x)) source domain target domain p t (x) > p s (x) promote instances
15
15 An Instance Weighting Solution (Instance Adaptation: p t (x) > p s (x)) source domain target domain p t (x) > p s (x) promote instances
16
16 An Instance Weighting Solution (Instance Adaptation: p t (x) > p s (x)) source domain target domain p t (x) > p s (x) promote instances
17
17 An Instance Weighting Solution (Instance Adaptation: p t (x) > p s (x)) Labeled target domain instances are useful Unlabeled target domain instances may also be useful source domain target domain p t (x) > p s (x)
18
18 The Exact Objective Function unknown true marginal and conditional probabilities in the target domain log likelihood (log loss function)
19
19 Three Sets of Instances DsDs D t, l D t, u
20
20 Three Sets of Instances: Using D s DsDs D t, l D t, u in principle, non-parametric density estimation; in practice, high dimensional data (future work) XDsXDs need labeled target data
21
21 DsDs D t, l D t, u X D t,l small sample size, estimation not accurate Three Sets of Instances: Using D t,l
22
22 DsDs D t, l D t, u X D t,u pseudo labels (e.g. bootstrapping, EM) Three Sets of Instances: Using D t,u
23
23 Using All Three Sets of Instances DsDs D t, l D t, u X D s + D t,l + D t,u ?
24
24 A Combined Framework a flexible setup covering both standard methods and new domain adaptive methods
25
25 α i = β i = 1, λ s = 1, λ t,l = λ t,u = 0 Standard Supervised Learning using only D s
26
26 λ t,l = 1, λ s = λ t,u = 0 Standard Supervised Learning using only D t,l
27
27 Standard Supervised Learning using both D s and D t,l α i = β i = 1, λ s = N s /(N s +N t,l ), λ t,l = N t,l /(N s +N t,l ), λ t,u = 0
28
28 α i = 0 if (x i, y i ) are predicted incorrectly by a model trained from D t,l ; 1 otherwise Domain Adaptive Heuristic: 1. Instance Pruning
29
29 Domain Adaptive Heuristic: 2. D t,l with higher weights λ s N t,l /(N s +N t,l )
30
30 Standard Bootstrapping γ k (y) = 1 if p(y | x k ) is large; 0 otherwise
31
31 Domain Adaptive Heuristic: 3. Balanced Bootstrapping γ k (y) = 1 if p(y | x k ) is large; 0 otherwise λ s = λ t,u = 0.5
32
32 Experiments Three NLP tasks: –POS tagging: WSJ (Penn TreeBank) Oncology (biomedical) text (Penn BioIE) –NE type classification: newswire conversational telephone speech (CTS) and web-log (WL) (ACE 2005) –Spam filtering: public email collection personal inboxes (u01, u02, u03) (ECML/PKDD 2006)
33
33 Experiments Three heuristics: 1. Instance pruning 2. D t,l with higher weights 3. Balanced bootstrapping Performance measure: accuracy
34
34 Instance Pruning Removing “Misleading” Instances from D s kCTSkWL 00.781500.7045 16000.864012000.6975 32000.882524000.6795 all0.8830all0.6600 kOncology 00.8630 80000.8709 160000.8714 all0.8720 kUser 1User 2User 3 00.63060.69500.7644 3000.66110.72280.8222 6000.79110.83220.8328 all0.81060.85170.8067 POS NE Type Spam useful in most cases; failed in some case When is it guaranteed to work? (future work)
35
35 D t,l with Higher Weights until D s and D t,l Are Balanced methodCTSWL DsDs 0.78150.7045 D s + D t,l 0.93400.7735 D s + 5D t,l 0.93600.7820 D s + 10D t,l 0.93550.7840 methodOncology DsDs 0.8630 D s + D t,l 0.9349 D s + 10D t,l 0.9429 D s + 20D t,l 0.9443 methodUser 1User 2User 3 DsDs 0.63060.69500.7644 D s + D t,l 0.9572 0.9461 D s + 5D t,l 0.96280.96110.9601 D s + 10D t,l 0.96390.96280.9633 POS NE Type Spam D t,l is very useful promoting D t,l is more useful
36
36 Instance Pruning + D t,l with Higher Weights MethodCTSWL D s + 10D t,l 0.93550.7840 D s ’+ 10D t,l 0.89500.6670 methodOncology D s + 20D t,l 0.9443 D s ’ + 20D t,l 0.9422 methodUser 1User 2User 3 D s + 10D t,l 0.96390.96280.9633 D s ’+ 10D t,l 0.97170.94780.9494 POS NE Type Spam The two heuristics do not work well together How to combine heuristics? (future work)
37
37 Balanced Bootstrapping methodCTSWL supervised0.77810.7351 standard bootstrap 0.89170.7498 balanced bootstrap 0.89230.7523 methodOncology supervised0.8630 standard bootstrap 0.8728 balanced bootstrap 0.8750 methodUser 1User 2User 3 supervised0.64760.69760.8068 standard bootstrap 0.87200.92120.9760 balanced bootstrap 0.88160.92560.9772 POS NE Type Spam Promoting target instances is useful, even with pseudo labels
38
38 Conclusions Formally analyzed the domain adaptation from an instance weighting perspective Proposed an instance weighting framework for domain adaptation –Both labeled and unlabeled instances –Various weight parameters Proposed a number of heuristics to set the weight parameters Experiments showed the effectiveness of the heuristics
39
39 Future Work Combining different heuristics Principled ways to set the weight parameters –Density estimation for setting β
40
Thank You!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.