Presentation is loading. Please wait.

Presentation is loading. Please wait.

Instance Weighting for Domain Adaptation in NLP Jing Jiang & ChengXiang Zhai University of Illinois at Urbana-Champaign June 25, 2007.

Similar presentations


Presentation on theme: "Instance Weighting for Domain Adaptation in NLP Jing Jiang & ChengXiang Zhai University of Illinois at Urbana-Champaign June 25, 2007."— Presentation transcript:

1 Instance Weighting for Domain Adaptation in NLP Jing Jiang & ChengXiang Zhai University of Illinois at Urbana-Champaign June 25, 2007

2 2 Domain Adaptation Many NLP tasks are cast into classification problems Lack of training data in new domains Domain adaptation: –POS: WSJ  biomedical text –NER: news  blog, speech –Spam filtering: public email corpus  personal inboxes Domain overfitting NER TaskTrain → TestF1 to find PER, LOC, ORG from news text NYT → NYT0.855 Reuters → NYT0.641 to find gene/protein from biomedical literature mouse → mouse0.541 fly → mouse0.281

3 3 Existing Work on Domain Adaptation Existing work –Prior on model parameters [Chelba & Acero 04] –Mixture of general and domain-specific distributions [Daumé III & Marcu 06] –Analysis of representation [Ben-David et al. 07] Our work –A fresh instance weighting perspective –A framework that incorporates both labeled and unlabeled instances

4 4 Outline Analysis of domain adaptation Instance weighting framework Experiments Conclusions

5 5 The Need for Domain Adaptation source domain target domain

6 6 The Need for Domain Adaptation source domain target domain

7 7 Where Does the Difference Come from? p(x, y) p(x)p(y | x) p s (y | x) ≠ p t (y | x) p s (x) ≠ p t (x) labeling difference instance difference labeling adaptation instance adaptation ?

8 8 An Instance Weighting Solution (Labeling Adaptation) source domain target domain p t (y | x) ≠ p s (y | x) remove/demote instances

9 9 An Instance Weighting Solution (Labeling Adaptation) source domain target domain p t (y | x) ≠ p s (y | x) remove/demote instances

10 10 An Instance Weighting Solution (Labeling Adaptation) source domain target domain p t (y | x) ≠ p s (y | x) remove/demote instances

11 11 An Instance Weighting Solution (Instance Adaptation: p t (x) < p s (x)) source domain target domain p t (x) < p s (x) remove/demote instances

12 12 An Instance Weighting Solution (Instance Adaptation: p t (x) < p s (x)) source domain target domain p t (x) < p s (x) remove/demote instances

13 13 An Instance Weighting Solution (Instance Adaptation: p t (x) < p s (x)) source domain target domain p t (x) < p s (x) remove/demote instances

14 14 An Instance Weighting Solution (Instance Adaptation: p t (x) > p s (x)) source domain target domain p t (x) > p s (x) promote instances

15 15 An Instance Weighting Solution (Instance Adaptation: p t (x) > p s (x)) source domain target domain p t (x) > p s (x) promote instances

16 16 An Instance Weighting Solution (Instance Adaptation: p t (x) > p s (x)) source domain target domain p t (x) > p s (x) promote instances

17 17 An Instance Weighting Solution (Instance Adaptation: p t (x) > p s (x)) Labeled target domain instances are useful Unlabeled target domain instances may also be useful source domain target domain p t (x) > p s (x)

18 18 The Exact Objective Function unknown true marginal and conditional probabilities in the target domain log likelihood (log loss function)

19 19 Three Sets of Instances DsDs D t, l D t, u

20 20 Three Sets of Instances: Using D s DsDs D t, l D t, u in principle, non-parametric density estimation; in practice, high dimensional data (future work) XDsXDs need labeled target data

21 21 DsDs D t, l D t, u X  D t,l small sample size, estimation not accurate Three Sets of Instances: Using D t,l

22 22 DsDs D t, l D t, u X  D t,u pseudo labels (e.g. bootstrapping, EM) Three Sets of Instances: Using D t,u

23 23 Using All Three Sets of Instances DsDs D t, l D t, u X  D s + D t,l + D t,u ?

24 24 A Combined Framework a flexible setup covering both standard methods and new domain adaptive methods

25 25 α i = β i = 1, λ s = 1, λ t,l = λ t,u = 0 Standard Supervised Learning using only D s

26 26 λ t,l = 1, λ s = λ t,u = 0 Standard Supervised Learning using only D t,l

27 27 Standard Supervised Learning using both D s and D t,l α i = β i = 1, λ s = N s /(N s +N t,l ), λ t,l = N t,l /(N s +N t,l ), λ t,u = 0

28 28 α i = 0 if (x i, y i ) are predicted incorrectly by a model trained from D t,l ; 1 otherwise Domain Adaptive Heuristic: 1. Instance Pruning

29 29 Domain Adaptive Heuristic: 2. D t,l with higher weights λ s N t,l /(N s +N t,l )

30 30 Standard Bootstrapping γ k (y) = 1 if p(y | x k ) is large; 0 otherwise

31 31 Domain Adaptive Heuristic: 3. Balanced Bootstrapping γ k (y) = 1 if p(y | x k ) is large; 0 otherwise λ s = λ t,u = 0.5

32 32 Experiments Three NLP tasks: –POS tagging: WSJ (Penn TreeBank)  Oncology (biomedical) text (Penn BioIE) –NE type classification: newswire  conversational telephone speech (CTS) and web-log (WL) (ACE 2005) –Spam filtering: public email collection  personal inboxes (u01, u02, u03) (ECML/PKDD 2006)

33 33 Experiments Three heuristics: 1. Instance pruning 2. D t,l with higher weights 3. Balanced bootstrapping Performance measure: accuracy

34 34 Instance Pruning Removing “Misleading” Instances from D s kCTSkWL 00.781500.7045 16000.864012000.6975 32000.882524000.6795 all0.8830all0.6600 kOncology 00.8630 80000.8709 160000.8714 all0.8720 kUser 1User 2User 3 00.63060.69500.7644 3000.66110.72280.8222 6000.79110.83220.8328 all0.81060.85170.8067 POS NE Type Spam useful in most cases; failed in some case When is it guaranteed to work? (future work)

35 35 D t,l with Higher Weights until D s and D t,l Are Balanced methodCTSWL DsDs 0.78150.7045 D s + D t,l 0.93400.7735 D s + 5D t,l 0.93600.7820 D s + 10D t,l 0.93550.7840 methodOncology DsDs 0.8630 D s + D t,l 0.9349 D s + 10D t,l 0.9429 D s + 20D t,l 0.9443 methodUser 1User 2User 3 DsDs 0.63060.69500.7644 D s + D t,l 0.9572 0.9461 D s + 5D t,l 0.96280.96110.9601 D s + 10D t,l 0.96390.96280.9633 POS NE Type Spam D t,l is very useful promoting D t,l is more useful

36 36 Instance Pruning + D t,l with Higher Weights MethodCTSWL D s + 10D t,l 0.93550.7840 D s ’+ 10D t,l 0.89500.6670 methodOncology D s + 20D t,l 0.9443 D s ’ + 20D t,l 0.9422 methodUser 1User 2User 3 D s + 10D t,l 0.96390.96280.9633 D s ’+ 10D t,l 0.97170.94780.9494 POS NE Type Spam The two heuristics do not work well together How to combine heuristics? (future work)

37 37 Balanced Bootstrapping methodCTSWL supervised0.77810.7351 standard bootstrap 0.89170.7498 balanced bootstrap 0.89230.7523 methodOncology supervised0.8630 standard bootstrap 0.8728 balanced bootstrap 0.8750 methodUser 1User 2User 3 supervised0.64760.69760.8068 standard bootstrap 0.87200.92120.9760 balanced bootstrap 0.88160.92560.9772 POS NE Type Spam Promoting target instances is useful, even with pseudo labels

38 38 Conclusions Formally analyzed the domain adaptation from an instance weighting perspective Proposed an instance weighting framework for domain adaptation –Both labeled and unlabeled instances –Various weight parameters Proposed a number of heuristics to set the weight parameters Experiments showed the effectiveness of the heuristics

39 39 Future Work Combining different heuristics Principled ways to set the weight parameters –Density estimation for setting β

40 Thank You!


Download ppt "Instance Weighting for Domain Adaptation in NLP Jing Jiang & ChengXiang Zhai University of Illinois at Urbana-Champaign June 25, 2007."

Similar presentations


Ads by Google