Instance Weighting for Domain Adaptation in NLP Jing Jiang & ChengXiang Zhai University of Illinois at Urbana-Champaign June 25, 2007.

Slides:

Advertisements

Similar presentations

Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao Wei Fan Jing JiangJiawei Han University of Illinois at Urbana-Champaign IBM T. J.

Advertisements

Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao, Wei Fan, Jing Jiang, Jiawei Han l Motivate Solution Framework Data Sets Synthetic.

Presenters: Arni, Sanjana.  Subtask of Information Extraction  Identify known entity names – person, places, organization etc  Identify the boundaries.

Universal Learning over Related Distributions and Adaptive Graph Transduction Erheng Zhong †, Wei Fan ‡, Jing Peng*, Olivier Verscheure ‡, and Jiangtao.

Multi-Task Transfer Learning for Weakly- Supervised Relation Extraction Jing Jiang Singapore Management University ACL-IJCNLP 2009.

Partitioned Logistic Regression for Spam Filtering Ming-wei Chang University of Illinois at Urbana-Champaign Wen-tau Yih and Christopher Meek Microsoft.

Probabilistic Generative Models Rong Jin. Probabilistic Generative Model Classify instance x into one of K classes Class prior Density function for class.

S ENTIMENTAL A NALYSIS O F B LOGS B Y C OMBINING L EXICAL K NOWLEDGE W ITH T EXT C LASSIFICATION. 1 By Prem Melville, Wojciech Gryc, Richard D. Lawrence.

A Two-Stage Approach to Domain Adaptation for Statistical Classifiers Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois.

Frustratingly Easy Domain Adaptation

Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06.

An Overview of Text Mining Rebecca Hwa 4/25/2002 References M. Hearst, “Untangling Text Data Mining,” in the Proceedings of the 37 th Annual Meeting of.

Text Classification from Labeled and Unlabeled Documents using EM Kamal Nigam Andrew K. McCallum Sebastian Thrun Tom Mitchell Machine Learning (2000) Presented.

MSRC Summer School - 30/06/2009 Cambridge – UK Hybrids of generative and discriminative methods for machine learning.

Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao † Wei Fan ‡ Yizhou Sun † Jiawei Han † †University of Illinois at Urbana-Champaign.

Unsupervised Training and Clustering Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall

1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.

Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.

Presented by Zeehasham Rasheed

Cross Validation Framework to Choose Amongst Models and Datasets for Transfer Learning Erheng Zhong ¶, Wei Fan ‡, Qiang Yang ¶, Olivier Verscheure ‡, Jiangtao.

Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao† Wei Fan‡ Jing Jiang†Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.

Scalable Text Mining with Sparse Generative Models

Maria-Florina Balcan A Theoretical Model for Learning from Labeled and Unlabeled Data Maria-Florina Balcan & Avrim Blum Carnegie Mellon University, Computer.

The classification problem (Recap from LING570) LING 572 Fei Xia, Dan Jinguji Week 1: 1/10/08 1.

Introduction to machine learning

Relaxed Transfer of Different Classes via Spectral Partition Xiaoxiao Shi 1 Wei Fan 2 Qiang Yang 3 Jiangtao Ren 4 1 University of Illinois at Chicago 2.

Review: Probability Random variables, events Axioms of probability

Introduction to domain adaptation

1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,

(ACM KDD 09’) Prem Melville, Wojciech Gryc, Richard D. Lawrence

Employing EM and Pool-Based Active Learning for Text Classification Andrew McCallumKamal Nigam Just Research and Carnegie Mellon University.

Shai Ben-David University of Waterloo Towards theoretical understanding of Domain Adaptation Learning ECML/PKDD 2009 LNIID Workshop.

Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.

Modeling Relationship Strength in Online Social Networks Rongjing Xiang: Purdue University Jennifer Neville: Purdue University Monica Rogati: LinkedIn.

Processing of large document collections Part 2 (Text categorization, term selection) Helena Ahonen-Myka Spring 2005.

Domain Adaptation in Natural Language Processing Jing Jiang Department of Computer Science University of Illinois at Urbana-Champaign.

Predictive Modeling with Heterogeneous Sources Xiaoxiao Shi 1 Qi Liu 2 Wei Fan 3 Qiang Yang 4 Philip S. Yu 1 1 University of Illinois at Chicago 2 Tongji.

The Necessity of Combining Adaptation Methods Cognitive Computation Group, University of Illinois Experimental Results Title Ming-Wei Chang, Michael Connor.

Exploiting Domain Structure for Named Entity Recognition Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign.

Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.

Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.

Transfer Learning Motivation and Types Functional Transfer Learning Representational Transfer Learning References.

Conﬁdence-Aware Graph Regularization with Heterogeneous Pairwise Features Yuan FangUniversity of Illinois at Urbana-Champaign Bo-June (Paul) HsuMicrosoft.

Analysis of Topic Dynamics in Web Search Xuehua Shen (University of Illinois) Susan Dumais (Microsoft Research) Eric Horvitz (Microsoft Research) WWW 2005.

CS 6998 NLP for the Web Columbia University 04/22/2010 Analyzing Wikipedia and Gold-Standard Corpora for NER Training William Y. Wang Computer Science.

A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,

PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.

Prototype-Driven Learning for Sequence Models Aria Haghighi and Dan Klein University of California Berkeley Slides prepared by Andrew Carlson for the Semi-

Prior Knowledge Driven Domain Adaptation Gourab Kundu, Ming-wei Chang, and Dan Roth Hyphenated compounds are tagged as NN. Example: H-ras Digit letter.

HAITHAM BOU AMMAR MAASTRICHT UNIVERSITY Transfer for Supervised Learning Tasks.

Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.

Domain Adaptation for Biomedical Information Extraction Jing Jiang BeeSpace Seminar Oct 17, 2007.

Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏

Multi-core Structural SVM Training Kai-Wei Chang Department of Computer Science University of Illinois at Urbana-Champaign Joint Work With Vivek Srikumar.

1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.

Class Imbalance in Text Classification

Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.

Automatic Labeling of Multinomial Topic Models

Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.

 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:

Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)

Machine Learning Expectation Maximization and Gaussian Mixtures CSE 473 Chapter 20.3.

Machine Learning Expectation Maximization and Gaussian Mixtures CSE 473 Chapter 20.3.

Domain Adaptation Slide 1 Hal Daumé III Frustratingly Easy Domain Adaptation Hal Daumé III School of Computing University of Utah

Understanding unstructured texts via Latent Dirichlet Allocation Raphael Cohen DSaaS, EMC IT June 2015.

Cross Domain Distribution Adaptation via Kernel Mapping

Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models

Knowledge Transfer via Multiple Model Local Structure Mapping

LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.

Discriminative Probabilistic Models for Relational Data

Presentation transcript:

Instance Weighting for Domain Adaptation in NLP Jing Jiang & ChengXiang Zhai University of Illinois at Urbana-Champaign June 25, 2007

2 Domain Adaptation Many NLP tasks are cast into classification problems Lack of training data in new domains Domain adaptation: –POS: WSJ  biomedical text –NER: news  blog, speech –Spam filtering: public corpus  personal inboxes Domain overfitting NER TaskTrain → TestF1 to find PER, LOC, ORG from news text NYT → NYT0.855 Reuters → NYT0.641 to find gene/protein from biomedical literature mouse → mouse0.541 fly → mouse0.281

3 Existing Work on Domain Adaptation Existing work –Prior on model parameters [Chelba & Acero 04] –Mixture of general and domain-specific distributions [Daumé III & Marcu 06] –Analysis of representation [Ben-David et al. 07] Our work –A fresh instance weighting perspective –A framework that incorporates both labeled and unlabeled instances

4 Outline Analysis of domain adaptation Instance weighting framework Experiments Conclusions

5 The Need for Domain Adaptation source domain target domain

6 The Need for Domain Adaptation source domain target domain

7 Where Does the Difference Come from? p(x, y) p(x)p(y | x) p s (y | x) ≠ p t (y | x) p s (x) ≠ p t (x) labeling difference instance difference labeling adaptation instance adaptation ?

8 An Instance Weighting Solution (Labeling Adaptation) source domain target domain p t (y | x) ≠ p s (y | x) remove/demote instances

9 An Instance Weighting Solution (Labeling Adaptation) source domain target domain p t (y | x) ≠ p s (y | x) remove/demote instances

10 An Instance Weighting Solution (Labeling Adaptation) source domain target domain p t (y | x) ≠ p s (y | x) remove/demote instances

11 An Instance Weighting Solution (Instance Adaptation: p t (x) < p s (x)) source domain target domain p t (x) < p s (x) remove/demote instances

12 An Instance Weighting Solution (Instance Adaptation: p t (x) < p s (x)) source domain target domain p t (x) < p s (x) remove/demote instances

13 An Instance Weighting Solution (Instance Adaptation: p t (x) < p s (x)) source domain target domain p t (x) < p s (x) remove/demote instances

14 An Instance Weighting Solution (Instance Adaptation: p t (x) > p s (x)) source domain target domain p t (x) > p s (x) promote instances

15 An Instance Weighting Solution (Instance Adaptation: p t (x) > p s (x)) source domain target domain p t (x) > p s (x) promote instances

16 An Instance Weighting Solution (Instance Adaptation: p t (x) > p s (x)) source domain target domain p t (x) > p s (x) promote instances

17 An Instance Weighting Solution (Instance Adaptation: p t (x) > p s (x)) Labeled target domain instances are useful Unlabeled target domain instances may also be useful source domain target domain p t (x) > p s (x)

18 The Exact Objective Function unknown true marginal and conditional probabilities in the target domain log likelihood (log loss function)

19 Three Sets of Instances DsDs D t, l D t, u

20 Three Sets of Instances: Using D s DsDs D t, l D t, u in principle, non-parametric density estimation; in practice, high dimensional data (future work) XDsXDs need labeled target data

21 DsDs D t, l D t, u X  D t,l small sample size, estimation not accurate Three Sets of Instances: Using D t,l

22 DsDs D t, l D t, u X  D t,u pseudo labels (e.g. bootstrapping, EM) Three Sets of Instances: Using D t,u

23 Using All Three Sets of Instances DsDs D t, l D t, u X  D s + D t,l + D t,u ?

24 A Combined Framework a flexible setup covering both standard methods and new domain adaptive methods

25 α i = β i = 1, λ s = 1, λ t,l = λ t,u = 0 Standard Supervised Learning using only D s

26 λ t,l = 1, λ s = λ t,u = 0 Standard Supervised Learning using only D t,l

27 Standard Supervised Learning using both D s and D t,l α i = β i = 1, λ s = N s /(N s +N t,l ), λ t,l = N t,l /(N s +N t,l ), λ t,u = 0

28 α i = 0 if (x i, y i ) are predicted incorrectly by a model trained from D t,l ; 1 otherwise Domain Adaptive Heuristic: 1. Instance Pruning

29 Domain Adaptive Heuristic: 2. D t,l with higher weights λ s N t,l /(N s +N t,l )

30 Standard Bootstrapping γ k (y) = 1 if p(y | x k ) is large; 0 otherwise

31 Domain Adaptive Heuristic: 3. Balanced Bootstrapping γ k (y) = 1 if p(y | x k ) is large; 0 otherwise λ s = λ t,u = 0.5

32 Experiments Three NLP tasks: –POS tagging: WSJ (Penn TreeBank)  Oncology (biomedical) text (Penn BioIE) –NE type classification: newswire  conversational telephone speech (CTS) and web-log (WL) (ACE 2005) –Spam filtering: public collection  personal inboxes (u01, u02, u03) (ECML/PKDD 2006)

33 Experiments Three heuristics: 1. Instance pruning 2. D t,l with higher weights 3. Balanced bootstrapping Performance measure: accuracy

34 Instance Pruning Removing “Misleading” Instances from D s kCTSkWL all0.8830all kOncology all kUser 1User 2User all POS NE Type Spam useful in most cases; failed in some case When is it guaranteed to work? (future work)

35 D t,l with Higher Weights until D s and D t,l Are Balanced methodCTSWL DsDs D s + D t,l D s + 5D t,l D s + 10D t,l methodOncology DsDs D s + D t,l D s + 10D t,l D s + 20D t,l methodUser 1User 2User 3 DsDs D s + D t,l D s + 5D t,l D s + 10D t,l POS NE Type Spam D t,l is very useful promoting D t,l is more useful

36 Instance Pruning + D t,l with Higher Weights MethodCTSWL D s + 10D t,l D s ’+ 10D t,l methodOncology D s + 20D t,l D s ’ + 20D t,l methodUser 1User 2User 3 D s + 10D t,l D s ’+ 10D t,l POS NE Type Spam The two heuristics do not work well together How to combine heuristics? (future work)

37 Balanced Bootstrapping methodCTSWL supervised standard bootstrap balanced bootstrap methodOncology supervised standard bootstrap balanced bootstrap methodUser 1User 2User 3 supervised standard bootstrap balanced bootstrap POS NE Type Spam Promoting target instances is useful, even with pseudo labels

38 Conclusions Formally analyzed the domain adaptation from an instance weighting perspective Proposed an instance weighting framework for domain adaptation –Both labeled and unlabeled instances –Various weight parameters Proposed a number of heuristics to set the weight parameters Experiments showed the effectiveness of the heuristics

39 Future Work Combining different heuristics Principled ways to set the weight parameters –Density estimation for setting β

Thank You!