Prior Knowledge Driven Domain Adaptation Gourab Kundu, Ming-wei Chang, and Dan Roth Hyphenated compounds are tagged as NN. Example: H-ras Digit letter.

Slides:

Advertisements

Similar presentations

Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.

Advertisements

Latent Variables Naman Agarwal Michael Nute May 1, 2013.

December 2011 NIPS Adaptation Workshop With thanks to: Collaborators: Ming-Wei Chang, Michael Connor, Gourab Kundu, Alla Rozovskaya Funding: NSF, MIAS-DHS,

Page 1 SRL via Generalized Inference Vasin Punyakanok, Dan Roth, Wen-tau Yih, Dav Zimak, Yuancheng Tu Department of Computer Science University of Illinois.

1 Relational Learning of Pattern-Match Rules for Information Extraction Presentation by Tim Chartrand of A paper bypaper Mary Elaine Califf and Raymond.

Multi-Task Transfer Learning for Weakly- Supervised Relation Extraction Jing Jiang Singapore Management University ACL-IJCNLP 2009.

Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance July 27 EMNLP 2011 Shay B. Cohen Dipanjan Das Noah A. Smith Carnegie Mellon University.

Semantic Role Labeling Abdul-Lateef Yussiff

Learning with Probabilistic Features for Improved Pipeline Models Razvan C. Bunescu Electrical Engineering and Computer Science Ohio University Athens,

STANFORD Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion J. Zico Kolter, Pieter Abbeel, Andrew Y. Ng Goal Initial Position.

Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.

Reconciling Schemas of Disparate Data Sources: A Machine-Learning Approach AnHai Doan Pedro Domingos Alon Halevy.

1 Combining Lexical and Syntactic Features for Supervised Word Sense Disambiguation Saif Mohammad Ted Pedersen University of Toronto University of Minnesota.

Page 1 Generalized Inference with Multiple Semantic Role Labeling Systems Peter Koomen, Vasin Punyakanok, Dan Roth, (Scott) Wen-tau Yih Department of Computer.

Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms עידו.

1 Complementarity of Lexical and Simple Syntactic Features: The SyntaLex Approach to S ENSEVAL -3 Saif Mohammad Ted Pedersen University of Toronto, Toronto.

Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network Kristina Toutanova, Dan Klein, Christopher Manning, Yoram Singer Stanford University.

Semantic Role Labeling using Maximum Entropy Model Joon-Ho Lim NLP Lab. Korea Univ.

Introduction to Machine Learning Approach Lecture 5.

Instance Weighting for Domain Adaptation in NLP Jing Jiang & ChengXiang Zhai University of Illinois at Urbana-Champaign June 25, 2007.

Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.

1 CS546: Machine Learning and Natural Language Preparation to the Term Project: - Dependency Parsing - Dependency Representation for Semantic Role Labeling.

Shai Ben-David University of Waterloo Towards theoretical understanding of Domain Adaptation Learning ECML/PKDD 2009 LNIID Workshop.

Overview of Machine Learning for NLP Tasks: part II Named Entity Tagging: A Phrase-Level NLP Task.

Authors: Ting Wang, Yaoyong Li, Kalina Bontcheva, Hamish Cunningham, Ji Wang Presented by: Khalifeh Al-Jadda Automatic Extraction of Hierarchical Relations.

1 CS546: Machine Learning and Natural Language Multi-Class and Structured Prediction Problems Slides from Taskar and Klein are used in this lecture TexPoint.

Illinois-Coref: The UI System in the CoNLL-2012 Shared Task Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Mark Sammons, and Dan Roth Supported by ARL,

Aspect Guided Text Categorization with Unobserved Labels Dan Roth, Yuancheng Tu University of Illinois at Urbana-Champaign.

The Necessity of Combining Adaptation Methods Cognitive Computation Group, University of Illinois Experimental Results Title Ming-Wei Chang, Michael Connor.

Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.

Selective Block Minimization for Faster Convergence of Limited Memory Large-scale Linear Models Kai-Wei Chang and Dan Roth Experiment Settings Block Minimization.

Czech-English Word Alignment Ondřej Bojar Magdalena Prokopová

Training dependency parsers by jointly optimizing multiple objectives Keith HallRyan McDonaldJason Katz- BrownMichael Ringgaard.

A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.

1 CS546: Machine Learning and Natural Language Latent-Variable Models for Structured Prediction Problems: Syntactic Parsing Slides / Figures from Slav.

Machine Learning.

1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.

13-1 Chapter 13 Part-of-Speech Tagging POS Tagging + HMMs Part of Speech Tagging –What and Why? What Information is Available? Visible Markov Models.

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,

Prototype-Driven Learning for Sequence Models Aria Haghighi and Dan Klein University of California Berkeley Slides prepared by Andrew Carlson for the Semi-

Combining Lexical Resources: Mapping Between PropBank and VerbNet Edward Loper,Szu-ting Yi, Martha Palmer September 2006.

Relation Alignment for Textual Entailment Recognition Cognitive Computation Group, University of Illinois Experimental ResultsTitle Mark Sammons, V.G.Vinod.

Using Semantic Relations to Improve Passage Retrieval for Question Answering Tom Morton.

1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )

Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification John Blitzer, Mark Dredze and Fernando Pereira University.

MedKAT Medical Knowledge Analysis Tool December 2009.

Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois.

Automatic Grammar Induction and Parsing Free Text - Eric Brill Thur. POSTECH Dept. of Computer Science 심 준 혁.

Part-of-speech tagging

Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)

Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.

Overview of Statistical NLP IR Group Meeting March 7, 2006.

Page 1 July 2008 ICML Workshop on Prior Knowledge for Text and Language Constraints as Prior Knowledge Ming-Wei Chang, Lev Ratinov, Dan Roth Department.

Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.

Static model noOverlaps :: ArgumentCandidate[] candidates -> discrete[] types for (i : (0.. candidates.size() - 1)) for (j : (i candidates.size()

Domain Adaptation Slide 1 Hal Daumé III Frustratingly Easy Domain Adaptation Hal Daumé III School of Computing University of Utah

Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.

This research is supported by the U.S. Department of Education and DARPA. Focuses on mistakes in determiner and preposition usage made by non-native speakers.

The University of Illinois System in the CoNLL-2013 Shared Task Alla RozovskayaKai-Wei ChangMark SammonsDan Roth Cognitive Computation Group University.

Linli Xu Martha White Dale Schuurmans University of Alberta

David Mareček and Zdeněk Žabokrtský

Two Discourse Driven Language Models for Semantics

Are End-to-end Systems the Ultimate Solutions for NLP?

Margin-based Decomposed Amortized Inference

Chunk Parsing CS1573: AI Application Development, Spring 2003

University of Illinois System in HOO Text Correction Shared Task

Statistical NLP Spring 2011

Dan Roth Department of Computer Science

Presentation transcript:

Prior Knowledge Driven Domain Adaptation Gourab Kundu, Ming-wei Chang, and Dan Roth Hyphenated compounds are tagged as NN. Example: H-ras Digit letter combinations should be tagged as NN. Example: CTNNB1 Hyphen should be tagged as HYPH. Prior Knowledge on BioMed Annotation wiki TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A A PDA-KW Incorporate Target domain specific knowledge c’ = {C’ k (.)} as constraints. Impose constraints c and c’ at inference time. Adaptation without retraining. PDA-ST SystemPOSSRL All VerbsBe Verbs Baseline Self-training PDA-KW PDA-ST Domain Adaptation Problem: Performance of statistical systems drops significantly when tested on a domain different than the training domain. Example: CoNLL 2007 shared task – annotation standard was different across the source and target domain. Motivation: Prior Knowledge is cheap and readily available for many domains. Solution: Use prior knowledge on the target domain for better adaptation.. When SRL trained on WSJ domain is tested on Ontonotes, F1 drops 18%. A0 V A1 Constrained Conditional Model Incorporate prior knowledge as constraints c = {C j (.)}. Learn the weight vector w ignoring c. Impose constraints c at inference time. Prior Knowledge on Ontonotes Be verbs are unseen in training domain. If be verb is followed by a verb immediately, there can be no core argument. Example: John is eating. If be verb is followed by the word “like”, core arguments of A0 and A1 are possible. Example: And he’s like why ‘s the door open ? Otherwise, A1 and A2 are possible. Example: John is a good man. Frame file of “be” verb POS Tagging PRP VB NNS. I eat fruits. Semantic Role Labeling (SRL) When POS Tagger trained on WSJ domain is tested on Bio domain, F1 drops 9%. I eat fruits. “Only names of persons, locations etc. are proper nouns which are very few. Gene, disease, drug names etc. are marked as common nouns. “ Any word unseen in source domain followed by the word “gene” should be tagged as NN. Example: ras gene If any word does not appear with tag NNP in training data, predict NN instead of NNP. Example: polymerase chain reaction ( PCR ) For POS tagging, we do not have any domain independent knowledge. For SRL, we use some domain independent knowledge. Example: Two arguments can not overlap. Motivation: Constraints are accurate but apply rarely. So can we generalize to cases where constraints did not apply? Solution: Embed constraints into self training. D s : Source domain labeled data D u : Target domain unlabeled data D t : Target domain test data Conclusion Prior knowledge gives competitive results to using labeled data. Future Work Improve the results for self-training. Find theoretical justifications for self training Apply PDA to more tasks/ domains. Suggestions? Self-training Motivation: How good is self training without knowledge? Same as PDA-ST except replace the red boxed line with the following line. Experimental Results Comparison with JiangZh07 References J. Jiang and C. Zhai, Instance Weighting for domain adaptation in nlp, acl07 G. Kundu and D. Roth, Adapting text instead of the Model: An Open Domain Approach, conll 11 J. Blitzer, R. Mcdonald, F. Pereira, Domain Adaptation with Structural Correspondence Learning, emnlp06 After adding knowledge, POS tagging error reduces 42%, SRL error reduces 25% on Be verbs and 9% on all verbs. Without using any labeled data, prior knowledge reduces error 38% over using 300 labeled sentences. Without using any labeled data, prior knowledge recovers 72% accuracy gain of adding 2730 labeled sentences. SystemPOS Amount of Target Label Data PDA-ST92.00 JiangZh JiangZh This research is sponsored by ARL and DARPA, under the machine reading program.

“Only names of persons, locations etc. are proper nouns which are very few. Gene, disease, drug names etc. are marked as common nouns. “