Download presentation
Presentation is loading. Please wait.
Published byMartha Golden Modified over 9 years ago
1
Prior Knowledge Driven Domain Adaptation Gourab Kundu, Ming-wei Chang, and Dan Roth Hyphenated compounds are tagged as NN. Example: H-ras Digit letter combinations should be tagged as NN. Example: CTNNB1 Hyphen should be tagged as HYPH. Prior Knowledge on BioMed Annotation wiki TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A A PDA-KW Incorporate Target domain specific knowledge c’ = {C’ k (.)} as constraints. Impose constraints c and c’ at inference time. Adaptation without retraining. PDA-ST SystemPOSSRL All VerbsBe Verbs Baseline86.258.115.5 Self-training86.258.313.7 PDA-KW91.862.134.5 PDA-ST92.062.436.4 Domain Adaptation Problem: Performance of statistical systems drops significantly when tested on a domain different than the training domain. Example: CoNLL 2007 shared task – annotation standard was different across the source and target domain. Motivation: Prior Knowledge is cheap and readily available for many domains. Solution: Use prior knowledge on the target domain for better adaptation.. When SRL trained on WSJ domain is tested on Ontonotes, F1 drops 18%. A0 V A1 Constrained Conditional Model Incorporate prior knowledge as constraints c = {C j (.)}. Learn the weight vector w ignoring c. Impose constraints c at inference time. Prior Knowledge on Ontonotes Be verbs are unseen in training domain. If be verb is followed by a verb immediately, there can be no core argument. Example: John is eating. If be verb is followed by the word “like”, core arguments of A0 and A1 are possible. Example: And he’s like why ‘s the door open ? Otherwise, A1 and A2 are possible. Example: John is a good man. Frame file of “be” verb POS Tagging PRP VB NNS. I eat fruits. Semantic Role Labeling (SRL) When POS Tagger trained on WSJ domain is tested on Bio domain, F1 drops 9%. I eat fruits. “Only names of persons, locations etc. are proper nouns which are very few. Gene, disease, drug names etc. are marked as common nouns. “ Any word unseen in source domain followed by the word “gene” should be tagged as NN. Example: ras gene If any word does not appear with tag NNP in training data, predict NN instead of NNP. Example: polymerase chain reaction ( PCR ) For POS tagging, we do not have any domain independent knowledge. For SRL, we use some domain independent knowledge. Example: Two arguments can not overlap. Motivation: Constraints are accurate but apply rarely. So can we generalize to cases where constraints did not apply? Solution: Embed constraints into self training. D s : Source domain labeled data D u : Target domain unlabeled data D t : Target domain test data Conclusion Prior knowledge gives competitive results to using labeled data. Future Work Improve the results for self-training. Find theoretical justifications for self training Apply PDA to more tasks/ domains. Suggestions? Self-training Motivation: How good is self training without knowledge? Same as PDA-ST except replace the red boxed line with the following line. Experimental Results Comparison with JiangZh07 References J. Jiang and C. Zhai, Instance Weighting for domain adaptation in nlp, acl07 G. Kundu and D. Roth, Adapting text instead of the Model: An Open Domain Approach, conll 11 J. Blitzer, R. Mcdonald, F. Pereira, Domain Adaptation with Structural Correspondence Learning, emnlp06 After adding knowledge, POS tagging error reduces 42%, SRL error reduces 25% on Be verbs and 9% on all verbs. Without using any labeled data, prior knowledge reduces error 38% over using 300 labeled sentences. Without using any labeled data, prior knowledge recovers 72% accuracy gain of adding 2730 labeled sentences. SystemPOS Amount of Target Label Data PDA-ST92.00 JiangZh07-187.2300 JiangZh07-294.22730 This research is sponsored by ARL and DARPA, under the machine reading program.
2
“Only names of persons, locations etc. are proper nouns which are very few. Gene, disease, drug names etc. are marked as common nouns. “
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.