Download presentation
Presentation is loading. Please wait.
Published byAudra Hoover Modified over 9 years ago
1
The Necessity of Combining Adaptation Methods Cognitive Computation Group, University of Illinois Experimental Results Title Ming-Wei Chang, Michael Connor and Dan Roth Text Take home message Propose a theoretical analysis of the ``Frustratingly Easy'' (FE) framework [Daume07] Demonstrate the complex interaction between unlabeled and labeled approaches (via artificial experiments) Simple “Source+Target” + “Cluster-like features” is often the best approach! (More details later) State-of-the-art adaptation performance! Contributions NER Experiments AlgorithmTGTFEFE+S+T SRC Labeled data?NoYes TGT Labeled data:Token F1 MUC7 Dev58.670.574.373.1 + cluster77.582.583.3 MUC7 Train73.078.280.178.7 + cluster 85.4 86.4 86.2 86.5 Domain Adaptation While recent advances in statistical modeling for natural language processing are exciting, the problem of domain adaptation remains a big challenge. It is widely known that a classifier trained on one domain (e.g. news domain) usually performs poorly on a different domain (e.g. medical domain). The inability of current statistical models to handle multiple domains is one of the key obstacles hindering the progress of NLP. “It is necessary to combine labeled and unlabeled adaptation frameworks!” Most works only focus on one aspect. We argue this is not enough because: 1. Mutual Benefit: We analyze these two types of frameworks and find that they address different adaptation issues. 2.Complex Interaction: these two types of frameworks are not independent. Selected References Artificial Adaptation Experiments Current Approaches Focuses on P(X) (Unlabeled) This type of adaptation algorithm attempts to resolve the difference between the feature space statistics of two domains. While many different techniques have been proposed, the common goal of these algorithms is to find (or append) a better shared representation that brings the source domain and the target domain closer. Often these algorithms do not use labeled examples in the target domain. The works [BlitzerMcPe06,HuangYa09] all belong to this category. Focuses on P(Y|X) (Labeled) These adaptation algorithms assume that there exists a small amount of labeled data for the target domain. Instead of training two weight vectors independently (one for source and the other for the target domain), these algorithms try to relate the source and target weight vectors. This is often achieved by using a special designed regularization term. The works [ChelbaAc04,Daume07,FinkelMa09] belong to this category. A daptation Frameworks To demonstrate some of the complexities and benefits of combining adaptation approaches we ran experiments on artificial data showing the performance of three adaptation frameworks as similarities between two domains were controlled. In the first experiment above (without clusters) we see that tasks need to be similar for FE to work. Once they are nearly identical the simpler S+T is better. In the second experiment a set of identical shared features are added to both hyperplanes (clusters), so both adaptation algorithms improve, and the cluster adaptation has effectively moved the two tasks closer, enlarging the region where S+T improves over FE. Addition of clusters allows simpler algorithm. Adaptation Without Clusters Adaptation With Clusters NER Comparison System Unlabeled?Labeled?P.F1T.F1 FM09NoYes79.98N/A RR09YesNoN/A83.2 RR09 + globalYesNoN/A86.2 Our NERYes 84.186.5 John Blitzer, Ryan McDonald, and Fernando Pereira. 2006. Domain adaptation with structural correspondence learning. In EMNLP. Ciprian Chelba and Alex Acero. 2004. Adaptation of maximum entropy capitalizer: Little data can help a lot. In EMNLP. Hal Daum ́ III. 2007. Frustratingly easy domain adaptation. In ACL. J. R. Finkel and C. D. Manning. 2009. Hierarchical Bayesian domain adaptation. In NAACL. Fei Huang and Alexander Yates. 2009. Distributional representations for handling sparsity in supervised sequence-labeling. In ACL. L. Ratinov and D. Roth. 2009. Design challenges and misconceptions in named entity recognition. In CoNLL. FrameworkLabeled Data Unlabeled Data Approach UnlabeledSourceCover Source and Target Generate features that span Domains LabeledSource plus Target NoneTrain classifier using both source and target data Tgt: Train on target only FE: Frustratingly Easy S+T: Train on source and target labeled data together as one. In both experiments training and test data generated for two domains according to random hyperplanes whose difference (cosine) was controlled. The goal of this adaptation experiment is to maximize the performance on the test data of MUC7 dataset with CoNLL training data and (some) MUC7 labeled data. As an unla- beled adaptation method to address feature sparsity, we add cluster-like features based on the gazetteers and word clustering resources used in (Ratinov and Roth, 2009) to bridge the source and target domain. Named Entity Recognition Importantly, adding cluster-like features changes the behavior of the labeled adaptation algorithms. When the cluster-like features are not added, the FE+ algorithm is in general the best labeled adaptation framework. However, after adding the cluster- like features, the simple S+T approach becomes very competitive to both FE and FE+. Resolving features sparsity will change the behavior of labeled adaptation frameworks. TGT: Only uses target labeled training dataset. FE: Uses both labeled datasets. FE+ : Modification of FE, equivalent to multiplying the “shared” part of the FE feature vector by 10 (Finkel and Manning, 2009). S+T: Uses both source and target labeled datasets to train a single model with all labeled data directly.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.